[GitHub] spark issue #16248: [SPARK-18810][SPARKR] SparkR install.spark does not work...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16248 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16248: [SPARK-18810][SPARKR] SparkR install.spark does not work...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16248 **[Test build #70009 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70009/consoleFull)** for PR 16248 at commit [`3e5034d`](https://github.com/apache/spark/commit/3e5034d18aa1edfe77310a8b52bccd2cd30ef130). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16248: [SPARK-18810][SPARKR] SparkR install.spark does not work...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16248 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70009/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16104: [SPARK-18675][SQL] CTAS for hive serde table should work...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16104 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16104: [SPARK-18675][SQL] CTAS for hive serde table should work...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16104 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70006/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16104: [SPARK-18675][SQL] CTAS for hive serde table should work...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16104 **[Test build #70006 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70006/consoleFull)** for PR 16104 at commit [`8607425`](https://github.com/apache/spark/commit/8607425d025944204ae38c38679a9204ffd1c144). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16219: [SPARK-18790][SS] Keep a general offset history o...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16219 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16219: [SPARK-18790][SS] Keep a general offset history of strea...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16219 Thanks! Merging to master and 2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13909 **[Test build #70010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70010/consoleFull)** for PR 13909 at commit [`f418062`](https://github.com/apache/spark/commit/f418062e8c54732c4b78716d27b8c699ac9df980). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16219: [SPARK-18790][SS] Keep a general offset history of strea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16219 **[Test build #3493 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3493/consoleFull)** for PR 16219 at commit [`0830349`](https://github.com/apache/spark/commit/083034925d068c1c7c9123d97fc3e647da4faee4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16220 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70003/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16220 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16220 **[Test build #70003 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70003/consoleFull)** for PR 16220 at commit [`4be4149`](https://github.com/apache/spark/commit/4be4149d81d9860445ce4b53ae5951c1467632f4). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class CartesianDeserializer(Serializer):` * `class PairDeserializer(Serializer):` * `case class FileStreamSourceOffset(logOffset: Long) extends Offset ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16251: [SPARK-18826][SS]Add 'newestFirst' option to FileStreamS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16251 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70002/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16251: [SPARK-18826][SS]Add 'newestFirst' option to FileStreamS...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16251 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16251: [SPARK-18826][SS]Add 'newestFirst' option to FileStreamS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16251 **[Test build #70002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70002/consoleFull)** for PR 16251 at commit [`58a57d4`](https://github.com/apache/spark/commit/58a57d4004c45ff2290b95ec8c70ef95828d379b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r91893984 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -331,7 +332,7 @@ private[spark] class MemoryStore( var unrollMemoryUsedByThisBlock = 0L // Underlying buffer for unrolling the block val redirectableStream = new RedirectableOutputStream -val bbos = new ChunkedByteBufferOutputStream(initialMemoryThreshold.toInt, allocator) +val bbos = new ChunkedByteBufferOutputStream(chunkSize, allocator) --- End diff -- Don't we need to add check for the size? It still exposes to overflow by converting `pageSizeBytes` from long to int, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r91892123 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -78,6 +80,7 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) } // Note: use getSizeAsKb (not bytes) to maintain compatibility if no units are provided blockSize = conf.getSizeAsKb("spark.broadcast.blockSize", "4m").toInt * 1024 --- End diff -- `spark.broadcast.blockSize` has special meaning. I don't think we should replace it with `pageSizeBytes`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16248: [SPARK-18810][SPARKR] SparkR install.spark does not work...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16248 **[Test build #70009 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70009/consoleFull)** for PR 16248 at commit [`3e5034d`](https://github.com/apache/spark/commit/3e5034d18aa1edfe77310a8b52bccd2cd30ef130). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16248: [SPARK-18810][SPARKR] SparkR install.spark does n...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/16248#discussion_r91891490 --- Diff: R/pkg/R/utils.R --- @@ -851,3 +851,12 @@ rbindRaws <- function(inputData){ out[!rawcolumns] <- lapply(out[!rawcolumns], unlist) out } + +# Get basename without extension from URL +basenameSansExtFromUrl <- function(url) { --- End diff -- My concern was to bring in another dependencies just for this (it's in the tools) The regex was in fact copy-paste from file_path_sans_ext (hence the name) except for the compression part which is what you are referring to. I could copy that over as well. Would you prefer `compression` be TRUE (default is FALSE) to remove `.gz`? ``` > library(tools) > file_path_sans_ext function (x, compression = FALSE) { if (compression) x <- sub("[.](gz|bz2|xz)$", "", x) sub("([^.]+)\\.[[:alnum:]]+$", "\\1", x) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16030 the new behavior LGTM, but I'm not sure if we still need to keep the old behavior --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16219: [SPARK-18790][SS] Keep a general offset history of strea...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16219 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16142: [SPARK-18716][CORE] Restrict the disk usage of sp...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16142#discussion_r91889826 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -90,6 +91,10 @@ private[spark] class EventLoggingListener( * Creates the log file in the configured log directory. */ def start() { +val statusList = Option(fileSystem.listStatus(new Path(logBaseDir))).map(_.toSeq) + .getOrElse(Seq[FileStatus]()) +EventLoggingListener.cleanRedundantLogFiles(sparkConf, fileSystem, statusList) --- End diff -- Make sense. I will revert related changes first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14638 Do you mean to replace the current whole `TableReader.scala` which was introduced in SPARK-1251 ? I guessed Spark chose this direct access approach for the performance issue at that time. Yes. This option is targeting only `TextInputFormat`s. For non-file based hive tables like Orc/Parquet, this option is ignored. ```scala val isTextInputFormatTable = classOf[TextInputFormat].isAssignableFrom(hiveTable.getInputFormatClass) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r9169 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -78,6 +80,7 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) } // Note: use getSizeAsKb (not bytes) to maintain compatibility if no units are provided blockSize = conf.getSizeAsKb("spark.broadcast.blockSize", "4m").toInt * 1024 +chunkSize = SparkEnv.get.memoryManager.pageSizeBytes.toInt checksumEnabled = conf.getBoolean("spark.broadcast.checksum", true) --- End diff -- @JoshRosen We use `SparkEnv.get.memoryManager.pageSizeBytes` as chunk size. As `SparkEnv.get.memoryManager.pageSizeBytes` returns `Long`, there is still underlying integer overflow issue, isn't it? Besides, users will never know the low level details and the effect to chunk size when modify `pageSizeBytes`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15915 **[Test build #70008 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70008/consoleFull)** for PR 15915 at commit [`8551892`](https://github.com/apache/spark/commit/85518921494bb4e24fcd913bafba45025da126cd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15915 **[Test build #70007 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70007/consoleFull)** for PR 15915 at commit [`45aeddb`](https://github.com/apache/spark/commit/45aeddb95984fb9e3940bea3e6227977f44033e8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16245 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16245 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/6/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16146: [SPARK-18091] [SQL] [BACKPORT-1.6] Deep if expressions c...
Github user kapilsingh5050 commented on the issue: https://github.com/apache/spark/pull/16146 Yes, I'll do that but the test failures here are different. I'm still to figure out the root cause. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16245 **[Test build #6 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/6/consoleFull)** for PR 16245 at commit [`66a3d98`](https://github.com/apache/spark/commit/66a3d983d978b902858a34dde992640a489f5351). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16104: [SPARK-18675][SQL] CTAS for hive serde table should work...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16104 **[Test build #70006 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70006/consoleFull)** for PR 16104 at commit [`8607425`](https://github.com/apache/spark/commit/8607425d025944204ae38c38679a9204ffd1c144). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16146: [SPARK-18091] [SQL] [BACKPORT-1.6] Deep if expressions c...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16146 hi @kapilsingh5050 can you also include https://github.com/apache/spark/pull/16244? It does fix the maven tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16104: [SPARK-18675][SQL] CTAS for hive serde table should work...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16104 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16104: [SPARK-18675][SQL] CTAS for hive serde table shou...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16104#discussion_r91886939 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -121,21 +121,61 @@ case class InsertIntoHiveTable( return dir } - private def getExternalScratchDir(extURI: URI, hadoopConf: Configuration): Path = { -getStagingDir(new Path(extURI.getScheme, extURI.getAuthority, extURI.getPath), hadoopConf) + private def getExternalScratchDir(extURI: URI): Path = { +getStagingDir(new Path(extURI.getScheme, extURI.getAuthority, extURI.getPath)) } - def getExternalTmpPath(path: Path, hadoopConf: Configuration): Path = { + def getExternalTmpPath(path: Path): Path = { +val hiveVersion = externalCatalog.asInstanceOf[HiveExternalCatalog].client.version.fullVersion +if (hiveVersion.startsWith("0.12") || + hiveVersion.startsWith("0.13") || + hiveVersion.startsWith("0.14") || + hiveVersion.startsWith("1.0")) { + oldStyleExternalTempPath(path) +} else if (hiveVersion.startsWith("1.1") || hiveVersion.startsWith("1.2")) { + newStyleExternalTempPath(path) +} else { + throw new IllegalStateException("Unsupported hive version: " + hiveVersion) --- End diff -- uh, I see. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16245 **[Test build #70005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70005/consoleFull)** for PR 16245 at commit [`63c50b8`](https://github.com/apache/spark/commit/63c50b8066a77506c6751710d5b5b5edb77ca933). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16104: [SPARK-18675][SQL] CTAS for hive serde table shou...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16104#discussion_r91886458 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -121,21 +121,61 @@ case class InsertIntoHiveTable( return dir } - private def getExternalScratchDir(extURI: URI, hadoopConf: Configuration): Path = { -getStagingDir(new Path(extURI.getScheme, extURI.getAuthority, extURI.getPath), hadoopConf) + private def getExternalScratchDir(extURI: URI): Path = { +getStagingDir(new Path(extURI.getScheme, extURI.getAuthority, extURI.getPath)) } - def getExternalTmpPath(path: Path, hadoopConf: Configuration): Path = { + def getExternalTmpPath(path: Path): Path = { +val hiveVersion = externalCatalog.asInstanceOf[HiveExternalCatalog].client.version.fullVersion +if (hiveVersion.startsWith("0.12") || + hiveVersion.startsWith("0.13") || + hiveVersion.startsWith("0.14") || + hiveVersion.startsWith("1.0")) { + oldStyleExternalTempPath(path) +} else if (hiveVersion.startsWith("1.1") || hiveVersion.startsWith("1.2")) { + newStyleExternalTempPath(path) +} else { + throw new IllegalStateException("Unsupported hive version: " + hiveVersion) --- End diff -- We will fail in other places any way, e.g. `IsolatedClientLoader.hiveVersion` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r91886381 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,6 +458,78 @@ private[spark] class Executor( } /** + * Supervises the killing / cancellation of a task by sending the interrupted flag, optionally + * sending a Thread.interrupt(), and monitoring the task until it finishes. + */ + private class TaskReaper( + taskRunner: TaskRunner, + val interruptThread: Boolean) +extends Runnable { + +private[this] val taskId: Long = taskRunner.taskId + +private[this] val killPollingFrequencyMs: Long = + conf.getTimeAsMs("spark.task.killPollingFrequency", "10s") --- End diff -- +1 on the naming suggestion; I'll do this tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16252: [SPARK-18827][Core] Fix cannot read broadcast on disk
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16252 **[Test build #70004 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70004/consoleFull)** for PR 16252 at commit [`58acc06`](https://github.com/apache/spark/commit/58acc06148e243420ab12ced77749be1767c4bc0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16189: [SPARK-18761][CORE] Introduce "task reaper" to oversee t...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/16189 @lins05, I'll see if there's a way to get a nicer executor exit status to be reported back to the driver. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r91886287 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,6 +458,78 @@ private[spark] class Executor( } /** + * Supervises the killing / cancellation of a task by sending the interrupted flag, optionally + * sending a Thread.interrupt(), and monitoring the task until it finishes. + */ + private class TaskReaper( + taskRunner: TaskRunner, + val interruptThread: Boolean) +extends Runnable { + +private[this] val taskId: Long = taskRunner.taskId + +private[this] val killPollingFrequencyMs: Long = + conf.getTimeAsMs("spark.task.killPollingFrequency", "10s") + +private[this] val killTimeoutMs: Long = conf.getTimeAsMs("spark.task.killTimeout", "2m") + +private[this] val takeThreadDump: Boolean = + conf.getBoolean("spark.task.threadDumpKilledTasks", true) + +override def run(): Unit = { + val startTimeMs = System.currentTimeMillis() + def elapsedTimeMs = System.currentTimeMillis() - startTimeMs + try { +while (!taskRunner.isFinished && (elapsedTimeMs < killTimeoutMs || killTimeoutMs <= 0)) { + taskRunner.kill(interruptThread = interruptThread) --- End diff -- That's a good point. I'll update this tomorrow to only interrupt once. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14638 Is there any high-level API that we can use to read hive tables? The current hive table reader is so low-level that we have to support features like `skip.header.line.count`, and I think it doesn't work well with non-file based hive tables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16245#discussion_r91886133 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -514,6 +514,25 @@ case class OptimizeCodegen(conf: CatalystConf) extends Rule[LogicalPlan] { /** + * Reorders the predicates in `Filter` so more expensive expressions like UDF can evaluate later. + */ +object ReorderPredicatesInFilter extends Rule[LogicalPlan] with PredicateHelper { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case Filter(pred, child) => + // Reverses the expressions to get the suffix deterministic expressions in the predicate. + // E.g., the original expressions are 'a > 1, rand(0), 'b > 2, 'c > 3. + // The reversed expressions are 'c > 3, 'b > 2, rand(0), 'a > 1. + // The suffix deterministic expressions are 'c > 3, 'b > 2. + val (deterministicExprs, others) = splitConjunctivePredicates(pred).reverse --- End diff -- the split is widely used in optimizer. i think i may rewrite this reverse and span to alleviate performance concern. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16252: [SPARK-18827][Core] Fix cannot read broadcast on ...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/16252 [SPARK-18827][Core] Fix cannot read broadcast on disk ## What changes were proposed in this pull request? Fix cannot read broadcast on disk ## How was this patch tested? Add unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-18827 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16252.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16252 commit 58acc06148e243420ab12ced77749be1767c4bc0 Author: Yuming Wang Date: 2016-12-12T05:44:20Z Fix cannot read broadcast on disk --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15915 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15915 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70001/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15915 **[Test build #70001 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70001/consoleFull)** for PR 15915 at commit [`b021557`](https://github.com/apache/spark/commit/b02155798061255ef04cf61a911a0ff6467a6a7a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16135 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69998/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16135 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16135 **[Test build #69998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69998/consoleFull)** for PR 16135 at commit [`16c47c5`](https://github.com/apache/spark/commit/16c47c5e5ada1ec17555e679fa424d5b93e082c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16219: [SPARK-18790][SS] Keep a general offset history of strea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16219 **[Test build #3493 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3493/consoleFull)** for PR 16219 at commit [`0830349`](https://github.com/apache/spark/commit/083034925d068c1c7c9123d97fc3e647da4faee4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16220 **[Test build #70003 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70003/consoleFull)** for PR 16220 at commit [`4be4149`](https://github.com/apache/spark/commit/4be4149d81d9860445ce4b53ae5951c1467632f4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16220: [SPARK-18796][SS]StreamingQueryManager should not block ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16220 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16251: [SPARK-18826][SS]Add 'newestFirst' option to FileStreamS...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16251 **[Test build #70002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70002/consoleFull)** for PR 16251 at commit [`58a57d4`](https://github.com/apache/spark/commit/58a57d4004c45ff2290b95ec8c70ef95828d379b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16251: [SPARK-18826][SS]Add 'newestFirst' option to File...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/16251 [SPARK-18826][SS]Add 'newestFirst' option to FileStreamSource ## What changes were proposed in this pull request? When starting a stream with a lot of backfill and maxFilesPerTrigger, the user could often want to start with most recent files first. This would let you keep low latency for recent data and slowly backfill historical data. This PR adds a new option `newestFirst` to control this behavior. When it's true, `FileStreamSource` will sort the files by the modified time from newest to oldest, and take the first `maxFilesPerTrigger` files as a new batch. ## How was this patch tested? The added test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark newest-first Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16251.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16251 commit 58a57d4004c45ff2290b95ec8c70ef95828d379b Author: Shixiong Zhu Date: 2016-12-12T05:15:10Z Add 'newestFirst' option to FileStreamSource --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16248: [SPARK-18810][SPARKR] SparkR install.spark does n...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16248#discussion_r91883327 --- Diff: R/pkg/R/utils.R --- @@ -851,3 +851,12 @@ rbindRaws <- function(inputData){ out[!rawcolumns] <- lapply(out[!rawcolumns], unlist) out } + +# Get basename without extension from URL +basenameSansExtFromUrl <- function(url) { --- End diff -- can we use file_path_sans_ext [1] for removing the extension ? I worry we might publish it as `.tar.gz` someday and then removing just the last `.` will be insufficient [1] https://stat.ethz.ch/R-manual/R-patched/library/tools/html/fileutils.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16248: [SPARK-18810][SPARKR] SparkR install.spark does n...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/16248#discussion_r91883219 --- Diff: R/pkg/R/install.R --- @@ -104,7 +113,12 @@ install.spark <- function(hadoopVersion = "2.7", mirrorUrl = NULL, if (tarExists && !overwrite) { message("tar file found.") } else { -robustDownloadTar(mirrorUrl, version, hadoopVersion, packageName, packageLocalPath) +if (releaseUrl != "") { + message("Downloading from alternate URL:\n- ", releaseUrl) + downloadUrl(releaseUrl, packageLocalPath, paste0("Fetch failed from ", mirrorUrl)) --- End diff -- this should be `releaseUrl` instead of `mirrorUrl` in the `paste0` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16249 Can we open a JIRA for this ? Its good to track this change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16214: [SPARK-18325][SPARKR] Add example for using nativ...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16214#discussion_r91881854 --- Diff: examples/src/main/r/native-r-package.R --- @@ -0,0 +1,68 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# This example illustrates how to install third-party R packages to executors +# in your SparkR jobs distributed by "spark.lapply". +# +# Note: This example will install packages to a temporary directory on your machine. +# The directory will be removed automatically when the example exit. +# You environment should be connected to internet to run this example, +# otherwise, you should change "repos" to your private repository url. +# And the environment need to have necessary tools such as gcc to compile +# and install R package "e1071". +# +# To run this example use +# ./bin/spark-submit examples/src/main/r/native-r-package.R + +# Load SparkR library into your R session +library(SparkR) + +# Initialize SparkSession +sparkR.session(appName = "SparkR-native-r-package-example") + +# $example on$ +# The directory where the third-party R packages are installed. +libDir <- paste0(tempdir(), "/", "Rlib") +dir.create(libDir) + +# Downloaded e1071 package source code to a directory +packagesDir <- paste0(tempdir(), "/", "packages") +dir.create(packagesDir) +download.packages("e1071", packagesDir, repos = "https://cran.r-project.org";) +filename <- list.files(packagesDir, "^e1071") +packagesPath <- file.path(packagesDir, filename) +# Add the third-party R package to be downloaded with this Spark job on every node. +spark.addFile(packagesPath) + +path <- spark.getSparkFiles(filename) +costs <- exp(seq(from = log(1), to = log(1000), length.out = 5)) +train <- function(cost) { +if("e1071" %in% rownames(installed.packages(lib = libDir)) == FALSE) { +install.packages(path, repos = NULL, type = "source") --- End diff -- Yeah, we have the package content, but it's source package rather than binary package, so we can not use ```library``` to load the package. This is the pain point for this example. If we illustrate this example with binary package, we should provide scripts for different os version, and it require all nodes in users' cluster should have the same architecture. So I use source package, I think it's a more universal example. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16214: [SPARK-18325][SPARKR] Add example for using native R pac...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16214 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16214: [SPARK-18325][SPARKR] Add example for using native R pac...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16214 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/7/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16214: [SPARK-18325][SPARKR] Add example for using native R pac...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16214 **[Test build #7 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/7/consoleFull)** for PR 16214 at commit [`d3ec5fa`](https://github.com/apache/spark/commit/d3ec5fabf686c4a96a5032d716e5ef1eff7fb8c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16245#discussion_r91881557 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala --- @@ -514,6 +514,25 @@ case class OptimizeCodegen(conf: CatalystConf) extends Rule[LogicalPlan] { /** + * Reorders the predicates in `Filter` so more expensive expressions like UDF can evaluate later. + */ +object ReorderPredicatesInFilter extends Rule[LogicalPlan] with PredicateHelper { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case Filter(pred, child) => + // Reverses the expressions to get the suffix deterministic expressions in the predicate. + // E.g., the original expressions are 'a > 1, rand(0), 'b > 2, 'c > 3. + // The reversed expressions are 'c > 3, 'b > 2, rand(0), 'a > 1. + // The suffix deterministic expressions are 'c > 3, 'b > 2. + val (deterministicExprs, others) = splitConjunctivePredicates(pred).reverse --- End diff -- how is the performance of this split, reverse, and span? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r91881499 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -57,6 +60,12 @@ public BufferHolder(UnsafeRow row, int initialSize) { this.row.pointTo(buffer, buffer.length); } + public BufferHolder(int initialSizeInBytes) { --- End diff -- This is a special use of `BufferHolder`. Better to add few comments explaining it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r91881279 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +58,93 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") +val array = ctx.freshName("array") -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val isPrimitiveArray = ctx.isPrimitiveType(et) +val primitiveTypeName = if (isPrimitiveArray) ctx.primitiveTypeName(et) else "" +val (preprocess, arrayData, arrayWriter) = + genArrayData.getCodeArrayData(ctx, et, children.size, isPrimitiveArray, array) + +ev.copy(code = + preprocess + ctx.splitExpressions( ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; +evals.zipWithIndex.map { case (eval, i) => + eval.code + +(if (isPrimitiveArray) { + (if (!children(i).nullable) { +s"\n$arrayWriter.write($i, ${eval.value});" + } else { +s""" +if (${eval.isNull}) { + $arrayWriter.setNull$primitiveTypeName($i); +} else { + $arrayWriter.write($i, ${eval.value}); +} + """ + }) } else { - $values[$i] = ${eval.value}; -} - """ + s""" + if (${eval.isNull}) { +$array[$i] = null; + } else { +$array[$i] = ${eval.value}; + } + """ +}) }) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") + s"\nfinal ArrayData ${ev.value} = $arrayData;\n", + isNull = "false") } override def prettyName: String = "array" } +private [sql] object genArrayData { --- End diff -- Name convention: genArrayData -> GenArrayData. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16195: [Spark-18765] [CORE] Make values for spark.yarn.{am|driv...
Github user daisukebe commented on the issue: https://github.com/apache/spark/pull/16195 @vanzin , 2.0 already has this capability per https://issues.apache.org/jira/browse/SPARK-529, thus my patch targets on 1.6. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16214: [SPARK-18325][SPARKR] Add example for using nativ...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16214#discussion_r91881006 --- Diff: docs/sparkr.md --- @@ -472,21 +472,17 @@ should fit in a single machine. If that is not the case they can do something li `dapply` -{% highlight r %} -# Perform distributed training of multiple models with spark.lapply. Here, we pass -# a read-only list of arguments which specifies family the generalized linear model should be. --- End diff -- Sounds good, updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r91880888 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -18,8 +18,11 @@ package org.apache.spark.sql.catalyst.expressions.codegen; import org.apache.spark.sql.catalyst.expressions.UnsafeRow; +import org.apache.spark.unsafe.array.ByteArrayMethods; --- End diff -- and this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r91880870 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -18,8 +18,11 @@ package org.apache.spark.sql.catalyst.expressions.codegen; import org.apache.spark.sql.catalyst.expressions.UnsafeRow; +import org.apache.spark.unsafe.array.ByteArrayMethods; import org.apache.spark.unsafe.Platform; +import static org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.calculateHeaderPortionInBytes; --- End diff -- Unnecessary import? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15915: [SPARK-18485][CORE] Underlying integer overflow w...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/15915#discussion_r91880781 --- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala --- @@ -223,8 +222,10 @@ private[spark] abstract class MemoryManager( case MemoryMode.OFF_HEAP => offHeapExecutionMemoryPool.poolSize } val size = ByteArrayMethods.nextPowerOf2(maxTungstenMemory / cores / safetyFactor) -val default = math.min(maxPageSize, math.max(minPageSize, size)) -conf.getSizeAsBytes("spark.buffer.pageSize", default) +val maxPageSize = math.min(64L * minPageSize, math.max(minPageSize, size)) +val userSetting = conf.getSizeAsBytes("spark.buffer.pageSize") +// In case of too large page size. +math.min(userSetting, maxPageSize) } --- End diff -- @JoshRosen The `SparkEnv.memoryManager.pageSizeBytes` returns `Long`, if we reuse it as chunk size, there is still underlying integer overflow, isn't it? Here, I restricted the upper limit of page size in case of too large. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16245: [SQL][WIP] Add optimizer rule to reorder Filter predicat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16245 **[Test build #6 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/6/consoleFull)** for PR 16245 at commit [`66a3d98`](https://github.com/apache/spark/commit/66a3d983d978b902858a34dde992640a489f5351). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16214: [SPARK-18325][SPARKR] Add example for using native R pac...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16214 **[Test build #7 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/7/consoleFull)** for PR 16214 at commit [`d3ec5fa`](https://github.com/apache/spark/commit/d3ec5fabf686c4a96a5032d716e5ef1eff7fb8c1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15915 **[Test build #70001 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70001/consoleFull)** for PR 15915 at commit [`b021557`](https://github.com/apache/spark/commit/b02155798061255ef04cf61a911a0ff6467a6a7a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16135: [SPARK-18700][SQL] Add StripedLock for each table's rela...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16135 **[Test build #69998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69998/consoleFull)** for PR 16135 at commit [`16c47c5`](https://github.com/apache/spark/commit/16c47c5e5ada1ec17555e679fa424d5b93e082c0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16135: [SPARK-18700][SQL] Add StripedLock for each table...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/16135#discussion_r91879183 --- Diff: core/src/main/scala/org/apache/spark/metrics/source/StaticSources.scala --- @@ -105,6 +111,7 @@ object HiveCatalogMetrics extends Source { METRIC_FILE_CACHE_HITS.dec(METRIC_FILE_CACHE_HITS.getCount()) METRIC_HIVE_CLIENT_CALLS.dec(METRIC_HIVE_CLIENT_CALLS.getCount()) METRIC_PARALLEL_LISTING_JOB_COUNT.dec(METRIC_PARALLEL_LISTING_JOB_COUNT.getCount()) + METRIC_DATASOUCE_TABLE_CACHE_HITS.dec(METRIC_DATASOUCE_TABLE_CACHE_HITS.getCount()) --- End diff -- e...sorry, this new added metric will delete next patch like before comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/14638 Thank you so much, @jamartinh , @srowen , @HyukjinKwon , and @gatorsmile . We can distinguish the two existing problems separately here. First, **a)** Spark returns incorrect result for an existing Hive table already with `skip.header.line.count` table property. This is the most common use case which this issue aimed to solve. Second, more ridiculously, **b)** Spark can create a table with `skip.header.line.count` table property and only Hive returns the correct result from that table. **SPARK (Current master branch)** ```scala scala> sql("CREATE TABLE t2 (id INT, value VARCHAR(10)) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' TBLPROPERTIES('skip.header.line.count'='1')") scala> sql("LOAD DATA LOCAL INPATH '/data/test.csv' OVERWRITE INTO TABLE t2") scala> sql("SELECT * FROM t2").show ++-+ | id|value| ++-+ |null| c2| | 1|a| | 2|b| ++-+ ``` **Hive** ```scala hive> select * from t2; OK 1 a 2 b ``` @gatorsmile . I totally agree on the Apache Spark development direction. But, IMO, `TBLPROPERTIES` or `OPTION` is not a proper issue in this PR. It's because this PR only updates `TableReader.scala` to support the existing table property, case **a)**. For `TBLPROPERTIES`, I simply used that because it's already supported on Spark. I can update the PR description in order to focus on **a)** instead of **b)**. Someday later, Apache Spark may delete(or block) `TBLPROPERTIES` SQL syntax in favor of `OPTION` syntax. It's okay. It's just a kind of regression on purpose. No problem at all. However, even in that case, we had better read the Hive table with `skip.header.line.count` correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16086: [SPARK-18653][SQL] Fix incorrect space padding for unico...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/16086 I am thinking about an simpler approach. However, it is fine to close for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15297: [SPARK-9862]Handling data skew
Github user YuhuWang2002 commented on the issue: https://github.com/apache/spark/pull/15297 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15620: [SPARK-18091] [SQL] Deep if expressions cause Generated ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15620 the test is good now https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.0-test-maven-hadoop-2.2/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11045: [SPARK-8321][SQL][WIP] Authorization Support(on all oper...
Github user winningsix commented on the issue: https://github.com/apache/spark/pull/11045 @yaooqinn yes, the validation is working on server side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/15915 @srowen Sorry for the delay, I will update it as soon as possible on the basis of comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15904: [SPARK-18470][STREAMING][WIP] Provide Spark Streaming Mo...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/15904 @vanzin Sorry for the delay, I will update as soon as possible on the basis of your comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user michalsenkyr commented on the issue: https://github.com/apache/spark/pull/16240 Possible optimization: Instead of conversions using `to`, we can use `Builder`s. This way we could get rid of the conversion overhead. This would require adding a new codegen method that would operate similarly to `MapObjects` but use a provided `Builder` to build the collection directly. I will wait for a response to this PR before attempting any more modifications. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16249 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16249 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69996/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16249 **[Test build #69996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69996/consoleFull)** for PR 16249 at commit [`d237526`](https://github.com/apache/spark/commit/d237526a2aec8f2e5f57172f9933c8c2d1963d39). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16180: [DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16180 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69997/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16180: [DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16180 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16180: [DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16180 **[Test build #69997 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69997/consoleFull)** for PR 16180 at commit [`0b31d6c`](https://github.com/apache/spark/commit/0b31d6cc2bc245f5270b0de5f33cc5a66ad9f135). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13909 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13909 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69995/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13909 **[Test build #69995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69995/consoleFull)** for PR 13909 at commit [`438944b`](https://github.com/apache/spark/commit/438944b0cc79d824898d44032674cb77395b59fb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16180: [DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed
Github user anabranch commented on the issue: https://github.com/apache/spark/pull/16180 @srowen completed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16180: [DOCS][MINOR] Clarify Where AccumulatorV2s are Displayed
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16180 **[Test build #69997 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69997/consoleFull)** for PR 16180 at commit [`0b31d6c`](https://github.com/apache/spark/commit/0b31d6cc2bc245f5270b0de5f33cc5a66ad9f135). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16180: [DOCS][MINOR] Clarify Where AccumulatorV2s are Di...
Github user anabranch commented on a diff in the pull request: https://github.com/apache/spark/pull/16180#discussion_r91865764 --- Diff: docs/programming-guide.md --- @@ -1345,14 +1345,17 @@ therefore be efficiently supported in parallel. They can be used to implement co MapReduce) or sums. Spark natively supports accumulators of numeric types, and programmers can add support for new types. -If accumulators are created with a name, they will be -displayed in Spark's UI. This can be useful for understanding the progress of -running stages (NOTE: this is not yet supported in Python). +As a user, you can create `Accumulators` that are both named and unnamed. Named accumulators will display in Spark's UI along with their running totals during execution. As seen in the image below, a named accumulator (in this instance `counter`) will display --- End diff -- made these clarifications. I think it is important however to call out that they can be named or unnamed, so I just rephrased that. Just because something can have a name is not clear enough to me that it can also be unnamed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16249 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69994/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16249 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16249: [SPARKR] Refactor scripts for R
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16249 **[Test build #69994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69994/consoleFull)** for PR 16249 at commit [`550eaa9`](https://github.com/apache/spark/commit/550eaa9e551f171cd4bbda3cf4ff7bb1c70a61fd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (to ease...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16250 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16250: [CORE][MINOR] Stylistic changes in DAGScheduler (to ease...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16250 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69991/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org