[GitHub] spark issue #20124: [WIP][SPARK-22126][ML] Fix model-specific optimization s...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/20124 (Happy new year!) Just commented on the JIRA; let me know what you think. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20127: [SPARK-22932] [SQL] Refactor AnalysisContext
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20127#discussion_r159150948 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -70,6 +71,8 @@ object AnalysisContext { } def get: AnalysisContext = value.get() + def reset(): Unit = value.remove() --- End diff -- `private` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20127: [SPARK-22932] [SQL] Refactor AnalysisContext
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20127#discussion_r159150900 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -95,6 +98,17 @@ class Analyzer( this(catalog, conf, conf.optimizerMaxIterations) } + override def execute(plan: LogicalPlan): LogicalPlan = { +AnalysisContext.reset() +try { + executeSameContext(plan) +} finally { + AnalysisContext.reset() +} + } + + private def executeSameContext(plan: LogicalPlan): LogicalPlan = super.execute(plan) --- End diff -- `executeWithSameContext`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20132: [SPARK-13030][ML] Follow-up cleanups for OneHotEncoderEs...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20132 The simplified logic for encoder looks good to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20132: [SPARK-13030][ML] Follow-up cleanups for OneHotEncoderEs...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20132 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20133 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85576/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20133 **[Test build #85576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85576/testReport)** for PR 20133 at commit [`8ae8f18`](https://github.com/apache/spark/commit/8ae8f1832a62caf10a62511f339402c0d94f89ea). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20133 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20129: [SPARK-22933][SPARKR] R Structured Streaming API for wit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20129 **[Test build #85577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85577/testReport)** for PR 20129 at commit [`137d1cb`](https://github.com/apache/spark/commit/137d1cb186aa826842ff7897cfd165429fb0b44b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/20078 @felixcheung At the beginning, if numReceivers > totleExecutorCores, there is not cpu cores for batch processing, and `ExecutorAllocationManager` can't listen metrics of any batches. As a result, it doesn't work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20070: SPARK-22896 Improvement in String interpolation
Github user chetkhatri commented on a diff in the pull request: https://github.com/apache/spark/pull/20070#discussion_r159152519 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/LatentDirichletAllocationExample.scala --- @@ -46,7 +46,10 @@ object LatentDirichletAllocationExample { val topics = ldaModel.topicsMatrix for (topic <- Range(0, 3)) { print(s"Topic $topic :") - for (word <- Range(0, ldaModel.vocabSize)) { print(s" ${topics(word, topic)}") } + for (word <- Range(0, ldaModel.vocabSize)) + { --- End diff -- @srowen sure done. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19992: [SPARK-22805][CORE] Use StorageLevel aliases in e...
Github user superbobry commented on a diff in the pull request: https://github.com/apache/spark/pull/19992#discussion_r159153806 --- Diff: core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala --- @@ -2022,12 +1947,7 @@ private[spark] object JsonProtocolSuite extends Assertions { | "Port": 300 |}, |"Block ID": "rdd_0_0", - |"Storage Level": { --- End diff -- I've added a test ensuring all predefine storage levels can be read from the legacy format. Sidenote: I've also noticed that the legacy format incorrectly handled the predefined `StorageLevel.OFF_HEAP` and an fact any other custom storage level with `useOffHeap = true`. It looks like a bug to me, wdyt? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20129: [SPARK-22933][SPARKR] R Structured Streaming API for wit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20129 **[Test build #85577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85577/testReport)** for PR 20129 at commit [`137d1cb`](https://github.com/apache/spark/commit/137d1cb186aa826842ff7897cfd165429fb0b44b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20129: [SPARK-22933][SPARKR] R Structured Streaming API for wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20129 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85577/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20129: [SPARK-22933][SPARKR] R Structured Streaming API for wit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20129 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20133 **[Test build #85578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85578/testReport)** for PR 20133 at commit [`0894f5e`](https://github.com/apache/spark/commit/0894f5e5a6cac2f73ad30fc80de3cd82b3020de6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19992 **[Test build #85579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85579/testReport)** for PR 19992 at commit [`9fbfe40`](https://github.com/apache/spark/commit/9fbfe40f5ca83f080f56f3e91c7a6f3f27471df5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19992 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19992 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85579/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19992 **[Test build #85579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85579/testReport)** for PR 19992 at commit [`9fbfe40`](https://github.com/apache/spark/commit/9fbfe40f5ca83f080f56f3e91c7a6f3f27471df5). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19992 **[Test build #85580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85580/testReport)** for PR 19992 at commit [`cb1fe6a`](https://github.com/apache/spark/commit/cb1fe6a572d8085d36884bf950a840f972976458). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/20133#discussion_r159156756 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala --- @@ -39,6 +41,17 @@ object ParserUtils { throw new ParseException(s"Operation not allowed: $message", ctx) } + def duplicateClausesNotAllowed(message: String, ctx: ParserRuleContext): Nothing = { +throw new ParseException(s"Found duplicate clauses: $message", ctx) --- End diff -- We cannot merge these two functions to check the duplication? e.g., ``` def checkDuplicateClauses[T](nodes: util.List[T], clauseName: String, ctx: ParserRuleContext): Unit = { if (nodes.size() > 1) { throw new ParseException(s"Found duplicate clauses: $clauseName", ctx) } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19520: [SPARK-22298][WEB-UI] url encode APP id before ge...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19520 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19613: Fixed a typo
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19613 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19739: [SPARK-22513][BUILD] Provide build profile for ha...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19739 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20027: Branch 2.2
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20027 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19933: [SPARK-22744][CORE] Add a configuration to show t...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19933 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20104: Merge pull request #1 from apache/master
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20104 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18916: [SPARK-21705][CORE][DOC]Add spark.internal.config...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18916 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19936: Branch 0.5
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19936 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19919: [SPARK-22727] spark.executor.instances's default ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19919 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20131: [MINOR] Fix a bunch of typos
Github user srowen closed the pull request at: https://github.com/apache/spark/pull/20131 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20044: [SPARK-22857] Optimize code by inspecting code
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20044 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19035: [SPARK-21822][SQL]When insert Hive Table is finis...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19035 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19917: [SPARK-22725][SQL] Add failing test for select wi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19917 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20130: [BUILD] Close stale PRs
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20130 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20132: [SPARK-13030][ML] Follow-up cleanups for OneHotEn...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20132#discussion_r159157608 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoderEstimator.scala --- @@ -205,60 +210,58 @@ class OneHotEncoderModel private[ml] ( import OneHotEncoderModel._ - // Returns the category size for a given index with `dropLast` and `handleInvalid` + // Returns the category size for each index with `dropLast` and `handleInvalid` // taken into account. - private def configedCategorySize(orgCategorySize: Int, idx: Int): Int = { + private def getConfigedCategorySizes: Array[Int] = { val dropLast = getDropLast val keepInvalid = getHandleInvalid == OneHotEncoderEstimator.KEEP_INVALID if (!dropLast && keepInvalid) { // When `handleInvalid` is "keep", an extra category is added as last category // for invalid data. - orgCategorySize + 1 + categorySizes.map(_ + 1) } else if (dropLast && !keepInvalid) { // When `dropLast` is true, the last category is removed. - orgCategorySize - 1 + categorySizes.map(_ - 1) } else { // When `dropLast` is true and `handleInvalid` is "keep", the extra category for invalid // data is removed. Thus, it is the same as the plain number of categories. - orgCategorySize + categorySizes } } private def encoder: UserDefinedFunction = { -val oneValue = Array(1.0) -val emptyValues = Array.empty[Double] -val emptyIndices = Array.empty[Int] -val dropLast = getDropLast -val handleInvalid = getHandleInvalid -val keepInvalid = handleInvalid == OneHotEncoderEstimator.KEEP_INVALID +val keepInvalid = getHandleInvalid == OneHotEncoderEstimator.KEEP_INVALID +val configedSizes = getConfigedCategorySizes +val localCategorySizes = categorySizes // The udf performed on input data. The first parameter is the input value. The second -// parameter is the index of input. -udf { (label: Double, idx: Int) => - val plainNumCategories = categorySizes(idx) - val size = configedCategorySize(plainNumCategories, idx) - - if (label < 0) { -throw new SparkException(s"Negative value: $label. Input can't be negative.") - } else if (label == size && dropLast && !keepInvalid) { -// When `dropLast` is true and `handleInvalid` is not "keep", -// the last category is removed. -Vectors.sparse(size, emptyIndices, emptyValues) - } else if (label >= plainNumCategories && keepInvalid) { -// When `handleInvalid` is "keep", encodes invalid data to last category (and removed -// if `dropLast` is true) -if (dropLast) { - Vectors.sparse(size, emptyIndices, emptyValues) +// parameter is the index in inputCols of the column being encoded. +udf { (label: Double, colIdx: Int) => + val origCategorySize = localCategorySizes(colIdx) + // idx: index in vector of the single 1-valued element + val idx = if (label >= 0 && label < origCategorySize) { +label + } else { +if (keepInvalid) { + origCategorySize } else { - Vectors.sparse(size, Array(size - 1), oneValue) + if (label < 0) { +throw new SparkException(s"Negative value: $label. Input can't be negative. " + --- End diff -- I have a question. Since we don't allow negative value when fitting, should we allow it in transforming even handleInvalid is KEEP_INVALID? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20133 **[Test build #85578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85578/testReport)** for PR 20133 at commit [`0894f5e`](https://github.com/apache/spark/commit/0894f5e5a6cac2f73ad30fc80de3cd82b3020de6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20133 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20133 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85578/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19992 **[Test build #85580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85580/testReport)** for PR 19992 at commit [`cb1fe6a`](https://github.com/apache/spark/commit/cb1fe6a572d8085d36884bf950a840f972976458). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14180: [SPARK-16367][PYSPARK] Support for deploying Anaconda an...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/14180 gentle ping @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19992 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85580/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19992: [SPARK-22805][CORE] Use StorageLevel aliases in event lo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19992 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19077 ping @cloud-fan shall we continue with this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18714: [SPARK-20236][SQL] runtime partition overwrite
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/18714 Is this PR still targeted to 2.3? @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20127: [SPARK-22932] [SQL] Refactor AnalysisContext
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20127 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20068: [SPARK-17916][SQL] Fix empty string being parsed ...
Github user aa8y commented on a diff in the pull request: https://github.com/apache/spark/pull/20068#discussion_r159160508 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -152,7 +152,11 @@ class CSVOptions( writerSettings.setIgnoreLeadingWhitespaces(ignoreLeadingWhiteSpaceFlagInWrite) writerSettings.setIgnoreTrailingWhitespaces(ignoreTrailingWhiteSpaceFlagInWrite) writerSettings.setNullValue(nullValue) -writerSettings.setEmptyValue(nullValue) +// The Univocity parser parses empty strings as `null` by default. This is the default behavior +// for Spark too, since `nullValue` defaults to an empty string and has a higher precedence to +// setEmptyValue(). But when `nullValue` is set to a different value, that would mean that the +// empty string should be parsed not as `null` but as an empty string. +writerSettings.setEmptyValue("") --- End diff -- I talked about this with Hyukjin Kwon before. I think the previous behavior should _not_ be exposed as an option as the previous behavior was a bug. All it did was that it _always_ coerced empty values to `null`s. If the `nullValue` was not set, then the it was set to `""` by default which coerced `""` to `null`. The empty value being set to `""` had no affect in this case. If it was set to something else, say `\N`, then the empty value was also set to `\N` which resulted in parsing both `\N` and `""` to `null`, as `""` was no longer considered as an empty value and the `""` being coerced to null is the Univocity parser's default. Setting empty value explicitly to the `""` literal would ensure that an empty string is always parsed as empty string, unless `nullValue` is not set or it is set to `""`, which is what people would do if they want `""` to be parsed as `null`, which would be the old behavior. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20133#discussion_r159170624 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1971,8 +1971,8 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { s""" |CREATE TABLE t(a int, b int, c int, d int) |USING parquet - |PARTITIONED BY(a, b) |LOCATION "${dir.toURI}" + |PARTITIONED BY(a, b) --- End diff -- This is an end-to-end test for `ORDER-INSENSITIVENESS`. I do not want to introduce a new one for it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20133#discussion_r159170627 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -875,12 +875,13 @@ class HiveDDLSuite test("desc table for Hive table - bucketed + sorted table") { withTable("tbl") { - sql(s""" -CREATE TABLE tbl (id int, name string) -PARTITIONED BY (ds string) -CLUSTERED BY(id) -SORTED BY(id, name) INTO 1024 BUCKETS -""") + sql( +s""" + |CREATE TABLE tbl (id int, name string) + |CLUSTERED BY(id) + |SORTED BY(id, name) INTO 1024 BUCKETS + |PARTITIONED BY (ds string) +""".stripMargin) --- End diff -- The same here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20133 **[Test build #85584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85584/testReport)** for PR 20133 at commit [`68170bb`](https://github.com/apache/spark/commit/68170bb45c64bb5b694bfdffd7c7f02801f9b82e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20133: [SPARK-22934] [SQL] Make optional clauses order insensit...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20133 **[Test build #85585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85585/testReport)** for PR 20133 at commit [`9818ab5`](https://github.com/apache/spark/commit/9818ab53d5b32aa89fe825a8a6ebce867ed51f01). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20078 hmm, that sounds like a different problem, why is numReceivers set to > spark.cores.max? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20133#discussion_r159170197 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -408,9 +417,17 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { .map(visitIdentifierList(_).toArray) .getOrElse(Array.empty[String]) val properties = Option(ctx.tableProps).map(visitPropertyKeyValues).getOrElse(Map.empty) -val bucketSpec = Option(ctx.bucketSpec()).map(visitBucketSpec) +val bucketSpec = if (ctx.bucketSpec().size > 1) { + duplicateClausesNotAllowed("CLUSTERED BY", ctx) --- End diff -- Sure --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20133#discussion_r159168759 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1971,8 +1971,8 @@ abstract class DDLSuite extends QueryTest with SQLTestUtils { s""" |CREATE TABLE t(a int, b int, c int, d int) |USING parquet - |PARTITIONED BY(a, b) |LOCATION "${dir.toURI}" + |PARTITIONED BY(a, b) --- End diff -- Is it a relevant change? Since the PR is about ORDER-INSENSITIVENESS, can we keep the original code instead of making an irrelevant change like this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19498 **[Test build #85581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85581/testReport)** for PR 19498 at commit [`174ec21`](https://github.com/apache/spark/commit/174ec2139a7e0af049e2954494525fd3fff145e2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20087 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19968: [SPARK-22769][CORE] When driver stopping, there i...
Github user KaiXinXiaoLei commented on a diff in the pull request: https://github.com/apache/spark/pull/19968#discussion_r159167935 --- Diff: core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala --- @@ -100,6 +102,7 @@ private[netty] class Dispatcher(nettyEnv: NettyRpcEnv, numUsableCores: Int) exte return } unregisterRpcEndpoint(rpcEndpointRef.name) + endpointsIsStopped.putIfAbsent(rpcEndpointRef.name, true) --- End diff -- ok thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20133#discussion_r159168832 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala --- @@ -1153,65 +1191,165 @@ class DDLParserSuite extends PlanTest with SharedSQLContext { } } + test("Test CTAS against data source tables") { +val s1 = + """ +|CREATE TABLE IF NOT EXISTS mydb.page_view +|USING parquet +|COMMENT 'This is the staging page view table' +|LOCATION '/user/external/page_view' +|TBLPROPERTIES ('p1'='v1', 'p2'='v2') +|AS SELECT * FROM src + """.stripMargin + +val s2 = + """ +|CREATE TABLE IF NOT EXISTS mydb.page_view +|USING parquet +|LOCATION '/user/external/page_view' +|COMMENT 'This is the staging page view table' +|TBLPROPERTIES ('p1'='v1', 'p2'='v2') +|AS SELECT * FROM src + """.stripMargin + +val s3 = + """ +|CREATE TABLE IF NOT EXISTS mydb.page_view +|USING parquet +|COMMENT 'This is the staging page view table' +|LOCATION '/user/external/page_view' +|TBLPROPERTIES ('p1'='v1', 'p2'='v2') +|AS SELECT * FROM src + """.stripMargin + +checkParsing(s1) +checkParsing(s2) +checkParsing(s3) + +def checkParsing(sql: String): Unit = { + val (desc, exists) = extractTableDesc(sql) + assert(exists) + assert(desc.identifier.database == Some("mydb")) + assert(desc.identifier.table == "page_view") + assert(desc.storage.locationUri == Some(new URI("/user/external/page_view"))) + assert(desc.schema.isEmpty) // will be populated later when the table is actually created + assert(desc.comment == Some("This is the staging page view table")) + assert(desc.viewText.isEmpty) + assert(desc.viewDefaultDatabase.isEmpty) + assert(desc.viewQueryColumnNames.isEmpty) + assert(desc.partitionColumnNames.isEmpty) + assert(desc.provider == Some("parquet")) + assert(desc.properties == Map("p1" -> "v1", "p2" -> "v2")) +} + } + test("Test CTAS #1") { val s1 = - """CREATE EXTERNAL TABLE IF NOT EXISTS mydb.page_view + """ +|CREATE EXTERNAL TABLE IF NOT EXISTS mydb.page_view |COMMENT 'This is the staging page view table' |STORED AS RCFILE |LOCATION '/user/external/page_view' |TBLPROPERTIES ('p1'='v1', 'p2'='v2') -|AS SELECT * FROM src""".stripMargin +|AS SELECT * FROM src + """.stripMargin --- End diff -- nit. extra space before `"""`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19498 **[Test build #85581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85581/testReport)** for PR 19498 at commit [`174ec21`](https://github.com/apache/spark/commit/174ec2139a7e0af049e2954494525fd3fff145e2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20133#discussion_r159168804 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -875,12 +875,13 @@ class HiveDDLSuite test("desc table for Hive table - bucketed + sorted table") { withTable("tbl") { - sql(s""" -CREATE TABLE tbl (id int, name string) -PARTITIONED BY (ds string) -CLUSTERED BY(id) -SORTED BY(id, name) INTO 1024 BUCKETS -""") + sql( +s""" + |CREATE TABLE tbl (id int, name string) + |CLUSTERED BY(id) + |SORTED BY(id, name) INTO 1024 BUCKETS + |PARTITIONED BY (ds string) +""".stripMargin) --- End diff -- Can we keep the original `HiveDDLSuite.scala` file, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20127: [SPARK-22932] [SQL] Refactor AnalysisContext
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20127 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20133#discussion_r159164626 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -408,9 +417,17 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { .map(visitIdentifierList(_).toArray) .getOrElse(Array.empty[String]) val properties = Option(ctx.tableProps).map(visitPropertyKeyValues).getOrElse(Map.empty) -val bucketSpec = Option(ctx.bucketSpec()).map(visitBucketSpec) +val bucketSpec = if (ctx.bucketSpec().size > 1) { + duplicateClausesNotAllowed("CLUSTERED BY", ctx) --- End diff -- Can you split the validation logic and the extraction logic? In this case I'd move the check to line 411 and do the extract on line 420. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support for Arra...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20114 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18714: [SPARK-20236][SQL] runtime partition overwrite
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18714 ah yes, please please :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20133#discussion_r159150554 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -384,22 +384,31 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { * CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name * USING table_provider * [OPTIONS table_property_list] - * [PARTITIONED BY (col_name, col_name, ...)] - * [CLUSTERED BY (col_name, col_name, ...) - *[SORTED BY (col_name [ASC|DESC], ...)] - *INTO num_buckets BUCKETS - * ] - * [LOCATION path] - * [COMMENT table_comment] - * [TBLPROPERTIES (property_name=property_value, ...)] + * create_table_clauses * [[AS] select_statement]; + * + * create_table_clauses (order insensitive): + * [PARTITIONED BY (col_name, col_name, ...)] --- End diff -- Isn't `[OPTIONS table_property_list]` one of `create_table_clauses`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20133#discussion_r159170195 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala --- @@ -39,6 +41,17 @@ object ParserUtils { throw new ParseException(s"Operation not allowed: $message", ctx) } + def duplicateClausesNotAllowed(message: String, ctx: ParserRuleContext): Nothing = { +throw new ParseException(s"Found duplicate clauses: $message", ctx) --- End diff -- Sounds good to me! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20087 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85582/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20106: [SPARK-21616][SPARKR][DOCS] update R migration gu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20106 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20114: [SPARK-22530][PYTHON][SQL] Adding Arrow support f...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20114 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19498 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85581/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20127: [SPARK-22932] [SQL] Refactor AnalysisContext
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20127 Thanks! Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19498 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19498 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20128: [SPARK-21893][SPARK-22142][TESTS][FOLLOWUP] Enabl...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20128 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20127: [SPARK-22932] [SQL] Refactor AnalysisContext
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20127#discussion_r159168110 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -70,6 +71,8 @@ object AnalysisContext { } def get: AnalysisContext = value.get() + def reset(): Unit = value.remove() --- End diff -- Will be resolved by the future PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20128: [SPARK-21893][SPARK-22142][TESTS][FOLLOWUP] Enables PySp...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20128 Merged to master. Thank you @srowen, @felixcheung and @ueshin for reviewing this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20131: [MINOR] Fix a bunch of typos
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20131 Merged to master - https://github.com/apache/spark/commit/c284c4e1f6f684ca8db1cc446fdcc43b46e3413c --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20133: [SPARK-22934] [SQL] Make optional clauses order i...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20133#discussion_r159170240 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -384,22 +384,31 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { * CREATE [TEMPORARY] TABLE [IF NOT EXISTS] [db_name.]table_name * USING table_provider * [OPTIONS table_property_list] - * [PARTITIONED BY (col_name, col_name, ...)] - * [CLUSTERED BY (col_name, col_name, ...) - *[SORTED BY (col_name [ASC|DESC], ...)] - *INTO num_buckets BUCKETS - * ] - * [LOCATION path] - * [COMMENT table_comment] - * [TBLPROPERTIES (property_name=property_value, ...)] + * create_table_clauses * [[AS] select_statement]; + * + * create_table_clauses (order insensitive): + * [PARTITIONED BY (col_name, col_name, ...)] --- End diff -- forgot it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20087 **[Test build #85582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85582/testReport)** for PR 20087 at commit [`e9f705d`](https://github.com/apache/spark/commit/e9f705d0ad783da5bd091632e98a6151d4d21cb6). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20106: [SPARK-21616][SPARKR][DOCS] update R migration guide and...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20106 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20087 **[Test build #85582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85582/testReport)** for PR 20087 at commit [`e9f705d`](https://github.com/apache/spark/commit/e9f705d0ad783da5bd091632e98a6151d4d21cb6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20087 **[Test build #85583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85583/testReport)** for PR 20087 at commit [`d3aa7a0`](https://github.com/apache/spark/commit/d3aa7a01320b6af2866d7cf7c4f178eb23eae3ad). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20082: [SPARK-22897][CORE]: Expose stageAttemptId in TaskContex...
Github user advancedxy commented on the issue: https://github.com/apache/spark/pull/20082 @cloud-fan Please take another look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19968: [SPARK-22769][CORE] When driver stopping, there is error...
Github user KaiXinXiaoLei commented on the issue: https://github.com/apache/spark/pull/19968 @srowen ok ,i will update, thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...
Github user CodingCat commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r159171970 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -261,6 +261,17 @@ object SQLConf { .booleanConf .createWithDefault(false) + val HADOOPFSRELATION_SIZE_FACTOR = buildConf( +"org.apache.spark.sql.execution.datasources.sizeFactor") --- End diff -- this is only for HadoopFSRelation --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20072 **[Test build #85586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85586/testReport)** for PR 20072 at commit [`e97f419`](https://github.com/apache/spark/commit/e97f419a5c3347242832287a9e5b0e5662f9e6bb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20072: [SPARK-22790][SQL] add a configurable factor to describe...
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/20072 @wzhfy thanks for the review, please take a look --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20087 **[Test build #85583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85583/testReport)** for PR 20087 at commit [`d3aa7a0`](https://github.com/apache/spark/commit/d3aa7a01320b6af2866d7cf7c4f178eb23eae3ad). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20087 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20087: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20087 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85583/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20078: [SPARK-22900] [Spark-Streaming] Remove unnecessary restr...
Github user sharkdtu commented on the issue: https://github.com/apache/spark/pull/20078 @felixcheung if you submit spark on yarn with `spark.streaming.dynamicAllocation.enabled=true`, the `num-executors` can not be set. So, at the begining, there are only 2(default value) executors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20127: [SPARK-22932] [SQL] Refactor AnalysisContext
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20127#discussion_r159174832 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -95,6 +98,17 @@ class Analyzer( this(catalog, conf, conf.optimizerMaxIterations) } + override def execute(plan: LogicalPlan): LogicalPlan = { +AnalysisContext.reset() +try { + executeSameContext(plan) +} finally { + AnalysisContext.reset() +} + } + + private def executeSameContext(plan: LogicalPlan): LogicalPlan = super.execute(plan) --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r159175087 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala --- @@ -82,7 +82,11 @@ case class HadoopFsRelation( } } - override def sizeInBytes: Long = location.sizeInBytes + override def sizeInBytes: Long = { +val sizeFactor = sqlContext.conf.sizeToMemorySizeFactor +(location.sizeInBytes * sizeFactor).toLong --- End diff -- we should add a safe check for overflow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20072: [SPARK-22790][SQL] add a configurable factor to d...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20072#discussion_r159175078 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -261,6 +261,17 @@ object SQLConf { .booleanConf .createWithDefault(false) + val DISK_TO_MEMORY_SIZE_FACTOR = buildConf( +"org.apache.spark.sql.execution.datasources.sizeFactor") --- End diff -- `...sizeFactor` is too vague, how about `fileDataSizeFactor`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19077: [SPARK-21860][core]Improve memory reuse for heap memory ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19077 The idea LGTM, but I think we can simplify the implementation to allow the memory allocator to return a larger memory than requested. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20082: [SPARK-22897][CORE]: Expose stageAttemptId in Tas...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20082#discussion_r159175320 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -79,6 +79,7 @@ private[spark] abstract class Task[T]( SparkEnv.get.blockManager.registerTask(taskAttemptId) context = new TaskContextImpl( stageId, + stageAttemptId, // stageAttemptId and stageAttemptNumber are semantically equal --- End diff -- How much work we need to rename the internal `stageAttemptId` to `stageAttemptNumber`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20082: [SPARK-22897][CORE]: Expose stageAttemptId in TaskContex...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20082 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20120: [SPARK-22926] [SQL] Respect table-level conf compression...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20120 I think we should document the difference between table options and properties. AFAIK we added table properties to data source tables since Spark 2.3, and previously table options is the only place for users to put some configs to change some behaviors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20119: [SPARK-21475][Core]Revert "[SPARK-21475][CORE] Use NIO's...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20119 let's also cc the author. @jerryshao do you know if there is a way to fix the regression? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org