[GitHub] spark issue #19020: [SPARK-3181] [ML] Implement huber loss for LinearRegress...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19020 **[Test build #82410 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82410/testReport)** for PR 19020 at commit [`8c6622f`](https://github.com/apache/spark/commit/8c6622f68ea81cedbeb3f03f957b335a99dedd46). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...
Github user szhem commented on the issue: https://github.com/apache/spark/pull/19294 @gatorsmile I believe that in Spark SQL code path `path` cannot be null, because in that case `FileFormatWriter` [fails even before](https://github.com/apache/spark/blob/3f958a99921d149fb9fdf7ba7e78957afdad1405/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L118) `setupJob` ([which in its order calls setupCommitter](https://github.com/apache/spark/blob/e47f48c737052564e92903de16ff16707fae32c3/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L124)) on the committer is called. The interesting part is that the [FileOutputCommitter allows null output paths](https://github.com/apache/hadoop/blob/5af572b6443715b7a741296c1bd520a1840f9a7c/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java#L96) and the line you highlighted is executed only in that case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/19370#discussion_r142317006 --- Diff: bin/sparkR2.cmd --- @@ -18,7 +18,7 @@ rem limitations under the License. rem --- End diff -- it looks like we should add this to the appveyor list... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19416: [SPARK-22187][SS] Update unsaferow format for saved stat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19416 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19416: [SPARK-22187][SS] Update unsaferow format for saved stat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19416 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82407/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19416: [SPARK-22187][SS] Update unsaferow format for saved stat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19416 **[Test build #82407 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82407/testReport)** for PR 19416 at commit [`64a8d86`](https://github.com/apache/spark/commit/64a8d865f71a92ed9f76879eb6c5a24d1fef8cec). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FlatMapGroupsWithState_StateManager(` * `case class FlatMapGroupsWithState_StateData(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19418: [SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19418 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19418: [SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19418 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82408/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19418: [SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19418 **[Test build #82408 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82408/testReport)** for PR 19418 at commit [`0fa4d61`](https://github.com/apache/spark/commit/0fa4d6154a4fe9d46c020dc979a0a835776cd83d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19393: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19393 One minor comment otherwise LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19417: [SPARK-22158][SQL][BRANCH-2.2] convertMetastore should n...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19417 This is the backport of #19382 , @gatorsmile . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19418: [SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19418 **[Test build #82409 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82409/testReport)** for PR 19418 at commit [`c717e9b`](https://github.com/apache/spark/commit/c717e9b8011942536d6b94831c671b4d8fdd7047). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19393: [SPARK-21644][SQL] LocalLimit.maxRows is defined ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19393#discussion_r142311845 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -296,13 +296,20 @@ object LimitPushDown extends Rule[LogicalPlan] { } } - private def maybePushLimit(limitExp: Expression, plan: LogicalPlan): LogicalPlan = { -(limitExp, plan.maxRows) match { - case (IntegerLiteral(maxRow), Some(childMaxRows)) if maxRow < childMaxRows => + private def maybePushLocalLimit(limitExp: Expression, plan: LogicalPlan): LogicalPlan = { +(limitExp, plan.maxRowsPerPartition) match { + case (IntegerLiteral(newLimit), Some(childMaxRows)) if newLimit < childMaxRows => +// If the child has a cap on max rows per partition and the cap is smaller than +// the new limit, put a new LocalLimit there. --- End diff -- I think it is `the cap is larger than the new limit`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19418: [SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19418 **[Test build #82408 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82408/testReport)** for PR 19418 at commit [`0fa4d61`](https://github.com/apache/spark/commit/0fa4d6154a4fe9d46c020dc979a0a835776cd83d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19370 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82405/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19370 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19418: [SPARK-19984][SQL] Fix for ERROR codegen.CodeGene...
GitHub user rekhajoshm opened a pull request: https://github.com/apache/spark/pull/19418 [SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: for generated java file ## What changes were proposed in this pull request? From the erratic error observed and quick analysis of code, it seems like leftKeys(i).dataType in SortMergeJoinExec is an AtomicType.It also seems that casting/promotion have played a role, as variable gets reported as long, but does not match the expected case flow of primitives {code} case dt: DataType if isPrimitiveType(dt) => s"($c1 > $c2 ? 1 : $c1 < $c2 ? -1 : 0)" {code} The fix is to not invoke compare if method does not exist. ## How was this patch tested? existing test Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rekhajoshm/spark SPARK-19984 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19418.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19418 commit e3677c9fa9697e0d34f9df52442085a6a481c9e9 Author: Rekha JoshiDate: 2015-05-05T23:10:08Z Merge pull request #1 from apache/master Pulling functionality from apache spark commit 106fd8eee8f6a6f7c67cfc64f57c1161f76d8f75 Author: Rekha Joshi Date: 2015-05-08T21:49:09Z Merge pull request #2 from apache/master pull latest from apache spark commit 0be142d6becba7c09c6eba0b8ea1efe83d649e8c Author: Rekha Joshi Date: 2015-06-22T00:08:08Z Merge pull request #3 from apache/master Pulling functionality from apache spark commit 6c6ee12fd733e3f9902e10faf92ccb78211245e3 Author: Rekha Joshi Date: 2015-09-17T01:03:09Z Merge pull request #4 from apache/master Pulling functionality from apache spark commit b123c601e459d1ad17511fd91dd304032154882a Author: Rekha Joshi Date: 2015-11-25T18:50:32Z Merge pull request #5 from apache/master pull request from apache/master commit c73c32aadd6066e631956923725a48d98a18777e Author: Rekha Joshi Date: 2016-03-18T19:13:51Z Merge pull request #6 from apache/master pull latest from apache spark commit 7dbf7320057978526635bed09dabc8cf8657a28a Author: Rekha Joshi Date: 2016-04-05T20:26:40Z Merge pull request #8 from apache/master pull latest from apache spark commit 5e9d71827f8e2e4d07027281b80e4e073e7fecd1 Author: Rekha Joshi Date: 2017-05-01T23:00:30Z Merge pull request #9 from apache/master Pull apache spark commit 63d99b3ce5f222d7126133170a373591f0ac67dd Author: Rekha Joshi Date: 2017-09-30T22:26:44Z Merge pull request #10 from apache/master pull latest apache spark commit 0fa4d6154a4fe9d46c020dc979a0a835776cd83d Author: rjoshi2 Date: 2017-10-03T04:40:53Z [SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: for generated java file --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19370 **[Test build #82405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82405/testReport)** for PR 19370 at commit [`d62ae59`](https://github.com/apache/spark/commit/d62ae59d892aa61a9f61af1411f2602a2b3e9ae1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19405: [SPARK-22178] [SQL] Refresh Persistent Views by REFRESH ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19405 LGTM except for one minor comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19417: [SPARK-22158][SQL][BRANCH-2.2] convertMetastore should n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19417 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82406/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19417: [SPARK-22158][SQL][BRANCH-2.2] convertMetastore should n...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19417 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19417: [SPARK-22158][SQL][BRANCH-2.2] convertMetastore should n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19417 **[Test build #82406 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82406/testReport)** for PR 19417 at commit [`47cb5ef`](https://github.com/apache/spark/commit/47cb5ef6badf6d509ae8f3e448a0cdfc4cd4f811). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19405: [SPARK-22178] [SQL] Refresh Persistent Views by R...
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19405#discussion_r142310673 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetadataCacheSuite.scala --- @@ -31,14 +31,22 @@ import org.apache.spark.sql.test.SQLTestUtils class HiveMetadataCacheSuite extends QueryTest with SQLTestUtils with TestHiveSingleton { test("SPARK-16337 temporary view refresh") { -withTempView("view_refresh") { +checkRefreshView(isTemp = true) + } + + test("view refresh") { --- End diff -- We didn't cover the persistent view case for refresh, that's why the bug happens... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19327: [SPARK-22136][SS] Implement stream-stream outer joins.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19327 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19327: [SPARK-22136][SS] Implement stream-stream outer joins.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19327 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82404/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19327: [SPARK-22136][SS] Implement stream-stream outer joins.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19327 **[Test build #82404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82404/testReport)** for PR 19327 at commit [`9a12c78`](https://github.com/apache/spark/commit/9a12c789ca7a871d12cb36f4b605673e93af8a43). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17819 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17819 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82403/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17819 **[Test build #82403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82403/testReport)** for PR 17819 at commit [`000844a`](https://github.com/apache/spark/commit/000844ab1f0dffef9b51b96f7edc1e1ab9e9e0b7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos
Github user tawfiqul-islam commented on the issue: https://github.com/apache/spark/pull/19390 Hi, is there any update on this issue? Is it fixed yet? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19389: [SPARK-22165][SQL] Resolve type conflicts between decima...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19389 Do you mean before / after in PR description? They are bugs to fix, aren't they? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19416#discussion_r142304163 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/FlatMapGroupsWithState_StateManager.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming.state + +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, BoundReference, CaseWhen, CreateNamedStruct, GetStructField, IsNull, Literal, UnsafeRow} +import org.apache.spark.sql.execution.ObjectOperator +import org.apache.spark.sql.execution.streaming.GroupStateImpl +import org.apache.spark.sql.execution.streaming.GroupStateImpl.NO_TIMESTAMP +import org.apache.spark.sql.types.{IntegerType, LongType, StructType} + + +class FlatMapGroupsWithState_StateManager( +stateEncoder: ExpressionEncoder[Any], +shouldStoreTimestamp: Boolean) extends Serializable { + + val stateSchema = { +val schema = new StructType().add("groupState", stateEncoder.schema, nullable = true) +if (shouldStoreTimestamp) schema.add("timeoutTimestamp", LongType) else schema + } + + def getState(store: StateStore, keyRow: UnsafeRow): FlatMapGroupsWithState_StateData = { +val stateRow = store.get(keyRow) +stateDataForGets.withNew( + keyRow, stateRow, getStateObj(stateRow), getTimestamp(stateRow)) + } + + def putState(store: StateStore, keyRow: UnsafeRow, state: Any, timestamp: Long): Unit = { +val stateRow = getStateRow(state) +setTimestamp(stateRow, timestamp) +store.put(keyRow, stateRow) + } + + def removeState(store: StateStore, keyRow: UnsafeRow): Unit = { +store.remove(keyRow) + } + + def getAllState(store: StateStore): Iterator[FlatMapGroupsWithState_StateData] = { +val stateDataForGetAllState = FlatMapGroupsWithState_StateData() +store.getRange(None, None).map { pair => + stateDataForGetAllState.withNew( +pair.key, pair.value, getStateObjFromRow(pair.value), getTimestamp(pair.value)) +} + } + + private val stateAttributes: Seq[Attribute] = stateSchema.toAttributes + + // Get the serializer for the state, taking into account whether we need to save timestamps + private val stateSerializer = { +val nestedStateExpr = CreateNamedStruct( + stateEncoder.namedExpressions.flatMap(e => Seq(Literal(e.name), e))) +if (shouldStoreTimestamp) { + Seq(nestedStateExpr, Literal(GroupStateImpl.NO_TIMESTAMP)) +} else { + Seq(nestedStateExpr) +} + } + + // Get the deserializer for the state. Note that this must be done in the driver, as + // resolving and binding of deserializer expressions to the encoded type can be safely done + // only in the driver. + private val stateDeserializer = { +val boundRefToNestedState = BoundReference(nestedStateOrdinal, stateEncoder.schema, true) +val deser = stateEncoder.resolveAndBind().deserializer.transformUp { + case BoundReference(ordinal, _, _) => GetStructField(boundRefToNestedState, ordinal) +} +CaseWhen(Seq(IsNull(boundRefToNestedState) -> Literal(null)), elseValue = deser).toCodegen() + } + + private lazy val nestedStateOrdinal = 0 + private lazy val timeoutTimestampOrdinal = 1 + + // Converters for translating state between rows and Java objects + private lazy val getStateObjFromRow = ObjectOperator.deserializeRowToObject( +stateDeserializer, stateAttributes) + private lazy val getStateRowFromObj = ObjectOperator.serializeObjectToRow(stateSerializer) + + private lazy val stateDataForGets = FlatMapGroupsWithState_StateData() + + /** Returns the state as Java object if defined */ + private def getStateObj(stateRow: UnsafeRow): Any = { +if
[GitHub] spark issue #19417: [SPARK-22158][SQL][BRANCH-2.2] convertMetastore should n...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19417 **[Test build #82406 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82406/testReport)** for PR 19417 at commit [`47cb5ef`](https://github.com/apache/spark/commit/47cb5ef6badf6d509ae8f3e448a0cdfc4cd4f811). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19416: [SPARK-22187][SS] Update unsaferow format for saved stat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19416 **[Test build #82407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82407/testReport)** for PR 19416 at commit [`64a8d86`](https://github.com/apache/spark/commit/64a8d865f71a92ed9f76879eb6c5a24d1fef8cec). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19416#discussion_r142303254 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala --- @@ -376,9 +388,35 @@ class FlatMapGroupsWithStateSuite extends StateStoreMetricsTest with BeforeAndAf expectedTimeoutTimestamp = currentBatchTimestamp + 5000) // timestamp should change testStateUpdateWithData( +s"ProcessingTimeTimeout - $testName - timeout updated after state removed", +stateUpdates = state => { state.remove(); state.setTimeoutDuration(5000) }, +timeoutConf = ProcessingTimeTimeout, +priorState = priorState, +priorTimeoutTimestamp = priorTimeoutTimestamp, +expectedState = None, +expectedTimeoutTimestamp = currentBatchTimestamp + 5000) + + // Tests with EventTimeTimeout + + if (priorState == None) { +testStateUpdateWithData( + s"EventTimeTimeout - $testName - setting timeout without init state not allowed", + stateUpdates = state => { +state.setTimeoutTimestamp(1) + }, + timeoutConf = EventTimeTimeout, + priorState = None, + priorTimeoutTimestamp = priorTimeoutTimestamp, + expectedState = None, + expectedTimeoutTimestamp = 1) + } + + testStateUpdateWithData( s"EventTimeTimeout - $testName - state and timeout timestamp updated", stateUpdates = - (state: GroupState[Int]) => { state.update(5); state.setTimeoutTimestamp(5000) }, + (state: GroupState[Int]) => { --- End diff -- undo this change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19416#discussion_r142303281 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala --- @@ -376,9 +388,35 @@ class FlatMapGroupsWithStateSuite extends StateStoreMetricsTest with BeforeAndAf expectedTimeoutTimestamp = currentBatchTimestamp + 5000) // timestamp should change testStateUpdateWithData( +s"ProcessingTimeTimeout - $testName - timeout updated after state removed", +stateUpdates = state => { state.remove(); state.setTimeoutDuration(5000) }, +timeoutConf = ProcessingTimeTimeout, +priorState = priorState, +priorTimeoutTimestamp = priorTimeoutTimestamp, +expectedState = None, +expectedTimeoutTimestamp = currentBatchTimestamp + 5000) + + // Tests with EventTimeTimeout + + if (priorState == None) { +testStateUpdateWithData( + s"EventTimeTimeout - $testName - setting timeout without init state not allowed", + stateUpdates = state => { --- End diff -- condense to single line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19417: [SPARK-22158][SQL][BRANCH-2.2] convertMetastore s...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/19417 [SPARK-22158][SQL][BRANCH-2.2] convertMetastore should not ignore table property ## What changes were proposed in this pull request? From the beginning, **convertMetastoreOrc** ignores table properties and use an empty map instead. This PR fixes that. **convertMetastoreParquet** also ignore. ```scala val options = Map[String, String]() ``` - [SPARK-14070: HiveMetastoreCatalog.scala](https://github.com/apache/spark/pull/11891/files#diff-ee66e11b56c21364760a5ed2b783f863R650) - [Master branch: HiveStrategies.scala](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L197 ) ## How was this patch tested? Pass the Jenkins with an updated test suite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-22158-BRANCH-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19417.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19417 commit 47cb5ef6badf6d509ae8f3e448a0cdfc4cd4f811 Author: Dongjoon HyunDate: 2017-10-02T22:00:26Z [SPARK-22158][SQL][BRANCH-2.2] convertMetastore should not ignore table property From the beginning, convertMetastoreOrc ignores table properties and use an empty map instead. This PR fixes that. For the diff, please see [this](https://github.com/apache/spark/pull/19382/files?w=1). convertMetastoreParquet also ignore. ```scala val options = Map[String, String]() ``` - [SPARK-14070: HiveMetastoreCatalog.scala](https://github.com/apache/spark/pull/11891/files#diff-ee66e11b56c21364760a5ed2b783f863R650) - [Master branch: HiveStrategies.scala](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L197 ) Pass the Jenkins with an updated test suite. Author: Dongjoon Hyun Closes #19382 from dongjoon-hyun/SPARK-22158. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19416#discussion_r142303228 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala --- @@ -397,50 +435,23 @@ class FlatMapGroupsWithStateSuite extends StateStoreMetricsTest with BeforeAndAf timeoutConf = EventTimeTimeout, priorState = priorState, priorTimeoutTimestamp = priorTimeoutTimestamp, -expectedState = Some(5), // state should change -expectedTimeoutTimestamp = NO_TIMESTAMP) // timestamp should not update -} - } - - // Currently disallowed cases for StateStoreUpdater.updateStateForKeysWithData(), - // Try to remove these cases in the future - for (priorTimeoutTimestamp <- Seq(NO_TIMESTAMP, 1000)) { --- End diff -- These functions test the cases where exception used to thrown to avoid null state + timeout to be saved. The exception checks have now been replaced by the correct output tests (see the added lines above). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19416#discussion_r142303057 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/FlatMapGroupsWithState_StateManager.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming.state + +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, BoundReference, CaseWhen, CreateNamedStruct, GetStructField, IsNull, Literal, UnsafeRow} +import org.apache.spark.sql.execution.ObjectOperator +import org.apache.spark.sql.execution.streaming.GroupStateImpl +import org.apache.spark.sql.execution.streaming.GroupStateImpl.NO_TIMESTAMP +import org.apache.spark.sql.types.{IntegerType, LongType, StructType} + + +class FlatMapGroupsWithState_StateManager( +stateEncoder: ExpressionEncoder[Any], +shouldStoreTimestamp: Boolean) extends Serializable { + + val stateSchema = { +val schema = new StructType().add("groupState", stateEncoder.schema, nullable = true) +if (shouldStoreTimestamp) schema.add("timeoutTimestamp", LongType) else schema + } + + def getState(store: StateStore, keyRow: UnsafeRow): FlatMapGroupsWithState_StateData = { +val stateRow = store.get(keyRow) +stateDataForGets.withNew( + keyRow, stateRow, getStateObj(stateRow), getTimestamp(stateRow)) + } + + def putState(store: StateStore, keyRow: UnsafeRow, state: Any, timestamp: Long): Unit = { +val stateRow = getStateRow(state) +setTimestamp(stateRow, timestamp) +store.put(keyRow, stateRow) + } + + def removeState(store: StateStore, keyRow: UnsafeRow): Unit = { +store.remove(keyRow) + } + + def getAllState(store: StateStore): Iterator[FlatMapGroupsWithState_StateData] = { +val stateDataForGetAllState = FlatMapGroupsWithState_StateData() +store.getRange(None, None).map { pair => + stateDataForGetAllState.withNew( +pair.key, pair.value, getStateObjFromRow(pair.value), getTimestamp(pair.value)) +} + } + + private val stateAttributes: Seq[Attribute] = stateSchema.toAttributes + + // Get the serializer for the state, taking into account whether we need to save timestamps + private val stateSerializer = { +val nestedStateExpr = CreateNamedStruct( + stateEncoder.namedExpressions.flatMap(e => Seq(Literal(e.name), e))) +if (shouldStoreTimestamp) { + Seq(nestedStateExpr, Literal(GroupStateImpl.NO_TIMESTAMP)) +} else { + Seq(nestedStateExpr) +} + } + + // Get the deserializer for the state. Note that this must be done in the driver, as + // resolving and binding of deserializer expressions to the encoded type can be safely done + // only in the driver. + private val stateDeserializer = { +val boundRefToNestedState = BoundReference(nestedStateOrdinal, stateEncoder.schema, true) +val deser = stateEncoder.resolveAndBind().deserializer.transformUp { + case BoundReference(ordinal, _, _) => GetStructField(boundRefToNestedState, ordinal) +} +CaseWhen(Seq(IsNull(boundRefToNestedState) -> Literal(null)), elseValue = deser).toCodegen() + } + + private lazy val nestedStateOrdinal = 0 + private lazy val timeoutTimestampOrdinal = 1 + + // Converters for translating state between rows and Java objects + private lazy val getStateObjFromRow = ObjectOperator.deserializeRowToObject( +stateDeserializer, stateAttributes) + private lazy val getStateRowFromObj = ObjectOperator.serializeObjectToRow(stateSerializer) + + private lazy val stateDataForGets = FlatMapGroupsWithState_StateData() + + /** Returns the state as Java object if defined */ + private def getStateObj(stateRow: UnsafeRow): Any = { +if
[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19416#discussion_r142303019 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/FlatMapGroupsWithState_StateManager.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming.state + +import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder +import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, BoundReference, CaseWhen, CreateNamedStruct, GetStructField, IsNull, Literal, UnsafeRow} +import org.apache.spark.sql.execution.ObjectOperator +import org.apache.spark.sql.execution.streaming.GroupStateImpl +import org.apache.spark.sql.execution.streaming.GroupStateImpl.NO_TIMESTAMP +import org.apache.spark.sql.types.{IntegerType, LongType, StructType} + + +class FlatMapGroupsWithState_StateManager( --- End diff -- Add docs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19416#discussion_r142302989 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala --- @@ -62,26 +60,7 @@ case class FlatMapGroupsWithStateExec( import GroupStateImpl._ private val isTimeoutEnabled = timeoutConf != NoTimeout - private val timestampTimeoutAttribute = -AttributeReference("timeoutTimestamp", dataType = IntegerType, nullable = false)() - private val stateAttributes: Seq[Attribute] = { -val encSchemaAttribs = stateEncoder.schema.toAttributes -if (isTimeoutEnabled) encSchemaAttribs :+ timestampTimeoutAttribute else encSchemaAttribs - } - // Get the serializer for the state, taking into account whether we need to save timestamps - private val stateSerializer = { -val encoderSerializer = stateEncoder.namedExpressions -if (isTimeoutEnabled) { - encoderSerializer :+ Literal(GroupStateImpl.NO_TIMESTAMP) -} else { - encoderSerializer -} - } - // Get the deserializer for the state. Note that this must be done in the driver, as - // resolving and binding of deserializer expressions to the encoded type can be safely done - // only in the driver. - private val stateDeserializer = stateEncoder.resolveAndBind().deserializer - + val stateManager = new FlatMapGroupsWithState_StateManager(stateEncoder, isTimeoutEnabled) --- End diff -- Refactored this class to separate out the state management from the processing. This results in this class being far simpler. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/19416 [SPARK-22187][SS] Update unsaferow format for saved state such that we can set timeouts when state is null ## What changes were proposed in this pull request? Currently, the group state of user-defined-type is encoded as top-level columns in the UnsafeRows stores in the state store. The timeout timestamp is also saved as (when needed) as the last top-level column. Since the group state is serialized to top-level columns, you cannot save "null" as a value of state (setting null in all the top-level columns is not equivalent). So we don't let the user set the timeout without initializing the state for a key. Based on user experience, this leads to confusion. This PR is to change the row format such that the state is saved as nested columns. This would allow the state to be set to null, and avoid these confusing corner cases. ## How was this patch tested? Refactored tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-22187 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19416.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19416 commit 301e0a15b87be8cd1c71090ece3497191bbd3881 Author: Tathagata DasDate: 2017-09-29T03:10:34Z Refactored all state operations into separate inner class commit 64a8d865f71a92ed9f76879eb6c5a24d1fef8cec Author: Tathagata Das Date: 2017-10-03T02:39:05Z Refactored and changed state format --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19382: [SPARK-22158][SQL] convertMetastore should not ignore ta...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19382 Thank you, @gatorsmile . I'll. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19061: [SPARK-21568][CORE][WIP] ConsoleProgressBar should only ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19061 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19061: [SPARK-21568][CORE][WIP] ConsoleProgressBar should only ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82401/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19061: [SPARK-21568][CORE][WIP] ConsoleProgressBar should only ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19061 **[Test build #82401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82401/testReport)** for PR 19061 at commit [`a465619`](https://github.com/apache/spark/commit/a465619c6393156c65ec808000f6ab35753c27a5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19370 **[Test build #82405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82405/testReport)** for PR 19370 at commit [`d62ae59`](https://github.com/apache/spark/commit/d62ae59d892aa61a9f61af1411f2602a2b3e9ae1). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19370#discussion_r142297093 --- Diff: bin/run-example.cmd --- @@ -17,6 +17,13 @@ rem See the License for the specific language governing permissions and rem limitations under the License. rem -set SPARK_HOME=%~dp0.. +rem Figure out where the Spark framework is installed +set FIND_SPARK_HOME_SCRIPT=%~dp0find_spark_home.py --- End diff -- Shouldn't we change this one too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19370 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19327: [SPARK-22136][SS] Implement stream-stream outer joins.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19327 **[Test build #82404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82404/testReport)** for PR 19327 at commit [`9a12c78`](https://github.com/apache/spark/commit/9a12c789ca7a871d12cb36f4b605673e93af8a43). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19389: [SPARK-22165][SQL] Resolve type conflicts between decima...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19389 @gatorsmile, could you elaborate which behaviour changes you mean? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17819 **[Test build #82403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82403/testReport)** for PR 17819 at commit [`000844a`](https://github.com/apache/spark/commit/000844ab1f0dffef9b51b96f7edc1e1ab9e9e0b7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17819 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17819 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17819 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82402/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17819 **[Test build #82402 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82402/testReport)** for PR 17819 at commit [`000844a`](https://github.com/apache/spark/commit/000844ab1f0dffef9b51b96f7edc1e1ab9e9e0b7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration thro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19413 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration thro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19413 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82399/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration thro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19413 **[Test build #82399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82399/testReport)** for PR 19413 at commit [`fd1df3d`](https://github.com/apache/spark/commit/fd1df3d7c4ded4bb431153f1195f7dbb0e811491). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...
Github user liufengdb commented on a diff in the pull request: https://github.com/apache/spark/pull/19394#discussion_r142290483 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala --- @@ -280,13 +280,20 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ results.toArray } + private[spark] def executeCollectIterator(): (Long, Iterator[InternalRow]) = { +val countsAndBytes = getByteArrayRdd().collect() --- End diff -- This still fetches all the compressed rows to the driver, before building the hashed relation. Ideally, you should fetch the rows from executors incrementally. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18460 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82398/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18460 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18460 **[Test build #82398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82398/testReport)** for PR 18460 at commit [`b2d1338`](https://github.com/apache/spark/commit/b2d13382310d9d53811d47434d8262ad371a4456). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19327: [SPARK-22136][SS] Implement stream-stream outer j...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/19327#discussion_r142288587 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -233,16 +234,53 @@ object UnsupportedOperationChecker { throwError("Full outer joins with streaming DataFrames/Datasets are not supported") } -case LeftOuter | LeftSemi | LeftAnti => +case LeftSemi | LeftAnti => if (right.isStreaming) { -throwError("Left outer/semi/anti joins with a streaming DataFrame/Dataset " + -"on the right is not supported") +throwError("Left semi/anti joins with a streaming DataFrame/Dataset " + +"on the right are not supported") } +// We support streaming left outer joins with static on the right always, and with +// stream on both sides under the appropriate conditions. +case LeftOuter => + if (!left.isStreaming && right.isStreaming) { +throwError("Left outer join with a streaming DataFrame/Dataset " + + "on the right and a static DataFrame/Dataset on the left is not supported") + } else if (left.isStreaming && right.isStreaming) { +val watermarkInJoinKeys = StreamingJoinHelper.isWatermarkInJoinKeys(subPlan) + +val hasValidWatermarkRange = + StreamingJoinHelper.getStateValueWatermark( +left.outputSet, right.outputSet, condition, Some(100)).isDefined + --- End diff -- extra line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/17819 @gatorsmile The related test is added. Please take a look again. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17819 **[Test build #82402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82402/testReport)** for PR 17819 at commit [`000844a`](https://github.com/apache/spark/commit/000844ab1f0dffef9b51b96f7edc1e1ab9e9e0b7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19394 cc @liufengdb --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17359: [SPARK-20028][SQL] Add aggreagate expression nGrams
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17359 cc @wzhfy @viirya Are you interested in reviewing this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19389: [SPARK-22165][SQL] Resolve type conflicts between decima...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19389 Please ensure no behavior change is introduced when fixing such issues. Also cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19389: [SPARK-22165][SQL] Resolve type conflicts between decima...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19389 This PR introduces the behavior changes. We are unable to do this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r142278786 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2120,6 +2120,19 @@ class Dataset[T] private[sql]( } /** + * Returns a new Dataset by adding columns with metadata. + */ + private[spark] def withColumns( + colNames: Seq[String], + cols: Seq[Column], + metadata: Seq[Metadata]): DataFrame = { +val newCols = colNames.zip(cols).zip(metadata).map { case ((colName, col), metadata) => --- End diff -- Yes. We should check the number of metadata too. I'll add it when adding related test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r142278697 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2120,6 +2120,19 @@ class Dataset[T] private[sql]( } /** + * Returns a new Dataset by adding columns with metadata. + */ + private[spark] def withColumns( --- End diff -- Yeah, I see. I've left comment https://github.com/apache/spark/pull/17819#discussion_r142172037 that I will add test later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r142278563 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2120,6 +2120,19 @@ class Dataset[T] private[sql]( } /** + * Returns a new Dataset by adding columns with metadata. + */ + private[spark] def withColumns( + colNames: Seq[String], + cols: Seq[Column], + metadata: Seq[Metadata]): DataFrame = { +val newCols = colNames.zip(cols).zip(metadata).map { case ((colName, col), metadata) => --- End diff -- Is that possible the number of elements in metadata do not match? Then, the results will be unexpected because of this impl --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19294 If `path` could be `null`, [this line](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SQLHadoopMapReduceCommitProtocol.scala#L54) will still fail with the error message like `Can not create a Path from a null string`. In our Spark SQL code path, how can `path` be null? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19061: [SPARK-21568][CORE] ConsoleProgressBar should only be en...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19061 **[Test build #82401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82401/testReport)** for PR 19061 at commit [`a465619`](https://github.com/apache/spark/commit/a465619c6393156c65ec808000f6ab35753c27a5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r142278414 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2120,6 +2120,19 @@ class Dataset[T] private[sql]( } /** + * Returns a new Dataset by adding columns with metadata. + */ + private[spark] def withColumns( --- End diff -- This is not being tested in SQL --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19390 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19390 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82400/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19390 **[Test build #82400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82400/testReport)** for PR 19390 at commit [`ce6902e`](https://github.com/apache/spark/commit/ce6902e445ff407ad2edc5a38392407eadfaf2e2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19392: [SPARK-22169][SQL] table name with numbers and ch...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19392#discussion_r142277890 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -510,11 +510,15 @@ rowFormat ; tableIdentifier -: (db=identifier '.')? table=identifier +: (db=identifierPart '.')? table=identifierPart ; functionIdentifier -: (db=identifier '.')? function=identifier +: (db=identifierPart '.')? function=identifierPart +; + +identifierPart +: identifier | BIGINT_LITERAL | SMALLINT_LITERAL | TINYINT_LITERAL | BYTELENGTH_LITERAL --- End diff -- But the rule in `validateName` allows a name consisting of numbers. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19390 **[Test build #82400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82400/testReport)** for PR 19390 at commit [`ce6902e`](https://github.com/apache/spark/commit/ce6902e445ff407ad2edc5a38392407eadfaf2e2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19294#discussion_r142276032 --- Diff: core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala --- @@ -57,6 +57,15 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: String) */ private def absPathStagingDir: Path = new Path(path, "_temporary-" + jobId) + /** + * Checks whether there are files to be committed to an absolute output location. + * + * As the committing and aborting the job occurs on driver where `addedAbsPathFiles` is always + * null, it is necessary to check whether the output path is specified, that may not be the case + * for committers not writing to distributed file systems. + */ + private def hasAbsPathFiles: Boolean = path != null --- End diff -- Please add `@param path` to line 38 for explaining it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19401: [SPARK-22176][SQL] Fix overflow issue in Dataset....
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19401 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19401: [SPARK-22176][SQL] Fix overflow issue in Dataset.show
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19401 Thanks! Merged to master/2.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19401: [SPARK-22176][SQL] Fix overflow issue in Dataset.show
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19401 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19402: [SPARK-22167][R][BUILD] sparkr packaging issue allow zin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19402 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82393/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19402: [SPARK-22167][R][BUILD] sparkr packaging issue allow zin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19402 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19402: [SPARK-22167][R][BUILD] sparkr packaging issue allow zin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19402 **[Test build #82393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82393/testReport)** for PR 19402 at commit [`40a7f6c`](https://github.com/apache/spark/commit/40a7f6cad2391d9f172695a14a2cbf91b7a38a83). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18098: [SPARK-16944][Mesos] Improve data locality when l...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18098 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18966#discussion_r142267872 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -769,16 +769,27 @@ class CodegenContext { foldFunctions: Seq[String] => String = _.mkString("", ";\n", ";")): String = { val blocks = new ArrayBuffer[String]() val blockBuilder = new StringBuilder() +val defaultMaxLines = 100 +val maxLines = if (SparkEnv.get != null) { + SparkEnv.get.conf.getInt("spark.sql.codegen.expressions.maxCodegenLinesPerFunction", --- End diff -- Can we use bytecode size? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19382: [SPARK-22158][SQL] convertMetastore should not ig...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19382 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration thro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19413 **[Test build #82399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82399/testReport)** for PR 19413 at commit [`fd1df3d`](https://github.com/apache/spark/commit/fd1df3d7c4ded4bb431153f1195f7dbb0e811491). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration thro...
Github user sahilTakiar commented on the issue: https://github.com/apache/spark/pull/19413 Fixed the style check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19382: [SPARK-22158][SQL] convertMetastore should not ignore ta...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19382 Please submit a separate PR to 2.2. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19382: [SPARK-22158][SQL] convertMetastore should not ignore ta...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19382 Thanks! Merged to master/2.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19412: [SPARK-22142][BUILD][STREAMING] Move Flume support behin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19412 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82392/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19412: [SPARK-22142][BUILD][STREAMING] Move Flume support behin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19412 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19412: [SPARK-22142][BUILD][STREAMING] Move Flume support behin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19412 **[Test build #82392 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82392/testReport)** for PR 19412 at commit [`4b45f9c`](https://github.com/apache/spark/commit/4b45f9c30a6e06cd4a3509f93c6370374e8f4c05). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19415: Branch 2.2 udf nullability
GitHub user ptkool opened a pull request: https://github.com/apache/spark/pull/19415 Branch 2.2 udf nullability ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Shopify/spark branch-2.2-udf_nullability Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19415.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19415 commit cfa6bcbe83b9a4b9607e23ac889963b6aa02f0d9 Author: Ryan BlueDate: 2017-05-01T21:48:02Z [SPARK-20540][CORE] Fix unstable executor requests. There are two problems fixed in this commit. First, the ExecutorAllocationManager sets a timeout to avoid requesting executors too often. However, the timeout is always updated based on its value and a timeout, not the current time. If the call is delayed by locking for more than the ongoing scheduler timeout, the manager will request more executors on every run. This seems to be the main cause of SPARK-20540. The second problem is that the total number of requested executors is not tracked by the CoarseGrainedSchedulerBackend. Instead, it calculates the value based on the current status of 3 variables: the number of known executors, the number of executors that have been killed, and the number of pending executors. But, the number of pending executors is never less than 0, even though there may be more known than requested. When executors are killed and not replaced, this can cause the request sent to YARN to be incorrect because there were too many executors due to the scheduler's state being slightly out of date. This is fixed by tracking the currently requested size explicitly. ## How was this patch tested? Existing tests. Author: Ryan Blue Closes #17813 from rdblue/SPARK-20540-fix-dynamic-allocation. (cherry picked from commit 2b2dd08e975dd7fbf261436aa877f1d7497ed31f) Signed-off-by: Marcelo Vanzin commit 5a0a8b0396df2feadb8333876cc08edf219fa177 Author: Sean Owen Date: 2017-05-02T00:01:05Z [SPARK-20459][SQL] JdbcUtils throws IllegalStateException: Cause already initialized after getting SQLException ## What changes were proposed in this pull request? Avoid failing to initCause on JDBC exception with cause initialized to null ## How was this patch tested? Existing tests Author: Sean Owen Closes #17800 from srowen/SPARK-20459. (cherry picked from commit af726cd6117de05c6e3b9616b8699d884a53651b) Signed-off-by: Xiao Li commit b7c1c2f973635a2ec05aedd89456765d830dfdce Author: Felix Cheung Date: 2017-05-02T04:03:48Z [SPARK-20192][SPARKR][DOC] SparkR migration guide to 2.2.0 ## What changes were proposed in this pull request? Updating R Programming Guide ## How was this patch tested? manually Author: Felix Cheung Closes #17816 from felixcheung/r22relnote. (cherry picked from commit d20a976e8918ca8d607af452301e8014fe14e64a) Signed-off-by: Felix Cheung commit b146481fff1ce529245f9c03b35c73ea604712d0 Author: Kazuaki Ishizaki Date: 2017-05-02T05:56:41Z [SPARK-20537][CORE] Fixing OffHeapColumnVector reallocation ## What changes were proposed in this pull request? As #17773 revealed `OnHeapColumnVector` may copy a part of the original storage. `OffHeapColumnVector` reallocation also copies to the new storage data up to 'elementsAppended'. This variable is only updated when using the `ColumnVector.appendX` API, while `ColumnVector.putX` is more commonly used. This PR copies the new storage data up to the previously-allocated size in`OffHeapColumnVector`. ## How was this patch tested? Existing test suites Author: Kazuaki Ishizaki Closes #17811 from kiszk/SPARK-20537. (cherry picked from commit afb21bf22a59c9416c04637412fb69d1442e6826) Signed-off-by: Wenchen Fan commit ef5e2a0509801f6afced3bc80f8d700acf84e0dd Author: Burak Yavuz Date:
[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18460 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org