[GitHub] spark issue #19020: [SPARK-3181] [ML] Implement huber loss for LinearRegress...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19020
  
**[Test build #82410 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82410/testReport)**
 for PR 19020 at commit 
[`8c6622f`](https://github.com/apache/spark/commit/8c6622f68ea81cedbeb3f03f957b335a99dedd46).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-02 Thread szhem
Github user szhem commented on the issue:

https://github.com/apache/spark/pull/19294
  
@gatorsmile I believe that in Spark SQL code path `path` cannot be null, 
because in that case `FileFormatWriter` [fails even 
before](https://github.com/apache/spark/blob/3f958a99921d149fb9fdf7ba7e78957afdad1405/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L118)
 `setupJob` ([which in its order calls 
setupCommitter](https://github.com/apache/spark/blob/e47f48c737052564e92903de16ff16707fae32c3/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L124))
 on the committer is called.

The interesting part is that the [FileOutputCommitter allows null output 
paths](https://github.com/apache/hadoop/blob/5af572b6443715b7a741296c1bd520a1840f9a7c/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java#L96)
 and the line you highlighted is executed only in that case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on...

2017-10-02 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/19370#discussion_r142317006
  
--- Diff: bin/sparkR2.cmd ---
@@ -18,7 +18,7 @@ rem limitations under the License.
 rem
--- End diff --

it looks like we should add this to the appveyor list...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19416: [SPARK-22187][SS] Update unsaferow format for saved stat...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19416
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19416: [SPARK-22187][SS] Update unsaferow format for saved stat...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19416
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82407/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19416: [SPARK-22187][SS] Update unsaferow format for saved stat...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19416
  
**[Test build #82407 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82407/testReport)**
 for PR 19416 at commit 
[`64a8d86`](https://github.com/apache/spark/commit/64a8d865f71a92ed9f76879eb6c5a24d1fef8cec).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FlatMapGroupsWithState_StateManager(`
  * `case class FlatMapGroupsWithState_StateData(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19418: [SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: ...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19418
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19418: [SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: ...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19418
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82408/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19418: [SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: ...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19418
  
**[Test build #82408 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82408/testReport)**
 for PR 19418 at commit 
[`0fa4d61`](https://github.com/apache/spark/commit/0fa4d6154a4fe9d46c020dc979a0a835776cd83d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19393: [SPARK-21644][SQL] LocalLimit.maxRows is defined incorre...

2017-10-02 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19393
  
One minor comment otherwise LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19417: [SPARK-22158][SQL][BRANCH-2.2] convertMetastore should n...

2017-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19417
  
This is the backport of #19382 , @gatorsmile .



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19418: [SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: ...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19418
  
**[Test build #82409 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82409/testReport)**
 for PR 19418 at commit 
[`c717e9b`](https://github.com/apache/spark/commit/c717e9b8011942536d6b94831c671b4d8fdd7047).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19393: [SPARK-21644][SQL] LocalLimit.maxRows is defined ...

2017-10-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19393#discussion_r142311845
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -296,13 +296,20 @@ object LimitPushDown extends Rule[LogicalPlan] {
 }
   }
 
-  private def maybePushLimit(limitExp: Expression, plan: LogicalPlan): 
LogicalPlan = {
-(limitExp, plan.maxRows) match {
-  case (IntegerLiteral(maxRow), Some(childMaxRows)) if maxRow < 
childMaxRows =>
+  private def maybePushLocalLimit(limitExp: Expression, plan: 
LogicalPlan): LogicalPlan = {
+(limitExp, plan.maxRowsPerPartition) match {
+  case (IntegerLiteral(newLimit), Some(childMaxRows)) if newLimit < 
childMaxRows =>
+// If the child has a cap on max rows per partition and the cap is 
smaller than
+// the new limit, put a new LocalLimit there.
--- End diff --

I think it is `the cap is larger than the new limit`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19418: [SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: ...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19418
  
**[Test build #82408 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82408/testReport)**
 for PR 19418 at commit 
[`0fa4d61`](https://github.com/apache/spark/commit/0fa4d6154a4fe9d46c020dc979a0a835776cd83d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19370
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82405/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19370
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19418: [SPARK-19984][SQL] Fix for ERROR codegen.CodeGene...

2017-10-02 Thread rekhajoshm
GitHub user rekhajoshm opened a pull request:

https://github.com/apache/spark/pull/19418

[SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: for generated java file

## What changes were proposed in this pull request?
From the erratic error observed and quick analysis of code, it seems like 
leftKeys(i).dataType in SortMergeJoinExec is an AtomicType.It also seems that 
casting/promotion have played a role, as variable gets reported as long, but 
does not match the expected case flow of primitives 
{code}   
case dt: DataType if isPrimitiveType(dt) => s"($c1 > $c2 ? 1 : $c1 < $c2 ? 
-1 : 0)" 
{code}
The fix is to not invoke compare if method does not exist.

## How was this patch tested?
existing test

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rekhajoshm/spark SPARK-19984

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19418


commit e3677c9fa9697e0d34f9df52442085a6a481c9e9
Author: Rekha Joshi 
Date:   2015-05-05T23:10:08Z

Merge pull request #1 from apache/master

Pulling functionality from apache spark

commit 106fd8eee8f6a6f7c67cfc64f57c1161f76d8f75
Author: Rekha Joshi 
Date:   2015-05-08T21:49:09Z

Merge pull request #2 from apache/master

pull latest from apache spark

commit 0be142d6becba7c09c6eba0b8ea1efe83d649e8c
Author: Rekha Joshi 
Date:   2015-06-22T00:08:08Z

Merge pull request #3 from apache/master

Pulling functionality from apache spark

commit 6c6ee12fd733e3f9902e10faf92ccb78211245e3
Author: Rekha Joshi 
Date:   2015-09-17T01:03:09Z

Merge pull request #4 from apache/master

Pulling functionality from apache spark

commit b123c601e459d1ad17511fd91dd304032154882a
Author: Rekha Joshi 
Date:   2015-11-25T18:50:32Z

Merge pull request #5 from apache/master

pull request from apache/master

commit c73c32aadd6066e631956923725a48d98a18777e
Author: Rekha Joshi 
Date:   2016-03-18T19:13:51Z

Merge pull request #6 from apache/master

pull latest from apache spark

commit 7dbf7320057978526635bed09dabc8cf8657a28a
Author: Rekha Joshi 
Date:   2016-04-05T20:26:40Z

Merge pull request #8 from apache/master

pull latest from apache spark

commit 5e9d71827f8e2e4d07027281b80e4e073e7fecd1
Author: Rekha Joshi 
Date:   2017-05-01T23:00:30Z

Merge pull request #9 from apache/master

Pull apache spark

commit 63d99b3ce5f222d7126133170a373591f0ac67dd
Author: Rekha Joshi 
Date:   2017-09-30T22:26:44Z

Merge pull request #10 from apache/master

pull latest apache spark

commit 0fa4d6154a4fe9d46c020dc979a0a835776cd83d
Author: rjoshi2 
Date:   2017-10-03T04:40:53Z

[SPARK-19984][SQL] Fix for ERROR codegen.CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: for generated java file




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19370
  
**[Test build #82405 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82405/testReport)**
 for PR 19370 at commit 
[`d62ae59`](https://github.com/apache/spark/commit/d62ae59d892aa61a9f61af1411f2602a2b3e9ae1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19405: [SPARK-22178] [SQL] Refresh Persistent Views by REFRESH ...

2017-10-02 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19405
  
LGTM except for one minor comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19417: [SPARK-22158][SQL][BRANCH-2.2] convertMetastore should n...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19417
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82406/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19417: [SPARK-22158][SQL][BRANCH-2.2] convertMetastore should n...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19417
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19417: [SPARK-22158][SQL][BRANCH-2.2] convertMetastore should n...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19417
  
**[Test build #82406 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82406/testReport)**
 for PR 19417 at commit 
[`47cb5ef`](https://github.com/apache/spark/commit/47cb5ef6badf6d509ae8f3e448a0cdfc4cd4f811).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19405: [SPARK-22178] [SQL] Refresh Persistent Views by R...

2017-10-02 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19405#discussion_r142310673
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveMetadataCacheSuite.scala 
---
@@ -31,14 +31,22 @@ import org.apache.spark.sql.test.SQLTestUtils
 class HiveMetadataCacheSuite extends QueryTest with SQLTestUtils with 
TestHiveSingleton {
 
   test("SPARK-16337 temporary view refresh") {
-withTempView("view_refresh") {
+checkRefreshView(isTemp = true)
+  }
+
+  test("view refresh") {
--- End diff --

We didn't cover the persistent view case for refresh, that's why the bug 
happens...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19327: [SPARK-22136][SS] Implement stream-stream outer joins.

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19327
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19327: [SPARK-22136][SS] Implement stream-stream outer joins.

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19327
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82404/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19327: [SPARK-22136][SS] Implement stream-stream outer joins.

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19327
  
**[Test build #82404 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82404/testReport)**
 for PR 19327 at commit 
[`9a12c78`](https://github.com/apache/spark/commit/9a12c789ca7a871d12cb36f4b605673e93af8a43).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17819
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17819
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82403/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17819
  
**[Test build #82403 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82403/testReport)**
 for PR 17819 at commit 
[`000844a`](https://github.com/apache/spark/commit/000844ab1f0dffef9b51b96f7edc1e1ab9e9e0b7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-10-02 Thread tawfiqul-islam
Github user tawfiqul-islam commented on the issue:

https://github.com/apache/spark/pull/19390
  
Hi, is there any update on this issue? Is it fixed yet?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19389: [SPARK-22165][SQL] Resolve type conflicts between decima...

2017-10-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19389
  
Do you mean before / after in PR description? They are bugs to fix, aren't 
they?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...

2017-10-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/19416#discussion_r142304163
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/FlatMapGroupsWithState_StateManager.scala
 ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference, BoundReference, CaseWhen, CreateNamedStruct, 
GetStructField, IsNull, Literal, UnsafeRow}
+import org.apache.spark.sql.execution.ObjectOperator
+import org.apache.spark.sql.execution.streaming.GroupStateImpl
+import org.apache.spark.sql.execution.streaming.GroupStateImpl.NO_TIMESTAMP
+import org.apache.spark.sql.types.{IntegerType, LongType, StructType}
+
+
+class FlatMapGroupsWithState_StateManager(
+stateEncoder: ExpressionEncoder[Any],
+shouldStoreTimestamp: Boolean) extends Serializable {
+
+  val stateSchema = {
+val schema = new StructType().add("groupState", stateEncoder.schema, 
nullable = true)
+if (shouldStoreTimestamp) schema.add("timeoutTimestamp", LongType) 
else schema
+  }
+
+  def getState(store: StateStore, keyRow: UnsafeRow): 
FlatMapGroupsWithState_StateData = {
+val stateRow = store.get(keyRow)
+stateDataForGets.withNew(
+  keyRow, stateRow, getStateObj(stateRow), getTimestamp(stateRow))
+  }
+
+  def putState(store: StateStore, keyRow: UnsafeRow, state: Any, 
timestamp: Long): Unit = {
+val stateRow = getStateRow(state)
+setTimestamp(stateRow, timestamp)
+store.put(keyRow, stateRow)
+  }
+
+  def removeState(store: StateStore, keyRow: UnsafeRow): Unit = {
+store.remove(keyRow)
+  }
+
+  def getAllState(store: StateStore): 
Iterator[FlatMapGroupsWithState_StateData] = {
+val stateDataForGetAllState = FlatMapGroupsWithState_StateData()
+store.getRange(None, None).map { pair =>
+  stateDataForGetAllState.withNew(
+pair.key, pair.value, getStateObjFromRow(pair.value), 
getTimestamp(pair.value))
+}
+  }
+
+  private val stateAttributes: Seq[Attribute] = stateSchema.toAttributes
+
+  // Get the serializer for the state, taking into account whether we need 
to save timestamps
+  private val stateSerializer = {
+val nestedStateExpr = CreateNamedStruct(
+  stateEncoder.namedExpressions.flatMap(e => Seq(Literal(e.name), e)))
+if (shouldStoreTimestamp) {
+  Seq(nestedStateExpr, Literal(GroupStateImpl.NO_TIMESTAMP))
+} else {
+  Seq(nestedStateExpr)
+}
+  }
+
+  // Get the deserializer for the state. Note that this must be done in 
the driver, as
+  // resolving and binding of deserializer expressions to the encoded type 
can be safely done
+  // only in the driver.
+  private val stateDeserializer = {
+val boundRefToNestedState = BoundReference(nestedStateOrdinal, 
stateEncoder.schema, true)
+val deser = stateEncoder.resolveAndBind().deserializer.transformUp {
+  case BoundReference(ordinal, _, _) => 
GetStructField(boundRefToNestedState, ordinal)
+}
+CaseWhen(Seq(IsNull(boundRefToNestedState) -> Literal(null)), 
elseValue = deser).toCodegen()
+  }
+
+  private lazy val nestedStateOrdinal = 0
+  private lazy val timeoutTimestampOrdinal = 1
+
+  // Converters for translating state between rows and Java objects
+  private lazy val getStateObjFromRow = 
ObjectOperator.deserializeRowToObject(
+stateDeserializer, stateAttributes)
+  private lazy val getStateRowFromObj = 
ObjectOperator.serializeObjectToRow(stateSerializer)
+
+  private lazy val stateDataForGets = FlatMapGroupsWithState_StateData()
+
+  /** Returns the state as Java object if defined */
+  private def getStateObj(stateRow: UnsafeRow): Any = {
+if 

[GitHub] spark issue #19417: [SPARK-22158][SQL][BRANCH-2.2] convertMetastore should n...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19417
  
**[Test build #82406 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82406/testReport)**
 for PR 19417 at commit 
[`47cb5ef`](https://github.com/apache/spark/commit/47cb5ef6badf6d509ae8f3e448a0cdfc4cd4f811).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19416: [SPARK-22187][SS] Update unsaferow format for saved stat...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19416
  
**[Test build #82407 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82407/testReport)**
 for PR 19416 at commit 
[`64a8d86`](https://github.com/apache/spark/commit/64a8d865f71a92ed9f76879eb6c5a24d1fef8cec).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...

2017-10-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/19416#discussion_r142303254
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala
 ---
@@ -376,9 +388,35 @@ class FlatMapGroupsWithStateSuite extends 
StateStoreMetricsTest with BeforeAndAf
 expectedTimeoutTimestamp = currentBatchTimestamp + 5000) // 
timestamp should change
 
   testStateUpdateWithData(
+s"ProcessingTimeTimeout - $testName - timeout updated after state 
removed",
+stateUpdates = state => { state.remove(); 
state.setTimeoutDuration(5000) },
+timeoutConf = ProcessingTimeTimeout,
+priorState = priorState,
+priorTimeoutTimestamp = priorTimeoutTimestamp,
+expectedState = None,
+expectedTimeoutTimestamp = currentBatchTimestamp + 5000)
+
+  // Tests with EventTimeTimeout
+
+  if (priorState == None) {
+testStateUpdateWithData(
+  s"EventTimeTimeout - $testName - setting timeout without init 
state not allowed",
+  stateUpdates = state => {
+state.setTimeoutTimestamp(1)
+  },
+  timeoutConf = EventTimeTimeout,
+  priorState = None,
+  priorTimeoutTimestamp = priorTimeoutTimestamp,
+  expectedState = None,
+  expectedTimeoutTimestamp = 1)
+  }
+
+  testStateUpdateWithData(
 s"EventTimeTimeout - $testName - state and timeout timestamp 
updated",
 stateUpdates =
-  (state: GroupState[Int]) => { state.update(5); 
state.setTimeoutTimestamp(5000) },
+  (state: GroupState[Int]) => {
--- End diff --

undo this change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...

2017-10-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/19416#discussion_r142303281
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala
 ---
@@ -376,9 +388,35 @@ class FlatMapGroupsWithStateSuite extends 
StateStoreMetricsTest with BeforeAndAf
 expectedTimeoutTimestamp = currentBatchTimestamp + 5000) // 
timestamp should change
 
   testStateUpdateWithData(
+s"ProcessingTimeTimeout - $testName - timeout updated after state 
removed",
+stateUpdates = state => { state.remove(); 
state.setTimeoutDuration(5000) },
+timeoutConf = ProcessingTimeTimeout,
+priorState = priorState,
+priorTimeoutTimestamp = priorTimeoutTimestamp,
+expectedState = None,
+expectedTimeoutTimestamp = currentBatchTimestamp + 5000)
+
+  // Tests with EventTimeTimeout
+
+  if (priorState == None) {
+testStateUpdateWithData(
+  s"EventTimeTimeout - $testName - setting timeout without init 
state not allowed",
+  stateUpdates = state => {
--- End diff --

condense to single line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19417: [SPARK-22158][SQL][BRANCH-2.2] convertMetastore s...

2017-10-02 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/19417

[SPARK-22158][SQL][BRANCH-2.2] convertMetastore should not ignore table 
property

## What changes were proposed in this pull request?

From the beginning, **convertMetastoreOrc** ignores table properties and 
use an empty map instead. This PR fixes that. **convertMetastoreParquet** also 
ignore.

```scala
val options = Map[String, String]()
```

- [SPARK-14070: 
HiveMetastoreCatalog.scala](https://github.com/apache/spark/pull/11891/files#diff-ee66e11b56c21364760a5ed2b783f863R650)
- [Master branch: 
HiveStrategies.scala](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L197
)

## How was this patch tested?

Pass the Jenkins with an updated test suite.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-22158-BRANCH-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19417


commit 47cb5ef6badf6d509ae8f3e448a0cdfc4cd4f811
Author: Dongjoon Hyun 
Date:   2017-10-02T22:00:26Z

[SPARK-22158][SQL][BRANCH-2.2] convertMetastore should not ignore table 
property

From the beginning, convertMetastoreOrc ignores table properties and use an 
empty map instead. This PR fixes that. For the diff, please see 
[this](https://github.com/apache/spark/pull/19382/files?w=1). 
convertMetastoreParquet also ignore.

```scala
val options = Map[String, String]()
```

- [SPARK-14070: 
HiveMetastoreCatalog.scala](https://github.com/apache/spark/pull/11891/files#diff-ee66e11b56c21364760a5ed2b783f863R650)
- [Master branch: 
HiveStrategies.scala](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L197
)

Pass the Jenkins with an updated test suite.

Author: Dongjoon Hyun 

Closes #19382 from dongjoon-hyun/SPARK-22158.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...

2017-10-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/19416#discussion_r142303228
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala
 ---
@@ -397,50 +435,23 @@ class FlatMapGroupsWithStateSuite extends 
StateStoreMetricsTest with BeforeAndAf
 timeoutConf = EventTimeTimeout,
 priorState = priorState,
 priorTimeoutTimestamp = priorTimeoutTimestamp,
-expectedState = Some(5), // state 
should change
-expectedTimeoutTimestamp = NO_TIMESTAMP) // 
timestamp should not update
-}
-  }
-
-  // Currently disallowed cases for 
StateStoreUpdater.updateStateForKeysWithData(),
-  // Try to remove these cases in the future
-  for (priorTimeoutTimestamp <- Seq(NO_TIMESTAMP, 1000)) {
--- End diff --

These functions test the cases where exception used to thrown to avoid null 
state + timeout to be saved. The exception checks have now been replaced by the 
correct output tests (see the added lines above).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...

2017-10-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/19416#discussion_r142303057
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/FlatMapGroupsWithState_StateManager.scala
 ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference, BoundReference, CaseWhen, CreateNamedStruct, 
GetStructField, IsNull, Literal, UnsafeRow}
+import org.apache.spark.sql.execution.ObjectOperator
+import org.apache.spark.sql.execution.streaming.GroupStateImpl
+import org.apache.spark.sql.execution.streaming.GroupStateImpl.NO_TIMESTAMP
+import org.apache.spark.sql.types.{IntegerType, LongType, StructType}
+
+
+class FlatMapGroupsWithState_StateManager(
+stateEncoder: ExpressionEncoder[Any],
+shouldStoreTimestamp: Boolean) extends Serializable {
+
+  val stateSchema = {
+val schema = new StructType().add("groupState", stateEncoder.schema, 
nullable = true)
+if (shouldStoreTimestamp) schema.add("timeoutTimestamp", LongType) 
else schema
+  }
+
+  def getState(store: StateStore, keyRow: UnsafeRow): 
FlatMapGroupsWithState_StateData = {
+val stateRow = store.get(keyRow)
+stateDataForGets.withNew(
+  keyRow, stateRow, getStateObj(stateRow), getTimestamp(stateRow))
+  }
+
+  def putState(store: StateStore, keyRow: UnsafeRow, state: Any, 
timestamp: Long): Unit = {
+val stateRow = getStateRow(state)
+setTimestamp(stateRow, timestamp)
+store.put(keyRow, stateRow)
+  }
+
+  def removeState(store: StateStore, keyRow: UnsafeRow): Unit = {
+store.remove(keyRow)
+  }
+
+  def getAllState(store: StateStore): 
Iterator[FlatMapGroupsWithState_StateData] = {
+val stateDataForGetAllState = FlatMapGroupsWithState_StateData()
+store.getRange(None, None).map { pair =>
+  stateDataForGetAllState.withNew(
+pair.key, pair.value, getStateObjFromRow(pair.value), 
getTimestamp(pair.value))
+}
+  }
+
+  private val stateAttributes: Seq[Attribute] = stateSchema.toAttributes
+
+  // Get the serializer for the state, taking into account whether we need 
to save timestamps
+  private val stateSerializer = {
+val nestedStateExpr = CreateNamedStruct(
+  stateEncoder.namedExpressions.flatMap(e => Seq(Literal(e.name), e)))
+if (shouldStoreTimestamp) {
+  Seq(nestedStateExpr, Literal(GroupStateImpl.NO_TIMESTAMP))
+} else {
+  Seq(nestedStateExpr)
+}
+  }
+
+  // Get the deserializer for the state. Note that this must be done in 
the driver, as
+  // resolving and binding of deserializer expressions to the encoded type 
can be safely done
+  // only in the driver.
+  private val stateDeserializer = {
+val boundRefToNestedState = BoundReference(nestedStateOrdinal, 
stateEncoder.schema, true)
+val deser = stateEncoder.resolveAndBind().deserializer.transformUp {
+  case BoundReference(ordinal, _, _) => 
GetStructField(boundRefToNestedState, ordinal)
+}
+CaseWhen(Seq(IsNull(boundRefToNestedState) -> Literal(null)), 
elseValue = deser).toCodegen()
+  }
+
+  private lazy val nestedStateOrdinal = 0
+  private lazy val timeoutTimestampOrdinal = 1
+
+  // Converters for translating state between rows and Java objects
+  private lazy val getStateObjFromRow = 
ObjectOperator.deserializeRowToObject(
+stateDeserializer, stateAttributes)
+  private lazy val getStateRowFromObj = 
ObjectOperator.serializeObjectToRow(stateSerializer)
+
+  private lazy val stateDataForGets = FlatMapGroupsWithState_StateData()
+
+  /** Returns the state as Java object if defined */
+  private def getStateObj(stateRow: UnsafeRow): Any = {
+if 

[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...

2017-10-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/19416#discussion_r142303019
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/FlatMapGroupsWithState_StateManager.scala
 ---
@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.state
+
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference, BoundReference, CaseWhen, CreateNamedStruct, 
GetStructField, IsNull, Literal, UnsafeRow}
+import org.apache.spark.sql.execution.ObjectOperator
+import org.apache.spark.sql.execution.streaming.GroupStateImpl
+import org.apache.spark.sql.execution.streaming.GroupStateImpl.NO_TIMESTAMP
+import org.apache.spark.sql.types.{IntegerType, LongType, StructType}
+
+
+class FlatMapGroupsWithState_StateManager(
--- End diff --

Add docs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...

2017-10-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/19416#discussion_r142302989
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala
 ---
@@ -62,26 +60,7 @@ case class FlatMapGroupsWithStateExec(
   import GroupStateImpl._
 
   private val isTimeoutEnabled = timeoutConf != NoTimeout
-  private val timestampTimeoutAttribute =
-AttributeReference("timeoutTimestamp", dataType = IntegerType, 
nullable = false)()
-  private val stateAttributes: Seq[Attribute] = {
-val encSchemaAttribs = stateEncoder.schema.toAttributes
-if (isTimeoutEnabled) encSchemaAttribs :+ timestampTimeoutAttribute 
else encSchemaAttribs
-  }
-  // Get the serializer for the state, taking into account whether we need 
to save timestamps
-  private val stateSerializer = {
-val encoderSerializer = stateEncoder.namedExpressions
-if (isTimeoutEnabled) {
-  encoderSerializer :+ Literal(GroupStateImpl.NO_TIMESTAMP)
-} else {
-  encoderSerializer
-}
-  }
-  // Get the deserializer for the state. Note that this must be done in 
the driver, as
-  // resolving and binding of deserializer expressions to the encoded type 
can be safely done
-  // only in the driver.
-  private val stateDeserializer = 
stateEncoder.resolveAndBind().deserializer
-
+  val stateManager = new FlatMapGroupsWithState_StateManager(stateEncoder, 
isTimeoutEnabled)
--- End diff --

Refactored this class to separate out the state management from the 
processing. This results in this class being far simpler.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19416: [SPARK-22187][SS] Update unsaferow format for sav...

2017-10-02 Thread tdas
GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/19416

[SPARK-22187][SS] Update unsaferow format for saved state such that we can 
set timeouts when state is null

## What changes were proposed in this pull request?

Currently, the group state of user-defined-type is encoded as top-level 
columns in the UnsafeRows stores in the state store. The timeout timestamp is 
also saved as (when needed) as the last top-level column. Since the group state 
is serialized to top-level columns, you cannot save "null" as a value of state 
(setting null in all the top-level columns is not equivalent). So we don't let 
the user set the timeout without initializing the state for a key. Based on 
user experience, this leads to confusion.

This PR is to change the row format such that the state is saved as nested 
columns. This would allow the state to be set to null, and avoid these 
confusing corner cases.

## How was this patch tested?
Refactored tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-22187

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19416.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19416


commit 301e0a15b87be8cd1c71090ece3497191bbd3881
Author: Tathagata Das 
Date:   2017-09-29T03:10:34Z

Refactored all state operations into separate inner class

commit 64a8d865f71a92ed9f76879eb6c5a24d1fef8cec
Author: Tathagata Das 
Date:   2017-10-03T02:39:05Z

Refactored and changed state format




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19382: [SPARK-22158][SQL] convertMetastore should not ignore ta...

2017-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19382
  
Thank you, @gatorsmile . I'll.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19061: [SPARK-21568][CORE][WIP] ConsoleProgressBar should only ...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19061
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19061: [SPARK-21568][CORE][WIP] ConsoleProgressBar should only ...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19061
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82401/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19061: [SPARK-21568][CORE][WIP] ConsoleProgressBar should only ...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19061
  
**[Test build #82401 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82401/testReport)**
 for PR 19061 at commit 
[`a465619`](https://github.com/apache/spark/commit/a465619c6393156c65ec808000f6ab35753c27a5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19370
  
**[Test build #82405 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82405/testReport)**
 for PR 19370 at commit 
[`d62ae59`](https://github.com/apache/spark/commit/d62ae59d892aa61a9f61af1411f2602a2b3e9ae1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on...

2017-10-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19370#discussion_r142297093
  
--- Diff: bin/run-example.cmd ---
@@ -17,6 +17,13 @@ rem See the License for the specific language governing 
permissions and
 rem limitations under the License.
 rem
 
-set SPARK_HOME=%~dp0..
+rem Figure out where the Spark framework is installed
+set FIND_SPARK_HOME_SCRIPT=%~dp0find_spark_home.py
--- End diff --

Shouldn't we change this one too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-18136] Fix setup of SPARK_HOME variable on Window...

2017-10-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19370
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19327: [SPARK-22136][SS] Implement stream-stream outer joins.

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19327
  
**[Test build #82404 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82404/testReport)**
 for PR 19327 at commit 
[`9a12c78`](https://github.com/apache/spark/commit/9a12c789ca7a871d12cb36f4b605673e93af8a43).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19389: [SPARK-22165][SQL] Resolve type conflicts between decima...

2017-10-02 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19389
  
@gatorsmile, could you elaborate which behaviour changes you mean?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17819
  
**[Test build #82403 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82403/testReport)**
 for PR 17819 at commit 
[`000844a`](https://github.com/apache/spark/commit/000844ab1f0dffef9b51b96f7edc1e1ab9e9e0b7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-02 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17819
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17819
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17819
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82402/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17819
  
**[Test build #82402 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82402/testReport)**
 for PR 17819 at commit 
[`000844a`](https://github.com/apache/spark/commit/000844ab1f0dffef9b51b96f7edc1e1ab9e9e0b7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration thro...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19413
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration thro...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19413
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82399/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration thro...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19413
  
**[Test build #82399 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82399/testReport)**
 for PR 19413 at commit 
[`fd1df3d`](https://github.com/apache/spark/commit/fd1df3d7c4ded4bb431153f1195f7dbb0e811491).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-02 Thread liufengdb
Github user liufengdb commented on a diff in the pull request:

https://github.com/apache/spark/pull/19394#discussion_r142290483
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -280,13 +280,20 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] 
with Logging with Serializ
 results.toArray
   }
 
+  private[spark] def executeCollectIterator(): (Long, 
Iterator[InternalRow]) = {
+val countsAndBytes = getByteArrayRdd().collect()
--- End diff --

This still fetches all the compressed rows to the driver, before building 
the hashed relation. Ideally, you should fetch the rows from executors 
incrementally. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18460
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82398/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18460
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18460
  
**[Test build #82398 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82398/testReport)**
 for PR 18460 at commit 
[`b2d1338`](https://github.com/apache/spark/commit/b2d13382310d9d53811d47434d8262ad371a4456).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19327: [SPARK-22136][SS] Implement stream-stream outer j...

2017-10-02 Thread tdas
Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/19327#discussion_r142288587
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 ---
@@ -233,16 +234,53 @@ object UnsupportedOperationChecker {
 throwError("Full outer joins with streaming 
DataFrames/Datasets are not supported")
   }
 
-case LeftOuter | LeftSemi | LeftAnti =>
+case LeftSemi | LeftAnti =>
   if (right.isStreaming) {
-throwError("Left outer/semi/anti joins with a streaming 
DataFrame/Dataset " +
-"on the right is not supported")
+throwError("Left semi/anti joins with a streaming 
DataFrame/Dataset " +
+"on the right are not supported")
   }
 
+// We support streaming left outer joins with static on the 
right always, and with
+// stream on both sides under the appropriate conditions.
+case LeftOuter =>
+  if (!left.isStreaming && right.isStreaming) {
+throwError("Left outer join with a streaming 
DataFrame/Dataset " +
+  "on the right and a static DataFrame/Dataset on the left 
is not supported")
+  } else if (left.isStreaming && right.isStreaming) {
+val watermarkInJoinKeys = 
StreamingJoinHelper.isWatermarkInJoinKeys(subPlan)
+
+val hasValidWatermarkRange =
+  StreamingJoinHelper.getStateValueWatermark(
+left.outputSet, right.outputSet, condition, 
Some(100)).isDefined
+
--- End diff --

extra line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-02 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17819
  
@gatorsmile The related test is added. Please take a look again. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17819
  
**[Test build #82402 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82402/testReport)**
 for PR 17819 at commit 
[`000844a`](https://github.com/apache/spark/commit/000844ab1f0dffef9b51b96f7edc1e1ab9e9e0b7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19394
  
cc @liufengdb 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17359: [SPARK-20028][SQL] Add aggreagate expression nGrams

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17359
  
cc @wzhfy @viirya Are you interested in reviewing this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19389: [SPARK-22165][SQL] Resolve type conflicts between decima...

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19389
  
Please ensure no behavior change is introduced when fixing such issues. 
Also cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19389: [SPARK-22165][SQL] Resolve type conflicts between decima...

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19389
  
This PR introduces the behavior changes. We are unable to do this. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17819#discussion_r142278786
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2120,6 +2120,19 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset by adding columns with metadata.
+   */
+  private[spark] def withColumns(
+  colNames: Seq[String],
+  cols: Seq[Column],
+  metadata: Seq[Metadata]): DataFrame = {
+val newCols = colNames.zip(cols).zip(metadata).map { case ((colName, 
col), metadata) =>
--- End diff --

Yes. We should check the number of metadata too. I'll add it when adding 
related test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17819#discussion_r142278697
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2120,6 +2120,19 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset by adding columns with metadata.
+   */
+  private[spark] def withColumns(
--- End diff --

Yeah, I see. I've left comment 
https://github.com/apache/spark/pull/17819#discussion_r142172037 that I will 
add test later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17819#discussion_r142278563
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2120,6 +2120,19 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset by adding columns with metadata.
+   */
+  private[spark] def withColumns(
+  colNames: Seq[String],
+  cols: Seq[Column],
+  metadata: Seq[Metadata]): DataFrame = {
+val newCols = colNames.zip(cols).zip(metadata).map { case ((colName, 
col), metadata) =>
--- End diff --

Is that possible the number of elements in metadata do not match? Then, the 
results will be unexpected because of  this impl 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19294
  
If `path` could be `null`, [this 
line](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SQLHadoopMapReduceCommitProtocol.scala#L54)
 will still fail with the error message like `Can not create a Path from a null 
string`. 

In our Spark SQL code path, how can `path` be null?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19061: [SPARK-21568][CORE] ConsoleProgressBar should only be en...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19061
  
**[Test build #82401 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82401/testReport)**
 for PR 19061 at commit 
[`a465619`](https://github.com/apache/spark/commit/a465619c6393156c65ec808000f6ab35753c27a5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17819#discussion_r142278414
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2120,6 +2120,19 @@ class Dataset[T] private[sql](
   }
 
   /**
+   * Returns a new Dataset by adding columns with metadata.
+   */
+  private[spark] def withColumns(
--- End diff --

This is not being tested in SQL


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19390
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19390
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82400/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19390
  
**[Test build #82400 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82400/testReport)**
 for PR 19390 at commit 
[`ce6902e`](https://github.com/apache/spark/commit/ce6902e445ff407ad2edc5a38392407eadfaf2e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19392: [SPARK-22169][SQL] table name with numbers and ch...

2017-10-02 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19392#discussion_r142277890
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -510,11 +510,15 @@ rowFormat
 ;
 
 tableIdentifier
-: (db=identifier '.')? table=identifier
+: (db=identifierPart '.')? table=identifierPart
 ;
 
 functionIdentifier
-: (db=identifier '.')? function=identifier
+: (db=identifierPart '.')? function=identifierPart
+;
+
+identifierPart
+: identifier | BIGINT_LITERAL | SMALLINT_LITERAL | TINYINT_LITERAL | 
BYTELENGTH_LITERAL
--- End diff --

But the rule in `validateName` allows a name consisting of numbers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19390
  
**[Test build #82400 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82400/testReport)**
 for PR 19390 at commit 
[`ce6902e`](https://github.com/apache/spark/commit/ce6902e445ff407ad2edc5a38392407eadfaf2e2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19294#discussion_r142276032
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -57,6 +57,15 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: 
String)
*/
   private def absPathStagingDir: Path = new Path(path, "_temporary-" + 
jobId)
 
+  /**
+   * Checks whether there are files to be committed to an absolute output 
location.
+   *
+   * As the committing and aborting the job occurs on driver where 
`addedAbsPathFiles` is always
+   * null, it is necessary to check whether the output path is specified, 
that may not be the case
+   * for committers not writing to distributed file systems.
+   */
+  private def hasAbsPathFiles: Boolean = path != null
--- End diff --

Please add `@param path` to line 38 for explaining it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19401: [SPARK-22176][SQL] Fix overflow issue in Dataset....

2017-10-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19401


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19401: [SPARK-22176][SQL] Fix overflow issue in Dataset.show

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19401
  
Thanks! Merged to master/2.2


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19401: [SPARK-22176][SQL] Fix overflow issue in Dataset.show

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19401
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19402: [SPARK-22167][R][BUILD] sparkr packaging issue allow zin...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19402
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82393/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19402: [SPARK-22167][R][BUILD] sparkr packaging issue allow zin...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19402
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19402: [SPARK-22167][R][BUILD] sparkr packaging issue allow zin...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19402
  
**[Test build #82393 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82393/testReport)**
 for PR 19402 at commit 
[`40a7f6c`](https://github.com/apache/spark/commit/40a7f6cad2391d9f172695a14a2cbf91b7a38a83).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18098: [SPARK-16944][Mesos] Improve data locality when l...

2017-10-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18098


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions ...

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18966#discussion_r142267872
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -769,16 +769,27 @@ class CodegenContext {
   foldFunctions: Seq[String] => String = _.mkString("", ";\n", ";")): 
String = {
 val blocks = new ArrayBuffer[String]()
 val blockBuilder = new StringBuilder()
+val defaultMaxLines = 100
+val maxLines = if (SparkEnv.get != null) {
+  
SparkEnv.get.conf.getInt("spark.sql.codegen.expressions.maxCodegenLinesPerFunction",
--- End diff --

Can we use bytecode size?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19382: [SPARK-22158][SQL] convertMetastore should not ig...

2017-10-02 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19382


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration thro...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19413
  
**[Test build #82399 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82399/testReport)**
 for PR 19413 at commit 
[`fd1df3d`](https://github.com/apache/spark/commit/fd1df3d7c4ded4bb431153f1195f7dbb0e811491).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfiguration thro...

2017-10-02 Thread sahilTakiar
Github user sahilTakiar commented on the issue:

https://github.com/apache/spark/pull/19413
  
Fixed the style check.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19382: [SPARK-22158][SQL] convertMetastore should not ignore ta...

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19382
  
Please submit a separate PR to 2.2. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19382: [SPARK-22158][SQL] convertMetastore should not ignore ta...

2017-10-02 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19382
  
Thanks! Merged to master/2.2


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19412: [SPARK-22142][BUILD][STREAMING] Move Flume support behin...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19412
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82392/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19412: [SPARK-22142][BUILD][STREAMING] Move Flume support behin...

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19412
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19412: [SPARK-22142][BUILD][STREAMING] Move Flume support behin...

2017-10-02 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19412
  
**[Test build #82392 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82392/testReport)**
 for PR 19412 at commit 
[`4b45f9c`](https://github.com/apache/spark/commit/4b45f9c30a6e06cd4a3509f93c6370374e8f4c05).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19415: Branch 2.2 udf nullability

2017-10-02 Thread ptkool
GitHub user ptkool opened a pull request:

https://github.com/apache/spark/pull/19415

Branch 2.2 udf nullability

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Shopify/spark branch-2.2-udf_nullability

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19415.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19415


commit cfa6bcbe83b9a4b9607e23ac889963b6aa02f0d9
Author: Ryan Blue 
Date:   2017-05-01T21:48:02Z

[SPARK-20540][CORE] Fix unstable executor requests.

There are two problems fixed in this commit. First, the
ExecutorAllocationManager sets a timeout to avoid requesting executors
too often. However, the timeout is always updated based on its value and
a timeout, not the current time. If the call is delayed by locking for
more than the ongoing scheduler timeout, the manager will request more
executors on every run. This seems to be the main cause of SPARK-20540.

The second problem is that the total number of requested executors is
not tracked by the CoarseGrainedSchedulerBackend. Instead, it calculates
the value based on the current status of 3 variables: the number of
known executors, the number of executors that have been killed, and the
number of pending executors. But, the number of pending executors is
never less than 0, even though there may be more known than requested.
When executors are killed and not replaced, this can cause the request
sent to YARN to be incorrect because there were too many executors due
to the scheduler's state being slightly out of date. This is fixed by 
tracking
the currently requested size explicitly.

## How was this patch tested?

Existing tests.

Author: Ryan Blue 

Closes #17813 from rdblue/SPARK-20540-fix-dynamic-allocation.

(cherry picked from commit 2b2dd08e975dd7fbf261436aa877f1d7497ed31f)
Signed-off-by: Marcelo Vanzin 

commit 5a0a8b0396df2feadb8333876cc08edf219fa177
Author: Sean Owen 
Date:   2017-05-02T00:01:05Z

[SPARK-20459][SQL] JdbcUtils throws IllegalStateException: Cause already 
initialized after getting SQLException

## What changes were proposed in this pull request?

Avoid failing to initCause on JDBC exception with cause initialized to null

## How was this patch tested?

Existing tests

Author: Sean Owen 

Closes #17800 from srowen/SPARK-20459.

(cherry picked from commit af726cd6117de05c6e3b9616b8699d884a53651b)
Signed-off-by: Xiao Li 

commit b7c1c2f973635a2ec05aedd89456765d830dfdce
Author: Felix Cheung 
Date:   2017-05-02T04:03:48Z

[SPARK-20192][SPARKR][DOC] SparkR migration guide to 2.2.0

## What changes were proposed in this pull request?

Updating R Programming Guide

## How was this patch tested?

manually

Author: Felix Cheung 

Closes #17816 from felixcheung/r22relnote.

(cherry picked from commit d20a976e8918ca8d607af452301e8014fe14e64a)
Signed-off-by: Felix Cheung 

commit b146481fff1ce529245f9c03b35c73ea604712d0
Author: Kazuaki Ishizaki 
Date:   2017-05-02T05:56:41Z

[SPARK-20537][CORE] Fixing OffHeapColumnVector reallocation

## What changes were proposed in this pull request?

As #17773 revealed `OnHeapColumnVector` may copy a part of the original 
storage.

`OffHeapColumnVector` reallocation also copies to the new storage data up 
to 'elementsAppended'. This variable is only updated when using the 
`ColumnVector.appendX` API, while `ColumnVector.putX` is more commonly used.
This PR copies the new storage data up to the previously-allocated size 
in`OffHeapColumnVector`.

## How was this patch tested?

Existing test suites

Author: Kazuaki Ishizaki 

Closes #17811 from kiszk/SPARK-20537.

(cherry picked from commit afb21bf22a59c9416c04637412fb69d1442e6826)
Signed-off-by: Wenchen Fan 

commit ef5e2a0509801f6afced3bc80f8d700acf84e0dd
Author: Burak Yavuz 
Date:   

[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...

2017-10-02 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18460
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >