[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/15807
  
For wholestage codegen, I think that a life time of sub-expressions is 
within an iteration for a row. Thus, `isInitialized` and `subExpr1` should be 
initialized at the beginning of each iteration. For example, after reading a 
row.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15822: [Minor][PySpark] Improve error message when running PySp...

2016-11-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15822
  
yeah, as per the discussion at 
https://github.com/apache/spark/pull/15659#issuecomment-259157516.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13909
  
**[Test build #68396 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68396/consoleFull)**
 for PR 13909 at commit 
[`990d6c8`](https://github.com/apache/spark/commit/990d6c8bb7f159088f1f614bf80d91a5801616cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15822: [Minor][PySpark] Improve error message when running PySp...

2016-11-08 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15822
  
Thanks - did this come from a discussion somewhere?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15816: SPARK-18368: Fix regexp_replace with task seriali...

2016-11-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15816


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15816: SPARK-18368: Fix regexp_replace with task serialization.

2016-11-08 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15816
  
I'm surprised too that we haven't caught this earlier ... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15816: SPARK-18368: Fix regexp_replace with task serialization.

2016-11-08 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15816
  
Merging in master/branch-2.1/branch-2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15823: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputF...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15823
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15823: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputF...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15823
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68391/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15823: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputF...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15823
  
**[Test build #68391 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68391/consoleFull)**
 for PR 15823 at commit 
[`4e79c37`](https://github.com/apache/spark/commit/4e79c377f99602adb5c3fd82f57de1773bb9b64d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15627: [SPARK-18099][YARN] Fail if same files added to distribu...

2016-11-08 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/15627
  
@kishorvpatil Thank you for fixing this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15807
  
even we modify it to hold the results of subexpressions in member 
variables, the above code example should not work under wholestage codegen.

The above code example is similar to non wholestage codegen subexpression 
elimination in fact, which is also using functions to wrap subexpression 
evaluations.

But for wholestage codegen, as we might evaluate expressions against input 
row or local variables, the function approach can't work due to these local 
variables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15807
  
why whole stage codegen can't use member variables to keep the result of 
subexpression?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15799: [SPARK-18333] [SQL] Revert hacks in parquet and o...

2016-11-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15799


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15823: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputF...

2016-11-08 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/15823
  
LGTM. Not at my laptop, would be great if you can merge @rxin, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15799: [SPARK-18333] [SQL] Revert hacks in parquet and orc read...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15799
  
merging to master/2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15825: [SPARK-18377][SQL] warehouse path should be a static con...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15825
  
**[Test build #68395 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68395/consoleFull)**
 for PR 15825 at commit 
[`1200e2d`](https://github.com/apache/spark/commit/1200e2d47cbb0b6d67950e2ad3512798395d65cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15807
  
> isn't the result of subexpression kept in member variables?

For non-wholestage codegen, yes. For wholestage codegen, no.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15797: [SPARK-17990][SPARK-18302][SQL] correct several p...

2016-11-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15797#discussion_r87141007
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -810,13 +825,44 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   newSpecs: Seq[TablePartitionSpec]): Unit = withClient {
 client.renamePartitions(
   db, table, specs.map(lowerCasePartitionSpec), 
newSpecs.map(lowerCasePartitionSpec))
+
+val tableMeta = getTable(db, table)
+val partitionColumnNames = tableMeta.partitionColumnNames
+// Hive metastore is not case preserving and keeps partition columns 
with lower cased names.
+// When Hive rename partition for managed tables, it will create the 
partition location with
+// a default path generate by the new spec with lower cased partition 
column names. This is
+// unexpected and we need to rename them manually and alter the 
partition location.
+val hasUpperCasePartitionColumn = partitionColumnNames.exists(col => 
col.toLowerCase != col)
+if (tableMeta.tableType == MANAGED && hasUpperCasePartitionColumn) {
+  val tablePath = new Path(tableMeta.location)
+  val fs = tablePath.getFileSystem(hadoopConf)
+  val newParts = newSpecs.map { spec =>
+val partition = client.getPartition(db, table, 
lowerCasePartitionSpec(spec))
+val wrongPath = new Path(partition.storage.locationUri.get)
+val rightPath = ExternalCatalogUtils.generatePartitionPath(
+  spec, partitionColumnNames, tablePath)
+try {
+  fs.rename(wrongPath, rightPath)
+} catch {
+  case e: IOException =>
+throw new SparkException(s"Unable to rename partition path 
$wrongPath", e)
--- End diff --

Maybe we also need to add `rightPath` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15799: [SPARK-18333] [SQL] Revert hacks in parquet and orc read...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15799
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15233: [SPARK-17659] [SQL] Partitioned View is Not Supported By...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15233
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15825: [SPARK-18377][SQL] warehouse path should be a static con...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15825
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15807
  
isn't the result of subexpression kept in member variables? What I am 
talking about is something like:
```
private boolean isInitialized = false;
private Int subExpr1 = 0;

private void evalSubExpr1() {
  ...
  isInitialized = true
}

public int getSubExpr() {
  if (!isInitialized) {
evalSubExpr1()
  }
  subExpr1
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15807
  
E.g.,

if (isNull(subexpr)) {
  ...
} else {
  AssertNotNull(subexpr)  // subexpr2, first used.
  
  SomeExpr(AssertNotNull(subexpr)) // SomeExpr(subexpr2)
}

if (isNull(subexpr)) {
  ...
} else {
  AssertNotNull(subexpr)  // subexpr2
  
  SomeExpr2(AssertNotNull(subexpr)) // SomeExpr2(subexpr2)
}   

SomeExpr3(AssertNotNull(subexpr)) // SomeExpr3(subexpr2)

`AssertNotNull(subexpr)` is recognized as subexpression. When it is used in 
the else branch at the top, it is evaluated in this branch. But 
`SomeExpr3(AssertNotNull(subexpr))` which also uses it can't access the 
evaluated subexpression result.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15807
  
> we can't access the subexpression outside later

I don't quite understand it, can you give an example?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15824: [SPARK-18376][SQL] Skip subexpression elimination for co...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15824
  
**[Test build #68394 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68394/consoleFull)**
 for PR 15824 at commit 
[`548e45f`](https://github.com/apache/spark/commit/548e45f63f64a3f9e9709af39610ce50390b0fa1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15824: [SPARK-18376][SQL] Skip subexpression elimination for co...

2016-11-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15824
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15824: [SPARK-18376][SQL] Skip subexpression elimination for co...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15824
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15824: [SPARK-18376][SQL] Skip subexpression elimination for co...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15824
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68393/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15824: [SPARK-18376][SQL] Skip subexpression elimination for co...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15824
  
**[Test build #68393 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68393/consoleFull)**
 for PR 15824 at commit 
[`548e45f`](https://github.com/apache/spark/commit/548e45f63f64a3f9e9709af39610ce50390b0fa1).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE...

2016-11-08 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15814#discussion_r87138232
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -87,25 +120,40 @@ class HadoopMapReduceCommitProtocol(jobId: String, 
path: String)
 
   override def commitJob(jobContext: JobContext, taskCommits: 
Seq[TaskCommitMessage]): Unit = {
 committer.commitJob(jobContext)
+val filesToMove = taskCommits.map(_.obj.asInstanceOf[Map[String, 
String]]).reduce(_ ++ _)
+logDebug(s"Committing files staged for absolute locations 
$filesToMove")
+val fs = absPathStagingDir.getFileSystem(jobContext.getConfiguration)
+for ((src, dst) <- filesToMove) {
+  fs.rename(new Path(src), new Path(dst))
+}
+fs.delete(absPathStagingDir, true)
   }
 
   override def abortJob(jobContext: JobContext): Unit = {
 committer.abortJob(jobContext, JobStatus.State.FAILED)
+val fs = absPathStagingDir.getFileSystem(jobContext.getConfiguration)
+fs.delete(absPathStagingDir, true)
   }
 
   override def setupTask(taskContext: TaskAttemptContext): Unit = {
 committer = setupCommitter(taskContext)
 committer.setupTask(taskContext)
+addedAbsPathFiles = mutable.Map[String, String]()
   }
 
   override def commitTask(taskContext: TaskAttemptContext): 
TaskCommitMessage = {
 val attemptId = taskContext.getTaskAttemptID
 SparkHadoopMapRedUtil.commitTask(
   committer, taskContext, attemptId.getJobID.getId, 
attemptId.getTaskID.getId)
-EmptyTaskCommitMessage
+new TaskCommitMessage(addedAbsPathFiles.toMap)
--- End diff --

Yea it can go either way. Unclear which one is better. Renaming on job 
commit gives higher chance of corrupting data, whereas renaming in task commit 
is slightly more performant.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE...

2016-11-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15814#discussion_r87138139
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -87,25 +120,40 @@ class HadoopMapReduceCommitProtocol(jobId: String, 
path: String)
 
   override def commitJob(jobContext: JobContext, taskCommits: 
Seq[TaskCommitMessage]): Unit = {
 committer.commitJob(jobContext)
+val filesToMove = taskCommits.map(_.obj.asInstanceOf[Map[String, 
String]]).reduce(_ ++ _)
+logDebug(s"Committing files staged for absolute locations 
$filesToMove")
+val fs = absPathStagingDir.getFileSystem(jobContext.getConfiguration)
+for ((src, dst) <- filesToMove) {
+  fs.rename(new Path(src), new Path(dst))
+}
+fs.delete(absPathStagingDir, true)
   }
 
   override def abortJob(jobContext: JobContext): Unit = {
 committer.abortJob(jobContext, JobStatus.State.FAILED)
+val fs = absPathStagingDir.getFileSystem(jobContext.getConfiguration)
+fs.delete(absPathStagingDir, true)
   }
 
   override def setupTask(taskContext: TaskAttemptContext): Unit = {
 committer = setupCommitter(taskContext)
 committer.setupTask(taskContext)
+addedAbsPathFiles = mutable.Map[String, String]()
   }
 
   override def commitTask(taskContext: TaskAttemptContext): 
TaskCommitMessage = {
 val attemptId = taskContext.getTaskAttemptID
 SparkHadoopMapRedUtil.commitTask(
   committer, taskContext, attemptId.getJobID.getId, 
attemptId.getTaskID.getId)
-EmptyTaskCommitMessage
+new TaskCommitMessage(addedAbsPathFiles.toMap)
--- End diff --

Why don't we just rename temp files to dest files in commitTask?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15803: [SPARK-18298][Web UI]change gmt time to local zone time ...

2016-11-08 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/spark/pull/15803
  
I agree with showing the timezone with date string.

But always using GMT/UTC time is not a good choice, logs of 
application(using log4j) usually are printed using local timezone(like CST). 
That means if I wanna check what happens using application logs with EventLog, 
I must do the translation between them manually.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15807
  
@cloud-fan Then once the first expression to use the subexpression is in a 
if/else branch, we can't access the subexpression outside later. Evaluate it 
again?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15824: [SPARK-18376][SQL] Skip subexpression elimination for co...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15824
  
**[Test build #68393 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68393/consoleFull)**
 for PR 15824 at commit 
[`548e45f`](https://github.com/apache/spark/commit/548e45f63f64a3f9e9709af39610ce50390b0fa1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15824: [SPARK-18376][SQL] Skip subexpression elimination for co...

2016-11-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15824
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15825: [SPARK-18377][SQL] warehouse path should be a static con...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15825
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68390/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15825: [SPARK-18377][SQL] warehouse path should be a static con...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15825
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15825: [SPARK-18377][SQL] warehouse path should be a static con...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15825
  
**[Test build #68390 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68390/consoleFull)**
 for PR 15825 at commit 
[`1200e2d`](https://github.com/apache/spark/commit/1200e2d47cbb0b6d67950e2ad3512798395d65cf).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15814
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68388/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15814
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15814
  
**[Test build #68388 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68388/consoleFull)**
 for PR 15814 at commit 
[`91f87de`](https://github.com/apache/spark/commit/91f87de270bbd743f0b40b82016391706a936ca1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15814
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15814
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68387/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15814
  
**[Test build #68387 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68387/consoleFull)**
 for PR 15814 at commit 
[`4296612`](https://github.com/apache/spark/commit/4296612bd373bc1118b2cbfc10530b1895aaef60).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15748: [SPARK-18240][ML][PySpark] Add Summary of BiKMean...

2016-11-08 Thread zhengruifeng
Github user zhengruifeng closed the pull request at:

https://github.com/apache/spark/pull/15748


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15233: [SPARK-17659] [SQL] Partitioned View is Not Supported By...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15233
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15233: [SPARK-17659] [SQL] Partitioned View is Not Supported By...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15233
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68386/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15233: [SPARK-17659] [SQL] Partitioned View is Not Supported By...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15233
  
**[Test build #68386 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68386/consoleFull)**
 for PR 15233 at commit 
[`c6d3acd`](https://github.com/apache/spark/commit/c6d3acd953ad70f6d66252d13e6cf4c0e507d33c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15814#discussion_r87135950
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -226,6 +238,34 @@ case class DataSourceAnalysis(conf: CatalystConf) 
extends Rule[LogicalPlan] {
 
   insertCmd
   }
+
+  /**
+   * Given a set of input partitions, returns those that have locations 
that differ from the
+   * Hive default (e.g. /k1=v1/k2=v2). These partitions were manually 
assigned locations by
+   * the user.
+   *
+   * @return a mapping from partition specs to their custom locations
+   */
+  private def getCustomPartitionLocations(
+  spark: SparkSession,
+  table: CatalogTable,
+  basePath: Path,
+  partitions: Seq[CatalogTablePartition]): Map[TablePartitionSpec, 
String] = {
+val hadoopConf = spark.sessionState.newHadoopConf
+val fs = basePath.getFileSystem(hadoopConf)
+val qualifiedBasePath = basePath.makeQualified(fs.getUri, 
fs.getWorkingDirectory)
+partitions.flatMap { p =>
+  val defaultLocation = qualifiedBasePath.suffix(
+"/" + PartitioningUtils.getPathFragment(p.spec, 
table.partitionSchema)).toString
+  val catalogLocation = new 
Path(p.storage.locationUri.get).makeQualified(
+fs.getUri, fs.getWorkingDirectory).toString
+  if (catalogLocation != defaultLocation) {
--- End diff --

Why we distinguish partition locations that equal to default location? 
Partitions always have locations(custom specified or generated by default), do 
we really need to care about who set it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15823: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputF...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15823
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68384/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15823: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputF...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15823
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15823: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputF...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15823
  
**[Test build #68384 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68384/consoleFull)**
 for PR 15823 at commit 
[`51d1fbe`](https://github.com/apache/spark/commit/51d1fbe464fa49f4503435f0f10790d8f1ec35ad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15814
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68385/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15814
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15814
  
**[Test build #68385 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68385/consoleFull)**
 for PR 15814 at commit 
[`fbd7b42`](https://github.com/apache/spark/commit/fbd7b425290644fcfdcaf92491aaae2ac67eefb9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15797: [SPARK-17990][SPARK-18302][SQL] correct several partitio...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15797
  
**[Test build #68392 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68392/consoleFull)**
 for PR 15797 at commit 
[`9dbc3f1`](https://github.com/apache/spark/commit/9dbc3f12f85c376c813d0eb2ea12854e9afe3853).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15823: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputF...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15823
  
**[Test build #68391 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68391/consoleFull)**
 for PR 15823 at commit 
[`4e79c37`](https://github.com/apache/spark/commit/4e79c377f99602adb5c3fd82f57de1773bb9b64d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15824: [SPARK-18376][SQL] Skip subexpression elimination for co...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15824
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68389/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15824: [SPARK-18376][SQL] Skip subexpression elimination for co...

2016-11-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15824
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15824: [SPARK-18376][SQL] Skip subexpression elimination for co...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15824
  
**[Test build #68389 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68389/consoleFull)**
 for PR 15824 at commit 
[`548e45f`](https://github.com/apache/spark/commit/548e45f63f64a3f9e9709af39610ce50390b0fa1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL].Staging directory fail to be removed

2016-11-08 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15819
  
Can you add some documentation? The current code is very difficult to 
follow.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15823: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputF...

2016-11-08 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15823
  
LGTM otherwise.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15823: [SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `...

2016-11-08 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15823#discussion_r87134013
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -42,7 +43,13 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: 
String)
   @transient private var committer: OutputCommitter = _
 
   protected def setupCommitter(context: TaskAttemptContext): 
OutputCommitter = {
-context.getOutputFormatClass.newInstance().getOutputCommitter(context)
+val format = context.getOutputFormatClass.newInstance
--- End diff --

use `newInstance()` since it clearly has side effects.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-11-08 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/13758
  
@cloud-fan yes, we could take the same approach as #15044. When I have just 
implement it in my local environment, it can achieve similar performance 
improvement.
I will submit that approach to #13909 within next few hours, then cc you 
and @hvanhovell.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15807
  
can we just evaluate subexpression like a scala lazy val?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15825: [SPARK-18377][SQL] warehouse path should be a static con...

2016-11-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15825
  
CC @yhuai @rxin @srown @gatorsmile


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15825: [SPARK-18377][SQL] warehouse path should be a static con...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15825
  
**[Test build #68390 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68390/consoleFull)**
 for PR 15825 at commit 
[`1200e2d`](https://github.com/apache/spark/commit/1200e2d47cbb0b6d67950e2ad3512798395d65cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15825: [SPARK-18377][SQL] warehouse path should be a sta...

2016-11-08 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/15825

[SPARK-18377][SQL] warehouse path should be a static conf

## What changes were proposed in this pull request?

it's weird that every session can set its own warehouse path at runtime, we 
should forbid it and make it a static conf.

## How was this patch tested?

existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark warehouse

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15825.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15825


commit 1200e2d47cbb0b6d67950e2ad3512798395d65cf
Author: Wenchen Fan 
Date:   2016-11-09T04:43:34Z

warehouse path should be a static conf




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15262: [SPARK-17690][STREAMING][SQL] Add mini-dfs cluste...

2016-11-08 Thread ScrapCodes
Github user ScrapCodes closed the pull request at:

https://github.com/apache/spark/pull/15262


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15262: [SPARK-17690][STREAMING][SQL] Add mini-dfs cluster based...

2016-11-08 Thread ScrapCodes
Github user ScrapCodes commented on the issue:

https://github.com/apache/spark/pull/15262
  
I was going to close this for now.

@srowen Those deps should not have changed, I have not added anything to 
the compile scope. I have not analyzed the working of those deps generation, do 
you have an understanding why would that happen ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15824: [SQL] Skip subexpression elimination for conditional exp...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15824
  
**[Test build #68389 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68389/consoleFull)**
 for PR 15824 at commit 
[`548e45f`](https://github.com/apache/spark/commit/548e45f63f64a3f9e9709af39610ce50390b0fa1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15824: [SQL] Skip subexpression elimination for conditional exp...

2016-11-08 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15824
  
cc @cloud-fan @kiszk 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15824: [SQL] Skip subexpression elimination for conditio...

2016-11-08 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/15824

[SQL] Skip subexpression elimination for conditional expressions

## What changes were proposed in this pull request?

As per discussion at #15807, we should disallow subexpression elimination 
for expressions wrapped in conditional expressions such as `If`.

## How was this patch tested?

Jenkins tests.

Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 
no-subexpr-eliminate-conditionexpr

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15824.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15824


commit 548e45f63f64a3f9e9709af39610ce50390b0fa1
Author: Liang-Chi Hsieh 
Date:   2016-11-09T04:25:21Z

Skip subexpression elimination for conditional expressions.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15807: [SPARK-18147][SQL] do not fail for very complex aggregat...

2016-11-08 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/15807
  
@viirya @cloud-fan It looks reasonable to me that to skip subexpression 
elimination for the expressions wrapped in condition expressions such as `if`. 
This is because we have only a place at top level to put common subexpression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15563: [SPARK-16759][CORE] Add a configuration property ...

2016-11-08 Thread weiqingy
Github user weiqingy commented on a diff in the pull request:

https://github.com/apache/spark/pull/15563#discussion_r87131269
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2587,17 +2589,16 @@ private[spark] class CallerContext(
taskId: Option[Long] = None,
taskAttemptNumber: Option[Int] = None) extends Logging {
 
-   val appIdStr = if (appId.isDefined) s"_${appId.get}" else ""
-   val appAttemptIdStr = if (appAttemptId.isDefined) 
s"_${appAttemptId.get}" else ""
-   val jobIdStr = if (jobId.isDefined) s"_JId_${jobId.get}" else ""
-   val stageIdStr = if (stageId.isDefined) s"_SId_${stageId.get}" else ""
-   val stageAttemptIdStr = if (stageAttemptId.isDefined) 
s"_${stageAttemptId.get}" else ""
-   val taskIdStr = if (taskId.isDefined) s"_TId_${taskId.get}" else ""
-   val taskAttemptNumberStr =
- if (taskAttemptNumber.isDefined) s"_${taskAttemptNumber.get}" else ""
-
-   val context = "SPARK_" + from + appIdStr + appAttemptIdStr +
- jobIdStr + stageIdStr + stageAttemptIdStr + taskIdStr + 
taskAttemptNumberStr
+   val context = "SPARK_" +
--- End diff --

Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15814
  
**[Test build #68388 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68388/consoleFull)**
 for PR 15814 at commit 
[`91f87de`](https://github.com/apache/spark/commit/91f87de270bbd743f0b40b82016391706a936ca1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE...

2016-11-08 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15814#discussion_r87109861
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala ---
@@ -86,6 +86,16 @@ abstract class FileCommitProtocol {
   def newTaskTempFile(taskContext: TaskAttemptContext, dir: 
Option[String], ext: String): String
 
   /**
+   * Similar to newTaskTempFile(), but allows files to committed to an 
absolute output location.
+   * Depending on the implementation, there may be weaker guarantees 
around adding files this way.
+   */
+  def newTaskTempFileAbsPath(
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE...

2016-11-08 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15814#discussion_r87111922
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -42,17 +44,21 @@ class HadoopMapReduceCommitProtocol(jobId: String, 
path: String)
   /** OutputCommitter from Hadoop is not serializable so marking it 
transient. */
   @transient private var committer: OutputCommitter = _
 
+  /**
+   * Tracks files staged by this task for absolute output paths. These 
outputs are not managed by
+   * the Hadoop OutputCommitter, so we must move these to their final 
locations on job commit.
+   */
+  @transient private var addedAbsPathFiles: mutable.Map[String, String] = 
null
+
+  private def absPathStagingDir: Path = new Path(path, "_temporary-" + 
jobId)
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE...

2016-11-08 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15814#discussion_r87112095
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -350,13 +350,15 @@ case class BroadcastHint(child: LogicalPlan) extends 
UnaryNode {
  * Options for writing new data into a table.
  *
  * @param enabled whether to overwrite existing data in the table.
- * @param specificPartition only data in the specified partition will be 
overwritten.
+ * @param staticPartitionKeys if non-empty, specifies that we only want to 
overwrite partitions
--- End diff --

Yep it is in the next sentence.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE...

2016-11-08 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15814#discussion_r87113460
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -182,41 +182,53 @@ case class DataSourceAnalysis(conf: CatalystConf) 
extends Rule[LogicalPlan] {
   "Cannot overwrite a path that is also being read from.")
   }
 
-  val overwritingSinglePartition =
-overwrite.specificPartition.isDefined &&
+  val partitionSchema = query.resolve(
+t.partitionSchema, t.sparkSession.sessionState.analyzer.resolver)
+  val partitionsTrackedByCatalog =
 t.sparkSession.sessionState.conf.manageFilesourcePartitions &&
+l.catalogTable.isDefined && 
l.catalogTable.get.partitionColumnNames.nonEmpty &&
 l.catalogTable.get.tracksPartitionsInCatalog
 
-  val effectiveOutputPath = if (overwritingSinglePartition) {
-val partition = t.sparkSession.sessionState.catalog.getPartition(
-  l.catalogTable.get.identifier, overwrite.specificPartition.get)
-new Path(partition.storage.locationUri.get)
-  } else {
-outputPath
-  }
-
-  val effectivePartitionSchema = if (overwritingSinglePartition) {
-Nil
-  } else {
-query.resolve(t.partitionSchema, 
t.sparkSession.sessionState.analyzer.resolver)
+  var initialMatchingPartitions: Seq[TablePartitionSpec] = Nil
+  var customPartitionLocations: Map[TablePartitionSpec, String] = 
Map.empty
+
+  // When partitions are tracked by the catalog, compute all custom 
partition locations that
+  // may be relevant to the insertion job.
+  if (partitionsTrackedByCatalog) {
+val matchingPartitions = 
t.sparkSession.sessionState.catalog.listPartitions(
--- End diff --

You also need the set of matching partitions (including those with default 
locations) in order to determine which ones to delete at the end of an 
overwrite call.

This makes the optimization quite messy, so I'd rather not push it to the 
catalog for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE...

2016-11-08 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15814#discussion_r87129853
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala ---
@@ -86,6 +86,16 @@ abstract class FileCommitProtocol {
   def newTaskTempFile(taskContext: TaskAttemptContext, dir: 
Option[String], ext: String): String
 
   /**
+   * Similar to newTaskTempFile(), but allows files to committed to an 
absolute output location.
+   * Depending on the implementation, there may be weaker guarantees 
around adding files this way.
+   */
+  def newTaskTempFileAbsPath(
--- End diff --

I thought about combining it but I think the method semantics become too 
subtle then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE...

2016-11-08 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15814#discussion_r87112188
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -418,6 +418,8 @@ case class DataSource(
 val plan =
   InsertIntoHadoopFsRelationCommand(
 outputPath,
+Map.empty,
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE...

2016-11-08 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15814#discussion_r87111758
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -42,17 +44,21 @@ class HadoopMapReduceCommitProtocol(jobId: String, 
path: String)
   /** OutputCommitter from Hadoop is not serializable so marking it 
transient. */
   @transient private var committer: OutputCommitter = _
 
+  /**
+   * Tracks files staged by this task for absolute output paths. These 
outputs are not managed by
+   * the Hadoop OutputCommitter, so we must move these to their final 
locations on job commit.
+   */
+  @transient private var addedAbsPathFiles: mutable.Map[String, String] = 
null
--- End diff --

They are files. We need to track the unique output location of each file 
here in order to know where to place it. We could use directories, but they 
would end up with one file each anyways.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE...

2016-11-08 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15814#discussion_r87112037
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -178,18 +178,13 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
 "partitions with value: " + 
dynamicPartitionKeys.keys.mkString("[", ",", "]"), ctx)
 }
 val overwrite = ctx.OVERWRITE != null
-val overwritePartition =
-  if (overwrite && partitionKeys.nonEmpty && 
dynamicPartitionKeys.isEmpty) {
-Some(partitionKeys.map(t => (t._1, t._2.get)))
-  } else {
-None
-  }
+val staticPartitionKeys = partitionKeys.filter(_._2.nonEmpty).map(t => 
(t._1, t._2.get))
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15814
  
**[Test build #68387 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68387/consoleFull)**
 for PR 15814 at commit 
[`4296612`](https://github.com/apache/spark/commit/4296612bd373bc1118b2cbfc10530b1895aaef60).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15233: [SPARK-17659] [SQL] Partitioned View is Not Supported By...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15233
  
**[Test build #68386 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68386/consoleFull)**
 for PR 15233 at commit 
[`c6d3acd`](https://github.com/apache/spark/commit/c6d3acd953ad70f6d66252d13e6cf4c0e507d33c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false work w...

2016-11-08 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15820
  
Wow, looks like the new github comment interface did all kinds of weird 
things, apologies about that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15820#discussion_r87129126
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
 ---
@@ -83,6 +86,113 @@ private[kafka010] case class CachedKafkaConsumer 
private(
 record
   }
 
+  @tailrec
+  final def getAndIgnoreLostData(
+  offset: Long,
+  untilOffset: Long,
+  pollTimeoutMs: Long): ConsumerRecord[Array[Byte], Array[Byte]] = {
+// scalastyle:off
+// When `failOnDataLoss` is `false`, we need to handle the following 
cases (note: untilOffset and latestOffset are exclusive):
+// 1. Some data are aged out, and `offset < beginningOffset <= 
untilOffset - 1 <= latestOffset - 1`
+//  Seek to the beginningOffset and fetch the data.
+// 2. Some data are aged out, and `offset <= untilOffset - 1 < 
beginningOffset`.
+//  There is nothing to fetch, return null.
+// 3. The topic is deleted.
+//  There is nothing to fetch, return null.
+// 4. The topic is deleted and recreated, and `beginningOffset <= 
offset <= untilOffset - 1 <= latestOffset - 1`.
+//  We cannot detect this case. We can still fetch data like 
nothing happens.
+// 5. The topic is deleted and recreated, and `beginningOffset <= 
offset < latestOffset - 1 < untilOffset - 1`.
+//  Same as 4.
+// 6. The topic is deleted and recreated, and `beginningOffset <= 
latestOffset - 1 < offset <= untilOffset - 1`.
+//  There is nothing to fetch, return null.
+// 7. The topic is deleted and recreated, and `offset < 
beginningOffset <= untilOffset - 1`.
+//  Same as 1.
+// 8. The topic is deleted and recreated, and `offset <= untilOffset - 
1 < beginningOffset`.
+//  There is nothing to fetch, return null.
+// scalastyle:on
+if (offset >= untilOffset) {
+  // Case 2 or 8
+  // We seek to beginningOffset but beginningOffset >= untilOffset
+  reset()
+  return null
+}
+
+logDebug(s"Get $groupId $topicPartition nextOffset 
$nextOffsetInFetchedData requested $offset")
+var outOfOffset = false
+if (offset != nextOffsetInFetchedData) {
+  logInfo(s"Initial fetch for $topicPartition $offset")
+  seek(offset)
+  try {
+poll(pollTimeoutMs)
+  } catch {
+case e: OffsetOutOfRangeException =>
+  logWarning(s"Cannot fetch offset $offset, try to recover from 
the beginning offset", e)
+  outOfOffset = true
+  }
+} else if (!fetchedData.hasNext()) {
+  // The last pre-fetched data has been drained.
+  seek(offset)
+  try {
+poll(pollTimeoutMs)
+  } catch {
+case e: OffsetOutOfRangeException =>
+  logWarning(s"Cannot fetch offset $offset, try to recover from 
the beginning offset", e)
+  outOfOffset = true
+  }
+}
+if (outOfOffset) {
+  val beginningOffset = getBeginningOffset()
+  if (beginningOffset <= offset) {
+val latestOffset = getLatestOffset()
+if (latestOffset <= offset) {
+  // Case 3 or 6
+  logWarning(s"Offset ${offset} is later than the latest offset 
$latestOffset. " +
+s"Skipped [$offset, $untilOffset)")
+  reset()
+  return null
+} else {
+  // Case 4 or 5
+  getAndIgnoreLostData(offset, untilOffset, pollTimeoutMs)
--- End diff --

- Why is this not an early return?
- The arguments to the recursive function have not changed at this point, 
right?  Why does it terminate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15233: [SPARK-17659] [SQL] Partitioned View is Not Suppo...

2016-11-08 Thread gatorsmile
GitHub user gatorsmile reopened a pull request:

https://github.com/apache/spark/pull/15233

[SPARK-17659] [SQL] Partitioned View is Not Supported By SHOW CREATE TABLE

### What changes were proposed in this pull request?

`Partitioned View` is not supported by SPARK SQL. For Hive partitioned 
view, SHOW CREATE TABLE is unable to generate the right DDL. Thus, SHOW CREATE 
TABLE should not support it like the other Hive-only features. This PR is to 
issue an exception when detecting the view is a partitioned view.
### How was this patch tested?

Added a test case


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark partitionedView

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15233.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15233


commit c6d3acd953ad70f6d66252d13e6cf4c0e507d33c
Author: gatorsmile 
Date:   2016-09-25T02:57:11Z

fix.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15820#discussion_r87129981
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
 ---
@@ -83,6 +86,113 @@ private[kafka010] case class CachedKafkaConsumer 
private(
 record
   }
 
+  @tailrec
+  final def getAndIgnoreLostData(
+  offset: Long,
+  untilOffset: Long,
+  pollTimeoutMs: Long): ConsumerRecord[Array[Byte], Array[Byte]] = {
+// scalastyle:off
+// When `failOnDataLoss` is `false`, we need to handle the following 
cases (note: untilOffset and latestOffset are exclusive):
+// 1. Some data are aged out, and `offset < beginningOffset <= 
untilOffset - 1 <= latestOffset - 1`
+//  Seek to the beginningOffset and fetch the data.
+// 2. Some data are aged out, and `offset <= untilOffset - 1 < 
beginningOffset`.
+//  There is nothing to fetch, return null.
+// 3. The topic is deleted.
+//  There is nothing to fetch, return null.
+// 4. The topic is deleted and recreated, and `beginningOffset <= 
offset <= untilOffset - 1 <= latestOffset - 1`.
+//  We cannot detect this case. We can still fetch data like 
nothing happens.
+// 5. The topic is deleted and recreated, and `beginningOffset <= 
offset < latestOffset - 1 < untilOffset - 1`.
+//  Same as 4.
+// 6. The topic is deleted and recreated, and `beginningOffset <= 
latestOffset - 1 < offset <= untilOffset - 1`.
+//  There is nothing to fetch, return null.
+// 7. The topic is deleted and recreated, and `offset < 
beginningOffset <= untilOffset - 1`.
+//  Same as 1.
+// 8. The topic is deleted and recreated, and `offset <= untilOffset - 
1 < beginningOffset`.
+//  There is nothing to fetch, return null.
+// scalastyle:on
+if (offset >= untilOffset) {
+  // Case 2 or 8
+  // We seek to beginningOffset but beginningOffset >= untilOffset
+  reset()
+  return null
+}
+
+logDebug(s"Get $groupId $topicPartition nextOffset 
$nextOffsetInFetchedData requested $offset")
+var outOfOffset = false
--- End diff --

Can this var be eliminated by just using a single try around the if / else? 
 It's the same catch condition in either case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15820#discussion_r87129817
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
 ---
@@ -83,6 +86,113 @@ private[kafka010] case class CachedKafkaConsumer 
private(
 record
   }
 
+  @tailrec
+  final def getAndIgnoreLostData(
+  offset: Long,
+  untilOffset: Long,
+  pollTimeoutMs: Long): ConsumerRecord[Array[Byte], Array[Byte]] = {
+// scalastyle:off
+// When `failOnDataLoss` is `false`, we need to handle the following 
cases (note: untilOffset and latestOffset are exclusive):
+// 1. Some data are aged out, and `offset < beginningOffset <= 
untilOffset - 1 <= latestOffset - 1`
+//  Seek to the beginningOffset and fetch the data.
+// 2. Some data are aged out, and `offset <= untilOffset - 1 < 
beginningOffset`.
+//  There is nothing to fetch, return null.
+// 3. The topic is deleted.
+//  There is nothing to fetch, return null.
+// 4. The topic is deleted and recreated, and `beginningOffset <= 
offset <= untilOffset - 1 <= latestOffset - 1`.
+//  We cannot detect this case. We can still fetch data like 
nothing happens.
+// 5. The topic is deleted and recreated, and `beginningOffset <= 
offset < latestOffset - 1 < untilOffset - 1`.
+//  Same as 4.
+// 6. The topic is deleted and recreated, and `beginningOffset <= 
latestOffset - 1 < offset <= untilOffset - 1`.
+//  There is nothing to fetch, return null.
+// 7. The topic is deleted and recreated, and `offset < 
beginningOffset <= untilOffset - 1`.
+//  Same as 1.
+// 8. The topic is deleted and recreated, and `offset <= untilOffset - 
1 < beginningOffset`.
+//  There is nothing to fetch, return null.
+// scalastyle:on
+if (offset >= untilOffset) {
+  // Case 2 or 8
+  // We seek to beginningOffset but beginningOffset >= untilOffset
+  reset()
+  return null
+}
+
+logDebug(s"Get $groupId $topicPartition nextOffset 
$nextOffsetInFetchedData requested $offset")
+var outOfOffset = false
+if (offset != nextOffsetInFetchedData) {
+  logInfo(s"Initial fetch for $topicPartition $offset")
+  seek(offset)
+  try {
+poll(pollTimeoutMs)
+  } catch {
+case e: OffsetOutOfRangeException =>
+  logWarning(s"Cannot fetch offset $offset, try to recover from 
the beginning offset", e)
+  outOfOffset = true
+  }
+} else if (!fetchedData.hasNext()) {
+  // The last pre-fetched data has been drained.
+  seek(offset)
--- End diff --

I don't think it's necessary to seek every time the fetched data is empty, 
in normal operation the poll should return the next offset, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15820#discussion_r87129927
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
 ---
@@ -83,6 +86,113 @@ private[kafka010] case class CachedKafkaConsumer 
private(
 record
   }
 
+  @tailrec
+  final def getAndIgnoreLostData(
+  offset: Long,
+  untilOffset: Long,
+  pollTimeoutMs: Long): ConsumerRecord[Array[Byte], Array[Byte]] = {
+// scalastyle:off
+// When `failOnDataLoss` is `false`, we need to handle the following 
cases (note: untilOffset and latestOffset are exclusive):
+// 1. Some data are aged out, and `offset < beginningOffset <= 
untilOffset - 1 <= latestOffset - 1`
+//  Seek to the beginningOffset and fetch the data.
+// 2. Some data are aged out, and `offset <= untilOffset - 1 < 
beginningOffset`.
+//  There is nothing to fetch, return null.
+// 3. The topic is deleted.
+//  There is nothing to fetch, return null.
+// 4. The topic is deleted and recreated, and `beginningOffset <= 
offset <= untilOffset - 1 <= latestOffset - 1`.
+//  We cannot detect this case. We can still fetch data like 
nothing happens.
+// 5. The topic is deleted and recreated, and `beginningOffset <= 
offset < latestOffset - 1 < untilOffset - 1`.
+//  Same as 4.
+// 6. The topic is deleted and recreated, and `beginningOffset <= 
latestOffset - 1 < offset <= untilOffset - 1`.
+//  There is nothing to fetch, return null.
+// 7. The topic is deleted and recreated, and `offset < 
beginningOffset <= untilOffset - 1`.
+//  Same as 1.
+// 8. The topic is deleted and recreated, and `offset <= untilOffset - 
1 < beginningOffset`.
+//  There is nothing to fetch, return null.
+// scalastyle:on
+if (offset >= untilOffset) {
+  // Case 2 or 8
+  // We seek to beginningOffset but beginningOffset >= untilOffset
+  reset()
+  return null
+}
+
+logDebug(s"Get $groupId $topicPartition nextOffset 
$nextOffsetInFetchedData requested $offset")
+var outOfOffset = false
+if (offset != nextOffsetInFetchedData) {
+  logInfo(s"Initial fetch for $topicPartition $offset")
+  seek(offset)
+  try {
+poll(pollTimeoutMs)
+  } catch {
+case e: OffsetOutOfRangeException =>
+  logWarning(s"Cannot fetch offset $offset, try to recover from 
the beginning offset", e)
+  outOfOffset = true
+  }
+} else if (!fetchedData.hasNext()) {
+  // The last pre-fetched data has been drained.
+  seek(offset)
+  try {
+poll(pollTimeoutMs)
+  } catch {
+case e: OffsetOutOfRangeException =>
+  logWarning(s"Cannot fetch offset $offset, try to recover from 
the beginning offset", e)
+  outOfOffset = true
+  }
+}
+if (outOfOffset) {
+  val beginningOffset = getBeginningOffset()
+  if (beginningOffset <= offset) {
+val latestOffset = getLatestOffset()
+if (latestOffset <= offset) {
+  // Case 3 or 6
+  logWarning(s"Offset ${offset} is later than the latest offset 
$latestOffset. " +
+s"Skipped [$offset, $untilOffset)")
+  reset()
+  return null
+} else {
+  // Case 4 or 5
+  getAndIgnoreLostData(offset, untilOffset, pollTimeoutMs)
+}
+  } else {
+// Case 1 or 7
+logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 
$beginningOffset})")
+return getAndIgnoreLostData(beginningOffset, untilOffset, 
pollTimeoutMs)
+  }
+} else {
+  if (!fetchedData.hasNext()) {
+// We cannot fetch anything after `polling`. Two possible cases:
+// - `beginningOffset` is `offset` but there is nothing for 
`beginningOffset` right now.
+// - Cannot fetch any date before timeout.
+// Because there is no way to distinguish, just skip the rest 
offsets in the current
+// partition.
+logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 
$untilOffset)")
+reset()
+return null
+  }
+
+  val record = fetchedData.next()
+  if (record.offset >= untilOffset) {
+// Case 2
+logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 
$untilOffset)")
+reset()
+return null
+  } else {
+if (record.offset != offset) {
+  // Case 1
+  logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 

[GitHub] spark issue #15233: [SPARK-17659] [SQL] Partitioned View is Not Supported By...

2016-11-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15233
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15233: [SPARK-17659] [SQL] Partitioned View is Not Supported By...

2016-11-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15233
  
Sure, let me reopen it. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15820#discussion_r87127811
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
 ---
@@ -83,6 +86,113 @@ private[kafka010] case class CachedKafkaConsumer 
private(
 record
   }
 
+  @tailrec
+  final def getAndIgnoreLostData(
+  offset: Long,
+  untilOffset: Long,
+  pollTimeoutMs: Long): ConsumerRecord[Array[Byte], Array[Byte]] = {
+// scalastyle:off
+// When `failOnDataLoss` is `false`, we need to handle the following 
cases (note: untilOffset and latestOffset are exclusive):
+// 1. Some data are aged out, and `offset < beginningOffset <= 
untilOffset - 1 <= latestOffset - 1`
+//  Seek to the beginningOffset and fetch the data.
+// 2. Some data are aged out, and `offset <= untilOffset - 1 < 
beginningOffset`.
+//  There is nothing to fetch, return null.
+// 3. The topic is deleted.
+//  There is nothing to fetch, return null.
+// 4. The topic is deleted and recreated, and `beginningOffset <= 
offset <= untilOffset - 1 <= latestOffset - 1`.
+//  We cannot detect this case. We can still fetch data like 
nothing happens.
+// 5. The topic is deleted and recreated, and `beginningOffset <= 
offset < latestOffset - 1 < untilOffset - 1`.
+//  Same as 4.
+// 6. The topic is deleted and recreated, and `beginningOffset <= 
latestOffset - 1 < offset <= untilOffset - 1`.
+//  There is nothing to fetch, return null.
+// 7. The topic is deleted and recreated, and `offset < 
beginningOffset <= untilOffset - 1`.
+//  Same as 1.
+// 8. The topic is deleted and recreated, and `offset <= untilOffset - 
1 < beginningOffset`.
+//  There is nothing to fetch, return null.
+// scalastyle:on
+if (offset >= untilOffset) {
+  // Case 2 or 8
+  // We seek to beginningOffset but beginningOffset >= untilOffset
+  reset()
+  return null
+}
+
+logDebug(s"Get $groupId $topicPartition nextOffset 
$nextOffsetInFetchedData requested $offset")
+var outOfOffset = false
+if (offset != nextOffsetInFetchedData) {
+  logInfo(s"Initial fetch for $topicPartition $offset")
+  seek(offset)
+  try {
+poll(pollTimeoutMs)
+  } catch {
+case e: OffsetOutOfRangeException =>
+  logWarning(s"Cannot fetch offset $offset, try to recover from 
the beginning offset", e)
+  outOfOffset = true
+  }
+} else if (!fetchedData.hasNext()) {
+  // The last pre-fetched data has been drained.
+  seek(offset)
+  try {
+poll(pollTimeoutMs)
+  } catch {
+case e: OffsetOutOfRangeException =>
+  logWarning(s"Cannot fetch offset $offset, try to recover from 
the beginning offset", e)
+  outOfOffset = true
+  }
+}
+if (outOfOffset) {
+  val beginningOffset = getBeginningOffset()
+  if (beginningOffset <= offset) {
+val latestOffset = getLatestOffset()
+if (latestOffset <= offset) {
+  // Case 3 or 6
+  logWarning(s"Offset ${offset} is later than the latest offset 
$latestOffset. " +
+s"Skipped [$offset, $untilOffset)")
+  reset()
+  return null
+} else {
+  // Case 4 or 5
+  getAndIgnoreLostData(offset, untilOffset, pollTimeoutMs)
+}
+  } else {
+// Case 1 or 7
+logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 
$beginningOffset})")
+return getAndIgnoreLostData(beginningOffset, untilOffset, 
pollTimeoutMs)
+  }
+} else {
+  if (!fetchedData.hasNext()) {
+// We cannot fetch anything after `polling`. Two possible cases:
+// - `beginningOffset` is `offset` but there is nothing for 
`beginningOffset` right now.
+// - Cannot fetch any date before timeout.
+// Because there is no way to distinguish, just skip the rest 
offsets in the current
+// partition.
+logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 
$untilOffset)")
+reset()
+return null
+  }
+
+  val record = fetchedData.next()
+  if (record.offset >= untilOffset) {
+// Case 2
+logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 
$untilOffset)")
+reset()
+return null
+  } else {
+if (record.offset != offset) {
+  // Case 1
+  logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15820#discussion_r87130059
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
 ---
@@ -83,6 +86,113 @@ private[kafka010] case class CachedKafkaConsumer 
private(
 record
   }
 
+  @tailrec
+  final def getAndIgnoreLostData(
+  offset: Long,
+  untilOffset: Long,
+  pollTimeoutMs: Long): ConsumerRecord[Array[Byte], Array[Byte]] = {
+// scalastyle:off
+// When `failOnDataLoss` is `false`, we need to handle the following 
cases (note: untilOffset and latestOffset are exclusive):
+// 1. Some data are aged out, and `offset < beginningOffset <= 
untilOffset - 1 <= latestOffset - 1`
+//  Seek to the beginningOffset and fetch the data.
+// 2. Some data are aged out, and `offset <= untilOffset - 1 < 
beginningOffset`.
+//  There is nothing to fetch, return null.
+// 3. The topic is deleted.
+//  There is nothing to fetch, return null.
+// 4. The topic is deleted and recreated, and `beginningOffset <= 
offset <= untilOffset - 1 <= latestOffset - 1`.
+//  We cannot detect this case. We can still fetch data like 
nothing happens.
+// 5. The topic is deleted and recreated, and `beginningOffset <= 
offset < latestOffset - 1 < untilOffset - 1`.
+//  Same as 4.
+// 6. The topic is deleted and recreated, and `beginningOffset <= 
latestOffset - 1 < offset <= untilOffset - 1`.
+//  There is nothing to fetch, return null.
+// 7. The topic is deleted and recreated, and `offset < 
beginningOffset <= untilOffset - 1`.
+//  Same as 1.
+// 8. The topic is deleted and recreated, and `offset <= untilOffset - 
1 < beginningOffset`.
+//  There is nothing to fetch, return null.
+// scalastyle:on
+if (offset >= untilOffset) {
+  // Case 2 or 8
+  // We seek to beginningOffset but beginningOffset >= untilOffset
+  reset()
+  return null
+}
+
+logDebug(s"Get $groupId $topicPartition nextOffset 
$nextOffsetInFetchedData requested $offset")
+var outOfOffset = false
+if (offset != nextOffsetInFetchedData) {
+  logInfo(s"Initial fetch for $topicPartition $offset")
+  seek(offset)
+  try {
+poll(pollTimeoutMs)
+  } catch {
+case e: OffsetOutOfRangeException =>
+  logWarning(s"Cannot fetch offset $offset, try to recover from 
the beginning offset", e)
+  outOfOffset = true
+  }
+} else if (!fetchedData.hasNext()) {
+  // The last pre-fetched data has been drained.
+  seek(offset)
+  try {
+poll(pollTimeoutMs)
+  } catch {
+case e: OffsetOutOfRangeException =>
+  logWarning(s"Cannot fetch offset $offset, try to recover from 
the beginning offset", e)
+  outOfOffset = true
+  }
+}
+if (outOfOffset) {
+  val beginningOffset = getBeginningOffset()
+  if (beginningOffset <= offset) {
+val latestOffset = getLatestOffset()
+if (latestOffset <= offset) {
+  // Case 3 or 6
+  logWarning(s"Offset ${offset} is later than the latest offset 
$latestOffset. " +
+s"Skipped [$offset, $untilOffset)")
+  reset()
+  return null
+} else {
+  // Case 4 or 5
+  getAndIgnoreLostData(offset, untilOffset, pollTimeoutMs)
--- End diff --

- Why isn't this an early return?
- Unless I'm misreading, this is a recursive call without changing the 
arguments.  Why is it guaranteed to terminate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15820#discussion_r87128373
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
 ---
@@ -83,6 +86,113 @@ private[kafka010] case class CachedKafkaConsumer 
private(
 record
   }
 
+  @tailrec
+  final def getAndIgnoreLostData(
+  offset: Long,
+  untilOffset: Long,
+  pollTimeoutMs: Long): ConsumerRecord[Array[Byte], Array[Byte]] = {
+// scalastyle:off
+// When `failOnDataLoss` is `false`, we need to handle the following 
cases (note: untilOffset and latestOffset are exclusive):
+// 1. Some data are aged out, and `offset < beginningOffset <= 
untilOffset - 1 <= latestOffset - 1`
+//  Seek to the beginningOffset and fetch the data.
+// 2. Some data are aged out, and `offset <= untilOffset - 1 < 
beginningOffset`.
+//  There is nothing to fetch, return null.
+// 3. The topic is deleted.
+//  There is nothing to fetch, return null.
+// 4. The topic is deleted and recreated, and `beginningOffset <= 
offset <= untilOffset - 1 <= latestOffset - 1`.
+//  We cannot detect this case. We can still fetch data like 
nothing happens.
+// 5. The topic is deleted and recreated, and `beginningOffset <= 
offset < latestOffset - 1 < untilOffset - 1`.
+//  Same as 4.
+// 6. The topic is deleted and recreated, and `beginningOffset <= 
latestOffset - 1 < offset <= untilOffset - 1`.
+//  There is nothing to fetch, return null.
+// 7. The topic is deleted and recreated, and `offset < 
beginningOffset <= untilOffset - 1`.
+//  Same as 1.
+// 8. The topic is deleted and recreated, and `offset <= untilOffset - 
1 < beginningOffset`.
+//  There is nothing to fetch, return null.
+// scalastyle:on
+if (offset >= untilOffset) {
+  // Case 2 or 8
+  // We seek to beginningOffset but beginningOffset >= untilOffset
+  reset()
+  return null
+}
+
+logDebug(s"Get $groupId $topicPartition nextOffset 
$nextOffsetInFetchedData requested $offset")
+var outOfOffset = false
+if (offset != nextOffsetInFetchedData) {
+  logInfo(s"Initial fetch for $topicPartition $offset")
+  seek(offset)
+  try {
+poll(pollTimeoutMs)
+  } catch {
+case e: OffsetOutOfRangeException =>
+  logWarning(s"Cannot fetch offset $offset, try to recover from 
the beginning offset", e)
+  outOfOffset = true
+  }
+} else if (!fetchedData.hasNext()) {
+  // The last pre-fetched data has been drained.
+  seek(offset)
+  try {
+poll(pollTimeoutMs)
+  } catch {
+case e: OffsetOutOfRangeException =>
+  logWarning(s"Cannot fetch offset $offset, try to recover from 
the beginning offset", e)
+  outOfOffset = true
+  }
+}
+if (outOfOffset) {
+  val beginningOffset = getBeginningOffset()
+  if (beginningOffset <= offset) {
+val latestOffset = getLatestOffset()
+if (latestOffset <= offset) {
+  // Case 3 or 6
+  logWarning(s"Offset ${offset} is later than the latest offset 
$latestOffset. " +
+s"Skipped [$offset, $untilOffset)")
+  reset()
+  return null
+} else {
+  // Case 4 or 5
+  getAndIgnoreLostData(offset, untilOffset, pollTimeoutMs)
+}
+  } else {
+// Case 1 or 7
+logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 
$beginningOffset})")
+return getAndIgnoreLostData(beginningOffset, untilOffset, 
pollTimeoutMs)
+  }
+} else {
+  if (!fetchedData.hasNext()) {
+// We cannot fetch anything after `polling`. Two possible cases:
+// - `beginningOffset` is `offset` but there is nothing for 
`beginningOffset` right now.
+// - Cannot fetch any date before timeout.
+// Because there is no way to distinguish, just skip the rest 
offsets in the current
+// partition.
+logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 
$untilOffset)")
+reset()
+return null
+  }
+
+  val record = fetchedData.next()
+  if (record.offset >= untilOffset) {
+// Case 2
+logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 
$untilOffset)")
+reset()
+return null
+  } else {
+if (record.offset != offset) {
+  // Case 1
+  logWarning(s"Buffer miss for $groupId $topicPartition [$offset, 

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15820#discussion_r87129204
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
 ---
@@ -83,6 +86,113 @@ private[kafka010] case class CachedKafkaConsumer 
private(
 record
   }
 
+  @tailrec
+  final def getAndIgnoreLostData(
+  offset: Long,
+  untilOffset: Long,
+  pollTimeoutMs: Long): ConsumerRecord[Array[Byte], Array[Byte]] = {
+// scalastyle:off
+// When `failOnDataLoss` is `false`, we need to handle the following 
cases (note: untilOffset and latestOffset are exclusive):
+// 1. Some data are aged out, and `offset < beginningOffset <= 
untilOffset - 1 <= latestOffset - 1`
+//  Seek to the beginningOffset and fetch the data.
+// 2. Some data are aged out, and `offset <= untilOffset - 1 < 
beginningOffset`.
+//  There is nothing to fetch, return null.
+// 3. The topic is deleted.
+//  There is nothing to fetch, return null.
+// 4. The topic is deleted and recreated, and `beginningOffset <= 
offset <= untilOffset - 1 <= latestOffset - 1`.
+//  We cannot detect this case. We can still fetch data like 
nothing happens.
+// 5. The topic is deleted and recreated, and `beginningOffset <= 
offset < latestOffset - 1 < untilOffset - 1`.
+//  Same as 4.
+// 6. The topic is deleted and recreated, and `beginningOffset <= 
latestOffset - 1 < offset <= untilOffset - 1`.
+//  There is nothing to fetch, return null.
+// 7. The topic is deleted and recreated, and `offset < 
beginningOffset <= untilOffset - 1`.
+//  Same as 1.
+// 8. The topic is deleted and recreated, and `offset <= untilOffset - 
1 < beginningOffset`.
+//  There is nothing to fetch, return null.
+// scalastyle:on
+if (offset >= untilOffset) {
+  // Case 2 or 8
+  // We seek to beginningOffset but beginningOffset >= untilOffset
+  reset()
+  return null
+}
+
+logDebug(s"Get $groupId $topicPartition nextOffset 
$nextOffsetInFetchedData requested $offset")
+var outOfOffset = false
--- End diff --

Can't this var be eliminated with a singly try around the following if/else?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15814: [SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE ...

2016-11-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15814
  
**[Test build #68385 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68385/consoleFull)**
 for PR 15814 at commit 
[`fbd7b42`](https://github.com/apache/spark/commit/fbd7b425290644fcfdcaf92491aaae2ac67eefb9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >