[GitHub] spark issue #17083: [SPARK-19750][UI][branch-2.1] Fix redirect issue from ht...

2017-03-01 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/17083
  
Ping @vanzin , do you have any further comments? Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17128: [SPARK-18352][DOCS] wholeFile JSON update doc and progra...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17128
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17128: [SPARK-18352][DOCS] wholeFile JSON update doc and progra...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17128
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73729/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17128: [SPARK-18352][DOCS] wholeFile JSON update doc and progra...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17128
  
**[Test build #73729 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73729/testReport)**
 for PR 17128 at commit 
[`f5daeae`](https://github.com/apache/spark/commit/f5daeae056fdae4ef42282206173f8484498968e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17034
  
**[Test build #73738 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73738/testReport)**
 for PR 17034 at commit 
[`31293b2`](https://github.com/apache/spark/commit/31293b2dc9483b8bcf7639420a23fc4f2b219598).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17122
  
**[Test build #73737 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73737/testReport)**
 for PR 17122 at commit 
[`6697928`](https://github.com/apache/spark/commit/6697928e4ff8cf93c4b63c9b6e4b18bec4a2f87a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16971: [SPARK-19573][SQL] Make NaN/null handling consistent in ...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16971
  
**[Test build #73739 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73739/testReport)**
 for PR 16971 at commit 
[`2071aae`](https://github.com/apache/spark/commit/2071aaec4cb3805c2cebbf2732f274d881182f3d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...

2017-03-01 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17001#discussion_r103865522
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -30,7 +33,7 @@ import 
org.apache.spark.sql.catalyst.expressions.Expression
  *
  * Implementations should throw [[NoSuchDatabaseException]] when databases 
don't exist.
  */
-abstract class ExternalCatalog {
+abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) 
{
--- End diff --

ok~ let me fix it~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17124: [SPARK-19779][SS]Delete needless tmp file after r...

2017-03-01 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/17124#discussion_r103864707
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
 ---
@@ -282,8 +282,12 @@ private[state] class HDFSBackedStateStoreProvider(
   // target file will break speculation, skipping the rename step is 
the only choice. It's still
   // semantically correct because Structured Streaming requires 
rerunning a batch should
   // generate the same output. (SPARK-19677)
+  // Also, a tmp file of delta file that generated by the first batch 
after restart
--- End diff --

This comment is not 100% correct, this may also happen in a speculation 
task.

This PR is just a follow up to delete the temp file that #17012 forgot to 
do it. IMO, not need to add a comment for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17124: [SPARK-19779][SS]Delete needless tmp file after r...

2017-03-01 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/17124#discussion_r103865389
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala
 ---
@@ -295,6 +295,28 @@ class StateStoreSuite extends SparkFunSuite with 
BeforeAndAfter with PrivateMeth
 provider.getStore(0).commit()
   }
 
+  test("SPARK-19779: A tmp file of delta file should not be reserved on 
HDFS " +
--- End diff --

Instead of adding a new test, I prefer to just add several lines to the 
above `SPARK-19677: Committing a delta file atop an existing one should not 
fail on HDFS`. E.g.

```
  test("SPARK-19677: Committing a delta file atop an existing one should 
not fail on HDFS") {
val conf = new Configuration()
conf.set("fs.fake.impl", classOf[RenameLikeHDFSFileSystem].getName)
conf.set("fs.default.name", "fake:///")

val provider = newStoreProvider(hadoopConf = conf)
provider.getStore(0).commit()
provider.getStore(0).commit()

// Verify we don't leak temp files
val tempFiles = FileUtils.listFiles(new 
File(provider.id.checkpointLocation), null, true)
  .asScala.filter(_.getName.contains("temp-"))
assert(tempFiles.isEmpty)
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17125#discussion_r103865408
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -128,6 +129,15 @@ case class CreateViewCommand(
 qe.assertAnalyzed()
 val analyzedPlan = qe.analyzed
 
+// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an 
AnalysisException.
+analyzedPlan match {
+  case i: InsertIntoHadoopFsRelationCommand =>
--- End diff --

can we fix it at parser side? cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17001#discussion_r103865312
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -30,7 +33,7 @@ import 
org.apache.spark.sql.catalyst.expressions.Expression
  *
  * Implementations should throw [[NoSuchDatabaseException]] when databases 
don't exist.
  */
-abstract class ExternalCatalog {
+abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) 
{
--- End diff --

but it will be only used in `getDatabase`, and we can save a metastore call 
to get the default database.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103865206
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
   def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): 
String = {
 throw new UnsupportedOperationException
   }
+
+  /*
+   * for optimization to suppress shouldStop() in a loop of 
WholeStageCodegen
+   */
+  // true: require to insert shouldStop() into a loop
+  protected var shouldStopRequired: Boolean = false
+
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103865202
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
   def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): 
String = {
 throw new UnsupportedOperationException
   }
+
+  /*
+   * for optimization to suppress shouldStop() in a loop of 
WholeStageCodegen
+   */
+  // true: require to insert shouldStop() into a loop
--- End diff --

Updated comments around here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17097: [SPARK-19765][SQL] UNCACHE TABLE should re-cache all cac...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17097
  
**[Test build #73736 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73736/testReport)**
 for PR 17097 at commit 
[`e881f29`](https://github.com/apache/spark/commit/e881f29bf5839af2f2ed723ccdb77516c795ef90).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...

2017-03-01 Thread wojtek-szymanski
Github user wojtek-szymanski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17075#discussion_r103864843
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala ---
@@ -193,7 +193,7 @@ class DecimalSuite extends SparkFunSuite with 
PrivateMethodTester {
 assert(Decimal(Long.MaxValue, 100, 0).toUnscaledLong === Long.MaxValue)
   }
 
-  test("changePrecision() on compact decimal should respect rounding 
mode") {
+  test("changePrecision/toPrecission on compact decimal should respect 
rounding mode") {
--- End diff --

Thanks, fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...

2017-03-01 Thread wojtek-szymanski
Github user wojtek-szymanski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17075#discussion_r103864772
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -362,17 +374,13 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   def abs: Decimal = if (this.compare(Decimal.ZERO) < 0) this.unary_- else 
this
 
   def floor: Decimal = if (scale == 0) this else {
-val value = this.clone()
-value.changePrecision(
-  DecimalType.bounded(precision - scale + 1, 0).precision, 0, 
ROUND_FLOOR)
-value
+toPrecision(DecimalType.bounded(precision - scale + 1, 0).precision, 
0, ROUND_FLOOR)
+  .getOrElse(clone())
   }
 
   def ceil: Decimal = if (scale == 0) this else {
-val value = this.clone()
-value.changePrecision(
-  DecimalType.bounded(precision - scale + 1, 0).precision, 0, 
ROUND_CEILING)
-value
+toPrecision(DecimalType.bounded(precision - scale + 1, 0).precision, 
0, ROUND_CEILING)
+  .getOrElse(clone())
--- End diff --

See my comment above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...

2017-03-01 Thread wojtek-szymanski
Github user wojtek-szymanski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17075#discussion_r103864736
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -362,17 +374,13 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   def abs: Decimal = if (this.compare(Decimal.ZERO) < 0) this.unary_- else 
this
 
   def floor: Decimal = if (scale == 0) this else {
-val value = this.clone()
-value.changePrecision(
-  DecimalType.bounded(precision - scale + 1, 0).precision, 0, 
ROUND_FLOOR)
-value
+toPrecision(DecimalType.bounded(precision - scale + 1, 0).precision, 
0, ROUND_FLOOR)
+  .getOrElse(clone())
--- End diff --

You're right, thanks. My suggestion is to raise an internal error if 
setting new precision in `floor` or `ceil` would fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...

2017-03-01 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/16910#discussion_r103864544
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1588,6 +1590,153 @@ class HiveDDLSuite
 }
   }
 
+  test("insert data to a hive serde table which has a non-existing 
location should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a string, b int)
+ |USING hive
+ |LOCATION '$dir'
--- End diff --

ok~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17132: [SPARK-19792][webui]In the Master Page,the column named ...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17132
  
**[Test build #3591 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3591/testReport)**
 for PR 17132 at commit 
[`6794b6b`](https://github.com/apache/spark/commit/6794b6bfc1def36c70471c75f6c2f9b188b23add).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...

2017-03-01 Thread wojtek-szymanski
Github user wojtek-szymanski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17075#discussion_r103864520
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala ---
@@ -233,6 +233,18 @@ class MathFunctionsSuite extends QueryTest with 
SharedSQLContext {
 )
   }
 
+  test("round/bround with data frame from a local Seq of Product") {
+val df = spark.createDataFrame(Seq(NumericRow(BigDecimal("5.9"
--- End diff --

Actually, the problem occurs only when creating data frame from `Product`. 
Unable to reproduce the issue with `Seq(BigDecimal("5.9")).toDF("value")`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17076: [SPARK-19745][ML] SVCAggregator captures coefficients in...

2017-03-01 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17076
  
@yanboliang yeah I agree we can do it in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17076: [SPARK-19745][ML] SVCAggregator captures coefficients in...

2017-03-01 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/17076
  
+1 @MLnick Three lines change, updated here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...

2017-03-01 Thread wojtek-szymanski
Github user wojtek-szymanski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17075#discussion_r103864255
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -362,17 +374,13 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   def abs: Decimal = if (this.compare(Decimal.ZERO) < 0) this.unary_- else 
this
 
   def floor: Decimal = if (scale == 0) this else {
-val value = this.clone()
-value.changePrecision(
-  DecimalType.bounded(precision - scale + 1, 0).precision, 0, 
ROUND_FLOOR)
-value
+toPrecision(DecimalType.bounded(precision - scale + 1, 0).precision, 
0, ROUND_FLOOR)
--- End diff --

Theoretically, it should be `Some`. On the other hand if something goes 
wrong when setting new precision in `floor` or `ceil`, I would raise an 
internal error:

def floor: Decimal = if (scale == 0) this else {
  val newPrecision = DecimalType.bounded(precision - scale + 1, 
0).precision
  toPrecision(newPrecision, 0, ROUND_FLOOR).getOrElse(
throw new AnalysisException(s"Overflow when setting precision to 
$newPrecision"))
}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for ...

2017-03-01 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17081#discussion_r103864206
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -364,7 +364,12 @@ case class DataSource(
 catalogTable.get,
 
catalogTable.get.stats.map(_.sizeInBytes.toLong).getOrElse(defaultTableSize))
 } else {
-  new InMemoryFileIndex(sparkSession, globbedPaths, options, 
Some(partitionSchema))
--- End diff --

ok, I think it is more reasonable~ thanks~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...

2017-03-01 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/17034#discussion_r103864147
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala
 ---
@@ -27,6 +27,9 @@ import org.apache.spark.ml.util.TestingUtils._
 import org.apache.spark.mllib.random.{ExponentialGenerator, 
WeibullGenerator}
 import org.apache.spark.mllib.util.MLlibTestSparkContext
 import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions.{col, lit}
+import org.apache.spark.sql.types.{ByteType, DecimalType, FloatType, 
IntegerType, LongType,
+  ShortType}
--- End diff --

The style rule is generally to use `_` when you're importing >= 5 things. 
You can revert it back, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...

2017-03-01 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17001#discussion_r103863975
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -30,7 +33,7 @@ import 
org.apache.spark.sql.catalyst.expressions.Expression
  *
  * Implementations should throw [[NoSuchDatabaseException]] when databases 
don't exist.
  */
-abstract class ExternalCatalog {
+abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) 
{
--- End diff --

if we pass a defaultDB, it seems like we introduce an instance of defaultDB 
as we discussed above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17076: [SPARK-19745][ML] SVCAggregator captures coefficients in...

2017-03-01 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/17076
  
@imatiach-msft `LinearRegression`, `LogisticRegression` and 
`AFTSurvivalRegression` do not have the `lazy` - they only do `private val 
gradientSumArray ...` so would need to be updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...

2017-03-01 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17125#discussion_r103863856
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -128,6 +129,15 @@ case class CreateViewCommand(
 qe.assertAnalyzed()
 val analyzedPlan = qe.analyzed
 
+// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an 
AnalysisException.
+analyzedPlan match {
+  case i: InsertIntoHadoopFsRelationCommand =>
--- End diff --

```
queryNoWith
: insertInto? queryTerm queryOrganization   
   #singleInsertQuery
| fromClause multiInsertQueryBody+  
   #multiInsertQuery
;
```
Seems we have mixed them together.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17034
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73731/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17034
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...

2017-03-01 Thread wojtek-szymanski
Github user wojtek-szymanski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17075#discussion_r103863771
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -223,12 +223,24 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
   }
 
   /**
+   * Create new `Decimal` with given precision and scale.
+   *
+   * @return `Some(decimal)` if successful or `None` if overflow would 
occur
+   */
+  private[sql] def toPrecision(precision: Int, scale: Int,
--- End diff --

Fixed, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17034
  
**[Test build #73731 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73731/testReport)**
 for PR 17034 at commit 
[`0185b45`](https://github.com/apache/spark/commit/0185b454aaa043406c39d1e8f19c98d3d345a836).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...

2017-03-01 Thread wojtek-szymanski
Github user wojtek-szymanski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17075#discussion_r103863738
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -339,36 +339,34 @@ case class Cast(child: Expression, dataType: 
DataType, timeZoneId: Option[String
   }
 
   /**
-   * Change the precision / scale in a given decimal to those set in 
`decimalType` (if any),
-   * returning null if it overflows or modifying `value` in-place and 
returning it if successful.
+   * Create new `Decimal` with precision and scale given in `decimalType` 
(if any),
+   * returning null if it overflows or creating a new `value` and 
returning it if successful.
*
-   * NOTE: this modifies `value` in-place, so don't call it on external 
data.
*/
-  private[this] def changePrecision(value: Decimal, decimalType: 
DecimalType): Decimal = {
-if (value.changePrecision(decimalType.precision, decimalType.scale)) 
value else null
-  }
+  private[this] def toPrecision(value: Decimal, decimalType: DecimalType): 
Decimal =
+value.toPrecision(decimalType.precision, decimalType.scale).orNull
 
   private[this] def castToDecimal(from: DataType, target: DecimalType): 
Any => Any = from match {
 case StringType =>
   buildCast[UTF8String](_, s => try {
-changePrecision(Decimal(new JavaBigDecimal(s.toString)), target)
+toPrecision(Decimal(new JavaBigDecimal(s.toString)), target)
   } catch {
 case _: NumberFormatException => null
   })
 case BooleanType =>
-  buildCast[Boolean](_, b => changePrecision(if (b) Decimal.ONE else 
Decimal.ZERO, target))
+  buildCast[Boolean](_, b => toPrecision(if (b) Decimal.ONE else 
Decimal.ZERO, target))
 case DateType =>
   buildCast[Int](_, d => null) // date can't cast to decimal in Hive
 case TimestampType =>
   // Note that we lose precision here.
-  buildCast[Long](_, t => 
changePrecision(Decimal(timestampToDouble(t)), target))
+  buildCast[Long](_, t => toPrecision(Decimal(timestampToDouble(t)), 
target))
 case dt: DecimalType =>
-  b => changePrecision(b.asInstanceOf[Decimal].clone(), target)
--- End diff --

Nope, there is one more here:

case BooleanType =>
  buildCast[Boolean](_, b => toPrecision(if (b) Decimal.ONE else 
Decimal.ZERO, target))

Both, `ONE` and `ZERO` are singletons so changing precision on themselves 
is not a good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17095: [SPARK-19763][SQL]qualified external datasource t...

2017-03-01 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17095#discussion_r103863691
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -254,7 +254,18 @@ class SessionCatalog(
 val db = 
formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase))
 val table = formatTableName(tableDefinition.identifier.table)
 validateName(table)
-val newTableDefinition = tableDefinition.copy(identifier = 
TableIdentifier(table, Some(db)))
+
+val newTableDefinition = if 
(tableDefinition.storage.locationUri.isDefined) {
--- End diff --

Yes, they should all be applied this logic~
 database has already contain this logic, shall I add the logic of 
partition in another pr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17132: [SPARK-19792][webui]In the Master Page,the column named ...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17132
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103863428
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
   def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): 
String = {
 throw new UnsupportedOperationException
   }
+
+  /*
+   * for optimization to suppress shouldStop() in a loop of 
WholeStageCodegen
+   */
+  // true: require to insert shouldStop() into a loop
+  protected var shouldStopRequired: Boolean = false
+
--- End diff --

Please add a simple comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17132: [SPARK-19792][webui]In the Master Page,the column...

2017-03-01 Thread 10110346
GitHub user 10110346 opened a pull request:

https://github.com/apache/spark/pull/17132

[SPARK-19792][webui]In the Master Page,the column named “Memory per 
Node” ,I think it is not all right

all right

Signed-off-by: liuxian 

## What changes were proposed in this pull request?

Open the spark web page,in the Master Page ,have two tables:Running 
Applications table and Completed Applications table, to the column named 
“Memory per Node” ,I think it is not all right ,because a node may be not 
have only one executor.So I think that should be named as “Memory per 
Executor”.Otherwise easy to let the user misunderstanding

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/10110346/spark wid-lx-0302

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17132.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17132






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17076: [SPARK-19745][ML] SVCAggregator captures coeffici...

2017-03-01 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/17076#discussion_r103863345
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -463,6 +458,8 @@ private class LinearSVCAggregator(
*/
   def add(instance: Instance): this.type = {
 instance match { case Instance(label, weight, features) =>
+  require(numFeatures == features.size, s"Dimensions mismatch when 
adding new instance." +
+s" Expecting $numFeatures but got ${features.size}.")
   if (weight == 0.0) return this
--- End diff --

Yes good catch - LoR and LinR both have this check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103863351
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
   def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): 
String = {
 throw new UnsupportedOperationException
   }
+
+  /*
+   * for optimization to suppress shouldStop() in a loop of 
WholeStageCodegen
+   */
+  // true: require to insert shouldStop() into a loop
--- End diff --

Btw, the usual style is:

 /**
  * 
  * 
  */


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17075: [SPARK-19727][SQL] Fix for round function that mo...

2017-03-01 Thread wojtek-szymanski
Github user wojtek-szymanski commented on a diff in the pull request:

https://github.com/apache/spark/pull/17075#discussion_r103863388
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -339,36 +339,34 @@ case class Cast(child: Expression, dataType: 
DataType, timeZoneId: Option[String
   }
 
   /**
-   * Change the precision / scale in a given decimal to those set in 
`decimalType` (if any),
-   * returning null if it overflows or modifying `value` in-place and 
returning it if successful.
+   * Create new `Decimal` with precision and scale given in `decimalType` 
(if any),
+   * returning null if it overflows or creating a new `value` and 
returning it if successful.
*
-   * NOTE: this modifies `value` in-place, so don't call it on external 
data.
*/
-  private[this] def changePrecision(value: Decimal, decimalType: 
DecimalType): Decimal = {
-if (value.changePrecision(decimalType.precision, decimalType.scale)) 
value else null
-  }
+  private[this] def toPrecision(value: Decimal, decimalType: DecimalType): 
Decimal =
+value.toPrecision(decimalType.precision, decimalType.scale).orNull
 
   private[this] def castToDecimal(from: DataType, target: DecimalType): 
Any => Any = from match {
 case StringType =>
   buildCast[UTF8String](_, s => try {
-changePrecision(Decimal(new JavaBigDecimal(s.toString)), target)
+toPrecision(Decimal(new JavaBigDecimal(s.toString)), target)
--- End diff --

agree


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17125#discussion_r103863369
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -128,6 +129,15 @@ case class CreateViewCommand(
 qe.assertAnalyzed()
 val analyzedPlan = qe.analyzed
 
+// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an 
AnalysisException.
+analyzedPlan match {
+  case i: InsertIntoHadoopFsRelationCommand =>
--- End diff --

h, why `INSERT INTO ...` is a query?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17095: [SPARK-19763][SQL]qualified external datasource t...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17095#discussion_r103863244
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -254,7 +254,18 @@ class SessionCatalog(
 val db = 
formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase))
 val table = formatTableName(tableDefinition.identifier.table)
 validateName(table)
-val newTableDefinition = tableDefinition.copy(identifier = 
TableIdentifier(table, Some(db)))
+
+val newTableDefinition = if 
(tableDefinition.storage.locationUri.isDefined) {
--- End diff --

shall we apply it to all locations like database location, partition 
location?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17001#discussion_r103863070
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -30,7 +33,7 @@ import 
org.apache.spark.sql.catalyst.expressions.Expression
  *
  * Implementations should throw [[NoSuchDatabaseException]] when databases 
don't exist.
  */
-abstract class ExternalCatalog {
+abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) 
{
--- End diff --

we still have conf/hadoopConf in `InMemoryCatalog` and 
`HiveExternalCatalog`, we can just add one more parameter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16910#discussion_r103862924
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1588,6 +1590,153 @@ class HiveDDLSuite
 }
   }
 
+  test("insert data to a hive serde table which has a non-existing 
location should succeed") {
+withTable("t") {
+  withTempDir { dir =>
+spark.sql(
+  s"""
+ |CREATE TABLE t(a string, b int)
+ |USING hive
+ |LOCATION '$dir'
--- End diff --

can we just call `dir.delete` before creating this table?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...

2017-03-01 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17001#discussion_r103862862
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -30,7 +33,7 @@ import 
org.apache.spark.sql.catalyst.expressions.Expression
  *
  * Implementations should throw [[NoSuchDatabaseException]] when databases 
don't exist.
  */
-abstract class ExternalCatalog {
+abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) 
{
--- End diff --

I think conf/hadoopConf is more useful, later logic can use it. and it's 
subclass also has these two conf


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103862946
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
   def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): 
String = {
 throw new UnsupportedOperationException
   }
+
+  /*
+   * for optimization to suppress shouldStop() in a loop of 
WholeStageCodegen
+   */
+  // true: require to insert shouldStop() into a loop
--- End diff --

Your comment style looks weird. Please put `true...` in the /*... */


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103862749
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
   def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): 
String = {
 throw new UnsupportedOperationException
   }
+
+  /*
+   * for optimization to suppress shouldStop() in a loop of 
WholeStageCodegen
+   */
+  // true: require to insert shouldStop() into a loop
--- End diff --

?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16910
  
**[Test build #73735 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73735/testReport)**
 for PR 16910 at commit 
[`a4f771a`](https://github.com/apache/spark/commit/a4f771a60f0c716e1811acab5fffead1929d8e80).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17122
  
**[Test build #73734 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73734/testReport)**
 for PR 17122 at commit 
[`9528ccc`](https://github.com/apache/spark/commit/9528ccc2d63d8c657d74f455dd2589d8e883d51c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for ...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17081#discussion_r103862631
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -364,7 +364,12 @@ case class DataSource(
 catalogTable.get,
 
catalogTable.get.stats.map(_.sizeInBytes.toLong).getOrElse(defaultTableSize))
 } else {
-  new InMemoryFileIndex(sparkSession, globbedPaths, options, 
Some(partitionSchema))
--- End diff --

I'd like to create file status cache as a local variable, pass it to 
`getOrInferFileFormatSchema`, then use it here. It's much easier to reason 
about the lifetime of this cache by this way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103862423
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -206,6 +206,16 @@ trait CodegenSupport extends SparkPlan {
   def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): 
String = {
 throw new UnsupportedOperationException
   }
+
+  /*
+   * for optimization to suppress shouldStop() in a loop of 
WholeStageCodegen
+   */
+  // true: require to insert shouldStop() into a loop
--- End diff --

??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for ...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17081#discussion_r103862351
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -86,7 +86,7 @@ case class DataSource(
   lazy val providingClass: Class[_] = 
DataSource.lookupDataSource(className)
   lazy val sourceInfo: SourceInfo = sourceSchema()
   private val caseInsensitiveOptions = CaseInsensitiveMap(options)
-
+  private lazy val fileStatusCache = 
FileStatusCache.getOrCreate(sparkSession)
--- End diff --

what's the life time of this cache?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103862311
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -434,6 +434,17 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
 val input = ctx.freshName("input")
 // Right now, Range is only used when there is one upstream.
 ctx.addMutableState("scala.collection.Iterator", input, s"$input = 
inputs[0];")
+
+val localIdx = ctx.freshName("localIdx")
+val localEnd = ctx.freshName("localEnd")
+val range = ctx.freshName("range")
+// we need to place consume() before calling isShouldStopRequired
--- End diff --

Thank you, done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103862282
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -206,6 +207,13 @@ trait CodegenSupport extends SparkPlan {
   def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): 
String = {
 throw new UnsupportedOperationException
   }
+
+  /* for optimization */
+  var shouldStopRequired: Boolean = false
--- End diff --

Sure, done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17095: [SPARK-19763][SQL]qualified external datasource t...

2017-03-01 Thread windpiger
Github user windpiger commented on a diff in the pull request:

https://github.com/apache/spark/pull/17095#discussion_r103862289
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -254,7 +254,18 @@ class SessionCatalog(
 val db = 
formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase))
 val table = formatTableName(tableDefinition.identifier.table)
 validateName(table)
-val newTableDefinition = tableDefinition.copy(identifier = 
TableIdentifier(table, Some(db)))
+
+val newTableDefinition = if 
(tableDefinition.storage.locationUri.isDefined) {
--- End diff --

if the location without schema like hdfs/file, when we restore it from 
metastore, we did not know what filesystem where the table stored. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103862257
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -77,6 +77,7 @@ trait CodegenSupport extends SparkPlan {
*/
   final def produce(ctx: CodegenContext, parent: CodegenSupport): String = 
executeQuery {
 this.parent = parent
+
--- End diff --

good catch. done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103862272
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -206,6 +207,13 @@ trait CodegenSupport extends SparkPlan {
   def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): 
String = {
 throw new UnsupportedOperationException
   }
+
+  /* for optimization */
--- End diff --

I see. done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17095: [SPARK-19763][SQL]qualified external datasource t...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17095#discussion_r103862062
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -254,7 +254,18 @@ class SessionCatalog(
 val db = 
formatDatabaseName(tableDefinition.identifier.database.getOrElse(getCurrentDatabase))
 val table = formatTableName(tableDefinition.identifier.table)
 validateName(table)
-val newTableDefinition = tableDefinition.copy(identifier = 
TableIdentifier(table, Some(db)))
+
+val newTableDefinition = if 
(tableDefinition.storage.locationUri.isDefined) {
--- End diff --

but why we have to store the full qualified path? What can we gain from 
this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17127: [SPARK-19734][PYTHON][ML] Correct OneHotEncoder d...

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17127


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for ...

2017-03-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17081#discussion_r103861992
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -122,7 +122,7 @@ case class DataSource(
 val qualified = hdfsPath.makeQualified(fs.getUri, 
fs.getWorkingDirectory)
 SparkHadoopUtil.get.globPathIfNecessary(qualified)
   }.toArray
-  new InMemoryFileIndex(sparkSession, globbedPaths, options, None)
+  new InMemoryFileIndex(sparkSession, globbedPaths, options, None, 
fileStatusCache)
--- End diff --

This also impacts the streaming code path. If it is fine to streaming, the 
code changes look good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17127: [SPARK-19734][PYTHON][ML] Correct OneHotEncoder doc stri...

2017-03-01 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/17127
  
Merged into master, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17104: [MINOR][ML] Fix comments in LSH Examples and Pyth...

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17104


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17104: [MINOR][ML] Fix comments in LSH Examples and Python API

2017-03-01 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/17104
  
LGTM, merged into master. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-03-01 Thread windpiger
Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16910
  
ok, do it now ~ yesterday is ok...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17001#discussion_r103861521
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -30,7 +33,7 @@ import 
org.apache.spark.sql.catalyst.expressions.Expression
  *
  * Implementations should throw [[NoSuchDatabaseException]] when databases 
don't exist.
  */
-abstract class ExternalCatalog {
+abstract class ExternalCatalog(conf: SparkConf, hadoopConf: Configuration) 
{
--- End diff --

how about we just pass in a `defaultDB: CatalogDatabase`? then we don't 
need to add the `protected def warehousePath: String`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for ...

2017-03-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17081#discussion_r103861424
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -364,7 +364,12 @@ case class DataSource(
 catalogTable.get,
 
catalogTable.get.stats.map(_.sizeInBytes.toLong).getOrElse(defaultTableSize))
 } else {
-  new InMemoryFileIndex(sparkSession, globbedPaths, options, 
Some(partitionSchema))
+  new InMemoryFileIndex(
+sparkSession,
+globbedPaths,
+options,
+Some(partitionSchema),
+fileStatusCache)
--- End diff --

```Scala
  new InMemoryFileIndex(
sparkSession, globbedPaths, options, Some(partitionSchema), 
fileStatusCache)
```

This is also valid


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17001: [SPARK-19667][SQL]create table with hiveenabled i...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17001#discussion_r103861355
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 ---
@@ -74,7 +77,19 @@ abstract class ExternalCatalog {
*/
   def alterDatabase(dbDefinition: CatalogDatabase): Unit
 
-  def getDatabase(db: String): CatalogDatabase
+  def getDatabase(db: String): CatalogDatabase = {
+val database = getDatabaseInternal(db)
+// The default database's location always uses the warehouse path.
+// Since the location of database stored in metastore is qualified,
+// we also make the warehouse location qualified.
+if (db == SessionCatalog.DEFAULT_DATABASE) {
--- End diff --

makes sense


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103861360
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -434,6 +434,17 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
 val input = ctx.freshName("input")
 // Right now, Range is only used when there is one upstream.
 ctx.addMutableState("scala.collection.Iterator", input, s"$input = 
inputs[0];")
+
+val localIdx = ctx.freshName("localIdx")
+val localEnd = ctx.freshName("localEnd")
+val range = ctx.freshName("range")
+// we need to place consume() before calling isShouldStopRequired
--- End diff --

Better to describe the reason that consume() may modify 
`shouldStopRequired`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16938: [SPARK-19583][SQL]CTAS for data source table with...

2017-03-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16938


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17081
  
**[Test build #73733 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73733/testReport)**
 for PR 17081 at commit 
[`9a73947`](https://github.com/apache/spark/commit/9a73947efea334ba0cfc5b5508003807a93ff806).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16910: [SPARK-19575][SQL]Reading from or writing to a hive serd...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16910
  
can you resolve the conflict?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103861241
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -206,6 +207,13 @@ trait CodegenSupport extends SparkPlan {
   def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): 
String = {
 throw new UnsupportedOperationException
   }
+
+  /* for optimization */
+  var shouldStopRequired: Boolean = false
--- End diff --

Please add `protected`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17081
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73726/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17081
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFor...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17081
  
**[Test build #73726 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73726/testReport)**
 for PR 17081 at commit 
[`60fa037`](https://github.com/apache/spark/commit/60fa03757d223f833e2fa161326a48a9015d4c6c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16938: [SPARK-19583][SQL]CTAS for data source table with a crea...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16938
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103860895
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -77,6 +77,7 @@ trait CodegenSupport extends SparkPlan {
*/
   final def produce(ctx: CodegenContext, parent: CodegenSupport): String = 
executeQuery {
 this.parent = parent
+
--- End diff --

extra space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103860938
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -206,6 +207,13 @@ trait CodegenSupport extends SparkPlan {
   def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: ExprCode): 
String = {
 throw new UnsupportedOperationException
   }
+
+  /* for optimization */
--- End diff --

Deserve better comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...

2017-03-01 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17125#discussion_r103860516
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -128,6 +129,15 @@ case class CreateViewCommand(
 qe.assertAnalyzed()
 val analyzedPlan = qe.analyzed
 
+// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an 
AnalysisException.
+analyzedPlan match {
+  case i: InsertIntoHadoopFsRelationCommand =>
--- End diff --

The sql parser only allows `CREATE VIEW AS query` here, a query can only be 
a `SELECT ...` or `INSERT INTO ...` or a CTE, so perhaps we don't have to 
consider other commands here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17076: [SPARK-19745][ML] SVCAggregator captures coefficients in...

2017-03-01 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/17076
  
@sethah Thanks for the good catch. I verified this optimization and found 
it indeed reduced the size of shuffle data. This looks good to me. BTW, like 
@MLnick 's suggestion, could you add the lazy evaluation for gradient array to 
all other aggregators in this PR? Since it's little change, I'd prefer to 
modify it here. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...

2017-03-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17125#discussion_r103860095
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -128,6 +129,15 @@ case class CreateViewCommand(
 qe.assertAnalyzed()
 val analyzedPlan = qe.analyzed
 
+// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an 
AnalysisException.
+analyzedPlan match {
+  case i: InsertIntoHadoopFsRelationCommand =>
--- End diff --

shall we forbid all commands? e.g. `CREATE VIEW xxx AS CREATE TABLE ...` 
should also be disallowed right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17081: [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for ...

2017-03-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17081#discussion_r103859650
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -364,7 +364,8 @@ case class DataSource(
 catalogTable.get,
 
catalogTable.get.stats.map(_.sizeInBytes.toLong).getOrElse(defaultTableSize))
 } else {
-  new InMemoryFileIndex(sparkSession, globbedPaths, options, 
Some(partitionSchema))
+  new InMemoryFileIndex(sparkSession, globbedPaths, options, 
Some(partitionSchema),
+fileStatusCache)
--- End diff --

Nit: indent issue


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17131: [SPARK-19766][SQL][BRANCH-2.0] Constant alias columns in...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17131
  
**[Test build #73732 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73732/consoleFull)**
 for PR 17131 at commit 
[`4975ac7`](https://github.com/apache/spark/commit/4975ac7f3a6a714c80e5f875ab54dd60f4aa22a5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17119: [SPARK-19784][SQL][WIP]refresh table after alter the loc...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17119
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17119: [SPARK-19784][SQL][WIP]refresh table after alter the loc...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17119
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73727/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17119: [SPARK-19784][SQL][WIP]refresh table after alter the loc...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17119
  
**[Test build #73727 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73727/testReport)**
 for PR 17119 at commit 
[`be98a0f`](https://github.com/apache/spark/commit/be98a0fabc9244ccb9e376ac8e7aef5125675c9b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17131: [SPARK-19766][SQL][BRANCH-2.0] Constant alias columns in...

2017-03-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17131
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...

2017-03-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17125#discussion_r103859056
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -128,6 +129,15 @@ case class CreateViewCommand(
 qe.assertAnalyzed()
 val analyzedPlan = qe.analyzed
 
+// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an 
AnalysisException.
+analyzedPlan match {
+  case i: InsertIntoHadoopFsRelationCommand =>
+throw new AnalysisException("Creating a view as insert into a 
table is not allowed")
--- End diff --

It will be nice to put a view name in the error message. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...

2017-03-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17125#discussion_r103858978
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -128,6 +129,15 @@ case class CreateViewCommand(
 qe.assertAnalyzed()
 val analyzedPlan = qe.analyzed
 
+// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an 
AnalysisException.
+analyzedPlan match {
+  case i: InsertIntoHadoopFsRelationCommand =>
--- End diff --

`_: InsertIntoHadoopFsRelationCommand`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...

2017-03-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17125#discussion_r103858997
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -128,6 +129,15 @@ case class CreateViewCommand(
 qe.assertAnalyzed()
 val analyzedPlan = qe.analyzed
 
+// CREATE VIEW AS INSERT INTO ... is not allowed, we should throw an 
AnalysisException.
+analyzedPlan match {
+  case i: InsertIntoHadoopFsRelationCommand =>
+throw new AnalysisException("Creating a view as insert into a 
table is not allowed")
+  case i: InsertIntoDataSourceCommand =>
--- End diff --

The same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17131: [SPARK-19766][SQL][BRANCH-2.0] Constant alias columns in...

2017-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17131
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17034: [SPARK-19704][ML] AFTSurvivalRegression should support n...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17034
  
**[Test build #73731 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73731/testReport)**
 for PR 17034 at commit 
[`0185b45`](https://github.com/apache/spark/commit/0185b454aaa043406c39d1e8f19c98d3d345a836).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17122: [SPARK-19786][SQL] Facilitate loop optimizations in a JI...

2017-03-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17122
  
**[Test build #73730 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73730/testReport)**
 for PR 17122 at commit 
[`5ff8dca`](https://github.com/apache/spark/commit/5ff8dcae1bce0b553d4aefc563addc001e6a6691).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17125: [SPARK-19211][SQL] Explicitly prevent Insert into...

2017-03-01 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17125#discussion_r103858555
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -604,7 +604,14 @@ class Analyzer(
 
 def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
   case i @ InsertIntoTable(u: UnresolvedRelation, parts, child, _, _) 
if child.resolved =>
-i.copy(table = EliminateSubqueryAliases(lookupTableFromCatalog(u)))
+val newTable = EliminateSubqueryAliases(lookupTableFromCatalog(u))
+// Inserting into a view is not allowed, we should throw an 
AnalysisException.
+newTable match {
+  case v: View =>
+u.failAnalysis(s"${v.desc.identifier} is a view, inserting 
into a view is not allowed")
--- End diff --

Can we move this to `PreprocessTableInsertion`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103858494
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -77,6 +77,10 @@ trait CodegenSupport extends SparkPlan {
*/
   final def produce(ctx: CodegenContext, parent: CodegenSupport): String = 
executeQuery {
 this.parent = parent
+
+// to track the existence of apply() call in the current 
produce-consume cycle
+// if apply is not called (e.g. in aggregation), we can skip shoudStop 
in the inner-most loop
+parent.shouldStopRequired = false
--- End diff --

I wanted to ensure `produce()` starts with `parent.shouldStopRequired = 
false`. This is because I am afraid other produce-consume may set true into 
`shouldStopRequired` if we have more than one-produce-consume in one parent.
However, in most of cases, it would not happen. For the simplicity, I 
eliminated this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17122: [SPARK-19786][SQL] Facilitate loop optimizations ...

2017-03-01 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/17122#discussion_r103858474
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala ---
@@ -69,6 +69,7 @@ trait BaseLimitExec extends UnaryExecNode with 
CodegenSupport {
   override def doConsume(ctx: CodegenContext, input: Seq[ExprCode], row: 
ExprCode): String = {
 val stopEarly = ctx.freshName("stopEarly")
 ctx.addMutableState("boolean", stopEarly, s"$stopEarly = false;")
+shouldStopRequired = true // loop may break early even without append 
in loop body
--- End diff --

Good catch. This implementation depends on slightly old revision that means 
there is no `stopEarly()` method. Removed this line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17131: [SPARK-19766][SQL][BRANCH-2.0] Constant alias col...

2017-03-01 Thread stanzhai
GitHub user stanzhai opened a pull request:

https://github.com/apache/spark/pull/17131

[SPARK-19766][SQL][BRANCH-2.0] Constant alias columns in INNER JOIN should 
not be folded by FoldablePropagation rule

This PR fix for branch-2.0

Refer #17099 

@gatorsmile 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/stanzhai/spark fix-inner-join-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17131


commit 4975ac7f3a6a714c80e5f875ab54dd60f4aa22a5
Author: Stan Zhai 
Date:   2017-03-02T05:56:07Z

fix innner join




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...

2017-03-01 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/17034#discussion_r103858261
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala
 ---
@@ -27,6 +27,8 @@ import org.apache.spark.ml.util.TestingUtils._
 import org.apache.spark.mllib.random.{ExponentialGenerator, 
WeibullGenerator}
 import org.apache.spark.mllib.util.MLlibTestSparkContext
 import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
--- End diff --

Yes, I will update this. Thanks for your reviewing!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17034: [SPARK-19704][ML] AFTSurvivalRegression should su...

2017-03-01 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/17034#discussion_r103858210
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/AFTSurvivalRegressionSuite.scala
 ---
@@ -361,6 +363,36 @@ class AFTSurvivalRegressionSuite
   }
   }
 
+  test("should support all NumericType censors, and not support other 
types") {
+val df = spark.createDataFrame(Seq(
+  (0, Vectors.dense(0)),
+  (1, Vectors.dense(1)),
+  (2, Vectors.dense(2)),
+  (3, Vectors.dense(3)),
+  (4, Vectors.dense(4))
+)).toDF("label", "features")
+  .withColumn("censor", lit(0.0))
+val aft = new AFTSurvivalRegression().setMaxIter(1)
+val expected = aft.fit(df)
+
+val types = Seq(ShortType, LongType, IntegerType, FloatType, ByteType, 
DecimalType(10, 0))
+types.foreach { t =>
+  val actual = aft.fit(df.select(col("label"), col("features"),
+col("censor").cast(t)))
+  assert(expected.intercept === actual.intercept)
+  assert(expected.coefficients === actual.coefficients)
+}
+
+val dfWithStringCensors = spark.createDataFrame(Seq(
+  (0, Vectors.dense(0, 2, 3), "0")
+)).toDF("label", "features", "censor")
+val thrown = intercept[IllegalArgumentException] {
--- End diff --

This place follows the implementation in 
`MLTestingUtils.checkNumericTypes`, so I prefer not to change this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >