date:20161229

[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16422
  
What is the behavior of DESC COLUMN for the complex/nested type (map, 
struct, array)? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94208490
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
--- End diff --

I assume we are following Hive syntax here? What is the behavior of Hive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94208116
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
--- End diff --

mysql? Sorry, I do not follow what you asked above. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15664#discussion_r94207926
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -108,14 +108,32 @@ object JdbcUtils extends Logging {
   }
 
   /**
-   * Returns a PreparedStatement that inserts a row into table via conn.
+   * Returns an Insert SQL statement for inserting a row into the target 
table via JDBC conn.
*/
-  def insertStatement(conn: Connection, table: String, rddSchema: 
StructType, dialect: JdbcDialect)
-  : PreparedStatement = {
-val columns = rddSchema.fields.map(x => 
dialect.quoteIdentifier(x.name)).mkString(",")
+  def getInsertStatement(
+  table: String,
+  rddSchema: StructType,
+  tableSchema: StructType,
--- End diff --

If `tableSchema` is None, we should follow the original way. Please add the 
extra logics. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15664#discussion_r94207873
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -57,26 +57,28 @@ class JdbcRelationProvider extends 
CreatableRelationProvider
 val table = jdbcOptions.table
 val createTableOptions = jdbcOptions.createTableOptions
 val isTruncate = jdbcOptions.isTruncate
+val isCaseSensitive = sqlContext.conf.caseSensitiveAnalysis
 
 val conn = JdbcUtils.createConnectionFactory(jdbcOptions)()
 try {
   val tableExists = JdbcUtils.tableExists(conn, url, table)
   if (tableExists) {
+val tableSchema = JdbcUtils.getSchemaOption(conn, url, table)
 mode match {
   case SaveMode.Overwrite =>
 if (isTruncate && isCascadingTruncateTable(url) == 
Some(false)) {
   // In this case, we should truncate table and then load.
   truncateTable(conn, table)
-  saveTable(df, url, table, jdbcOptions)
+  saveTable(df, url, table, tableSchema.get, isCaseSensitive, 
jdbcOptions)
--- End diff --

We need to pass `tableSchema`. It is not safe to do `tableSchema.get`. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16423: Update known_translations for contributor names and also...

2016-12-29 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/16423
  
hi @yhuai, could you also:

add:
```
lw-lin - Liwei Lin
```

update:
```
sharkdtu - Xiaogang Tu
cenyuhai - Yuhai Ceng
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-29 Thread merlintang

Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
yes, let me backport the test cases for checking the staging file.

On Thu, Dec 29, 2016 at 10:11 PM, Xiao Li  wrote:

> Is that possible to backport the test cases in #16399
> ?
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against all Pyth...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16405
  
**[Test build #70737 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70737/testReport)**
 for PR 16405 at commit 
[`5ffa382`](https://github.com/apache/spark/commit/5ffa3826d2c6989d8add1e20f945c7cb67507cc0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against all Pyth...

2016-12-29 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
retest yhis please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13077: [SPARK-10748] [Mesos] Log error instead of crashi...

2016-12-29 Thread devaraj-kavali

Github user devaraj-kavali commented on a diff in the pull request:

https://github.com/apache/spark/pull/13077#discussion_r94205390
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -559,15 +560,29 @@ private[spark] class MesosClusterScheduler(
   } else {
 val offer = offerOption.get
 val queuedTasks = tasks.getOrElseUpdate(offer.offerId, new 
ArrayBuffer[TaskInfo])
-val task = createTaskInfo(submission, offer)
-queuedTasks += task
-logTrace(s"Using offer ${offer.offerId.getValue} to launch driver 
" +
-  submission.submissionId)
-val newState = new MesosClusterSubmissionState(submission, 
task.getTaskId, offer.slaveId,
-  None, new Date(), None, getDriverFrameworkID(submission))
-launchedDrivers(submission.submissionId) = newState
-launchedDriversState.persist(submission.submissionId, newState)
-afterLaunchCallback(submission.submissionId)
+breakable {
--- End diff --

Here it needs to continue in the for loop from the catch block with next 
set of drivers. It cannot return from the exception since it needs to launch 
the other candidates, I can consider the other suggestion i.e. moving the 
following code into try clause. I will update the PR by moving the code into 
try block. Please let me know if it doesnât make sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16433
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70735/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16433
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16433
  
**[Test build #70735 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70735/testReport)**
 for PR 16433 at commit 
[`f3f78b8`](https://github.com/apache/spark/commit/f3f78b8522e3eb0bba5cd6927f49d199d5bd9f92).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  implicit class EqualsIgnoreCRLF(source: String) `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15819
  
Is that possible to backport the test cases in 
https://github.com/apache/spark/pull/16399?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against all Pyth...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16405
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against all Pyth...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16405
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70734/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against all Pyth...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16405
  
**[Test build #70734 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70734/testReport)**
 for PR 16405 at commit 
[`5ffa382`](https://github.com/apache/spark/commit/5ffa3826d2c6989d8add1e20f945c7cb67507cc0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94204568
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
 ---
@@ -76,6 +76,8 @@ class InsertIntoHiveTableSuite extends QueryTest with 
TestHiveSingleton with Bef
   sql("SELECT * FROM createAndInsertTest"),
   testData.collect().toSeq
 )
+
+
--- End diff --

Nit: please remove these two lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16296
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16296
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70736/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16296
  
**[Test build #70736 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70736/testReport)**
 for PR 16296 at commit 
[`a1dbf61`](https://github.com/apache/spark/commit/a1dbf6115b081360a1f565ea4daa02e2c63ef112).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DetermineHiveSerde(conf: SQLConf) extends Rule[LogicalPlan] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16233
  
Generally, the solution also looks ok to me. I think the test case coverage 
needs to be improved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16233
  
Could you also add a test case for verifying the error behaviors?

For example, in the definition of a nested view, how Analyzer behaves when 
the dependent databases, views, or views are dropped. 

```
|- view1 (defaultDatabase = db1)
  |- operator
|- table2 (defaultDatabase = db1)
|- view2 (defaultDatabase = db2)
   |- view3 (defaultDatabase = db3)
   |- table3 (defaultDatabase = db5)
  |- view4 (defaultDatabase = db4)
```

In the following cases, what kind of errors we got when resolving `view1`
- What happened when `db2` is dropped?
- What happened when `view2` is dropped?
- What happened when `view3` is dropped?
- What happened when `table3` is dropped?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16371
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16371
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70733/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16371
  
**[Test build #70733 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70733/testReport)**
 for PR 16371 at commit 
[`c2f3841`](https://github.com/apache/spark/commit/c2f384111d0b5d1f54355f2ac0189f2241e40fd3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94202865
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -465,6 +465,35 @@ class SessionCatalogSuite extends SparkFunSuite {
 assert(plan == SubqueryAlias("range", tmpView, 
Option(TableIdentifier("vw1"
   }
 
+  test("lookup view relation") {
+val externalCatalog = newBasicCatalog()
+val sessionCatalog = new SessionCatalog(externalCatalog)
+val metadata1 = externalCatalog.getTable("db3", "view1")
--- End diff --

Nit: `metadata1` -> `metadata`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94202897
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 ---
@@ -465,6 +465,35 @@ class SessionCatalogSuite extends SparkFunSuite {
 assert(plan == SubqueryAlias("range", tmpView, 
Option(TableIdentifier("vw1"
   }
 
+  test("lookup view relation") {
--- End diff --

Nit: lookup -> `look up`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94202769
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -658,6 +719,21 @@ class Analyzer(
   Generate(newG.asInstanceOf[Generator], join, outer, qualifier, 
output, child)
 }
 
+  // A special case for View, replace the output attributes with the 
attributes that have the
+  // same names from the child. If the corresponding attribute is not 
found, throw an
+  // AnalysisException.
+  // On the resolution of the view, the output attributes are 
generated from the view schema,
+  // and the view query is resolved later. After the view query has 
been resolved, we should
+  // map the output of the logical plan to the output of the view, 
here we simply replace the
+  // output attributes of the view with the attributes that have the 
same names from the child.
+  // TODO: Also check the dataTypes and nullabilites of the output.
+  case v @ View(_, output, child) if child.isDefined =>
--- End diff --

The same here. `case v @ View(_, output, Some(child))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94202754
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2224,6 +2316,16 @@ object EliminateSubqueryAliases extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Removes [[View]] operators from the plan. The operator is respected 
till the end of analysis
+ * stage because we want to see which part of a analyzed logical plan is 
generated from a view.
+ */
+object EliminateView extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case View(_, output, child) if child.isDefined => Project(output, 
child.get)
--- End diff --

This line can be simplied to `case View(_, output, Some(child)) => 
Project(output, child)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-29 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94202748
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,100 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"max length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val attribute = {
+  val field = catalog.lookupRelation(table).schema.find(f => 
resolver(f.name, column))
--- End diff --

ok, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-29 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94202675
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
--- End diff --

It seems mysql doesn't support struct or nested types? @gatorsmile Can you 
give some advice on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94202502
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -658,6 +719,21 @@ class Analyzer(
   Generate(newG.asInstanceOf[Generator], join, outer, qualifier, 
output, child)
 }
 
+  // A special case for View, replace the output attributes with the 
attributes that have the
+  // same names from the child. If the corresponding attribute is not 
found, throw an
+  // AnalysisException.
+  // On the resolution of the view, the output attributes are 
generated from the view schema,
+  // and the view query is resolved later. After the view query has 
been resolved, we should
+  // map the output of the logical plan to the output of the view, 
here we simply replace the
+  // output attributes of the view with the attributes that have the 
same names from the child.
+  // TODO: Also check the dataTypes and nullabilites of the output.
+  case v @ View(_, output, child) if child.isDefined =>
--- End diff --

Add the same comment here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94202491
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2224,6 +2316,16 @@ object EliminateSubqueryAliases extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Removes [[View]] operators from the plan. The operator is respected 
till the end of analysis
+ * stage because we want to see which part of a analyzed logical plan is 
generated from a view.
+ */
+object EliminateView extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case View(_, output, child) if child.isDefined => Project(output, 
child.get)
--- End diff --

If `child` is not defined, we will issue an exception in `CheckAnalysis`. 
Here, please add a comment to explain the error handling; otherwise, it looks 
like a bug. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94202380
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 ---
@@ -377,6 +378,39 @@ case class InsertIntoTable(
   override lazy val resolved: Boolean = childrenResolved && table.resolved
 }
 
+/** Factory for constructing new `View` nodes. */
+object View {
+  def apply(desc: CatalogTable): View = View(desc, 
desc.schema.toAttributes, None)
+}
+
+/**
+ * A container for holding the view description(CatalogTable), and the 
output of the view. The
+ * child will be defined if the view is resolved with Hive support, else 
it should be None.
+ * This operator will be removed at the end of analysis stage.
+ *
+ * @param desc A view description(CatalogTable) that provides necessary 
information to resolve the
+ * view.
+ * @param output The output of a view operator, this is generated during 
planning the view, so that
+ *   we are able to decouple the output from the underlying 
structure.
+ * @param child The logical plan of a view operator, it should be 
non-empty if the view is resolved
+ *  with Hive support, else it should be None.
+ */
+case class View(
+desc: CatalogTable,
+output: Seq[Attribute],
+child: Option[LogicalPlan] = None) extends LogicalPlan with 
MultiInstanceRelation {
+
+  override lazy val resolved: Boolean = child.exists(_.resolved)
+
+  override def children: Seq[LogicalPlan] = child.toSeq
+
+  override def newInstance(): LogicalPlan = copy(output = 
output.map(_.newInstance()))
--- End diff --

Do you have any test case that needs to call this function? For example, 
self joins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16233: [SPARK-18801][SQL] Support resolve a nested view

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94202164
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -845,6 +846,7 @@ private[hive] class HiveClientImpl(
 table.comment.foreach { c => hiveTable.setProperty("comment", c) }
 table.viewOriginalText.foreach { t => hiveTable.setViewOriginalText(t) 
}
 table.viewText.foreach { t => hiveTable.setViewExpandedText(t) }
+table.viewDefaultDatabase.foreach {t => 
hiveTable.setProperty("viewDefaultDatabase", t)}
--- End diff --

First, we should define `viewDefaultDatabase ` in the object 
`HiveExternalCatalog`. See the 
[examples](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L1059-L1083).

Second, we do not convert the table properties here. Normally, all the 
table properties should be converted in 
[`restoreTableMetadata`](https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L615).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16296
  
**[Test build #70736 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70736/testReport)**
 for PR 16296 at commit 
[`a1dbf61`](https://github.com/apache/spark/commit/a1dbf6115b081360a1f565ea4daa02e2c63ef112).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.n...

2016-12-29 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16429#discussion_r94201674
  
--- Diff: python/pyspark/serializers.py ---
@@ -382,18 +382,30 @@ def _hijack_namedtuple():
 return
 
 global _old_namedtuple  # or it will put in closure
+global _old_namedtuple_kwdefaults  # or it will put in closure too
 
 def _copy_func(f):
 return types.FunctionType(f.__code__, f.__globals__, f.__name__,
   f.__defaults__, f.__closure__)
 
+def _kwdefaults(f):
+kargs = getattr(f, "__kwdefaults__", None)
--- End diff --

`__kwdefaults__` can be `None` or not existing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16433
  
**[Test build #70735 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70735/testReport)**
 for PR 16433 at commit 
[`f3f78b8`](https://github.com/apache/spark/commit/f3f78b8522e3eb0bba5cd6927f49d199d5bd9f92).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...

2016-12-29 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16433
  
Build started: [TESTS] 
`org.apache.spark.sql.streaming.StreamingQueryStatusAndProgressSuite` 
[![PR-16433](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=D1A3B54F-82B5-481D-ADE8-7CC273C97303=true)](https://ci.appveyor.com/project/spark-test/spark/branch/D1A3B54F-82B5-481D-ADE8-7CC273C97303)
Diff: 
https://github.com/apache/spark/compare/master...spark-test:D1A3B54F-82B5-481D-ADE8-7CC273C97303


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against all Pyth...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16405
  
**[Test build #70734 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70734/testReport)**
 for PR 16405 at commit 
[`5ffa382`](https://github.com/apache/spark/commit/5ffa3826d2c6989d8add1e20f945c7cb67507cc0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16405: [SPARK-19002][BUILD][PYTHON] Check pep8 against all Pyth...

2016-12-29 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16405
  
I just manually ran `./dev/create-release/translate-contributors.py` which 
had a conflict for sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16435: [SPARK-19027][SQL] estimate size of object buffer...

2016-12-29 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16435#discussion_r94201244
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala
 ---
@@ -154,21 +200,19 @@ class ObjectAggregationIterator(
   while (inputRows.hasNext && !sortBased) {
 val newInput = safeProjection(inputRows.next())
 val groupingKey = groupingProjection.apply(newInput)
-val buffer: InternalRow = getAggregationBufferByKey(hashMap, 
groupingKey)
-processRow(buffer, newInput)
-
-// The the hash map gets too large, makes a sorted spill and clear 
the map.
-if (hashMap.size >= fallbackCountThreshold) {
-  logInfo(
-s"Aggregation hash map reaches threshold " +
-  s"capacity ($fallbackCountThreshold entries), spilling and 
falling back to sort" +
-  s" based aggregation. You may change the threshold by adjust 
option " +
-  SQLConf.OBJECT_AGG_SORT_BASED_FALLBACK_THRESHOLD.key
-  )
-
+val buffer: (InternalRow, Long, Int) = 
getAggregationBufferByKey(hashMap, groupingKey)
+if (buffer == null) {
   // Falls back to sort-based aggregation
   sortBased = true
-
+} else {
+  processRow(buffer._1, newInput)
+  if (buffer._3 == 10) {
--- End diff --

Why 10? Is it better to add a comment to choose this value?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...

2016-12-29 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
@srowen, thank you Sean. I think it is okay for now. To be honest, I found 
some more same instances but I haven't fixed, tested and verified them yet. 
Maybe, I need one more go to deal with them all cleanly. I hope it is okay to 
go ahead and merge this as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...

2016-12-29 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16433
  
In most cases, it seems they explicitly write `\n` (e.g. writing CSV and 
JSON). _Apparently_, these seem only tests being failed due to this problem 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16433: [SPARK-19022][TESTS] Fix tests dependent on OS du...

2016-12-29 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16433#discussion_r94200602
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala
 ---
@@ -30,10 +30,16 @@ import 
org.apache.spark.sql.streaming.StreamingQueryStatusAndProgressSuite._
 
 
 class StreamingQueryStatusAndProgressSuite extends StreamTest {
+  implicit class EqualsIgnoreCRLF(source: String) {
+def equalsIgnoreCRLF(target: String): Boolean = {
+  source.stripMargin.replaceAll("\r\n|\r|\n", System.lineSeparator) ===
--- End diff --

Oh, sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16439: [SPARK-19026]SPARK_LOCAL_DIRS(multiple directories on di...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16439
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16439: [SPARK-19026]SPARK_LOCAL_DIRS(multiple directorie...

2016-12-29 Thread zuotingbing

GitHub user zuotingbing opened a pull request:

https://github.com/apache/spark/pull/16439

[SPARK-19026]SPARK_LOCAL_DIRS(multiple directories on different disks) 
cannot be deleted 

JIRA Issue: https://issues.apache.org/jira/browse/SPARK-19026

SPARK_LOCAL_DIRS (Standalone) can  be a comma-separated list of multiple 
directories on different disks, e.g. SPARK_LOCAL_DIRS=/dir1,/dir2,/dir3, if 
there is a IOExecption when create sub directory on dir3 , then the sub 
directory which have been created successfully on dir1 and dir2 cannot be 
deleted anymore when the application finishes. 
So we should catch the IOExecption at Utils.createDirectory  , otherwise 
the variable "appDirectories(appId)" which the function maybeCleanupApplication 
calls will not be set . If the number of folders "executor-**" > 32k(ext3) then 
we cannot create any executor on this worker node.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zuotingbing/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16439.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16439


commit e4fe18e2fb894c6e41d49e620d342d6be475b5ed
Author: zuotingbing 
Date:   2016-12-30T02:53:13Z

local directories cannot be cleanuped when create directory of executor-*** 
throws IOException




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16371
  
**[Test build #70733 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70733/testReport)**
 for PR 16371 at commit 
[`c2f3841`](https://github.com/apache/spark/commit/c2f384111d0b5d1f54355f2ac0189f2241e40fd3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94199600
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,100 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"max length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val attribute = {
+  val field = catalog.lookupRelation(table).schema.find(f => 
resolver(f.name, column))
--- End diff --

shall we call `getTempViewOrPermanentTableMetadata`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94199552
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
--- End diff --

the parser rule for the column name here:
```
describeColName
: identifier ('.' (identifier | STRING))*
;
```
can we just make it `identifier`? "a.b" should refer to a column named 
"a.b", or the inner field "b" from column "a"? let's check with other databases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16415: [SPARK-19007]Speedup and optimize the GradientBoostedTre...

2016-12-29 Thread zdh2292390

Github user zdh2292390 commented on the issue:

https://github.com/apache/spark/pull/16415
  
@jkbradley Yes,changing the storageLevel in predErrorCheckpointer can fix 
the problem.

Can you please merge my latest commit which just changed the storageLevel 
in predErrorCheckpointer?

When you begin to fix the rest problem  can i join in?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16414: [SPARK-19009][DOC] Add streaming rest api doc

2016-12-29 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16414
  
cc @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15664
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15664
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70731/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15664
  
**[Test build #70731 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70731/testReport)**
 for PR 15664 at commit 
[`2cc738a`](https://github.com/apache/spark/commit/2cc738a45773e143cdc233b06a97f0e1c2aae1f2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16386
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16386
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70730/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16386
  
**[Test build #70730 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70730/testReport)**
 for PR 16386 at commit 
[`9dc084d`](https://github.com/apache/spark/commit/9dc084d5f0938cc8f5aad1233a99d37bf85957fd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16438: [SPARK-19029] [SQL] Remove databaseName from SimpleCatal...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16438
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70727/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16438: [SPARK-19029] [SQL] Remove databaseName from SimpleCatal...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16438
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16438: [SPARK-19029] [SQL] Remove databaseName from SimpleCatal...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16438
  
**[Test build #70727 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70727/testReport)**
 for PR 16438 at commit 
[`b38c53c`](https://github.com/apache/spark/commit/b38c53c24d921d107957821a494a505b7214fa7c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16424
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70732/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16424
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16424
  
**[Test build #70732 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70732/testReport)**
 for PR 16424 at commit 
[`dce40b5`](https://github.com/apache/spark/commit/dce40b52e4f64b8e3eb5949664b02225039ce565).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-29 Thread NathanHowell

Github user NathanHowell commented on the issue:

https://github.com/apache/spark/pull/16386
  
@HyukjinKwon I just pushed a change that makes the corrupt record handling 
consistent: if a corrupt record column is defined it will always get the json 
text for failed records. If `wholeFile` is enabled a warning is emitted.

I think more discussion is needed to figure out the best way to handle 
corrupt records and exceptions, perhaps it can be shelved for now and we can 
pick it up later under another ticket?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16397: [SPARK-18922][TESTS] Fix more path-related test f...

2016-12-29 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16397#discussion_r94190092
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogBackwardCompatibilitySuite.scala
 ---
@@ -185,8 +191,8 @@ class HiveExternalCatalogBackwardCompatibilitySuite 
extends QueryTest
 identifier = TableIdentifier("tbl9", Some("test_db")),
 tableType = CatalogTableType.EXTERNAL,
 storage = CatalogStorageFormat.empty.copy(
-  locationUri = Some(defaultTablePath("tbl9") + "-__PLACEHOLDER__"),
-  properties = Map("path" -> tempDir.getAbsolutePath)),
+  locationUri = Some(defaultTableURI("tbl9").toString + 
"-__PLACEHOLDER__"),
--- End diff --

I think this .toString is superfluous, in string concatenation, but it's 
not worth changing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16424
  
**[Test build #70732 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70732/testReport)**
 for PR 16424 at commit 
[`dce40b5`](https://github.com/apache/spark/commit/dce40b52e4f64b8e3eb5949664b02225039ce565).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2016-12-29 Thread imatiach-msft

Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/16355
  
the only problem I see is that with this code we generate k-1 clusters 
instead of k, but it states in the algorithm documentation that it is not 
guaranteed to generate k clusters, it could be fewer if the leaf clusters are 
not divisible (see 
spark/mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala):

 **_Iteratively it finds divisible clusters on the bottom level and bisects 
each of them using
 k-means, until there are `k` leaf clusters in total or no leaf clusters 
are divisible._**

It seems in the dataset Alok gave, one of the clusters which was assumed to 
be divisible and was divided ended up generating two clusters, one which 
contained all the points and the other none, which is what created the error 
(his cluster 162, child of 81, was empty, but cluster 163 was non-empty after 
reassignment). 





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15664
  
**[Test build #70731 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70731/testReport)**
 for PR 15664 at commit 
[`2cc738a`](https://github.com/apache/spark/commit/2cc738a45773e143cdc233b06a97f0e1c2aae1f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-29 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15664
  
Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-29 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15664
  
The only failure is irrelevant to this PR.
```
[info] StreamSuite:
[info] - fatal errors from a source should be sent to the user *** FAILED 
*** (101 milliseconds)
[info]   org.apache.spark.sql.streaming.StreamingQueryException: Query [id 
= b3869a67-68d5-4fc1-a3e2-d5879a49166c, runId = 
cdb70502-63b9-439a-9ece-bdbdcad5c2b4] terminated with exception: null
[info]   at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:296)
[info]   at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:186)
[info]   Cause: 
org.apache.spark.sql.streaming.StreamSuite$$anonfun$12$$anon$2:
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15664
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15664
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70726/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15664
  
**[Test build #70726 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70726/testReport)**
 for PR 15664 at commit 
[`2cc738a`](https://github.com/apache/spark/commit/2cc738a45773e143cdc233b06a97f0e1c2aae1f2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16424: [SPARK-19016][SQL][DOC] Document scalable partiti...

2016-12-29 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/16424#discussion_r94187848
  
--- Diff: docs/sql-programming-guide.md ---
@@ -526,11 +526,18 @@ By default `saveAsTable` will create a "managed 
table", meaning that the locatio
 be controlled by the metastore. Managed tables will also have their data 
deleted automatically
 when a table is dropped.
 
-Currently, `saveAsTable` does not expose an API supporting the creation of 
an "External table" from a `DataFrame`, 
-however, this functionality can be achieved by providing a `path` option 
to the `DataFrameWriter` with `path` as the key 
-and location of the external table as its value (String) when saving the 
table with `saveAsTable`. When an External table 
+Currently, `saveAsTable` does not expose an API supporting the creation of 
an "external table" from a `DataFrame`,
+however. This functionality can be achieved by providing a `path` option 
to the `DataFrameWriter` with `path` as the key
--- End diff --

Should it say: However, this functionality?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/16424
  
LGTM, just one comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16426: [MINOR][TEST] Add `finally` clause for `sc.stop()`

2016-12-29 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16426
  
It looks OK to me. Do you have a moment to grep for other instances in the 
tests (at least Java ones) where a new SparkContext is created in the test, to 
see if any other contexts need to be stopped?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16433: [SPARK-19022][TESTS] Fix tests dependent on OS due to di...

2016-12-29 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16433
  
Maybe a dumb question but why is this only a problem in this test and not 
for many other test suites? i'd sort of imagine the newline difference affects 
a lot of comparisons.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16417: [SPARK-19014][SQL] support complex aggregate buff...

2016-12-29 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16417#discussion_r94187427
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -310,6 +310,31 @@ class CodegenContext {
   // The UTF8String may came from UnsafeRow, otherwise clone is cheap 
(re-use the bytes)
   case StringType => s"$row.update($ordinal, $value.clone())"
   case udt: UserDefinedType[_] => setColumn(row, udt.sqlType, ordinal, 
value)
+  case s: StructType if UnsafeRow.isMutable(s) =>
+val nestedRow = freshName("nestedRow")
+val updateFields = s.zipWithIndex.map { case (field, index) =>
+  val ev = ExprCode(
+code = "",
+isNull = s"value.isNullAt($index)",
+value = getValue("value", field.dataType, index.toString))
+  updateColumn(nestedRow, field.dataType, index, ev, 
field.nullable)
+}
+val updateFieldsCode = splitExpressions(
+  updateFields, "updateFields", Seq("InternalRow" -> nestedRow, 
"InternalRow" -> "value"))
+val setColumnFunc = freshName("setColumnFunc")
+val funcCode = s"""
+  public void $setColumnFunc(InternalRow row, InternalRow value) {
+if (row instanceof UnsafeRow) {
+  ((UnsafeRow) row).setNotNullAt($ordinal);
+  final InternalRow $nestedRow = row.getStruct($ordinal, 
${s.length});
+  $updateFieldsCode
--- End diff --

A normal use case in aggregation would be that the buffer is written back. 
In that case the `value` row and the `nestedRow` should be the same, and then 
we don't have write anything (skipping the updates).

A related thought is that we could just copy the bytes from `value` to 
`nestedRow`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...

2016-12-29 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/16355
  
@yu-iskw Pinging on this since you wrote bisecting k-means originally.  Do 
you have time to take a look?  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16415: [SPARK-19007]Speedup and optimize the GradientBoostedTre...

2016-12-29 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/16415
  
Thanks for checking!

Does changing the storageLevel in predErrorCheckpointer fix the problem?

"other use cases": Well, I remember thinking about this a lot when adding 
the periodic checkpointer, and it had to do with the fact that RDDs may be 
materialized later than checkpointer.update() gets called.  Now that I look 
again, it's possible that we could maintain 2 instead of 3 cached RDDs in the 
checkpointer's persistedQueue, but I'd want to check this more carefully.

Problem with 2 RDDs left cached after the loop: This could be fixed by 
adding a finalize() method to trait LDAOptimizer which can clean up the extra 
cached RDD.  Unfortunately, now that the trait is public, we cannot change it.  
This fix will need to wait until we move the implementation to the spark.ml 
package, at which time we can fix the API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16423: Update known_translations for contributor names a...

2016-12-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16423


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16386: [SPARK-18352][SQL] Support parsing multiline json files

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16386
  
**[Test build #70730 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70730/testReport)**
 for PR 16386 at commit 
[`9dc084d`](https://github.com/apache/spark/commit/9dc084d5f0938cc8f5aad1233a99d37bf85957fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16423: Update known_translations for contributor names and also...

2016-12-29 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16423
  
Thanks. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13077: [SPARK-10748] [Mesos] Log error instead of crashi...

2016-12-29 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/13077#discussion_r94185533
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala
 ---
@@ -559,15 +560,29 @@ private[spark] class MesosClusterScheduler(
   } else {
 val offer = offerOption.get
 val queuedTasks = tasks.getOrElseUpdate(offer.offerId, new 
ArrayBuffer[TaskInfo])
-val task = createTaskInfo(submission, offer)
-queuedTasks += task
-logTrace(s"Using offer ${offer.offerId.getValue} to launch driver 
" +
-  submission.submissionId)
-val newState = new MesosClusterSubmissionState(submission, 
task.getTaskId, offer.slaveId,
-  None, new Date(), None, getDriverFrameworkID(submission))
-launchedDrivers(submission.submissionId) = newState
-launchedDriversState.persist(submission.submissionId, newState)
-afterLaunchCallback(submission.submissionId)
+breakable {
--- End diff --

More of a style thing but why use breakable? can you just return from the 
exception case? or move the code following the catch into the try clause?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16408: [SPARK-19003][DOCS] Add Java example in Spark Str...

2016-12-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16408


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16408: [SPARK-19003][DOCS] Add Java example in Spark Streaming ...

2016-12-29 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16408
  
Merged to master/2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16424
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16424
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70729/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16424
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16424
  
**[Test build #70729 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70729/testReport)**
 for PR 16424 at commit 
[`99337cd`](https://github.com/apache/spark/commit/99337cdd6102545c82635bce95f1aa7452c44851).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16424
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70728/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16424
  
**[Test build #70728 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70728/testReport)**
 for PR 16424 at commit 
[`499213e`](https://github.com/apache/spark/commit/499213e0f842a8e549ee1cbad828e1c42c204cf4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16436: [SPARK-18698][ML] Adding public constructor that ...

2016-12-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16436


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16424
  
**[Test build #70729 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70729/testReport)**
 for PR 16424 at commit 
[`99337cd`](https://github.com/apache/spark/commit/99337cdd6102545c82635bce95f1aa7452c44851).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16436: [SPARK-18698][ML] Adding public constructor that takes u...

2016-12-29 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/16436
  
LGTM
Merging with master
Thanks @imatiach-msft !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16424
  
**[Test build #70728 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70728/testReport)**
 for PR 16424 at commit 
[`499213e`](https://github.com/apache/spark/commit/499213e0f842a8e549ee1cbad828e1c42c204cf4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16424: [SPARK-19016][SQL][DOC] Document scalable partition hand...

2016-12-29 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/16424
  
@ericl @CodingCat Thanks for the review! Fixed per your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16424: [SPARK-19016][SQL][DOC] Document scalable partiti...

2016-12-29 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/16424#discussion_r94181109
  
--- Diff: docs/sql-programming-guide.md ---
@@ -526,11 +526,18 @@ By default `saveAsTable` will create a "managed 
table", meaning that the locatio
 be controlled by the metastore. Managed tables will also have their data 
deleted automatically
 when a table is dropped.
 
-Currently, `saveAsTable` does not expose an API supporting the creation of 
an "External table" from a `DataFrame`, 
-however, this functionality can be achieved by providing a `path` option 
to the `DataFrameWriter` with `path` as the key 
-and location of the external table as its value (String) when saving the 
table with `saveAsTable`. When an External table 
+Currently, `saveAsTable` does not expose an API supporting the creation of 
an "external table" from a `DataFrame`,
+however. This functionality can be achieved by providing a `path` option 
to the `DataFrameWriter` with `path` as the key
--- End diff --

No...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 225 matches

Mail list logo