date:20170117

[GitHub] spark pull request #16603: [SPARK-19244][Core] Sort MemoryConsumers accordin...

2017-01-17 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/16603#discussion_r96581689
  
--- Diff: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java 
---
@@ -144,8 +164,24 @@ public long acquireExecutionMemory(long required, 
MemoryConsumer consumer) {
   // spilling, avoid to have too many spilled files.
   if (got < required) {
 // Call spill() on other consumers to release memory
+// Sort the consumers according their memory usage. So we avoid 
spilling the same consumer
+// which is just spilled in last few times and re-spilling on it 
will produce many small
+// spill files.
+List sortedList = new 
ArrayList<>(consumers.size());
 for (MemoryConsumer c: consumers) {
   if (c != consumer && c.getUsed() > 0 && c.getMode() == mode) {
+sortedList.add(c);
+  }
+}
+Collections.sort(sortedList, new ConsumerComparator());
+for (int listIndex = 0; listIndex < sortedList.size(); 
listIndex++) {
+  MemoryConsumer c = sortedList.get(listIndex);
+  // Try to only spill on the consumer which has the required size 
of memory.
+  // As the consumers are sorted in descending order, if the next 
consumer doesn't have
+  // the required memory, then we need to spill the current 
consumer at least.
+  boolean doSpill = (listIndex + 1) == sortedList.size() ||
+sortedList.get(listIndex + 1).getUsed() < (required - got);
+  if (doSpill) {
--- End diff --


I like the fact that this implementation does not need to incur the cost of 
remove in a TreeMap.
Unfortunately, I dont think it is sufficient : the impl assumes that 
spill() will actually always give you back getUsed - from the rest of the code 
in the method, this does not look like a valid assumption to make.

This can resulting in spilling a large number of smaller blocks, and 
potentially itself.

For example: required = 500MB, consumers = 1.5GB 1GB 500MB 2MB 1MB ..
If spilling 500MB resulted in (say) releasing 490MB, we might end up 
spilling a large number of blocks and also (potentially) end up spilling itself 
- also can end up returning less than requested while enough memory does exist 
to satisfy the request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

2017-01-17 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16613#discussion_r96580856
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -275,21 +286,80 @@ case class AlterViewAsCommand(
   throw new AnalysisException(s"${viewMeta.identifier} is not a view.")
 }
 
-val viewSQL: String = new SQLBuilder(analyzedPlan).toSQL
-// Validate the view SQL - make sure we can parse it and analyze it.
-// If we cannot analyze the generated query, there is probably a bug 
in SQL generation.
-try {
-  session.sql(viewSQL).queryExecution.assertAnalyzed()
-} catch {
-  case NonFatal(e) =>
-throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
-}
+val newProperties = generateViewProperties(viewMeta.properties, 
session, analyzedPlan)
 
 val updatedViewMeta = viewMeta.copy(
   schema = analyzedPlan.schema,
+  properties = newProperties,
   viewOriginalText = Some(originalText),
-  viewText = Some(viewSQL))
+  viewText = Some(originalText))
 
 session.sessionState.catalog.alterTable(updatedViewMeta)
   }
 }
+
+object ViewHelper {
+
+  import CatalogTable._
+
+  /**
+   * Generate the view default database in `properties`.
+   */
+  def generateViewDefaultDatabase(databaseName: String): Map[String, 
String] = {
+Map(VIEW_DEFAULT_DATABASE -> databaseName)
+  }
+
+  /**
+   * Generate the view query output column names in `properties`.
+   */
+  def generateQueryColumnNames(columns: Seq[String]): Map[String, String] 
= {
+val props = new mutable.HashMap[String, String]
+if (columns.nonEmpty) {
+  props.put(VIEW_QUERY_OUTPUT_NUM_COLUMNS, columns.length.toString)
+  columns.zipWithIndex.foreach { case (colName, index) =>
+props.put(s"$VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX$index", colName)
+  }
+}
+props.toMap
+  }
+
+  /**
+   * Remove the view query output column names in `properties`.
+   */
+  def removeQueryColumnNames(properties: Map[String, String]): Map[String, 
String] = {
+// We can't use `filterKeys` here, as the map returned by `filterKeys` 
is not serializable,
+// while `CatalogTable` should be serializable.
+properties.filterNot { case (key, _) =>
+  key.startsWith(VIEW_QUERY_OUTPUT_PREFIX)
+}
+  }
+
+  /**
+   * Generate the view properties in CatalogTable, including:
+   * 1. view default database that is used to provide the default database 
name on view resolution.
+   * 2. the output column names of the query that creates a view, this is 
used to map the output of
+   *the view child to the view output during view resolution.
+   *
+   * @param properties the `properties` in CatalogTable.
+   * @param session the spark session.
+   * @param analyzedPlan the analyzed logical plan that represents the 
child of a view.
+   * @return new view properties including view default database and query 
column names properties.
+   */
+  def generateViewProperties(
--- End diff --

yea, will update that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16606
  
**[Test build #71580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71580/testReport)**
 for PR 16606 at commit 
[`5e60f14`](https://github.com/apache/spark/commit/5e60f1417f6b85e2f4fbab86d6b506d0cc2a553b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

2017-01-17 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16613#discussion_r96580761
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -207,29 +210,35 @@ case class CreateViewCommand(
   }
 
   /**
-   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This comment canonicalize
-   * SQL based on the analyzed plan, and also creates the proper schema 
for the view.
+   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
Generate the view-specific
+   * properties(e.g. view default database, view query output column 
names) and store them as
+   * properties in the CatalogTable, and also creates the proper schema 
for the view.
+   *
+   * @param session the spark session.
+   * @param aliasedPlan if `userSpecifiedColumns` is defined, the aliased 
plan outputs the user
+   *specified columns, else it is the same as the 
`analyzedPlan`.
+   * @param analyzedPlan the analyzed logical plan that represents the 
child of a view.
--- End diff --

We generate the `queryColumnNames` by `analyzedPlan`, and we generate the 
view schema by `aliasedPlan`, they are not the same when `userSpecifiedColumns` 
is defined.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16613#discussion_r96580350
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -275,21 +286,80 @@ case class AlterViewAsCommand(
   throw new AnalysisException(s"${viewMeta.identifier} is not a view.")
 }
 
-val viewSQL: String = new SQLBuilder(analyzedPlan).toSQL
-// Validate the view SQL - make sure we can parse it and analyze it.
-// If we cannot analyze the generated query, there is probably a bug 
in SQL generation.
-try {
-  session.sql(viewSQL).queryExecution.assertAnalyzed()
-} catch {
-  case NonFatal(e) =>
-throw new RuntimeException(s"Failed to analyze the canonicalized 
SQL: $viewSQL", e)
-}
+val newProperties = generateViewProperties(viewMeta.properties, 
session, analyzedPlan)
 
 val updatedViewMeta = viewMeta.copy(
   schema = analyzedPlan.schema,
+  properties = newProperties,
   viewOriginalText = Some(originalText),
-  viewText = Some(viewSQL))
+  viewText = Some(originalText))
 
 session.sessionState.catalog.alterTable(updatedViewMeta)
   }
 }
+
+object ViewHelper {
+
+  import CatalogTable._
+
+  /**
+   * Generate the view default database in `properties`.
+   */
+  def generateViewDefaultDatabase(databaseName: String): Map[String, 
String] = {
+Map(VIEW_DEFAULT_DATABASE -> databaseName)
+  }
+
+  /**
+   * Generate the view query output column names in `properties`.
+   */
+  def generateQueryColumnNames(columns: Seq[String]): Map[String, String] 
= {
+val props = new mutable.HashMap[String, String]
+if (columns.nonEmpty) {
+  props.put(VIEW_QUERY_OUTPUT_NUM_COLUMNS, columns.length.toString)
+  columns.zipWithIndex.foreach { case (colName, index) =>
+props.put(s"$VIEW_QUERY_OUTPUT_COLUMN_NAME_PREFIX$index", colName)
+  }
+}
+props.toMap
+  }
+
+  /**
+   * Remove the view query output column names in `properties`.
+   */
+  def removeQueryColumnNames(properties: Map[String, String]): Map[String, 
String] = {
+// We can't use `filterKeys` here, as the map returned by `filterKeys` 
is not serializable,
+// while `CatalogTable` should be serializable.
+properties.filterNot { case (key, _) =>
+  key.startsWith(VIEW_QUERY_OUTPUT_PREFIX)
+}
+  }
+
+  /**
+   * Generate the view properties in CatalogTable, including:
+   * 1. view default database that is used to provide the default database 
name on view resolution.
+   * 2. the output column names of the query that creates a view, this is 
used to map the output of
+   *the view child to the view output during view resolution.
+   *
+   * @param properties the `properties` in CatalogTable.
+   * @param session the spark session.
+   * @param analyzedPlan the analyzed logical plan that represents the 
child of a view.
+   * @return new view properties including view default database and query 
column names properties.
+   */
+  def generateViewProperties(
--- End diff --

looks like all other methods in this class can be `private`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16613: [SPARK-19024][SQL] Implement new approach to writ...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16613#discussion_r96580275
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -207,29 +210,35 @@ case class CreateViewCommand(
   }
 
   /**
-   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
This comment canonicalize
-   * SQL based on the analyzed plan, and also creates the proper schema 
for the view.
+   * Returns a [[CatalogTable]] that can be used to save in the catalog. 
Generate the view-specific
+   * properties(e.g. view default database, view query output column 
names) and store them as
+   * properties in the CatalogTable, and also creates the proper schema 
for the view.
+   *
+   * @param session the spark session.
+   * @param aliasedPlan if `userSpecifiedColumns` is defined, the aliased 
plan outputs the user
+   *specified columns, else it is the same as the 
`analyzedPlan`.
+   * @param analyzedPlan the analyzed logical plan that represents the 
child of a view.
--- End diff --

why we need both `aliasedPlan` and `analyzedPlan`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16613: [SPARK-19024][SQL] Implement new approach to write a per...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16613
  
**[Test build #71579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71579/testReport)**
 for PR 16613 at commit 
[`2d49ef2`](https://github.com/apache/spark/commit/2d49ef26936448dd70768562c4ef429542f56e4e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

2017-01-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16621
  
After merging https://github.com/apache/spark/pull/16517, it introduces a 
few conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16517: [SPARK-18243][SQL] Port Hive writing to use FileF...

2017-01-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16517


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16517: [SPARK-18243][SQL] Port Hive writing to use FileFormat i...

2017-01-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16517
  
Thanks! Merged to master. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16624: [WIP] Add two test cases for `SET -v`.

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16624
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71571/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16624: [WIP] Add two test cases for `SET -v`.

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16624
  
**[Test build #71571 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71571/testReport)**
 for PR 16624 at commit 
[`87de8da`](https://github.com/apache/spark/commit/87de8da846a7b4d368c1475ba3fc3d83cc865220).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16624: [WIP] Add two test cases for `SET -v`.

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16624
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...

2017-01-17 Thread paragpc

Github user paragpc commented on the issue:

https://github.com/apache/spark/pull/11867
  
I am not sure why the build is failing with following error,

stderr: fatal: unable to access 'https://github.com/apache/spark.git/': 
Failed connect to github.com:443; Operation now in progress

at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1640)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandWithCredentials(CliGitAPIImpl.java:1388)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$300(CliGitAPIImpl.java:62)
at 
org.jenkinsci.plugins.gitclient.CliGitAPIImpl$1.execute(CliGitAPIImpl.java:313)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:152)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler$1.call(RemoteGitImpl.java:145)
at hudson.remoting.UserRequest.perform(UserRequest.java:120)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:326)
at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:68)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
at ..remote call to amp-jenkins-worker-04(Native Method)
at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1416)
at hudson.remoting.UserResponse.retrieve(UserRequest.java:220)
at hudson.remoting.Channel.call(Channel.java:781)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.execute(RemoteGitImpl.java:145)
at sun.reflect.GeneratedMethodAccessor287.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.jenkinsci.plugins.gitclient.RemoteGitImpl$CommandInvocationHandler.invoke(RemoteGitImpl.java:131)
at com.sun.proxy.$Proxy58.execute(Unknown Source)
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:761)
... 11 more

Error does not seem related to my changes, can anyone help? cc @vanzin, 
@zsxwing, @squito


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11867
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11867
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71578/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16517: [SPARK-18243][SQL] Port Hive writing to use FileFormat i...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16517
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71569/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16517: [SPARK-18243][SQL] Port Hive writing to use FileFormat i...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16517
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16517: [SPARK-18243][SQL] Port Hive writing to use FileFormat i...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16517
  
**[Test build #71569 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71569/testReport)**
 for PR 16517 at commit 
[`150efa2`](https://github.com/apache/spark/commit/150efa2266f298205b272e9347032b2a85ab665c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HiveFileFormat(fileSinkConf: FileSinkDesc) extends FileFormat 
with Logging `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-01-17 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16630
  
The following code illustrates the idea of this PR. 

```
val datasetWithWeight = Seq(
(1.0, 1.0, 0.0, 5.0),
(0.5, 2.0, 1.0, 2.0),
(1.0, 3.0, 2.0, 1.0),
(0.0, 4.0, 3.0, 3.0)
  ).toDF("y", "w", "x1", "x2")

val formula = (new RFormula()
  .setFormula("y ~ x1 + x2")
  .setFeaturesCol("features")
  .setLabelCol("label"))
val output = formula.fit(datasetWithWeight).transform(datasetWithWeight)

val glr = new GeneralizedLinearRegression()
val model = glr.fit(output)
model.summary.summaryTable.show
```

This prints out: 
```

+-++---+---+---+
|  Feature|Estimate|   StdError| TValue|
 PValue|

+-++---+---+---+
|Intercept|  1.4523809523809539| 0.9245946589975053| 1.5708299180050451| 
0.3609009059280113|
|   
x1|-0.33387|0.28171808490950573|-1.1832159566199243|0.44669962096188565|
|   x2|-0.11904761904761924|   0.2129588548|-0.5590169943749482| 
0.6754896416955616|

+-++---+---+---+
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11867
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11867
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71577/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16630
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16605
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71570/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16605
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16605
  
**[Test build #71570 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71570/testReport)**
 for PR 16605 at commit 
[`c5d8070`](https://github.com/apache/spark/commit/c5d80701cc5429841534c980030f983e9e941e46).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16630: [SPARK-19270][ML] Add summary table to GLM summar...

2017-01-17 Thread actuaryzhang

GitHub user actuaryzhang opened a pull request:

https://github.com/apache/spark/pull/16630

[SPARK-19270][ML] Add summary table to GLM summary

## What changes were proposed in this pull request?

Add R-like summary table to GLM summary, which includes feature name (if 
exist), parameter estimate, standard error, t-stat and p-value. This allows 
scala users to easily gather these commonly used inference results.

@srowen @yanboliang 

## How was this patch tested?
New tests. One for testing feature Name, and one for testing the summary 
Table. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/actuaryzhang/spark glmTable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16630.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16630






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16621#discussion_r96577649
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -215,37 +215,43 @@ case class DataSourceAnalysis(conf: CatalystConf) 
extends Rule[LogicalPlan] {
 
 
 /**
- * Replaces [[SimpleCatalogRelation]] with data source table if its table 
property contains data
- * source information.
+ * Replaces [[SimpleCatalogRelation]] with data source table if its table 
provider is not hive.
  */
 class FindDataSourceTable(sparkSession: SparkSession) extends 
Rule[LogicalPlan] {
-  private def readDataSourceTable(
-  sparkSession: SparkSession,
-  simpleCatalogRelation: SimpleCatalogRelation): LogicalPlan = {
-val table = simpleCatalogRelation.catalogTable
-val pathOption = table.storage.locationUri.map("path" -> _)
-val dataSource =
-  DataSource(
-sparkSession,
-userSpecifiedSchema = Some(table.schema),
-partitionColumns = table.partitionColumnNames,
-bucketSpec = table.bucketSpec,
-className = table.provider.get,
-options = table.storage.properties ++ pathOption)
-
-LogicalRelation(
-  dataSource.resolveRelation(),
-  expectedOutputAttributes = Some(simpleCatalogRelation.output),
-  catalogTable = Some(table))
+  private def readDataSourceTable(table: CatalogTable): LogicalPlan = {
+val qualifiedTableName = QualifiedTableName(table.database, 
table.identifier.table)
+val cache = sparkSession.sessionState.catalog.tableRelationCache
+val withHiveSupport =
+  
sparkSession.sparkContext.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == 
"hive"
+
+cache.get(qualifiedTableName, new Callable[LogicalPlan]() {
+  override def call(): LogicalPlan = {
+val pathOption = table.storage.locationUri.map("path" -> _)
+val dataSource =
+  DataSource(
+sparkSession,
+// In older version(prior to 2.1) of Spark, the table schema 
can be empty and should be
+// inferred at runtime. We should still support it.
+userSpecifiedSchema = if (table.schema.isEmpty) None else 
Some(table.schema),
+partitionColumns = table.partitionColumnNames,
+bucketSpec = table.bucketSpec,
+className = table.provider.get,
+options = table.storage.properties ++ pathOption,
+// TODO: improve `InMemoryCatalog` and remove this limitation.
+catalogTable = if (withHiveSupport) Some(table) else None)
+
+LogicalRelation(dataSource.resolveRelation(), catalogTable = 
Some(table))
--- End diff --

cc @wzhfy 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16585
  
LGTM, pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16621#discussion_r96577543
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala
 ---
@@ -1322,4 +1322,26 @@ class MetastoreDataSourcesSuite extends QueryTest 
with SQLTestUtils with TestHiv
   sparkSession.sparkContext.conf.set(DEBUG_MODE, previousValue)
 }
   }
+
+  test("SPARK-18464: support old table which doesn't store schema in table 
properties") {
--- End diff --

this test was removed in https://github.com/apache/spark/pull/16003, but I 
find it's still useful and is not covered by other tests, so I add it back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16621#discussion_r96577471
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -1626,17 +1626,6 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 assert(d.size == d.distinct.size)
   }
 
-  test("SPARK-17625: data source table in InMemoryCatalog should guarantee 
output consistency") {
--- End diff --

we don't need this test anymore, see 
https://github.com/apache/spark/pull/16621/files#r96577427


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16621#discussion_r96577427
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -215,37 +215,43 @@ case class DataSourceAnalysis(conf: CatalystConf) 
extends Rule[LogicalPlan] {
 
 
 /**
- * Replaces [[SimpleCatalogRelation]] with data source table if its table 
property contains data
- * source information.
+ * Replaces [[SimpleCatalogRelation]] with data source table if its table 
provider is not hive.
  */
 class FindDataSourceTable(sparkSession: SparkSession) extends 
Rule[LogicalPlan] {
-  private def readDataSourceTable(
-  sparkSession: SparkSession,
-  simpleCatalogRelation: SimpleCatalogRelation): LogicalPlan = {
-val table = simpleCatalogRelation.catalogTable
-val pathOption = table.storage.locationUri.map("path" -> _)
-val dataSource =
-  DataSource(
-sparkSession,
-userSpecifiedSchema = Some(table.schema),
-partitionColumns = table.partitionColumnNames,
-bucketSpec = table.bucketSpec,
-className = table.provider.get,
-options = table.storage.properties ++ pathOption)
-
-LogicalRelation(
-  dataSource.resolveRelation(),
-  expectedOutputAttributes = Some(simpleCatalogRelation.output),
-  catalogTable = Some(table))
+  private def readDataSourceTable(table: CatalogTable): LogicalPlan = {
+val qualifiedTableName = QualifiedTableName(table.database, 
table.identifier.table)
+val cache = sparkSession.sessionState.catalog.tableRelationCache
+val withHiveSupport =
+  
sparkSession.sparkContext.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == 
"hive"
+
+cache.get(qualifiedTableName, new Callable[LogicalPlan]() {
+  override def call(): LogicalPlan = {
+val pathOption = table.storage.locationUri.map("path" -> _)
+val dataSource =
+  DataSource(
+sparkSession,
+// In older version(prior to 2.1) of Spark, the table schema 
can be empty and should be
+// inferred at runtime. We should still support it.
+userSpecifiedSchema = if (table.schema.isEmpty) None else 
Some(table.schema),
+partitionColumns = table.partitionColumnNames,
+bucketSpec = table.bucketSpec,
+className = table.provider.get,
+options = table.storage.properties ++ pathOption,
+// TODO: improve `InMemoryCatalog` and remove this limitation.
+catalogTable = if (withHiveSupport) Some(table) else None)
+
+LogicalRelation(dataSource.resolveRelation(), catalogTable = 
Some(table))
--- End diff --

Note that, previously we will set `expectedOutputAttributes` here, which 
was added by https://github.com/apache/spark/pull/15182

However, this doesn't work when the table schema needs to be inferred at 
runtime, and it turns out that we don't need to do it at all. 
`AnalyzeColumnCommand` now gets attributes from the [resolved table relation 
plan](https://github.com/apache/spark/pull/16621/files#diff-027d6bd7c8cf4f64f99acc058389d859R44)
 , so it's fine for rule `FindDataSourceTable` to change outputs during 
analysis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16605
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71568/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16605
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16605
  
**[Test build #71568 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71568/testReport)**
 for PR 16605 at commit 
[`22fb9d1`](https://github.com/apache/spark/commit/22fb9d14abcf7b2590c07739c2ce9641abb64ea5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16621#discussion_r96576872
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -586,12 +594,12 @@ class SessionCatalog(
 desc = metadata,
 output = metadata.schema.toAttributes,
 child = parser.parsePlan(viewText))
-  SubqueryAlias(relationAlias, child, Option(name))
+  SubqueryAlias(relationAlias, child, Some(name.copy(table = 
table, database = Some(db
 } else {
   SubqueryAlias(relationAlias, SimpleCatalogRelation(metadata), 
None)
 }
   } else {
-SubqueryAlias(relationAlias, tempTables(table), Option(name))
+SubqueryAlias(relationAlias, tempTables(table), None)
--- End diff --

the existing way is to set `None`, see 
https://github.com/apache/spark/pull/16621/files#diff-ca4533edbf148c89cc0c564ab6b0aeaaL75

This shows the evil of duplicated code, we have inconsistent behaviors 
without and without hive support. I think we should only set table identifier 
for persisted view, @hvanhovell is that true?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16621
  
**[Test build #71576 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71576/testReport)**
 for PR 16621 at commit 
[`d636389`](https://github.com/apache/spark/commit/d636389947af3041832c63582f9073b92421d7f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16517: [SPARK-18243][SQL] Port Hive writing to use FileFormat i...

2017-01-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16517
  
No concern after the latest changes. LGTM pending Jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

2017-01-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16621
  
No more comments. It looks pretty good! Let us see whether all the test 
cases can pass. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

2017-01-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16621#discussion_r96575899
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
---
@@ -1799,6 +1799,7 @@ class DDLSuite extends QueryTest with 
SharedSQLContext with BeforeAndAfterEach {
   .getTableMetadata(TableIdentifier("tbl")).storage.locationUri.get
 
 sql(s"ALTER TABLE tbl SET LOCATION '${dir.getCanonicalPath}'")
+spark.catalog.refreshTable("tbl")
--- End diff --

+1 :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

2017-01-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16621#discussion_r96575520
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -586,12 +594,12 @@ class SessionCatalog(
 desc = metadata,
 output = metadata.schema.toAttributes,
 child = parser.parsePlan(viewText))
-  SubqueryAlias(relationAlias, child, Option(name))
+  SubqueryAlias(relationAlias, child, Some(name.copy(table = 
table, database = Some(db
 } else {
   SubqueryAlias(relationAlias, SimpleCatalogRelation(metadata), 
None)
 }
   } else {
-SubqueryAlias(relationAlias, tempTables(table), Option(name))
+SubqueryAlias(relationAlias, tempTables(table), None)
--- End diff --

Should we keep the existing way? This was introduced for the EXPLAIN 
command of view. See the PR: https://github.com/apache/spark/pull/14657


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16621
  
**[Test build #71574 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71574/testReport)**
 for PR 16621 at commit 
[`bbccdae`](https://github.com/apache/spark/commit/bbccdae6640a5efe047dba2df569384a78bd986e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16585
  
**[Test build #71575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71575/testReport)**
 for PR 16585 at commit 
[`2b61d47`](https://github.com/apache/spark/commit/2b61d472a74766d4a1c2af4cf2278a87b7b12698).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHol...

2017-01-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16585#discussion_r96574565
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -435,6 +435,31 @@ def test_udf_with_input_file_name(self):
 row = 
self.spark.read.json(filePath).select(sourceFile(input_file_name())).first()
 self.assertTrue(row[0].find("people1.json") != -1)
 
+def test_udf_with_input_file_name_for_hadooprdd(self):
+from pyspark.sql.functions import udf, input_file_name
+from pyspark.sql.types import StringType
+
+def filename(path):
+return path
+
+self.spark.udf.register('sameText', filename)
--- End diff --

oh. wrongly copied.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHol...

2017-01-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16585#discussion_r96574569
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -435,6 +435,31 @@ def test_udf_with_input_file_name(self):
 row = 
self.spark.read.json(filePath).select(sourceFile(input_file_name())).first()
 self.assertTrue(row[0].find("people1.json") != -1)
 
+def test_udf_with_input_file_name_for_hadooprdd(self):
+from pyspark.sql.functions import udf, input_file_name
+from pyspark.sql.types import StringType
+
+def filename(path):
+return path
+
+self.spark.udf.register('sameText', filename)
+sameText = udf(filename, StringType())
+
+rdd = self.sc.textFile('python/test_support/sql/people.json')
+df = 
self.spark.read.json(rdd).select(input_file_name().alias('file'))
+row = df.select(sameText(df['file'])).first()
+self.assertTrue(row[0].find("people.json") != -1)
+
+rdd2 = self.sc.newAPIHadoopFile(
+'python/test_support/sql/people.json',
+'org.apache.hadoop.mapreduce.lib.input.TextInputFormat',
+'org.apache.hadoop.io.LongWritable',
+'org.apache.hadoop.io.Text')
+
+df2 = 
self.spark.read.json(rdd2).select(input_file_name().alias('file'))
+row = df2.select(sameText(df2['file'])).first()
--- End diff --

sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12064
  
**[Test build #71573 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71573/testReport)**
 for PR 12064 at commit 
[`eebae43`](https://github.com/apache/spark/commit/eebae43c84a1179260648f7f5cdbb63a60fcc40d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16621
  
**[Test build #71572 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71572/testReport)**
 for PR 16621 at commit 
[`2883c8b`](https://github.com/apache/spark/commit/2883c8bc6c22bfde24711060b5110b896f4c8b4a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16603: [SPARK-19244][Core] Sort MemoryConsumers accordin...

2017-01-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16603#discussion_r96574267
  
--- Diff: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java 
---
@@ -144,23 +170,31 @@ public long acquireExecutionMemory(long required, 
MemoryConsumer consumer) {
   // spilling, avoid to have too many spilled files.
   if (got < required) {
 // Call spill() on other consumers to release memory
+// Sort the consumers according their memory usage. So we avoid 
spilling the same consumer
+// which is just spilled in last few times and re-spilling on it 
will produce many small
+// spill files.
+List sortedList = new ArrayList<>();
 for (MemoryConsumer c: consumers) {
   if (c != consumer && c.getUsed() > 0 && c.getMode() == mode) {
-try {
-  long released = c.spill(required - got, consumer);
-  if (released > 0) {
-logger.debug("Task {} released {} from {} for {}", 
taskAttemptId,
-  Utils.bytesToString(released), c, consumer);
-got += memoryManager.acquireExecutionMemory(required - 
got, taskAttemptId, mode);
-if (got >= required) {
-  break;
-}
+sortedList.add(c);
+  }
+}
+Collections.sort(sortedList, new ConsumerComparator());
+for (MemoryConsumer c: sortedList) {
+  try {
+long released = c.spill(required - got, consumer);
+if (released > 0) {
+  logger.debug("Task {} released {} from {} for {}", 
taskAttemptId,
+Utils.bytesToString(released), c, consumer);
+  got += memoryManager.acquireExecutionMemory(required - got, 
taskAttemptId, mode);
+  if (got >= required) {
+break;
   }
-} catch (IOException e) {
-  logger.error("error while calling spill() on " + c, e);
-  throw new OutOfMemoryError("error while calling spill() on " 
+ c + " : "
-+ e.getMessage());
 }
+  } catch (IOException e) {
+logger.error("error while calling spill() on " + c, e);
+throw new OutOfMemoryError("error while calling spill() on " + 
c + " : "
+  + e.getMessage());
   }
--- End diff --

Actually the newest update is already satisfying the example you show.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16576: [SPARK-19215] Add necessary check for `RDD.checkp...

2017-01-17 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16576#discussion_r96573963
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -1539,6 +1539,9 @@ abstract class RDD[T: ClassTag](
 // NOTE: we use a global lock here due to complexities downstream with 
ensuring
 // children RDD partitions point to the correct parent partitions. In 
the future
 // we should revisit this consideration.
+if (doCheckpointCalled) {
+  logWarning(s"Because job has been executed on RDD ${id}, checkpoint 
won't work")
--- End diff --

reping @zsxwing Would you mind take a look? This is a simple PR but it will 
bring much help for spark developers to avoid them making some mistake usage...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16621
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71567/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16603: [SPARK-19244][Core] Sort MemoryConsumers accordin...

2017-01-17 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/16603#discussion_r96572902
  
--- Diff: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java 
---
@@ -144,23 +170,31 @@ public long acquireExecutionMemory(long required, 
MemoryConsumer consumer) {
   // spilling, avoid to have too many spilled files.
   if (got < required) {
 // Call spill() on other consumers to release memory
+// Sort the consumers according their memory usage. So we avoid 
spilling the same consumer
+// which is just spilled in last few times and re-spilling on it 
will produce many small
+// spill files.
+List sortedList = new ArrayList<>();
 for (MemoryConsumer c: consumers) {
   if (c != consumer && c.getUsed() > 0 && c.getMode() == mode) {
-try {
-  long released = c.spill(required - got, consumer);
-  if (released > 0) {
-logger.debug("Task {} released {} from {} for {}", 
taskAttemptId,
-  Utils.bytesToString(released), c, consumer);
-got += memoryManager.acquireExecutionMemory(required - 
got, taskAttemptId, mode);
-if (got >= required) {
-  break;
-}
+sortedList.add(c);
+  }
+}
+Collections.sort(sortedList, new ConsumerComparator());
+for (MemoryConsumer c: sortedList) {
+  try {
+long released = c.spill(required - got, consumer);
+if (released > 0) {
+  logger.debug("Task {} released {} from {} for {}", 
taskAttemptId,
+Utils.bytesToString(released), c, consumer);
+  got += memoryManager.acquireExecutionMemory(required - got, 
taskAttemptId, mode);
+  if (got >= required) {
+break;
   }
-} catch (IOException e) {
-  logger.error("error while calling spill() on " + c, e);
-  throw new OutOfMemoryError("error while calling spill() on " 
+ c + " : "
-+ e.getMessage());
 }
+  } catch (IOException e) {
+logger.error("error while calling spill() on " + c, e);
+throw new OutOfMemoryError("error while calling spill() on " + 
c + " : "
+  + e.getMessage());
   }
--- End diff --


Use ceiling and not floor.
Ensure that the requirements are satisfied : what I wrote was on the fly to 
convey the idea and not meant to be used literally - and apparently there were 
some errors.
I have edited the examples so that there is no further confusion.

Basic idea is simple : instead of picking largest or random consumer, pick 
one which is sufficient to meet the memory requirements. If none exist, then 
evict the largest and retry until done.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

2017-01-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16621
  
Sure. No problem. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16621
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16621
  
can we do it later? We are going to merge `CatalogRelation` implementations 
and unify the table relation representations soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16621
  
**[Test build #71567 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71567/testReport)**
 for PR 16621 at commit 
[`919aaa2`](https://github.com/apache/spark/commit/919aaa2fbdf21fb4760a855c538e4bc9efa25d4b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class QualifiedTableName(database: String, name: String)`
  * `class FindHiveSerdeTable(session: SparkSession) extends 
Rule[LogicalPlan] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHol...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16585#discussion_r96572553
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -435,6 +435,31 @@ def test_udf_with_input_file_name(self):
 row = 
self.spark.read.json(filePath).select(sourceFile(input_file_name())).first()
 self.assertTrue(row[0].find("people1.json") != -1)
 
+def test_udf_with_input_file_name_for_hadooprdd(self):
+from pyspark.sql.functions import udf, input_file_name
+from pyspark.sql.types import StringType
+
+def filename(path):
+return path
+
+self.spark.udf.register('sameText', filename)
+sameText = udf(filename, StringType())
+
+rdd = self.sc.textFile('python/test_support/sql/people.json')
+df = 
self.spark.read.json(rdd).select(input_file_name().alias('file'))
+row = df.select(sameText(df['file'])).first()
+self.assertTrue(row[0].find("people.json") != -1)
+
+rdd2 = self.sc.newAPIHadoopFile(
+'python/test_support/sql/people.json',
+'org.apache.hadoop.mapreduce.lib.input.TextInputFormat',
+'org.apache.hadoop.io.LongWritable',
+'org.apache.hadoop.io.Text')
+
+df2 = 
self.spark.read.json(rdd2).select(input_file_name().alias('file'))
+row = df2.select(sameText(df2['file'])).first()
--- End diff --

nit: `row2`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHol...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16585#discussion_r96572514
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -435,6 +435,31 @@ def test_udf_with_input_file_name(self):
 row = 
self.spark.read.json(filePath).select(sourceFile(input_file_name())).first()
 self.assertTrue(row[0].find("people1.json") != -1)
 
+def test_udf_with_input_file_name_for_hadooprdd(self):
+from pyspark.sql.functions import udf, input_file_name
+from pyspark.sql.types import StringType
+
+def filename(path):
+return path
+
+self.spark.udf.register('sameText', filename)
--- End diff --

where do we call this registered function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16621: [SPARK-19265][SQL] make table relation cache general and...

2017-01-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16621
  
Could we rename `SimpleCatalogRelation` to `UnresolvedCatalogRelation`? The 
current name looks very confusing to me. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16621#discussion_r96572359
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -215,37 +215,44 @@ case class DataSourceAnalysis(conf: CatalystConf) 
extends Rule[LogicalPlan] {
 
 
 /**
- * Replaces [[SimpleCatalogRelation]] with data source table if its table 
property contains data
- * source information.
+ * Replaces [[SimpleCatalogRelation]] with data source table if its table 
provider is not hive.
  */
 class FindDataSourceTable(sparkSession: SparkSession) extends 
Rule[LogicalPlan] {
-  private def readDataSourceTable(
-  sparkSession: SparkSession,
-  simpleCatalogRelation: SimpleCatalogRelation): LogicalPlan = {
-val table = simpleCatalogRelation.catalogTable
-val pathOption = table.storage.locationUri.map("path" -> _)
-val dataSource =
-  DataSource(
-sparkSession,
-userSpecifiedSchema = Some(table.schema),
-partitionColumns = table.partitionColumnNames,
-bucketSpec = table.bucketSpec,
-className = table.provider.get,
-options = table.storage.properties ++ pathOption)
-
-LogicalRelation(
-  dataSource.resolveRelation(),
-  expectedOutputAttributes = Some(simpleCatalogRelation.output),
-  catalogTable = Some(table))
+  private def readDataSourceTable(relation: SimpleCatalogRelation): 
LogicalPlan = {
+val table = relation.catalogTable
+val cache = sparkSession.sessionState.catalog.tableRelationCache
+val withHiveSupport =
+  
sparkSession.sparkContext.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == 
"hive"
+
+cache.get(table.qualifiedIdentifier, new Callable[LogicalPlan]() {
+  override def call(): LogicalPlan = {
+val pathOption = table.storage.locationUri.map("path" -> _)
+val dataSource =
+  DataSource(
+sparkSession,
+userSpecifiedSchema = Some(table.schema),
--- End diff --

good catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71564/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #71564 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71564/testReport)**
 for PR 13599 at commit 
[`ea9e0c4`](https://github.com/apache/spark/commit/ea9e0c4e80ea568c066156e76cc8abefb911fb59).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16624: [WIP] Add two test cases for `SET -v`.

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16624
  
**[Test build #71571 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71571/testReport)**
 for PR 16624 at commit 
[`87de8da`](https://github.com/apache/spark/commit/87de8da846a7b4d368c1475ba3fc3d83cc865220).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12064
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71566/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12064
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

2017-01-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16621#discussion_r96571473
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -650,14 +659,21 @@ class SessionCatalog(
* Refresh the cache entry for a metastore table, if any.
*/
   def refreshTable(name: TableIdentifier): Unit = synchronized {
+val dbName = formatDatabaseName(name.database.getOrElse(currentDb))
+val tableName = formatTableName(name.table)
+
 // Go through temporary tables and invalidate them.
-// If the database is defined, this is definitely not a temp table.
+// If the database is defined, this may be a global temporary view.
 // If the database is not defined, there is a good chance this is a 
temp table.
 if (name.database.isEmpty) {
-  tempTables.get(formatTableName(name.table)).foreach(_.refresh())
-} else if (formatDatabaseName(name.database.get) == 
globalTempViewManager.database) {
-  
globalTempViewManager.get(formatTableName(name.table)).foreach(_.refresh())
+  tempTables.get(tableName).foreach(_.refresh())
+} else if (dbName == globalTempViewManager.database) {
+  globalTempViewManager.get(tableName).foreach(_.refresh())
 }
+
+// Also invalidate the table relation cache.
--- End diff --

After an offline discussion, I am fine to remove it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12064: [SPARK-14272][ML] Evaluate GaussianMixtureModel with Log...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12064
  
**[Test build #71566 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71566/testReport)**
 for PR 12064 at commit 
[`fd85c5d`](https://github.com/apache/spark/commit/fd85c5d221a0cc52c8b5f4662182d487e34db63b).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16547: [SPARK-19168][Structured Streaming] StateStore should be...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16547
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71565/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16547: [SPARK-19168][Structured Streaming] StateStore should be...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16547
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16585
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71561/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16547: [SPARK-19168][Structured Streaming] StateStore should be...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16547
  
**[Test build #71565 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71565/testReport)**
 for PR 16547 at commit 
[`0f9e54d`](https://github.com/apache/spark/commit/0f9e54d9efe4c9d7f446cb2f4dc46741cef776f7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16585
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16585
  
**[Test build #71561 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71561/testReport)**
 for PR 16585 at commit 
[`2ce65cb`](https://github.com/apache/spark/commit/2ce65cb8336b32d8309f189e2c63a576c5a60ee5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setName for D...

2017-01-17 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16609
  
@emlyn I'm not sure this should be associated with RDD, since we are 
working with DataFrame here?
As for the existing `name` methods in RDD.R - they are not public APIs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16621: [SPARK-19265][SQL] make table relation cache gene...

2017-01-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16621#discussion_r96570432
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -215,37 +215,44 @@ case class DataSourceAnalysis(conf: CatalystConf) 
extends Rule[LogicalPlan] {
 
 
 /**
- * Replaces [[SimpleCatalogRelation]] with data source table if its table 
property contains data
- * source information.
+ * Replaces [[SimpleCatalogRelation]] with data source table if its table 
provider is not hive.
  */
 class FindDataSourceTable(sparkSession: SparkSession) extends 
Rule[LogicalPlan] {
-  private def readDataSourceTable(
-  sparkSession: SparkSession,
-  simpleCatalogRelation: SimpleCatalogRelation): LogicalPlan = {
-val table = simpleCatalogRelation.catalogTable
-val pathOption = table.storage.locationUri.map("path" -> _)
-val dataSource =
-  DataSource(
-sparkSession,
-userSpecifiedSchema = Some(table.schema),
-partitionColumns = table.partitionColumnNames,
-bucketSpec = table.bucketSpec,
-className = table.provider.get,
-options = table.storage.properties ++ pathOption)
-
-LogicalRelation(
-  dataSource.resolveRelation(),
-  expectedOutputAttributes = Some(simpleCatalogRelation.output),
-  catalogTable = Some(table))
+  private def readDataSourceTable(relation: SimpleCatalogRelation): 
LogicalPlan = {
+val table = relation.catalogTable
+val cache = sparkSession.sessionState.catalog.tableRelationCache
+val withHiveSupport =
+  
sparkSession.sparkContext.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION) == 
"hive"
+
+cache.get(table.qualifiedIdentifier, new Callable[LogicalPlan]() {
+  override def call(): LogicalPlan = {
+val pathOption = table.storage.locationUri.map("path" -> _)
+val dataSource =
+  DataSource(
+sparkSession,
+userSpecifiedSchema = Some(table.schema),
--- End diff --

```
// In older version(prior to 2.1) of Spark, the table schema 
can be empty and should be
// inferred at runtime. We should still support it.
```

Is it still valid?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16606#discussion_r96569991
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/PartitionedWriteSuite.scala
 ---
@@ -92,6 +92,30 @@ class PartitionedWriteSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+
+  test("saveAsTable with inconsistent columns order" +
--- End diff --

does this test improve the test coverage?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16606#discussion_r96569809
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1374,4 +1377,47 @@ class HiveDDLSuite
   assert(e2.message.contains("Hive data source can only be used with 
tables"))
 }
   }
+
+  test("table partition schema should be ordered") {
+withTable("t", "t1") {
+  val path = Utils.createTempDir(namePrefix = "t")
+  val path1 = Utils.createTempDir(namePrefix = "t1")
+  try {
+spark.sql(s"""
+ |create table t (id long, P1 int, P2 int)
+ |using parquet
+ |options (path "$path")
+ |partitioned by (P1, P2)""".stripMargin)
--- End diff --

this test can pass without your changes right? I think we can just keep the 
below one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16623: [SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set opti...

2017-01-17 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16623
  
merged to 2.1. thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16606#discussion_r96569387
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1374,4 +1377,47 @@ class HiveDDLSuite
   assert(e2.message.contains("Hive data source can only be used with 
tables"))
 }
   }
+
+  test("table partition schema should be ordered") {
+withTable("t", "t1") {
+  val path = Utils.createTempDir(namePrefix = "t")
+  val path1 = Utils.createTempDir(namePrefix = "t1")
+  try {
+spark.sql(s"""
--- End diff --

nit: code style, please follow existing code: 
https://github.com/apache/spark/pull/16606/files#diff-b7094baa12601424a5d19cb930e3402fR1255


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16606#discussion_r96569222
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1374,4 +1377,47 @@ class HiveDDLSuite
   assert(e2.message.contains("Hive data source can only be used with 
tables"))
 }
   }
+
+  test("table partition schema should be ordered") {
+withTable("t", "t1") {
+  val path = Utils.createTempDir(namePrefix = "t")
--- End diff --

use `withTempDir`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16606#discussion_r96569205
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -1374,4 +1377,47 @@ class HiveDDLSuite
   assert(e2.message.contains("Hive data source can only be used with 
tables"))
 }
   }
+
+  test("table partition schema should be ordered") {
--- End diff --

table partition schema should respect the order of partition columns


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16623: [SPARK-19066][SPARKR][Backport-2.1]:LDA doesn't set opti...

2017-01-17 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16623
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...

2017-01-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16606#discussion_r96569171
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala
 ---
@@ -138,6 +138,7 @@ case class CreateDataSourceTableAsSelectCommand(
 val tableIdentWithDB = table.identifier.copy(database = Some(db))
 val tableName = tableIdentWithDB.unquotedString
 
+var tableWithSchema = table.copy(schema = query.output.toStructType)
--- End diff --

shall we set the schema in `AnalyzeCreateTable`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16589: [SPARK-19231][SPARKR] add error handling for down...

2017-01-17 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16589#discussion_r96569089
  
--- Diff: R/pkg/R/install.R ---
@@ -201,14 +221,20 @@ directDownloadTar <- function(mirrorUrl, version, 
hadoopVersion, packageName, pa
   msg <- sprintf(fmt, version, ifelse(hadoopVersion == "without", "Free 
build", hadoopVersion),
  packageRemotePath)
   message(msg)
-  downloadUrl(packageRemotePath, packageLocalPath, paste0("Fetch failed 
from ", mirrorUrl))
+  downloadUrl(packageRemotePath, packageLocalPath)
--- End diff --

yea I agree. I guess I'm trying to bubble up error messages to the top 
level but generally the exception to throw is making this non-trivial (never 
thought I'd say that!)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-17 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16503
  
@vanzin 
Sorry for the stupid mistake I made. I've changed. Please take another look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16589: [SPARK-19231][SPARKR] add error handling for down...

2017-01-17 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16589#discussion_r96567761
  
--- Diff: R/pkg/R/install.R ---
@@ -201,14 +221,20 @@ directDownloadTar <- function(mirrorUrl, version, 
hadoopVersion, packageName, pa
   msg <- sprintf(fmt, version, ifelse(hadoopVersion == "without", "Free 
build", hadoopVersion),
  packageRemotePath)
   message(msg)
-  downloadUrl(packageRemotePath, packageLocalPath, paste0("Fetch failed 
from ", mirrorUrl))
+  downloadUrl(packageRemotePath, packageLocalPath)
--- End diff --

I didn't relate this to the update in L176 - I think this is fine. In 
general I think this file has gotten a little unwieldy with error messages 
coming from different functions. I wonder if there is a better way to refactor 
things to setup some expectations on where errors are thrown etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16589: [SPARK-19231][SPARKR] add error handling for down...

2017-01-17 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16589#discussion_r96289817
  
--- Diff: R/pkg/R/install.R ---
@@ -54,7 +54,7 @@
 #' }
 #' @param overwrite If \code{TRUE}, download and overwrite the existing 
tar file in localDir
 #'  and force re-install Spark (in case the local 
directory or file is corrupted)
-#' @return \code{install.spark} returns the local directory where Spark is 
found or installed
+#' @return the (invisible) local directory where Spark is found or 
installed
--- End diff --

Got it. Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16503
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16503
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16503
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71560/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16503
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71558/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16503
  
**[Test build #71558 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71558/testReport)**
 for PR 16503 at commit 
[`52af8c5`](https://github.com/apache/spark/commit/52af8c5359a48e31f665f282c0a50aaacb19ae4d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16503: [SPARK-18113] Use ask to replace askWithRetry in canComm...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16503
  
**[Test build #71560 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71560/testReport)**
 for PR 16503 at commit 
[`69b412a`](https://github.com/apache/spark/commit/69b412ac9fd6d6ebd27049cdbaf7a2c5ef75455b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14204
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71557/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14204: [SPARK-16520] [WEBUI] Link executors to correspon...

2017-01-17 Thread nblintao

Github user nblintao commented on a diff in the pull request:

https://github.com/apache/spark/pull/14204#discussion_r96568190
  
--- Diff: 
core/src/main/resources/org/apache/spark/ui/static/executorspage.js ---
@@ -408,12 +420,17 @@ $(document).ready(function () {
 data: 'id', render: function (data, type) {
 return type === 'display' ? ("Thread Dump" ) : data;
 }
-}
+},
+{data: 'worker', render: formatWorkersCells}
 ],
 "columnDefs": [
 {
 "targets": [ 16 ],
 "visible": getThreadDumpEnabled()
+},
+{
+"targets": [ 17 ],
+"visible": workersExist(response)
--- End diff --

Fixed. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15125
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71556/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15125
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2017-01-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14204
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15125
  
**[Test build #71556 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71556/testReport)**
 for PR 15125 at commit 
[`e786838`](https://github.com/apache/spark/commit/e786838af3912953d61787210213c269b4a5cdba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2017-01-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14204
  
**[Test build #71557 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71557/testReport)**
 for PR 14204 at commit 
[`d23643c`](https://github.com/apache/spark/commit/d23643ce79efe98e33e42c23548478d672f5de81).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 686 matches

Mail list logo