[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread petermaxlee
Github user petermaxlee commented on the issue:

https://github.com/apache/spark/pull/13989
  
What do you mean by both positive and negative cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread petermaxlee
Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69075298
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
@@ -166,8 +166,8 @@ private[sql] class SessionState(sparkSession: 
SparkSession) {
 
   def executePlan(plan: LogicalPlan): QueryExecution = new 
QueryExecution(sparkSession, plan)
 
-  def invalidateTable(tableName: String): Unit = {
-catalog.invalidateTable(sqlParser.parseTableIdentifier(tableName))
+  def refreshTable(tableName: String): Unit = {
--- End diff --

I just picked the one that was exposed to users (refresh in catalog and in 
sql).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13603: [SPARK-15865][CORE] Blacklist should not result in job h...

2016-06-29 Thread kayousterhout
Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/13603
  
LGTM!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13971: [SPARK-16289][SQL] Implement posexplode table gen...

2016-06-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13971#discussion_r69075261
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala
 ---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.unsafe.types.UTF8String
+
+class GeneratorExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper {
+  private def checkTuple(actual: ExplodeBase, expected: Seq[InternalRow]): 
Unit = {
+assert(actual.eval(null).toSeq === expected)
--- End diff --

And, how to check the zero row? At Line 39, 

https://github.com/apache/spark/pull/13971/files#diff-6715134a4e95980149a7600ecb71674cR41


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread petermaxlee
Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69075247
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -85,5 +85,10 @@ case class LogicalRelation(
   expectedOutputAttributes,
   metastoreTableIdentifier).asInstanceOf[this.type]
 
+  override def refresh(): Unit = relation match {
+case fs: HadoopFsRelation => fs.refresh()
--- End diff --

I don't agree on this one. LogicalRelation might not be the only one that 
needs to override this in the future. There can certainly be other logical 
plans in the future that keep some state and needs to implement refresh. The 
definition of "refresh" itself with a default implementation also means only 
plans that need to refresh anything should override it.

I'm going to update refresh in LogicalPlan to make this more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread petermaxlee
Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69075198
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -265,6 +265,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   s"Reference '$name' is ambiguous, could be: $referenceNames.")
 }
   }
+
+  /**
+   * Invalidates any metadata cached in the plan recursively.
+   */
+  def refresh(): Unit = children.foreach(_.refresh())
--- End diff --

I don't get it. Why would this be more expensive than any other recursive 
calls that happen in logical plans?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13972: [SPARK-16294][SQL] Labelling support for the incl...

2016-06-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13972


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69074558
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -265,6 +265,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   s"Reference '$name' is ambiguous, could be: $referenceNames.")
 }
   }
+
+  /**
+   * Invalidates any metadata cached in the plan recursively.
+   */
+  def refresh(): Unit = children.foreach(_.refresh())
--- End diff --

I think we want to avoid recursive implementation at best. It is too 
expensive for a large tree.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/13972
  
@yinxusen Do you have time to consolidate example files for 
`mllib-data-types.md`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/13972
  
LGTM2. Merged into master and branch-2.0. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69074411
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -85,5 +85,10 @@ case class LogicalRelation(
   expectedOutputAttributes,
   metastoreTableIdentifier).asInstanceOf[this.type]
 
+  override def refresh(): Unit = relation match {
+case fs: HadoopFsRelation => fs.refresh()
--- End diff --

I know, but we need to write the comments for the code readers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread petermaxlee
Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69074328
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -265,6 +265,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   s"Reference '$name' is ambiguous, could be: $referenceNames.")
 }
   }
+
+  /**
+   * Invalidates any metadata cached in the plan recursively.
+   */
+  def refresh(): Unit = children.foreach(_.refresh())
--- End diff --

But this function is not tail recursive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13971: [SPARK-16289][SQL] Implement posexplode table gen...

2016-06-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13971#discussion_r69074335
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/GeneratorExpressionSuite.scala
 ---
@@ -0,0 +1,71 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.unsafe.types.UTF8String
+
+class GeneratorExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper {
+  private def checkTuple(actual: ExplodeBase, expected: Seq[InternalRow]): 
Unit = {
+assert(actual.eval(null).toSeq === expected)
--- End diff --

Oh, thank you for review, @cloud-fan , too.
Do we have an example of `checkEvaluation` to check the generator, multiple 
InternalRows?
I just thought `checkEvaluation` is just for a single row, e.g., values, 
arrays, maps.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69074265
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -265,6 +265,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   s"Reference '$name' is ambiguous, could be: $referenceNames.")
 }
   }
+
+  /**
+   * Invalidates any metadata cached in the plan recursively.
+   */
+  def refresh(): Unit = children.foreach(_.refresh())
--- End diff --

You need to mark it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13989
  
Test cases are not enough to cover the metadata refreshing. The current 
metadata cache is only used for data source tables. We still could convert Hive 
tables to data source tables. For example, parquet and orc. Thus, we also need 
to check the behaviors of these cases. 

Try to design more test cases for metadata refreshing, including both 
positive and negative cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread petermaxlee
Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69074253
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2307,6 +2307,19 @@ class Dataset[T] private[sql](
   def distinct(): Dataset[T] = dropDuplicates()
 
   /**
+   * Refreshes the metadata and data cached in Spark for data associated 
with this Dataset.
+   * An example use case is to invalidate the file system metadata cached 
by Spark, when the
+   * underlying files have been updated by an external process.
+   *
+   * @group action
+   * @since 2.0.0
+   */
+  def refresh(): Unit = {
+unpersist(false)
--- End diff --

ah ic - we can't unpersist.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread petermaxlee
Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69074131
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -85,5 +85,10 @@ case class LogicalRelation(
   expectedOutputAttributes,
   metastoreTableIdentifier).asInstanceOf[this.type]
 
+  override def refresh(): Unit = relation match {
+case fs: HadoopFsRelation => fs.refresh()
--- End diff --

What do you mean? Other leaf nodes don't keep state, do they?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread petermaxlee
Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69074039
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -265,6 +265,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   s"Reference '$name' is ambiguous, could be: $referenceNames.")
 }
   }
+
+  /**
+   * Invalidates any metadata cached in the plan recursively.
+   */
+  def refresh(): Unit = children.foreach(_.refresh())
--- End diff --

This is not a tailrec?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69073906
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
@@ -166,8 +166,8 @@ private[sql] class SessionState(sparkSession: 
SparkSession) {
 
   def executePlan(plan: LogicalPlan): QueryExecution = new 
QueryExecution(sparkSession, plan)
 
-  def invalidateTable(tableName: String): Unit = {
-catalog.invalidateTable(sqlParser.parseTableIdentifier(tableName))
+  def refreshTable(tableName: String): Unit = {
--- End diff --

To be honest, I still think `invalidateTable` is a right name. We are not 
doing `refresh`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69073454
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -85,5 +85,10 @@ case class LogicalRelation(
   expectedOutputAttributes,
   metastoreTableIdentifier).asInstanceOf[this.type]
 
+  override def refresh(): Unit = relation match {
+case fs: HadoopFsRelation => fs.refresh()
--- End diff --

You have to document the reason why only `LogicalRelation` override this 
function


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69073383
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -265,6 +265,11 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   s"Reference '$name' is ambiguous, could be: $referenceNames.")
 }
   }
+
+  /**
+   * Invalidates any metadata cached in the plan recursively.
+   */
+  def refresh(): Unit = children.foreach(_.refresh())
--- End diff --

use @tailrec


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69073191
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -139,18 +139,6 @@ private[hive] class HiveMetastoreCatalog(sparkSession: 
SparkSession) extends Log
   }
 
   def refreshTable(tableIdent: TableIdentifier): Unit = {
-// refreshTable does not eagerly reload the cache. It just invalidate 
the cache.
-// Next time when we use the table, it will be populated in the cache.
-// Since we also cache ParquetRelations converted from Hive Parquet 
tables and
-// adding converted ParquetRelations into the cache is not defined in 
the load function
-// of the cache (instead, we add the cache entry in 
convertToParquetRelation),
-// it is better at here to invalidate the cache to avoid confusing 
waring logs from the
-// cache loader (e.g. cannot find data source provider, which is only 
defined for
-// data source table.).
--- End diff --

Keep the comments? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69072136
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2307,6 +2307,19 @@ class Dataset[T] private[sql](
   def distinct(): Dataset[T] = dropDuplicates()
 
   /**
+   * Refreshes the metadata and data cached in Spark for data associated 
with this Dataset.
+   * An example use case is to invalidate the file system metadata cached 
by Spark, when the
+   * underlying files have been updated by an external process.
+   *
+   * @group action
+   * @since 2.0.0
+   */
+  def refresh(): Unit = {
+unpersist(false)
--- End diff --

This new API has different behaviors from the `refreshTable` API and 
`Refresh Table` SQL statement. See the following code:

https://github.com/apache/spark/blob/02a029df43392c5d73697203bf6ff51b8d6efb83/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala#L349-L374

IMO, if we using the word `refresh`, we have to make them consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13988
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13988
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61523/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13988
  
**[Test build #61523 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61523/consoleFull)**
 for PR 13988 at commit 
[`211bfb4`](https://github.com/apache/spark/commit/211bfb47acc79c51327b3f1c40aa86802470f436).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69071788
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala
 ---
@@ -85,5 +85,10 @@ case class LogicalRelation(
   expectedOutputAttributes,
   metastoreTableIdentifier).asInstanceOf[this.type]
 
+  override def refresh(): Unit = relation match {
+case fs: HadoopFsRelation => fs.refresh()
--- End diff --

How about the other leaf nodes? `LogicalRelation` is just one of them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13978: [SPARK-16256][DOCS] Minor fixes on the Structured Stream...

2016-06-29 Thread ScrapCodes
Github user ScrapCodes commented on the issue:

https://github.com/apache/spark/pull/13978
  
Looks good !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread petermaxlee
Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69071622
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2307,6 +2307,19 @@ class Dataset[T] private[sql](
   def distinct(): Dataset[T] = dropDuplicates()
 
   /**
+   * Refreshes the metadata and data cached in Spark for data associated 
with this Dataset.
+   * An example use case is to invalidate the file system metadata cached 
by Spark, when the
+   * underlying files have been updated by an external process.
+   *
+   * @group action
+   * @since 2.0.0
+   */
+  def refresh(): Unit = {
+unpersist(false)
--- End diff --

Other refresh methods also remove cached data, so I thought this is better.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13969: [SPARK-16284][SQL] Implement reflect SQL function

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13969
  
**[Test build #3152 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3152/consoleFull)**
 for PR 13969 at commit 
[`0e43c95`](https://github.com/apache/spark/commit/0e43c9560de9ce49953f90337e83bb30858915fc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13966: [SPARK-16276][SQL] Implement elt SQL function

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13966
  
**[Test build #3153 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3153/consoleFull)**
 for PR 13966 at commit 
[`bbccf10`](https://github.com/apache/spark/commit/bbccf1002a6f3a0d2bf9abc8ef68465245fa4983).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/13989#discussion_r69071525
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2307,6 +2307,19 @@ class Dataset[T] private[sql](
   def distinct(): Dataset[T] = dropDuplicates()
 
   /**
+   * Refreshes the metadata and data cached in Spark for data associated 
with this Dataset.
+   * An example use case is to invalidate the file system metadata cached 
by Spark, when the
+   * underlying files have been updated by an external process.
+   *
+   * @group action
+   * @since 2.0.0
+   */
+  def refresh(): Unit = {
+unpersist(false)
--- End diff --

It will remove the cached data. This is different from what JIRA describes. 
CC @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13966: [SPARK-16276][SQL] Implement elt SQL function

2016-06-29 Thread petermaxlee
Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/13966#discussion_r69070865
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -162,6 +163,46 @@ case class ConcatWs(children: Seq[Expression])
   }
 }
 
+@ExpressionDescription(
+  usage = "_FUNC_(n, str1, str2, ...) - returns the n-th string",
--- End diff --

updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13987
  
**[Test build #61528 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61528/consoleFull)**
 for PR 13987 at commit 
[`bd2040a`](https://github.com/apache/spark/commit/bd2040a64e80f91b8805c3dcd1e99d3dbb7e6524).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13966: [SPARK-16276][SQL] Implement elt SQL function

2016-06-29 Thread petermaxlee
Github user petermaxlee commented on a diff in the pull request:

https://github.com/apache/spark/pull/13966#discussion_r69070679
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -162,6 +163,46 @@ case class ConcatWs(children: Seq[Expression])
   }
 }
 
+@ExpressionDescription(
+  usage = "_FUNC_(n, str1, str2, ...) - returns the n-th string",
+  extended = "> SELECT _FUNC_(1, 'scala', 'java') FROM src LIMIT 1;\n" + 
"'scala'")
+case class Elt(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
--- End diff --

Created https://issues.apache.org/jira/browse/SPARK-16315


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13989
  
Before, I tried to merge `invalidateTable` and `refreshTable`. @yhuai left 
the following comment: 
https://github.com/apache/spark/pull/13156#discussion_r63729506

I think maybe we can keep them separately? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13982: [SPARK-16304] LinkageError should not crash Spark execut...

2016-06-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13982
  
cc @JoshRosen and @ericl 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13767: [MINOR][SQL] Not dropping all necessary tables

2016-06-29 Thread techaddict
Github user techaddict commented on the issue:

https://github.com/apache/spark/pull/13767
  
cc: @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13990: [SPARK-16287][SQL][WIP] Implement str_to_map SQL functio...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13990
  
**[Test build #61525 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61525/consoleFull)**
 for PR 13990 at commit 
[`1f888ab`](https://github.com/apache/spark/commit/1f888abb532c905dac11b404819786fd2641e38f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13987
  
**[Test build #61526 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61526/consoleFull)**
 for PR 13987 at commit 
[`dbf9e58`](https://github.com/apache/spark/commit/dbf9e58bdac662721d26f3bd5ca76a2c2acdb0ee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13926: [SPARK-16229] [SQL] Drop Empty Table After CREATE TABLE ...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13926
  
**[Test build #61527 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61527/consoleFull)**
 for PR 13926 at commit 
[`c0f08a5`](https://github.com/apache/spark/commit/c0f08a518332deac260bc69c787cba06ddf9cf98).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13990: [SPARK-16287][SQL][WIP] Implement str_to_map SQL ...

2016-06-29 Thread techaddict
GitHub user techaddict opened a pull request:

https://github.com/apache/spark/pull/13990

[SPARK-16287][SQL][WIP] Implement str_to_map SQL function

## What changes were proposed in this pull request?
This PR adds `str_to_map` SQL function in order to remove Hive fallback.

## How was this patch tested?
Pass the Jenkins tests with newly added.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/techaddict/spark SPARK-16287

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13990.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13990


commit af59f57cecd93de49ec5bd20058199d93a9f2445
Author: Sandeep Singh 
Date:   2016-06-30T03:54:05Z

First pass without arguments

commit dc6b1f439e32768828bdb7d1a10f8b8178fa4c13
Author: Sandeep Singh 
Date:   2016-06-30T04:32:54Z

Add delimiter options

commit a8e6631edf6d124f218b15589427664f5b454759
Author: Sandeep Singh 
Date:   2016-06-30T04:36:08Z

Merge master

commit 1f888abb532c905dac11b404819786fd2641e38f
Author: Sandeep Singh 
Date:   2016-06-30T04:37:13Z

merge fix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13926: [SPARK-16229] [SQL] Drop Empty Table After CREATE TABLE ...

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13926
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13989
  
**[Test build #61524 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61524/consoleFull)**
 for PR 13989 at commit 
[`82f9bec`](https://github.com/apache/spark/commit/82f9bec79125ad3f1c4da504891a75adb5b33f2f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13926: [SPARK-16229] [SQL] Drop Empty Table After CREATE TABLE ...

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13926
  
ping @hvanhovell Could you please take a look at this again? : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13886: [SPARK-16185] [SQL] Better Error Messages When Creating ...

2016-06-29 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13886
  
Could you please review this PR again? @cloud-fan Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13989
  
cc @cloud-fan / @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread petermaxlee
Github user petermaxlee commented on the issue:

https://github.com/apache/spark/pull/13989
  
cc @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13989: [SPARK-16311][SQL] Improve metadata refresh

2016-06-29 Thread petermaxlee
GitHub user petermaxlee opened a pull request:

https://github.com/apache/spark/pull/13989

[SPARK-16311][SQL] Improve metadata refresh

## What changes were proposed in this pull request?
This patch implements the 3 things specified in SPARK-16311:

(1) Append a message to the FileNotFoundException and say that a workaround 
is to do explicitly metadata refresh.
(2) Make metadata refresh work on temporary tables/views.
(3) Make metadata refresh work on Datasets/DataFrames, by introducing a 
Dataset.refresh() method.

And one additional small change:
(4) Merge invalidateTable and refreshTable.

## How was this patch tested?
Created a new test suite that creates a temporary directory and then 
deletes a file from it to verify Spark can read the directory once refresh is 
called.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/petermaxlee/spark SPARK-16311

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13989.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13989


commit cbfbbc7d27ae086805625fa41dbcbad50783fee8
Author: petermaxlee 
Date:   2016-06-30T04:50:37Z

[SPARK-16311][SQL] Improve metadata refresh

commit f7150345245accd0e71a351e9da9ebac9b80a520
Author: petermaxlee 
Date:   2016-06-30T04:53:58Z

Add test suite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13979: [SPARK-SPARK-16302] [SQL] Set the right number of partit...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13979
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61520/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13979: [SPARK-SPARK-16302] [SQL] Set the right number of partit...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13979
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13979: [SPARK-SPARK-16302] [SQL] Set the right number of partit...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13979
  
**[Test build #61520 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61520/consoleFull)**
 for PR 13979 at commit 
[`f49ad08`](https://github.com/apache/spark/commit/f49ad0809d84ad8b512afd4cb58ac377426b8d3e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13987: [SPARK-16313][SQL] Spark should not silently drop...

2016-06-29 Thread clockfly
Github user clockfly commented on a diff in the pull request:

https://github.com/apache/spark/pull/13987#discussion_r69067474
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala
 ---
@@ -58,10 +56,16 @@ class ListingFileCatalog(
   }
 
   override protected def leafFiles: mutable.LinkedHashMap[Path, 
FileStatus] = {
+if (cachedLeafFiles eq null) {
+  refresh()
+}
 cachedLeafFiles
   }
 
   override protected def leafDirToChildrenFiles: Map[Path, 
Array[FileStatus]] = {
+if (cachedLeafDirToChildrenFiles eq null) {
+  refresh()
--- End diff --

There is a side effect. `refresh()` rest the `cachedPartitionSpec` to null, 
which may cleared already inferred partition information. 

```
  override def refresh(): Unit = {
val files = listLeafFiles(paths)
cachedLeafFiles =
  new mutable.LinkedHashMap[Path, FileStatus]() ++= files.map(f => 
f.getPath -> f)
cachedLeafDirToChildrenFiles = 
files.toArray.groupBy(_.getPath.getParent)
cachedPartitionSpec = null
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13987
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61521/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13987
  
**[Test build #61521 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61521/consoleFull)**
 for PR 13987 at commit 
[`f3eb4fb`](https://github.com/apache/spark/commit/f3eb4fbac5317fe9a29b2494a6006cb92932a456).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13987
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13906
  
@cloud-fan Yea, that's a good point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...

2016-06-29 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13988
  
I still need to correct some nits and check the consistency with JSON data 
source but I opened this just to check if it breaks anything. I will submit 
some more commits soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data source to b...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13988
  
**[Test build #61523 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61523/consoleFull)**
 for PR 13988 at commit 
[`211bfb4`](https://github.com/apache/spark/commit/211bfb47acc79c51327b3f1c40aa86802470f436).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13988: [WIP][SPARK-16101][SQL] Refactoring CSV data sour...

2016-06-29 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/13988

[WIP][SPARK-16101][SQL] Refactoring CSV data source to be consistent with 
JSON data source

## What changes were proposed in this pull request?

This PR refactors CSV data source to be consistent with JSON data source.

This PR removes classes `CSVParser` and introduces new classes 
`UnivocityParser`, `UnivocityGenerator` and `CSVUtils` to be consistent with 
JSON data source (`JacksonParser`, `JacksonGenerator` and `JacksonUtils`). 
Also, DefaultSource moves to `CSVRelation` just like `JSONRelation`.

## How was this patch tested?

Existing tests should cover this.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-16101

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13988.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13988


commit 211bfb47acc79c51327b3f1c40aa86802470f436
Author: hyukjinkwon 
Date:   2016-06-30T03:50:58Z

Refactoring CSV data source to be consistent with JSON data source




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13829
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13829
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61517/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13829
  
**[Test build #61517 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61517/consoleFull)**
 for PR 13829 at commit 
[`943f7de`](https://github.com/apache/spark/commit/943f7de62204af5fee228e938d293e3283f4b395).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer

2016-06-29 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13906
  
@liancheng , I think we still need to keep some simple rules for unary 
node, which also helps the binary cases, as the empty relation is propagated up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-29 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69065541
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. InnerJoin with one or two empty children.
+ * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all 
empty children.
+ * 3. Aggregate with all empty children and grpExprs containing all 
aggExprs.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean =
+plan.isInstanceOf[LocalRelation] && 
plan.asInstanceOf[LocalRelation].data.isEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case p @ Join(_, _, Inner, _) if 
p.children.exists(isEmptyLocalRelation) =>
--- End diff --

Yea, we can also add `Intersect`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-29 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69065425
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. InnerJoin with one or two empty children.
+ * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all 
empty children.
+ * 3. Aggregate with all empty children and grpExprs containing all 
aggExprs.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean =
+plan.isInstanceOf[LocalRelation] && 
plan.asInstanceOf[LocalRelation].data.isEmpty
--- End diff --

```scala
plan match {
  case p: LocalRelation => p.data.isEmpty
  case _ => false
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimizer

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13906
  
My feeling is that, this optimization rule is mostly useful for binary plan 
nodes like inner join and intersection, where we can avoid scanning output of 
the non-empty side.

On the other hand, for unary plan nodes, firstly it doesn't bring much 
performance benefits, especially when whole stage codegen is enabled; secondly 
there are non-obvious and tricky corner cases, like `Aggregate` and `Generate`.

That said, although this patch is not a big one, it does introduce 
non-trivial complexities. For example, I didn't immediately realize that why 
`Aggregate` must be special cased at first (`COUNT(x)` may return 0 for empty 
input). The `Generate` case is even trickier.

So my suggestion is to only implement this rule for inner join and 
intersection, which are much simpler to handle. what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13829
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13829
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61515/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13829
  
**[Test build #61515 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61515/consoleFull)**
 for PR 13829 at commit 
[`4265771`](https://github.com/apache/spark/commit/42657717041b055c9a9d1266f9a29d8e39edab20).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69065025
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. InnerJoin with one or two empty children.
+ * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all 
empty children.
+ * 3. Aggregate with all empty children and grpExprs containing all 
aggExprs.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean =
+plan.isInstanceOf[LocalRelation] && 
plan.asInstanceOf[LocalRelation].data.isEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case p @ Join(_, _, Inner, _) if 
p.children.exists(isEmptyLocalRelation) =>
--- End diff --

I think this rule is very useful, we can avoid scanning one join side if 
the other side is empty


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69064885
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. InnerJoin with one or two empty children.
+ * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all 
empty children.
+ * 3. Aggregate with all empty children and grpExprs containing all 
aggExprs.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean =
+plan.isInstanceOf[LocalRelation] && 
plan.asInstanceOf[LocalRelation].data.isEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case p @ Join(_, _, Inner, _) if 
p.children.exists(isEmptyLocalRelation) =>
+  LocalRelation(p.output, data = Seq.empty)
+
+case p: LogicalPlan if p.children.nonEmpty && 
p.children.forall(isEmptyLocalRelation) =>
+  p match {
+case _: Project | _: Generate | _: Filter | _: Sample | _: Join |
+ _: Sort | _: GlobalLimit | _: LocalLimit | _: Union | _: 
Repartition =>
+  LocalRelation(p.output, data = Seq.empty)
+case Aggregate(ge, ae, _) if ae.forall(ge.contains(_)) =>
--- End diff --

what exactly are we checking here? it looks to me that we can do empty 
relation propagate if aggregate list has no aggregate function, e.g. `select 
col + 1 from tbl group by col` should also work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13978: [SPARK-16256][DOCS] Minor fixes on the Structured Stream...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13978
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61522/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13978: [SPARK-16256][DOCS] Minor fixes on the Structured Stream...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13978
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13978: [SPARK-16256][DOCS] Minor fixes on the Structured Stream...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13978
  
**[Test build #61522 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61522/consoleFull)**
 for PR 13978 at commit 
[`f440214`](https://github.com/apache/spark/commit/f440214efb0f79d3a82be45bd3d67aa6c4038fda).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11863
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61513/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11863
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11863
  
**[Test build #61513 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61513/consoleFull)**
 for PR 11863 at commit 
[`cffb0e0`](https://github.com/apache/spark/commit/cffb0e0fb89808732c3ab3c1c7d83049549e2e2d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13906: [SPARK-16208][SQL] Add `CollapseEmptyPlan` optimi...

2016-06-29 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13906#discussion_r69064054
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CollapseEmptyPlan.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans._
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules._
+
+/**
+ * Collapse plans consisting empty local relations generated by 
[[PruneFilters]].
+ * 1. InnerJoin with one or two empty children.
+ * 2. Project/Generate/Filter/Sample/Join/Limit/Union/Repartition with all 
empty children.
+ * 3. Aggregate with all empty children and grpExprs containing all 
aggExprs.
+ */
+object CollapseEmptyPlan extends Rule[LogicalPlan] with PredicateHelper {
+  private def isEmptyLocalRelation(plan: LogicalPlan): Boolean =
+plan.isInstanceOf[LocalRelation] && 
plan.asInstanceOf[LocalRelation].data.isEmpty
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
+case p @ Join(_, _, Inner, _) if 
p.children.exists(isEmptyLocalRelation) =>
+  LocalRelation(p.output, data = Seq.empty)
+
+case p: LogicalPlan if p.children.nonEmpty && 
p.children.forall(isEmptyLocalRelation) =>
+  p match {
+case _: Project | _: Generate | _: Filter | _: Sample | _: Join |
--- End diff --

Actually `Generate` can't be included here. Our `Generate` also support 
Hive style UDTF, which has a weird semantics: for a UDTF `f`, after all rows 
being processed, `f.close()` will be called, and *more rows can be generated* 
within `f.close()`. This means a UDTF may generate one or more rows even if the 
underlying input is empty.

See [here][1] and PR #5338 for more details.

[1]: https://github.com/apache/spark/pull/5383/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13829
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13829
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61514/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13829: [SPARK-16071][SQL] Checks size limit when doubling the a...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13829
  
**[Test build #61514 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61514/consoleFull)**
 for PR 13829 at commit 
[`3a831e0`](https://github.com/apache/spark/commit/3a831e03cfbe0722701a88c9bdbc164098197113).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13978: [SPARK-16256][DOCS] Minor fixes on the Structured Stream...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13978
  
**[Test build #61522 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61522/consoleFull)**
 for PR 13978 at commit 
[`f440214`](https://github.com/apache/spark/commit/f440214efb0f79d3a82be45bd3d67aa6c4038fda).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...

2016-06-29 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/13987
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13987: [SPARK-16313][SQL] Spark should not silently drop except...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13987
  
**[Test build #61521 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61521/consoleFull)**
 for PR 13987 at commit 
[`f3eb4fb`](https://github.com/apache/spark/commit/f3eb4fbac5317fe9a29b2494a6006cb92932a456).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13987: [SPARK-16313][SQL] Spark should not silently drop...

2016-06-29 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/13987

[SPARK-16313][SQL] Spark should not silently drop exceptions in file listing

## What changes were proposed in this pull request?
Spark silently drops exceptions during file listing. This is a very bad 
behavior because it can mask legitimate errors and the resulting plan will 
silently have 0 rows. This patch changes it to not silently drop the errors.

## How was this patch tested?
Manually verified.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-16313

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13987.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13987


commit f3eb4fbac5317fe9a29b2494a6006cb92932a456
Author: Reynold Xin 
Date:   2016-06-30T03:00:16Z

[SPARK-16313][SQL] Spark should not silently drop exceptions in file listing




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13972
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61519/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13972
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13972
  
**[Test build #61519 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61519/consoleFull)**
 for PR 13972 at commit 
[`7ea9c75`](https://github.com/apache/spark/commit/7ea9c753fc8b490f2b0549b6dbb303bd0b8a573f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13979: [SPARK-SPARK-16302] [SQL] Set the right number of partit...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13979
  
**[Test build #61520 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61520/consoleFull)**
 for PR 13979 at commit 
[`f49ad08`](https://github.com/apache/spark/commit/f49ad0809d84ad8b512afd4cb58ac377426b8d3e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12384: [SPARK-14608] [ML] transformSchema needs better document...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12384
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61518/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12384: [SPARK-14608] [ML] transformSchema needs better document...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12384
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12384: [SPARK-14608] [ML] transformSchema needs better document...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12384
  
**[Test build #61518 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61518/consoleFull)**
 for PR 12384 at commit 
[`ddbc56a`](https://github.com/apache/spark/commit/ddbc56a6cdbbd1280bd50dd55972e50f0eaa3dd5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13941: [SPARK-16249][ML] Change visibility of Object ml.cluster...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13941
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13941: [SPARK-16249][ML] Change visibility of Object ml.cluster...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13941
  
**[Test build #61516 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61516/consoleFull)**
 for PR 13941 at commit 
[`11a077c`](https://github.com/apache/spark/commit/11a077cd3e86c169465375c24ac50ad28801f2e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13941: [SPARK-16249][ML] Change visibility of Object ml.cluster...

2016-06-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13941
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61516/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11863
  
**[Test build #3150 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3150/consoleFull)**
 for PR 11863 at commit 
[`f863369`](https://github.com/apache/spark/commit/f86336951d4dd196812420e4e902f105ea95e81b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread yinxusen
Github user yinxusen commented on the issue:

https://github.com/apache/spark/pull/13972
  
@mengxr With this PR merged, I think we can also fix the [SPARK-13015 
(mllib-data-types.md )](https://issues.apache.org/jira/browse/SPARK-13015) with 
a consolidated example file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/13972
  
@yinxusen Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13972: [SPARK-16294][SQL] Labelling support for the include_exa...

2016-06-29 Thread yinxusen
Github user yinxusen commented on the issue:

https://github.com/apache/spark/pull/13972
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >