[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r80629937
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, ColumnStat, 
LogicalPlan, Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
+ * which will be used in query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableIdent: TableIdentifier,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val db = 
tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase)
+val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db))
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdentWithDB))
+
+relation match {
+  case catalogRel: CatalogRelation =>
+updateStats(catalogRel.catalogTable,
+  AnalyzeTableCommand.calculateTotalSize(sessionState, 
catalogRel.catalogTable))
+
+  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
+updateStats(logicalRel.catalogTable.get, 
logicalRel.relation.sizeInBytes)
+
+  case otherRelation =>
+throw new AnalysisException("ANALYZE TABLE is not supported for " +
+  s"${otherRelation.nodeName}.")
+}
+
+def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit 
= {
+  val (rowCount, columnStats) = computeColStats(sparkSession, relation)
+  val statistics = Statistics(
+sizeInBytes = newTotalSize,
+rowCount = Some(rowCount),
+colStats = columnStats ++ 
catalogTable.stats.map(_.colStats).getOrElse(Map()))
+  sessionState.catalog.alterTable(catalogTable.copy(stats = 
Some(statistics)))
+  // Refresh the cached data source table in the catalog.
+  sessionState.catalog.refreshTable(tableIdentWithDB)
--- End diff --

why do we need to refresh table for statistics updating? AFAIK, refresh 
table is used when the table data files changed, and we need to list files 
again and invalidate table cache.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-26 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r80629684
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -87,19 +87,27 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   }
 
   /**
-   * Create an [[AnalyzeTableCommand]] command. This currently only 
implements the NOSCAN
-   * option (other options are passed on to Hive) e.g.:
+   * Create an [[AnalyzeTableCommand]] command or an 
[[AnalyzeColumnCommand]] command.
+   * Example SQL for analyzing table :
* {{{
-   *   ANALYZE TABLE table COMPUTE STATISTICS NOSCAN;
+   *   ANALYZE TABLE table COMPUTE STATISTICS [NOSCAN];
+   * }}}
+   * Example SQL for analyzing columns :
+   * {{{
+   *   ANALYZE TABLE table COMPUTE STATISTICS FOR COLUMNS column1, column2;
* }}}
*/
   override def visitAnalyze(ctx: AnalyzeContext): LogicalPlan = 
withOrigin(ctx) {
 if (ctx.partitionSpec == null &&
   ctx.identifier != null &&
   ctx.identifier.getText.toLowerCase == "noscan") {
-  
AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier).toString)
+  AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier))
+} else if (ctx.identifierSeq() == null) {
+  AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier), 
noscan = false)
--- End diff --

analyze table without `noncan`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r80629335
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, ColumnStat, 
LogicalPlan, Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
--- End diff --

`... in the current database ...` users can specify the database in table 
name right? I think we can just say `given table`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r80629224
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -87,19 +87,27 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   }
 
   /**
-   * Create an [[AnalyzeTableCommand]] command. This currently only 
implements the NOSCAN
-   * option (other options are passed on to Hive) e.g.:
+   * Create an [[AnalyzeTableCommand]] command or an 
[[AnalyzeColumnCommand]] command.
+   * Example SQL for analyzing table :
* {{{
-   *   ANALYZE TABLE table COMPUTE STATISTICS NOSCAN;
+   *   ANALYZE TABLE table COMPUTE STATISTICS [NOSCAN];
+   * }}}
+   * Example SQL for analyzing columns :
+   * {{{
+   *   ANALYZE TABLE table COMPUTE STATISTICS FOR COLUMNS column1, column2;
* }}}
*/
   override def visitAnalyze(ctx: AnalyzeContext): LogicalPlan = 
withOrigin(ctx) {
 if (ctx.partitionSpec == null &&
   ctx.identifier != null &&
   ctx.identifier.getText.toLowerCase == "noscan") {
-  
AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier).toString)
+  AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier))
+} else if (ctx.identifierSeq() == null) {
+  AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier), 
noscan = false)
--- End diff --

when will we hit this branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r80628790
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ---
@@ -31,6 +31,7 @@
 import com.esotericsoftware.kryo.io.Input;
 import com.esotericsoftware.kryo.io.Output;
 
+import org.apache.spark.sql.catalyst.plans.logical.Except;
--- End diff --

unnecessary change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

2016-09-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r80628570
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -19,37 +19,102 @@ package org.apache.spark.sql.execution.datasources.jdbc
 
 import java.util.Properties
 
-import org.apache.spark.sql.SQLContext
-import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, 
RelationProvider}
+import scala.collection.JavaConverters.mapAsJavaMapConverter
 
-class JdbcRelationProvider extends RelationProvider with 
DataSourceRegister {
+import org.apache.spark.sql.{AnalysisException, DataFrame, SaveMode, 
SQLContext}
+import org.apache.spark.sql.sources.{BaseRelation, 
CreatableRelationProvider, DataSourceRegister, RelationProvider}
+
+class JdbcRelationProvider extends CreatableRelationProvider
+  with RelationProvider with DataSourceRegister {
 
   override def shortName(): String = "jdbc"
 
-  /** Returns a new base relation with the given parameters. */
   override def createRelation(
   sqlContext: SQLContext,
   parameters: Map[String, String]): BaseRelation = {
 val jdbcOptions = new JDBCOptions(parameters)
-if (jdbcOptions.partitionColumn != null
-  && (jdbcOptions.lowerBound == null
-|| jdbcOptions.upperBound == null
-|| jdbcOptions.numPartitions == null)) {
-  sys.error("Partitioning incompletely specified")
-}
+val partitionColumn = jdbcOptions.partitionColumn
+val lowerBound = jdbcOptions.lowerBound
+val upperBound = jdbcOptions.upperBound
+val numPartitions = jdbcOptions.numPartitions
 
-val partitionInfo = if (jdbcOptions.partitionColumn == null) {
+val partitionInfo = if (partitionColumn == null) {
   null
 } else {
   JDBCPartitioningInfo(
-jdbcOptions.partitionColumn,
-jdbcOptions.lowerBound.toLong,
-jdbcOptions.upperBound.toLong,
-jdbcOptions.numPartitions.toInt)
+partitionColumn, lowerBound.toLong, upperBound.toLong, 
numPartitions.toInt)
 }
 val parts = JDBCRelation.columnPartition(partitionInfo)
 val properties = new Properties() // Additional properties that we 
will pass to getConnection
 parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
 JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, 
properties)(sqlContext.sparkSession)
   }
+
+  /*
+   * The following structure applies to this code:
--- End diff --

If the table does not exist and the mode is `OVERWRITE`, we create a table, 
then insert rows into the table, and finally return a BaseRelation. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

2016-09-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r80628287
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -19,37 +19,102 @@ package org.apache.spark.sql.execution.datasources.jdbc
 
 import java.util.Properties
 
-import org.apache.spark.sql.SQLContext
-import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, 
RelationProvider}
+import scala.collection.JavaConverters.mapAsJavaMapConverter
 
-class JdbcRelationProvider extends RelationProvider with 
DataSourceRegister {
+import org.apache.spark.sql.{AnalysisException, DataFrame, SaveMode, 
SQLContext}
+import org.apache.spark.sql.sources.{BaseRelation, 
CreatableRelationProvider, DataSourceRegister, RelationProvider}
+
+class JdbcRelationProvider extends CreatableRelationProvider
+  with RelationProvider with DataSourceRegister {
 
   override def shortName(): String = "jdbc"
 
-  /** Returns a new base relation with the given parameters. */
   override def createRelation(
   sqlContext: SQLContext,
   parameters: Map[String, String]): BaseRelation = {
 val jdbcOptions = new JDBCOptions(parameters)
-if (jdbcOptions.partitionColumn != null
-  && (jdbcOptions.lowerBound == null
-|| jdbcOptions.upperBound == null
-|| jdbcOptions.numPartitions == null)) {
-  sys.error("Partitioning incompletely specified")
-}
+val partitionColumn = jdbcOptions.partitionColumn
+val lowerBound = jdbcOptions.lowerBound
+val upperBound = jdbcOptions.upperBound
+val numPartitions = jdbcOptions.numPartitions
 
-val partitionInfo = if (jdbcOptions.partitionColumn == null) {
+val partitionInfo = if (partitionColumn == null) {
   null
 } else {
   JDBCPartitioningInfo(
-jdbcOptions.partitionColumn,
-jdbcOptions.lowerBound.toLong,
-jdbcOptions.upperBound.toLong,
-jdbcOptions.numPartitions.toInt)
+partitionColumn, lowerBound.toLong, upperBound.toLong, 
numPartitions.toInt)
 }
 val parts = JDBCRelation.columnPartition(partitionInfo)
 val properties = new Properties() // Additional properties that we 
will pass to getConnection
 parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
 JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, 
properties)(sqlContext.sparkSession)
   }
+
+  /*
+   * The following structure applies to this code:
--- End diff --

Now, at least, three of reviewers are confused of this bit. Do you mind if 
I submit a PR to clean up this part?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15242: [MINOR][PySpark][DOCS] Fix examples in PySpark documenta...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15242
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65949/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15242: [MINOR][PySpark][DOCS] Fix examples in PySpark documenta...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15242
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15242: [MINOR][PySpark][DOCS] Fix examples in PySpark documenta...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15242
  
**[Test build #65949 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65949/consoleFull)**
 for PR 15242 at commit 
[`85cda01`](https://github.com/apache/spark/commit/85cda012ad47d3d514b42e325e4503251778ff73).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for Verifyin...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15255
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for Verifyin...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15255
  
**[Test build #65946 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65946/consoleFull)**
 for PR 15255 at commit 
[`741d59c`](https://github.com/apache/spark/commit/741d59c2565f70404409cc4a8afcf002148c3d74).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for Verifyin...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15255
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65946/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work ...

2016-09-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12601#discussion_r80627940
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -19,37 +19,102 @@ package org.apache.spark.sql.execution.datasources.jdbc
 
 import java.util.Properties
 
-import org.apache.spark.sql.SQLContext
-import org.apache.spark.sql.sources.{BaseRelation, DataSourceRegister, 
RelationProvider}
+import scala.collection.JavaConverters.mapAsJavaMapConverter
 
-class JdbcRelationProvider extends RelationProvider with 
DataSourceRegister {
+import org.apache.spark.sql.{AnalysisException, DataFrame, SaveMode, 
SQLContext}
+import org.apache.spark.sql.sources.{BaseRelation, 
CreatableRelationProvider, DataSourceRegister, RelationProvider}
+
+class JdbcRelationProvider extends CreatableRelationProvider
+  with RelationProvider with DataSourceRegister {
 
   override def shortName(): String = "jdbc"
 
-  /** Returns a new base relation with the given parameters. */
   override def createRelation(
   sqlContext: SQLContext,
   parameters: Map[String, String]): BaseRelation = {
 val jdbcOptions = new JDBCOptions(parameters)
-if (jdbcOptions.partitionColumn != null
-  && (jdbcOptions.lowerBound == null
-|| jdbcOptions.upperBound == null
-|| jdbcOptions.numPartitions == null)) {
-  sys.error("Partitioning incompletely specified")
-}
+val partitionColumn = jdbcOptions.partitionColumn
+val lowerBound = jdbcOptions.lowerBound
+val upperBound = jdbcOptions.upperBound
+val numPartitions = jdbcOptions.numPartitions
 
-val partitionInfo = if (jdbcOptions.partitionColumn == null) {
+val partitionInfo = if (partitionColumn == null) {
   null
 } else {
   JDBCPartitioningInfo(
-jdbcOptions.partitionColumn,
-jdbcOptions.lowerBound.toLong,
-jdbcOptions.upperBound.toLong,
-jdbcOptions.numPartitions.toInt)
+partitionColumn, lowerBound.toLong, upperBound.toLong, 
numPartitions.toInt)
 }
 val parts = JDBCRelation.columnPartition(partitionInfo)
 val properties = new Properties() // Additional properties that we 
will pass to getConnection
 parameters.foreach(kv => properties.setProperty(kv._1, kv._2))
 JDBCRelation(jdbcOptions.url, jdbcOptions.table, parts, 
properties)(sqlContext.sparkSession)
   }
+
+  /*
+   * The following structure applies to this code:
--- End diff --

what does this table mean? what is `CreateTable, saveTable, BaseRelation`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15090
  
**[Test build #65950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65950/consoleFull)**
 for PR 15090 at commit 
[`7cd8f14`](https://github.com/apache/spark/commit/7cd8f144c3d64fc407f42b069bd7a53d70604974).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15243: Fixing comment since Actor is not used anymore.

2016-09-26 Thread danix800
Github user danix800 closed the pull request at:

https://github.com/apache/spark/pull/15243


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-26 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r80626981
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, ColumnStat, 
LogicalPlan, Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
+ * which will be used in query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableIdent: TableIdentifier,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val db = 
tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase)
+val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db))
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdentWithDB))
+
+relation match {
+  case catalogRel: CatalogRelation =>
+updateStats(catalogRel.catalogTable,
+  AnalyzeTableCommand.calculateTotalSize(sessionState, 
catalogRel.catalogTable))
+
+  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
+updateStats(logicalRel.catalogTable.get, 
logicalRel.relation.sizeInBytes)
+
+  case otherRelation =>
+throw new AnalysisException("ANALYZE TABLE is not supported for " +
+  s"${otherRelation.nodeName}.")
+}
+
+def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit 
= {
+  val (rowCount, columnStats) = computeColStats(sparkSession, relation)
+  val statistics = Statistics(
+sizeInBytes = newTotalSize,
+rowCount = Some(rowCount),
+colStats = columnStats ++ 
catalogTable.stats.map(_.colStats).getOrElse(Map()))
+  sessionState.catalog.alterTable(catalogTable.copy(stats = 
Some(statistics)))
+  // Refresh the cached data source table in the catalog.
+  sessionState.catalog.refreshTable(tableIdentWithDB)
+}
+
+Seq.empty[Row]
+  }
+
+  def computeColStats(
+  sparkSession: SparkSession,
+  relation: LogicalPlan): (Long, Map[String, ColumnStat]) = {
+
+// check correctness of column names
+val attributesToAnalyze = mutable.MutableList[Attribute]()
+val caseSensitive = 
sparkSession.sessionState.conf.caseSensitiveAnalysis
+columnNames.foreach { col =>
+  val exprOption = relation.output.find { attr =>
+if (caseSensitive) attr.name == col else 
attr.name.equalsIgnoreCase(col)
+  }
+  val expr = exprOption.getOrElse(throw new 
AnalysisException(s"Invalid column name: $col."))
+  // do deduplication
+  if (!attributesToAnalyze.contains(expr)) {
+attributesToAnalyze += expr
+  }
+}
+
+// Collect statistics per column.
+// The first element in the result will be the overall row count, the 
following elements
+// will be structs containing all column stats.
+// The layout of each struct follows the layout of the ColumnStats.
+val ndvMaxErr = sparkSession.sessionState.conf.ndvMaxError
+val expressions = Count(Literal(

[GitHub] spark issue #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for Verifyin...

2016-09-26 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15255
  
... Hit one more bug in the write path...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...

2016-09-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13680


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-09-26 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13680
  
thanks for your great work! merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15251: Fix two comments since Actor is not used anymore.

2016-09-26 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15251


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15251: Fix two comments since Actor is not used anymore.

2016-09-26 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15251
  
Thanks - merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15242: [MINOR][PySpark][DOCS] Fix examples in PySpark documenta...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15242
  
**[Test build #65949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65949/consoleFull)**
 for PR 15242 at commit 
[`85cda01`](https://github.com/apache/spark/commit/85cda012ad47d3d514b42e325e4503251778ff73).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15251: Fix two comments since Actor is not used anymore.

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15251
  
**[Test build #3290 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3290/consoleFull)**
 for PR 15251 at commit 
[`dbaf93c`](https://github.com/apache/spark/commit/dbaf93c96f130e78f6386479cc275223384294eb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15168: [SPARK-17612][SQL] Support `DESCRIBE table PARTITION` SQ...

2016-09-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15168
  
Hi, @hvanhovell .
Could you review this PR again, please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15252: [SPARK-17677][SQL] Break WindowExec.scala into multiple ...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15252
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65945/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15252: [SPARK-17677][SQL] Break WindowExec.scala into multiple ...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15252
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15252: [SPARK-17677][SQL] Break WindowExec.scala into multiple ...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15252
  
**[Test build #65945 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65945/consoleFull)**
 for PR 15252 at commit 
[`952ef99`](https://github.com/apache/spark/commit/952ef99036ac96eef25156097ab4e07c29428c28).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15242: [MINOR][PySpark][DOCS] Fix examples in PySpark documenta...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15242
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65947/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15242: [MINOR][PySpark][DOCS] Fix examples in PySpark documenta...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15242
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15242: [MINOR][PySpark][DOCS] Fix examples in PySpark documenta...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15242
  
**[Test build #65947 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65947/consoleFull)**
 for PR 15242 at commit 
[`88c1069`](https://github.com/apache/spark/commit/88c1069bdda685d6e93628b8165c1fe360a17f92).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14897: [SPARK-17338][SQL] add global temp view

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14897
  
**[Test build #65948 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65948/consoleFull)**
 for PR 14897 at commit 
[`2732531`](https://github.com/apache/spark/commit/273253177c3c46c0ac3932eea62fa343551ead65).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15253: [SPARK-17678][REPL][Branch-1.6] Honor spark.replClassSer...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15253
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15253: [SPARK-17678][REPL][Branch-1.6] Honor spark.replClassSer...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15253
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65944/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15253: [SPARK-17678][REPL][Branch-1.6] Honor spark.replClassSer...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15253
  
**[Test build #65944 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65944/consoleFull)**
 for PR 15253 at commit 
[`fafdf43`](https://github.com/apache/spark/commit/fafdf432e3d35c19b07e86ff7ddac23d8298a66e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15248: [SPARK-17671] Spark 2.0 history server summary pa...

2016-09-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15248#discussion_r80623063
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/ApplicationListResource.scala
 ---
@@ -32,7 +32,14 @@ private[v1] class ApplicationListResource(uiRoot: 
UIRoot) {
   @DefaultValue("3000-01-01") @QueryParam("maxDate") maxDate: 
SimpleDateParam,
   @QueryParam("limit") limit: Integer)
   : Iterator[ApplicationInfo] = {
-val allApps = uiRoot.getApplicationInfoList
+val allApps = {
--- End diff --

Is this something you can do with a one-liner, by adding .view here?  
uiRoot.getApplicationInfoList.view?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15248: [SPARK-17671] Spark 2.0 history server summary pa...

2016-09-26 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15248#discussion_r80623021
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala ---
@@ -178,6 +178,23 @@ class HistoryServer(
 provider.getListing()
   }
 
+  /**
+* Returns a list of available applications, in descending order 
according to their end time.
+*
+* @param limit the number of applications to return
+* @return List of known applications with a limit.
+*/
+  def getApplicationInfoList(limit: Int): Iterator[ApplicationInfo] = {
--- End diff --

If you use Integer.MaxValue to mean unlimited, then just the call to 
limit() works as intended.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15246: [MINOR][SQL] Use resource path for test_script.sh

2016-09-26 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15246
  
Hm, I see why this happens to work, because the file is not packaged inside 
a jar file. Normally that's what getResource is for, and if it were in a jar 
this wouldn't work. It's not a bad idea though isn't it easier to just set the 
working dir for the project in your IDE? IJ can do that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15252: [SPARK-17677][SQL] Break WindowExec.scala into multiple ...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15252
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65943/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15252: [SPARK-17677][SQL] Break WindowExec.scala into multiple ...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15252
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15252: [SPARK-17677][SQL] Break WindowExec.scala into multiple ...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15252
  
**[Test build #65943 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65943/consoleFull)**
 for PR 15252 at commit 
[`1124edf`](https://github.com/apache/spark/commit/1124edf477a48ca66cea4287d2e81acafd9e5646).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for Verifyin...

2016-09-26 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15255
  
... Hit a bug in the write path... Need to fix it at first... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15254: [SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConv...

2016-09-26 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15254
  
cc @JoshRosen and @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for Verifyin...

2016-09-26 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15255
  
Also need to add a test case for data source tables. Will do it later


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15250: [SPARK-17676][CORE] FsHistoryProvider should ignore hidd...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15250
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65942/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15250: [SPARK-17676][CORE] FsHistoryProvider should ignore hidd...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15250
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15250: [SPARK-17676][CORE] FsHistoryProvider should ignore hidd...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15250
  
**[Test build #65942 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65942/consoleFull)**
 for PR 15250 at commit 
[`b125f2f`](https://github.com/apache/spark/commit/b125f2f2761ae81abd2c2c049666fc1b0ca51e0c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15242: [MINOR][PySpark][DOCS] Fix examples in PySpark documenta...

2016-09-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15242
  
Let me update the PR description just in case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15242: [MINOR][PySpark][DOCS] Fix examples in PySpark documenta...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15242
  
**[Test build #65947 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65947/consoleFull)**
 for PR 15242 at commit 
[`88c1069`](https://github.com/apache/spark/commit/88c1069bdda685d6e93628b8165c1fe360a17f92).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15242: [MINOR][PySpark][DOCS] Fix examples in PySpark documenta...

2016-09-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15242
  
@srowen I just took a scan twice and I think they should be all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15254: [SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConv...

2016-09-26 Thread lins05
Github user lins05 commented on the issue:

https://github.com/apache/spark/pull/15254
  
I guess we can also remove another workaround 
[here](https://github.com/apache/spark/blob/v2.0.0/python/pyspark/rdd.py#L2320-L2328)
 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for Verifyin...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15255
  
**[Test build #65946 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65946/consoleFull)**
 for PR 15255 at commit 
[`741d59c`](https://github.com/apache/spark/commit/741d59c2565f70404409cc4a8afcf002148c3d74).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for V...

2016-09-26 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/15255

[SPARK-17680] [SQL] [TEST] Added a Testcase for Verifying Unicode Character 
Support for Column Names and Comments

### What changes were proposed in this pull request?
When the version of the Hive metastore is higher than 0.12, Spark SQL 
supports Unicode characters for column names when specified within 
backticks(`). See the JIRA: https://issues.apache.org/jira/browse/HIVE-6013 
Hive metastore supports Unicode characters for column names since 0.13 

In Spark SQL, table comments, and view comments always allow Unicode 
characters without backticks.

BTW, will submit a separate PR for database and table name validation 
because we do not support Unicode characters in these two cases. 

### How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark unicodeSupport

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15255.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15255


commit 41d3bae9c3730b64a54c17098e203a78620b2032
Author: gatorsmile 
Date:   2016-09-27T04:20:25Z

added

commit 741d59c2565f70404409cc4a8afcf002148c3d74
Author: gatorsmile 
Date:   2016-09-27T05:06:18Z

improved




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15090
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15090
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65941/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15090
  
**[Test build #65941 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65941/consoleFull)**
 for PR 15090 at commit 
[`08df669`](https://github.com/apache/spark/commit/08df66937d8834c8e0b7300beea1973705f852b7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15254: [SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConv...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15254
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15254: [SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConv...

2016-09-26 Thread JasonMWhite
Github user JasonMWhite commented on the issue:

https://github.com/apache/spark/pull/15254
  
@davies you authored https://github.com/apache/spark/pull/5570 and reported 
the issue in Py4J https://github.com/bartdag/py4j/issues/160. I happened across 
this while spelunking through Py4J code in PySpark, it seems like it's no 
longer needed. Do you agree?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15254: [SPARK-17679] [PYSPARK] remove unnecessary Py4J L...

2016-09-26 Thread JasonMWhite
GitHub user JasonMWhite opened a pull request:

https://github.com/apache/spark/pull/15254

[SPARK-17679] [PYSPARK] remove unnecessary Py4J ListConverter patch

## What changes were proposed in this pull request?

This PR removes a patch on ListConverter from 
https://github.com/apache/spark/pull/5570, as it is no longer necessary. The 
underlying issue in Py4J https://github.com/bartdag/py4j/issues/160 was patched 
in 
https://github.com/bartdag/py4j/commit/224b94b6665e56a93a064073886e1d803a4969d2 
and is present in 0.10.3, the version currently in use in Spark.

## How was this patch tested?

The original test added in https://github.com/apache/spark/pull/5570 
remains.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JasonMWhite/spark remove_listconverter_patch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15254.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15254


commit c2b50d427e2c6712d4e648a6da560b95f5e6d669
Author: Jason White 
Date:   2016-09-27T04:14:49Z

remove unnecessary Py4J ListConverter patch




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-09-26 Thread kayousterhout
Github user kayousterhout commented on the issue:

https://github.com/apache/spark/pull/15249
  
This is awesome to separate this out.  I should have time to review this 
tomorrow and then hopefully we can (finally) merge this in the next few days!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...

2016-09-26 Thread markhamstra
Github user markhamstra commented on the issue:

https://github.com/apache/spark/pull/15213
  
@scwf I understand that you were trying to make the least invasive fix 
possible to deal with the problem.  That's usually a good thing to do, but even 
when that kind of fix is getting to the root of the problem it can still result 
in layers of patches that are hard to make sense of.  That's not really the 
fault of any one patch; rather, the blame lies more with those of us who often 
didn't produce clear, maintainable code in the first place.  When it's possible 
to see re-organizing principles that will make the code clearer, reduce 
duplication, make future maintenance less error prone, etc., then it's usually 
a good idea to do a little larger refactoring instead of just a minimally 
invasive fix.

I think this is a small example of where that kind of refactoring makes 
sense, so that's why I made my code suggestion.  If you can see ways to make 
things even clearer, then feel free to suggest them.  I'm sure that Kay, Imran 
and others who also have been trying to make these kinds of clarifying changes 
in the DAGScheduler will also chime in if they have further suggestions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13998: [SPARK-12177][Streaming][Kafka] limit api surface area

2016-09-26 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/13998
  
I ran that test 100 times locally w/out error... you have any suggestions
on repro?

On Mon, Sep 26, 2016 at 6:40 PM, Cody Koeninger  wrote:

> Sure I'll give it another look
>
> On Sep 26, 2016 3:46 PM, "Tathagata Das"  wrote:
>
>> @koeninger  Could you take a look at this
>> test flakiness in
>> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test
>> -sbt-hadoop-2.4/1792/
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> , or 
mute
>> the thread
>> 

>> .
>>
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14079
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65939/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14079: [SPARK-8425][CORE] New Blacklist Mechanism

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14079
  
**[Test build #65939 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65939/consoleFull)**
 for PR 14079 at commit 
[`278fff3`](https://github.com/apache/spark/commit/278fff343eaa1b917f17d7591e39b0543538d253).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15249
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65938/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15249
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15249: [SPARK-17675] [CORE] Expand Blacklist for TaskSets

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15249
  
**[Test build #65938 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65938/consoleFull)**
 for PR 15249 at commit 
[`882b385`](https://github.com/apache/spark/commit/882b385c966112c0345fce7fe92e3a0aa31ed22d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r80617904
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
 ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+
+import scala.collection.JavaConverters._
+
+import org.apache.kafka.clients.consumer.{Consumer, KafkaConsumer}
+import 
org.apache.kafka.clients.consumer.internals.NoOpConsumerRebalanceListener
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.SparkContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.scheduler.ExecutorCacheTaskLocation
+import org.apache.spark.sql._
+import org.apache.spark.sql.execution.streaming._
+import org.apache.spark.sql.kafka010.KafkaSource._
+import org.apache.spark.sql.types._
+
+/**
+ * A [[Source]] that uses Kafka's own [[KafkaConsumer]] API to reads data 
from Kafka. The design
+ * for this source is as follows.
+ *
+ * - The [[KafkaSourceOffset]] is the custom [[Offset]] defined for this 
source that contains
+ *   a map of TopicPartition -> offset. Note that this offset is 1 + 
(available offset). For
+ *   example if the last record in a Kafka topic "t", partition 2 is 
offset 5, then
+ *   KafkaSourceOffset will contain TopicPartition("t", 2) -> 6. This is 
done keep it consistent
+ *   with the semantics of `KafkaConsumer.position()`.
+ *
+ * - The [[ConsumerStrategy]] class defines which Kafka topics and 
partitions should be read
+ *   by this source. These strategies directly correspond to the different 
consumption options
+ *   in . This class is designed to return a configured
+ *   [[KafkaConsumer]] that is used by the [[KafkaSource]] to query for 
the offsets.
+ *   See the docs on 
[[org.apache.spark.sql.kafka010.KafkaSource.ConsumerStrategy]] for
+ *   more details.
+ *
+ * - The [[KafkaSource]] written to do the following.
+ *
+ *  - As soon as the source is created, the pre-configured KafkaConsumer 
returned by the
+ *[[ConsumerStrategy]] is used to query the initial offsets that this 
source should
+ *start reading from. This used to create the first batch.
+ *
+ *   - `getOffset()` uses the KafkaConsumer to query the latest available 
offsets, which are
+ * returned as a [[KafkaSourceOffset]].
+ *
+ *   - `getBatch()` returns a DF that reads from the 'start offset' until 
the 'end offset' in
+ * for each partition. The end offset is excluded to be consistent 
with the semantics of
+ * [[KafkaSourceOffset]] and `KafkaConsumer.position()`.
+ *
+ *   - The DF returned is based on [[KafkaSourceRDD]] which is constructed 
such that the
+ * data from Kafka topic + partition is consistently read by the same 
executors across
+ * batches, and cached KafkaConsumers in the executors can be reused 
efficiently. See the
+ * docs on [[KafkaSourceRDD]] for more details.
+ */
+private[kafka010] case class KafkaSource(
+sqlContext: SQLContext,
+consumerStrategy: ConsumerStrategy,
+executorKafkaParams: ju.Map[String, Object],
+sourceOptions: Map[String, String])
+  extends Source with Logging {
+
+  private val consumer = consumerStrategy.createConsumer()
+  private val sc = sqlContext.sparkContext
+  private val initialPartitionOffsets = fetchPartitionOffsets(seekToLatest 
= false)
+  logInfo(s"Initial offsets: $initialPartitionOffsets")
+
+  override def schema: StructType = KafkaSource.kafkaSchema
+
+  /** Returns the maximum available offset for this source. */
+  override def getOffset: Option[Offset] = {
+val offset = KafkaSourceOffset(fetchPartitionOffsets(seekToLatest = 
true))
+logInfo(s"GetOffset: $offset")
+Some(offset)
+  }
+
+  /** Returns the data that is between th

[GitHub] spark pull request #12618: [SPARK-14857] [SQL] Table/Database Name Validatio...

2016-09-26 Thread gatorsmile
Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/12618


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12618: [SPARK-14857] [SQL] Table/Database Name Validation in Se...

2016-09-26 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/12618
  
The changes in this PR becomes completely out-of-dated. Will resubmit a new 
PR for it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-26 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15102
  
Ok, finished a line-by-line compare + comment.

The biggest thing I'm having trouble reconciling is the stated emphasis on 
limiting user options in order to give guarantees, yet throwing those 
guarantees away with only warn level logging (e.g. the handling of offset out 
of range exception).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r80617145
  
--- Diff: external/kafka-0-10-sql/pom.xml ---
@@ -0,0 +1,82 @@
+
+
+
+http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
+  4.0.0
+  
+org.apache.spark
+spark-parent_2.11
+2.1.0-SNAPSHOT
+../../pom.xml
+  
+
+  org.apache.spark
+  spark-sql-kafka-0-10_2.11
+  
+sql-kafka-0-10
+  
+  jar
+  Spark Integration for Kafka 0.10
--- End diff --

This has the same name as the DStream, should probably be changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15221: [SPARK-17648][CORE] TaskScheduler really needs offers to...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15221
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15221: [SPARK-17648][CORE] TaskScheduler really needs offers to...

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15221
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65940/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15221: [SPARK-17648][CORE] TaskScheduler really needs offers to...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15221
  
**[Test build #65940 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65940/consoleFull)**
 for PR 15221 at commit 
[`f2f1252`](https://github.com/apache/spark/commit/f2f125253796f967c578a075b2594caa943eb952).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r80617036
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala
 ---
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.kafka.clients.consumer.{ConsumerRecord, 
OffsetOutOfRangeException}
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.{Partition, SparkContext, TaskContext}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+
+/** Offset range that one partition of the KafkaSourceRDD has to read */
+private[kafka010] case class KafkaSourceRDDOffsetRange(
+topicPartition: TopicPartition,
+fromOffset: Long,
+untilOffset: Long,
+preferredLoc: Option[String]) {
+  def topic: String = topicPartition.topic
+  def partition: Int = topicPartition.partition
+  def size: Long = untilOffset - fromOffset
+}
+
+
+/** Partition of the KafkaSourceRDD */
+private[kafka010] case class KafkaSourceRDDPartition(
+  index: Int, offsetRange: KafkaSourceRDDOffsetRange) extends Partition
+
+
+/**
+ * An RDD that reads data from Kafka based on offset ranges across 
multiple partitions.
+ * Additionally, it allows preferred locations to be set for each topic + 
partition, so that
+ * the [[KafkaSource]] can ensure the same executor always reads the same 
topic + partition
+ * and cached KafkaConsuemrs (see [[CachedKafkaConsumer]] can be used read 
data efficiently.
+ *
+ * Note that this is a simplified version of the 
org.apache.spark.streaming.kafka010.KafkaRDD.
+ *
+ * @param executorKafkaParams Kafka configuration for creating 
KafkaConsumer on the executors
+ * @param offsetRanges Offset ranges that define the Kafka data belonging 
to this RDD
+ */
+private[kafka010] class KafkaSourceRDD(
+sc: SparkContext,
+executorKafkaParams: ju.Map[String, Object],
+offsetRanges: Seq[KafkaSourceRDDOffsetRange])
+  extends RDD[ConsumerRecord[Array[Byte], Array[Byte]]](sc, Nil) {
+
+  override def persist(newLevel: StorageLevel): this.type = {
+logError("Kafka ConsumerRecord is not serializable. " +
+  "Use .map to extract fields before calling .persist or .window")
+super.persist(newLevel)
+  }
+
+  override def getPartitions: Array[Partition] = {
+offsetRanges.zipWithIndex.map { case (o, i) => new 
KafkaSourceRDDPartition(i, o) }.toArray
+  }
+
+  override def count(): Long = offsetRanges.map(_.size).sum
+
+  override def isEmpty(): Boolean = count == 0L
+
+  override def take(num: Int): Array[ConsumerRecord[Array[Byte], 
Array[Byte]]] = {
+val nonEmptyPartitions =
+  
this.partitions.map(_.asInstanceOf[KafkaSourceRDDPartition]).filter(_.offsetRange.size
 > 0)
+
+if (num < 1 || nonEmptyPartitions.isEmpty) {
+  return new Array[ConsumerRecord[Array[Byte], Array[Byte]]](0)
+}
+
+// Determine in advance how many messages need to be taken from each 
partition
+val parts = nonEmptyPartitions.foldLeft(Map[Int, Int]()) { (result, 
part) =>
+  val remain = num - result.values.sum
+  if (remain > 0) {
+val taken = Math.min(remain, part.offsetRange.size)
+result + (part.index -> taken.toInt)
+  } else {
+result
+  }
+}
+
+val buf = new ArrayBuffer[ConsumerRecord[Array[Byte], Array[Byte]]]
+val res = context.runJob(
+  this,
+  (tc: TaskContext, it: Iterator[ConsumerRecord[Array[Byte], 
Array[Byte]]]) =>
+  it.take(parts(tc.partitionId)).toArray, parts.keys.toArray
+)
+res.foreach(buf ++= _)
+buf.toArray
+  }
+
+  override def compute(
+  

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r80616878
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala
 ---
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.kafka.clients.consumer.{ConsumerRecord, 
OffsetOutOfRangeException}
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.{Partition, SparkContext, TaskContext}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+
+/** Offset range that one partition of the KafkaSourceRDD has to read */
+private[kafka010] case class KafkaSourceRDDOffsetRange(
+topicPartition: TopicPartition,
+fromOffset: Long,
+untilOffset: Long,
+preferredLoc: Option[String]) {
+  def topic: String = topicPartition.topic
+  def partition: Int = topicPartition.partition
+  def size: Long = untilOffset - fromOffset
+}
+
+
+/** Partition of the KafkaSourceRDD */
+private[kafka010] case class KafkaSourceRDDPartition(
+  index: Int, offsetRange: KafkaSourceRDDOffsetRange) extends Partition
+
+
+/**
+ * An RDD that reads data from Kafka based on offset ranges across 
multiple partitions.
+ * Additionally, it allows preferred locations to be set for each topic + 
partition, so that
+ * the [[KafkaSource]] can ensure the same executor always reads the same 
topic + partition
+ * and cached KafkaConsuemrs (see [[CachedKafkaConsumer]] can be used read 
data efficiently.
+ *
+ * Note that this is a simplified version of the 
org.apache.spark.streaming.kafka010.KafkaRDD.
+ *
+ * @param executorKafkaParams Kafka configuration for creating 
KafkaConsumer on the executors
+ * @param offsetRanges Offset ranges that define the Kafka data belonging 
to this RDD
+ */
+private[kafka010] class KafkaSourceRDD(
+sc: SparkContext,
+executorKafkaParams: ju.Map[String, Object],
+offsetRanges: Seq[KafkaSourceRDDOffsetRange])
+  extends RDD[ConsumerRecord[Array[Byte], Array[Byte]]](sc, Nil) {
+
+  override def persist(newLevel: StorageLevel): this.type = {
+logError("Kafka ConsumerRecord is not serializable. " +
+  "Use .map to extract fields before calling .persist or .window")
+super.persist(newLevel)
+  }
+
+  override def getPartitions: Array[Partition] = {
+offsetRanges.zipWithIndex.map { case (o, i) => new 
KafkaSourceRDDPartition(i, o) }.toArray
+  }
+
+  override def count(): Long = offsetRanges.map(_.size).sum
+
+  override def isEmpty(): Boolean = count == 0L
+
+  override def take(num: Int): Array[ConsumerRecord[Array[Byte], 
Array[Byte]]] = {
+val nonEmptyPartitions =
+  
this.partitions.map(_.asInstanceOf[KafkaSourceRDDPartition]).filter(_.offsetRange.size
 > 0)
+
+if (num < 1 || nonEmptyPartitions.isEmpty) {
+  return new Array[ConsumerRecord[Array[Byte], Array[Byte]]](0)
+}
+
+// Determine in advance how many messages need to be taken from each 
partition
+val parts = nonEmptyPartitions.foldLeft(Map[Int, Int]()) { (result, 
part) =>
+  val remain = num - result.values.sum
+  if (remain > 0) {
+val taken = Math.min(remain, part.offsetRange.size)
+result + (part.index -> taken.toInt)
+  } else {
+result
+  }
+}
+
+val buf = new ArrayBuffer[ConsumerRecord[Array[Byte], Array[Byte]]]
+val res = context.runJob(
+  this,
+  (tc: TaskContext, it: Iterator[ConsumerRecord[Array[Byte], 
Array[Byte]]]) =>
+  it.take(parts(tc.partitionId)).toArray, parts.keys.toArray
+)
+res.foreach(buf ++= _)
+buf.toArray
+  }
+
+  override def compute(
+  

[GitHub] spark issue #15252: [SPARK-17677][SQL] Break WindowExec.scala into multiple ...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15252
  
**[Test build #65945 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65945/consoleFull)**
 for PR 15252 at commit 
[`952ef99`](https://github.com/apache/spark/commit/952ef99036ac96eef25156097ab4e07c29428c28).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15213: [SPARK-17644] [CORE] Do not add failedStages when abortS...

2016-09-26 Thread scwf
Github user scwf commented on the issue:

https://github.com/apache/spark/pull/15213
  
@markhamstra in my fix i just want to make the minor changes for the 
dagscheduer, and your fix is also ok to me, i can update this according your 
comment. Thanks:)
/cc @zsxwing may also have comments on this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15253: [SPARK-17678][REPL][Branch-1.6] Honor spark.replClassSer...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15253
  
**[Test build #65944 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65944/consoleFull)**
 for PR 15253 at commit 
[`fafdf43`](https://github.com/apache/spark/commit/fafdf432e3d35c19b07e86ff7ddac23d8298a66e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15253: [SPARK-17678][REPL][Branch-1.6] Honor spark.replC...

2016-09-26 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/15253

[SPARK-17678][REPL][Branch-1.6] Honor spark.replClassServer.port in 
scala-2.11 repl

## What changes were proposed in this pull request?

Spark 1.6 Scala-2.11 repl doesn't honor "spark.replClassServer.port" 
configuration, so user cannot set a fixed port number through 
"spark.replClassServer.port".

## How was this patch tested?

N/A




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-17678

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15253.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15253


commit fafdf432e3d35c19b07e86ff7ddac23d8298a66e
Author: jerryshao 
Date:   2016-09-27T03:44:45Z

Honor spark.replClassServer.port in scala-2.11 repl




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r80616386
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala
 ---
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.kafka.clients.consumer.{ConsumerRecord, 
OffsetOutOfRangeException}
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.{Partition, SparkContext, TaskContext}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+
+/** Offset range that one partition of the KafkaSourceRDD has to read */
+private[kafka010] case class KafkaSourceRDDOffsetRange(
+topicPartition: TopicPartition,
+fromOffset: Long,
+untilOffset: Long,
+preferredLoc: Option[String]) {
+  def topic: String = topicPartition.topic
+  def partition: Int = topicPartition.partition
+  def size: Long = untilOffset - fromOffset
+}
+
+
+/** Partition of the KafkaSourceRDD */
+private[kafka010] case class KafkaSourceRDDPartition(
+  index: Int, offsetRange: KafkaSourceRDDOffsetRange) extends Partition
+
+
+/**
+ * An RDD that reads data from Kafka based on offset ranges across 
multiple partitions.
+ * Additionally, it allows preferred locations to be set for each topic + 
partition, so that
+ * the [[KafkaSource]] can ensure the same executor always reads the same 
topic + partition
+ * and cached KafkaConsuemrs (see [[CachedKafkaConsumer]] can be used read 
data efficiently.
+ *
+ * Note that this is a simplified version of the 
org.apache.spark.streaming.kafka010.KafkaRDD.
+ *
+ * @param executorKafkaParams Kafka configuration for creating 
KafkaConsumer on the executors
+ * @param offsetRanges Offset ranges that define the Kafka data belonging 
to this RDD
+ */
+private[kafka010] class KafkaSourceRDD(
+sc: SparkContext,
+executorKafkaParams: ju.Map[String, Object],
+offsetRanges: Seq[KafkaSourceRDDOffsetRange])
+  extends RDD[ConsumerRecord[Array[Byte], Array[Byte]]](sc, Nil) {
+
+  override def persist(newLevel: StorageLevel): this.type = {
+logError("Kafka ConsumerRecord is not serializable. " +
+  "Use .map to extract fields before calling .persist or .window")
+super.persist(newLevel)
+  }
+
+  override def getPartitions: Array[Partition] = {
+offsetRanges.zipWithIndex.map { case (o, i) => new 
KafkaSourceRDDPartition(i, o) }.toArray
+  }
+
+  override def count(): Long = offsetRanges.map(_.size).sum
+
--- End diff --

Why missing countApprox?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r80616098
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceRDD.scala
 ---
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.kafka.clients.consumer.{ConsumerRecord, 
OffsetOutOfRangeException}
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.{Partition, SparkContext, TaskContext}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+
+/** Offset range that one partition of the KafkaSourceRDD has to read */
+private[kafka010] case class KafkaSourceRDDOffsetRange(
+topicPartition: TopicPartition,
+fromOffset: Long,
+untilOffset: Long,
+preferredLoc: Option[String]) {
+  def topic: String = topicPartition.topic
+  def partition: Int = topicPartition.partition
+  def size: Long = untilOffset - fromOffset
+}
+
+
+/** Partition of the KafkaSourceRDD */
+private[kafka010] case class KafkaSourceRDDPartition(
+  index: Int, offsetRange: KafkaSourceRDDOffsetRange) extends Partition
+
+
+/**
+ * An RDD that reads data from Kafka based on offset ranges across 
multiple partitions.
+ * Additionally, it allows preferred locations to be set for each topic + 
partition, so that
+ * the [[KafkaSource]] can ensure the same executor always reads the same 
topic + partition
+ * and cached KafkaConsuemrs (see [[CachedKafkaConsumer]] can be used read 
data efficiently.
+ *
+ * Note that this is a simplified version of the 
org.apache.spark.streaming.kafka010.KafkaRDD.
--- End diff --

Why is this the only place that has this note, when much of the code in 
other classes (e.g. cached consumer) is taken from that as well?  Not saying 
it's necessary to have it everywhere, just seems inconsistent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...

2016-09-26 Thread cjjnjust
Github user cjjnjust commented on a diff in the pull request:

https://github.com/apache/spark/pull/15172#discussion_r80615974
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/sasl/aes/SparkAesCipher.java
 ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.sasl.aes;
+
+import java.io.IOException;
+import java.security.InvalidAlgorithmParameterException;
+import java.security.InvalidKeyException;
+import java.security.NoSuchAlgorithmException;
+import java.util.Arrays;
+import java.util.Properties;
+import javax.crypto.Cipher;
+import javax.crypto.Mac;
+import javax.crypto.SecretKey;
+import javax.crypto.ShortBufferException;
+import javax.crypto.spec.SecretKeySpec;
+import javax.crypto.spec.IvParameterSpec;
+import javax.security.sasl.SaslException;
+
+import org.apache.commons.crypto.cipher.CryptoCipher;
+import org.apache.commons.crypto.utils.Utils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * AES cipher for encryption and decryption.
+ */
+public class SparkAesCipher {
--- End diff --

Seq number and HMAC are necessary parts of SASL encryption,  Seq number can 
make same data with different offsets different, and HMAC is used for integrity 
check, see com.sun.security.sasl.digest.DigestMD5Base class from JDK as example.
 
JDK's SASL framework doesn't provide interface to transfer AES cipher, it 
creates cipher interanlly. For example DigestMD5 client create cipher 
internally (DigestMD5Base.java:1219). However, it provides SaslEncryptBackend 
interface for user to customize the SASL client/server, that's why the patch 
implements wrap/unwrap. 

Another way to make patch clean and clear is to add AES cipher support in 
JDK's Digest-MD5 mechanism or change JDK SASL framework to provide a interface 
support registering customized cipher, but that would be slow and depends on 
JDK release.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15251: Fix two comments since Actor is not used anymore.

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15251
  
**[Test build #3290 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3290/consoleFull)**
 for PR 15251 at commit 
[`dbaf93c`](https://github.com/apache/spark/commit/dbaf93c96f130e78f6386479cc275223384294eb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15251: Fix two comments since Actor is not used anymore.

2016-09-26 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15251
  
Jenkins, test this please.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r80615842
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
 ---
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+
+import org.apache.kafka.clients.consumer.ConsumerConfig
+import org.apache.kafka.common.serialization.ByteArrayDeserializer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.execution.streaming.Source
+import org.apache.spark.sql.kafka010.KafkaSource._
+import org.apache.spark.sql.sources.{DataSourceRegister, 
StreamSourceProvider}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * The provider class for the [[KafkaSource]]. This provider is designed 
such that it throws
+ * IllegalArgumentException when the Kafka Dataset is created, so that it 
can catch
+ * missing options even before the query is started.
+ */
+private[kafka010] class KafkaSourceProvider extends StreamSourceProvider
+with DataSourceRegister with Logging {
+  import KafkaSourceProvider._
+
+  /**
+   * Returns the name and schema of the source. In addition, it also 
verifies whether the options
+   * are correct and sufficient to create the [[KafkaSource]] when the 
query is started.
+   */
+  override def sourceSchema(
+  sqlContext: SQLContext,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): (String, StructType) = {
+validateOptions(parameters)
+("kafka", KafkaSource.kafkaSchema)
+  }
+
+  override def createSource(
+  sqlContext: SQLContext,
+  metadataPath: String,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): Source = {
+  validateOptions(parameters)
+val caseInsensitiveParams = parameters.map { case (k, v) => 
(k.toLowerCase, v) }
+val specifiedKafkaParams =
+  parameters
+.keySet
+.filter(_.toLowerCase.startsWith("kafka."))
+.map { k => k.drop(6).toString -> parameters(k) }
+.toMap
+
+val deserClassName = classOf[ByteArrayDeserializer].getName
+val uniqueGroupId = 
s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"
+
+val autoOffsetResetValue = 
caseInsensitiveParams.get(STARTING_OFFSET_OPTION_KEY) match {
+  case Some(value) => value.trim()  // same values as those supported 
by auto.offset.reset
+  case None => "latest"
+}
+
+val kafkaParamsForStrategy =
+  ConfigUpdater("source", specifiedKafkaParams)
+.set(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, deserClassName)
+.set(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, 
deserClassName)
+
+// So that consumers in Kafka source do not mess with any existing 
group id
+.set(ConsumerConfig.GROUP_ID_CONFIG, s"$uniqueGroupId-driver")
+
+// So that consumers can start from earliest or latest
+.set(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetResetValue)
+
+// So that consumers in the driver does not commit offsets 
unnecessaribly
+.set(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false")
+
+// So that the driver does not pull too much data
+.set(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, new 
java.lang.Integer(1))
+
+// If buffer config is not set, set it to reasonable value to work 
around
+// buffer issues (see KAFKA-3135)
+.setIfUnset(ConsumerConfig.RECEIVE_BUFFER_CONFIG, 65536: 
java.lang.Integer)
+.build()
+
+val kafkaParamsForExecutors =
+  ConfigUpdate

[GitHub] spark issue #15252: [SPARK-17677][SQL] Break WindowExec.scala into multiple ...

2016-09-26 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15252
  
cc @hvanhovell for review. This is mostly simple copy/paste. I did look 
over the warnings in IntelliJ and spacing issues.

Also @yhuai and @hvanhovell - I noticed there was virtually 0 unit test 
coverage for this complicated feature. There are some end-to-end tests, but we 
should have built unit tests to cover the fundamental classes (e.g. 
BoundOrdering, RowBuffer, WindowFunctionFrame). We need to be more careful in 
creating new code & reviewing code.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15252: [SPARK-17677][SQL] Break WindowExec.scala into multiple ...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15252
  
**[Test build #65943 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65943/consoleFull)**
 for PR 15252 at commit 
[`1124edf`](https://github.com/apache/spark/commit/1124edf477a48ca66cea4287d2e81acafd9e5646).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15252: [SPARK-17677][SQL] Break WindowExec.scala into mu...

2016-09-26 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/15252

[SPARK-17677][SQL] Break WindowExec.scala into multiple files

## What changes were proposed in this pull request?
As of Spark 2.0, all the window function execution code are in 
WindowExec.scala. This file is pretty large (over 1k loc) and has a lot of 
different abstractions in them. This patch creates a new package 
sql.execution.window, moves WindowExec.scala in it, and breaks WindowExec.scala 
into multiple, more maintainable pieces:

- AggregateProcessor.scala
- BoundOrdering.scala
- RowBuffer.scala
- WindowExec
- WindowFunctionFrame.scala

## How was this patch tested?
This patch mostly moves code around, and should not change any existing 
test coverage.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-17677

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15252.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15252


commit 1124edf477a48ca66cea4287d2e81acafd9e5646
Author: Reynold Xin 
Date:   2016-09-27T03:39:27Z

[SPARK-17677][SQL] Break WindowExec.scala into multiple files




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r80615307
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
 ---
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+
+import org.apache.kafka.clients.consumer.ConsumerConfig
+import org.apache.kafka.common.serialization.ByteArrayDeserializer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.execution.streaming.Source
+import org.apache.spark.sql.kafka010.KafkaSource._
+import org.apache.spark.sql.sources.{DataSourceRegister, 
StreamSourceProvider}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * The provider class for the [[KafkaSource]]. This provider is designed 
such that it throws
+ * IllegalArgumentException when the Kafka Dataset is created, so that it 
can catch
+ * missing options even before the query is started.
+ */
+private[kafka010] class KafkaSourceProvider extends StreamSourceProvider
+with DataSourceRegister with Logging {
+  import KafkaSourceProvider._
+
+  /**
+   * Returns the name and schema of the source. In addition, it also 
verifies whether the options
+   * are correct and sufficient to create the [[KafkaSource]] when the 
query is started.
+   */
+  override def sourceSchema(
+  sqlContext: SQLContext,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): (String, StructType) = {
+validateOptions(parameters)
+("kafka", KafkaSource.kafkaSchema)
+  }
+
+  override def createSource(
+  sqlContext: SQLContext,
+  metadataPath: String,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): Source = {
+  validateOptions(parameters)
+val caseInsensitiveParams = parameters.map { case (k, v) => 
(k.toLowerCase, v) }
+val specifiedKafkaParams =
+  parameters
+.keySet
+.filter(_.toLowerCase.startsWith("kafka."))
+.map { k => k.drop(6).toString -> parameters(k) }
+.toMap
+
+val deserClassName = classOf[ByteArrayDeserializer].getName
+val uniqueGroupId = 
s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"
+
+val autoOffsetResetValue = 
caseInsensitiveParams.get(STARTING_OFFSET_OPTION_KEY) match {
+  case Some(value) => value.trim()  // same values as those supported 
by auto.offset.reset
+  case None => "latest"
+}
+
+val kafkaParamsForStrategy =
+  ConfigUpdater("source", specifiedKafkaParams)
+.set(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, deserClassName)
+.set(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, 
deserClassName)
+
+// So that consumers in Kafka source do not mess with any existing 
group id
+.set(ConsumerConfig.GROUP_ID_CONFIG, s"$uniqueGroupId-driver")
+
+// So that consumers can start from earliest or latest
+.set(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetResetValue)
+
+// So that consumers in the driver does not commit offsets 
unnecessaribly
+.set(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false")
+
+// So that the driver does not pull too much data
+.set(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, new 
java.lang.Integer(1))
+
+// If buffer config is not set, set it to reasonable value to work 
around
+// buffer issues (see KAFKA-3135)
+.setIfUnset(ConsumerConfig.RECEIVE_BUFFER_CONFIG, 65536: 
java.lang.Integer)
+.build()
+
+val kafkaParamsForExecutors =
+  ConfigUpdate

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-26 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r80615261
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,179 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, ColumnStat, 
LogicalPlan, Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
+ * which will be used in query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableIdent: TableIdentifier,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val db = 
tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase)
+val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db))
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdentWithDB))
+
+relation match {
+  case catalogRel: CatalogRelation =>
+updateStats(catalogRel.catalogTable,
+  AnalyzeTableCommand.calculateTotalSize(sessionState, 
catalogRel.catalogTable))
+
+  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
+updateStats(logicalRel.catalogTable.get, 
logicalRel.relation.sizeInBytes)
+
+  case otherRelation =>
+throw new AnalysisException("ANALYZE TABLE is not supported for " +
+  s"${otherRelation.nodeName}.")
+}
+
+def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit 
= {
+  val (rowCount, columnStats) = computeColStats(sparkSession, relation)
+  val statistics = Statistics(
+sizeInBytes = newTotalSize,
+rowCount = Some(rowCount),
+colStats = columnStats ++ 
catalogTable.stats.map(_.colStats).getOrElse(Map()))
+  sessionState.catalog.alterTable(catalogTable.copy(stats = 
Some(statistics)))
+  // Refresh the cached data source table in the catalog.
+  sessionState.catalog.refreshTable(tableIdentWithDB)
+}
+
+Seq.empty[Row]
+  }
+
+  def computeColStats(
+  sparkSession: SparkSession,
+  relation: LogicalPlan): (Long, Map[String, ColumnStat]) = {
+
+// check correctness of column names
+val attributesToAnalyze = mutable.MutableList[Attribute]()
+val caseSensitive = 
sparkSession.sessionState.conf.caseSensitiveAnalysis
+columnNames.foreach { col =>
+  val exprOption = relation.output.find { attr =>
+if (caseSensitive) attr.name == col else 
attr.name.equalsIgnoreCase(col)
+  }
+  val expr = exprOption.getOrElse(throw new 
AnalysisException(s"Invalid column name: $col."))
+  // do deduplication
+  if (!attributesToAnalyze.contains(expr)) {
+attributesToAnalyze += expr
+  }
+}
+
+// Collect statistics per column.
+// The first element in the result will be the overall row count, the 
following elements
+// will be structs containing all column stats.
+// The layout of each struct follows the layout of the ColumnStats.
+val ndvMaxErr = sparkSession.sessionState.conf.ndvMaxError
+val expressions = Count(Lit

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r80614899
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
 ---
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+
+import org.apache.kafka.clients.consumer.ConsumerConfig
+import org.apache.kafka.common.serialization.ByteArrayDeserializer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.execution.streaming.Source
+import org.apache.spark.sql.kafka010.KafkaSource._
+import org.apache.spark.sql.sources.{DataSourceRegister, 
StreamSourceProvider}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * The provider class for the [[KafkaSource]]. This provider is designed 
such that it throws
+ * IllegalArgumentException when the Kafka Dataset is created, so that it 
can catch
+ * missing options even before the query is started.
+ */
+private[kafka010] class KafkaSourceProvider extends StreamSourceProvider
+with DataSourceRegister with Logging {
+  import KafkaSourceProvider._
+
+  /**
+   * Returns the name and schema of the source. In addition, it also 
verifies whether the options
+   * are correct and sufficient to create the [[KafkaSource]] when the 
query is started.
+   */
+  override def sourceSchema(
+  sqlContext: SQLContext,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): (String, StructType) = {
+validateOptions(parameters)
+("kafka", KafkaSource.kafkaSchema)
+  }
+
+  override def createSource(
+  sqlContext: SQLContext,
+  metadataPath: String,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): Source = {
+  validateOptions(parameters)
+val caseInsensitiveParams = parameters.map { case (k, v) => 
(k.toLowerCase, v) }
+val specifiedKafkaParams =
+  parameters
+.keySet
+.filter(_.toLowerCase.startsWith("kafka."))
+.map { k => k.drop(6).toString -> parameters(k) }
+.toMap
+
+val deserClassName = classOf[ByteArrayDeserializer].getName
+val uniqueGroupId = 
s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"
+
+val autoOffsetResetValue = 
caseInsensitiveParams.get(STARTING_OFFSET_OPTION_KEY) match {
+  case Some(value) => value.trim()  // same values as those supported 
by auto.offset.reset
+  case None => "latest"
+}
+
+val kafkaParamsForStrategy =
+  ConfigUpdater("source", specifiedKafkaParams)
+.set(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, deserClassName)
+.set(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, 
deserClassName)
+
+// So that consumers in Kafka source do not mess with any existing 
group id
+.set(ConsumerConfig.GROUP_ID_CONFIG, s"$uniqueGroupId-driver")
+
+// So that consumers can start from earliest or latest
+.set(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetResetValue)
+
+// So that consumers in the driver does not commit offsets 
unnecessaribly
+.set(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false")
+
+// So that the driver does not pull too much data
+.set(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, new 
java.lang.Integer(1))
+
+// If buffer config is not set, set it to reasonable value to work 
around
+// buffer issues (see KAFKA-3135)
+.setIfUnset(ConsumerConfig.RECEIVE_BUFFER_CONFIG, 65536: 
java.lang.Integer)
+.build()
+
+val kafkaParamsForExecutors =
+  ConfigUpdate

[GitHub] spark pull request #15102: [SPARK-17346][SQL] Add Kafka source for Structure...

2016-09-26 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/15102#discussion_r80614481
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
 ---
@@ -0,0 +1,263 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.{util => ju}
+import java.util.UUID
+
+import scala.collection.JavaConverters._
+
+import org.apache.kafka.clients.consumer.ConsumerConfig
+import org.apache.kafka.common.serialization.ByteArrayDeserializer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.execution.streaming.Source
+import org.apache.spark.sql.kafka010.KafkaSource._
+import org.apache.spark.sql.sources.{DataSourceRegister, 
StreamSourceProvider}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * The provider class for the [[KafkaSource]]. This provider is designed 
such that it throws
+ * IllegalArgumentException when the Kafka Dataset is created, so that it 
can catch
+ * missing options even before the query is started.
+ */
+private[kafka010] class KafkaSourceProvider extends StreamSourceProvider
+with DataSourceRegister with Logging {
+  import KafkaSourceProvider._
+
+  /**
+   * Returns the name and schema of the source. In addition, it also 
verifies whether the options
+   * are correct and sufficient to create the [[KafkaSource]] when the 
query is started.
+   */
+  override def sourceSchema(
+  sqlContext: SQLContext,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): (String, StructType) = {
+validateOptions(parameters)
+("kafka", KafkaSource.kafkaSchema)
+  }
+
+  override def createSource(
+  sqlContext: SQLContext,
+  metadataPath: String,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): Source = {
+  validateOptions(parameters)
+val caseInsensitiveParams = parameters.map { case (k, v) => 
(k.toLowerCase, v) }
+val specifiedKafkaParams =
+  parameters
+.keySet
+.filter(_.toLowerCase.startsWith("kafka."))
+.map { k => k.drop(6).toString -> parameters(k) }
+.toMap
+
+val deserClassName = classOf[ByteArrayDeserializer].getName
+val uniqueGroupId = 
s"spark-kafka-source-${UUID.randomUUID}-${metadataPath.hashCode}"
--- End diff --

@marmbrus the new consumer shouldn't be using ZK for consumer groups, so 
this shouldn't leak state into ZK.  I quickly verified this with a spark job 
that had auto commit turned off.  It will show up in the list of consumer 
groups though, and a senseless name may make it hard for people that want to do 
monitoring via kafka tools.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15251: Comment fixing

2016-09-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15251
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15243: Fixing comment since Actor is not used anymore.

2016-09-26 Thread danix800
Github user danix800 commented on the issue:

https://github.com/apache/spark/pull/15243
  
Hi, @dongjoon-hyun . Fixed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15243: Fixing comment since Actor is not used anymore.

2016-09-26 Thread danix800
Github user danix800 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15243#discussion_r80614009
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/worker/WorkerWatcher.scala ---
@@ -21,7 +21,7 @@ import org.apache.spark.internal.Logging
 import org.apache.spark.rpc._
 
 /**
- * Actor which connects to a worker process and terminates the JVM if the 
connection is severed.
+ * WorkerWatcher RpcEndpoint which connects to a worker process and 
terminates the JVM if the connection is severed.
--- End diff --

Thanks! Fix again!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15251: Comment fixing

2016-09-26 Thread danix800
GitHub user danix800 opened a pull request:

https://github.com/apache/spark/pull/15251

Comment fixing

## What changes were proposed in this pull request?

Fix two comments since Actor is not used anymore.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/danix800/spark comment-fixing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15251.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15251


commit 3fe33a29064dcc06e8deebfeec2045af4a44dfed
Author: Ding Fei 
Date:   2016-09-26T13:01:33Z

Fixing comment since Actor is not used anymore.

commit dbaf93c96f130e78f6386479cc275223384294eb
Author: Ding Fei 
Date:   2016-09-27T03:10:00Z

Fix two comments since Actor is not used anymore.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15250: [SPARK-17676][CORE] FsHistoryProvider should ignore hidd...

2016-09-26 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15250
  
**[Test build #65942 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65942/consoleFull)**
 for PR 15250 at commit 
[`b125f2f`](https://github.com/apache/spark/commit/b125f2f2761ae81abd2c2c049666fc1b0ca51e0c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >