[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16626
  
We also need to test the support of `InMemoryCatalog`. Please do not add a 
test case yet. I think I really need to finish 
https://github.com/apache/spark/pull/16592 ASAP. It will make everyone simple 
to test both Catalogs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16995: [SPARK-19340][SQL] CSV file will result in an exc...

2017-02-19 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16995#discussion_r101961405
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -404,6 +386,35 @@ case class DataSource(
   }
 
   /**
+* Creates Hadoop relation based on format and globbed file paths
+* @param format format of the data source file
+* @param globPaths Path to the file resolved by Hadoop library
+* @return Hadoop relation object
+*/
+  def createHadoopRelation(format: FileFormat,
+   globPaths: Array[Path]): BaseRelation = {
+val (dataSchema, partitionSchema) = getOrInferFileFormatSchema(format)
--- End diff --

You do twice `getOrInferFileFormatSchema`. One is before calling 
`createHadoopRelation`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r101961354
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -563,35 +574,47 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 //   want to alter the table location to a file path, we will 
fail. This should be fixed
 //   in the future.
 
-val newLocation = tableDefinition.storage.locationUri
-val storageWithPathOption = tableDefinition.storage.copy(
-  properties = tableDefinition.storage.properties ++ 
newLocation.map("path" -> _))
+val newLocation = newTableDefinition.storage.locationUri
+val storageWithPathOption = newTableDefinition.storage.copy(
+  properties = newTableDefinition.storage.properties ++ 
newLocation.map("path" -> _))
 
-val oldLocation = getLocationFromStorageProps(oldTableDef)
+val oldLocation = getLocationFromStorageProps(oldRawTableDef)
 if (oldLocation == newLocation) {
-  storageWithPathOption.copy(locationUri = 
oldTableDef.storage.locationUri)
+  storageWithPathOption.copy(locationUri = 
oldRawTableDef.storage.locationUri)
 } else {
   storageWithPathOption
 }
   }
 
-  val partitionProviderProp = if 
(tableDefinition.tracksPartitionsInCatalog) {
+  val partitionProviderProp = if 
(newTableDefinition.tracksPartitionsInCatalog) {
 TABLE_PARTITION_PROVIDER -> TABLE_PARTITION_PROVIDER_CATALOG
   } else {
 TABLE_PARTITION_PROVIDER -> TABLE_PARTITION_PROVIDER_FILESYSTEM
   }
 
-  // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from 
the old table definition,
+  // Sets the `partitionColumnNames` and `bucketSpec` from the old 
table definition,
   // to retain the spark specific format if it is. Also add old data 
source properties to table
   // properties, to retain the data source table format.
-  val oldDataSourceProps = 
oldTableDef.properties.filter(_._1.startsWith(DATASOURCE_PREFIX))
-  val newTableProps = oldDataSourceProps ++ withStatsProps.properties 
+ partitionProviderProp
-  val newDef = withStatsProps.copy(
+  val dataSourceProps = if (schemaChange) {
--- End diff --

Could we move the whole logics when we find the table has a schema change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r101961067
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -563,35 +574,47 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 //   want to alter the table location to a file path, we will 
fail. This should be fixed
 //   in the future.
 
-val newLocation = tableDefinition.storage.locationUri
-val storageWithPathOption = tableDefinition.storage.copy(
-  properties = tableDefinition.storage.properties ++ 
newLocation.map("path" -> _))
+val newLocation = newTableDefinition.storage.locationUri
+val storageWithPathOption = newTableDefinition.storage.copy(
+  properties = newTableDefinition.storage.properties ++ 
newLocation.map("path" -> _))
 
-val oldLocation = getLocationFromStorageProps(oldTableDef)
+val oldLocation = getLocationFromStorageProps(oldRawTableDef)
 if (oldLocation == newLocation) {
-  storageWithPathOption.copy(locationUri = 
oldTableDef.storage.locationUri)
+  storageWithPathOption.copy(locationUri = 
oldRawTableDef.storage.locationUri)
 } else {
   storageWithPathOption
 }
   }
 
-  val partitionProviderProp = if 
(tableDefinition.tracksPartitionsInCatalog) {
+  val partitionProviderProp = if 
(newTableDefinition.tracksPartitionsInCatalog) {
 TABLE_PARTITION_PROVIDER -> TABLE_PARTITION_PROVIDER_CATALOG
   } else {
 TABLE_PARTITION_PROVIDER -> TABLE_PARTITION_PROVIDER_FILESYSTEM
   }
 
-  // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from 
the old table definition,
+  // Sets the `partitionColumnNames` and `bucketSpec` from the old 
table definition,
   // to retain the spark specific format if it is. Also add old data 
source properties to table
   // properties, to retain the data source table format.
-  val oldDataSourceProps = 
oldTableDef.properties.filter(_._1.startsWith(DATASOURCE_PREFIX))
-  val newTableProps = oldDataSourceProps ++ withStatsProps.properties 
+ partitionProviderProp
-  val newDef = withStatsProps.copy(
+  val dataSourceProps = if (schemaChange) {
+val props =
+  
tableMetaToTableProps(newTableDefinition).filter(_._1.startsWith(DATASOURCE_PREFIX))
+if (newTableDefinition.provider.isDefined
+  && newTableDefinition.provider.get.toLowerCase != 
DDLUtils.HIVE_PROVIDER) {
+  // we only need to populate non-hive provider to the tableprops
+  props.put(DATASOURCE_PROVIDER, newTableDefinition.provider.get)
+}
+props
+  } else {
+
oldRawTableDef.properties.filter(_._1.startsWith(DATASOURCE_PREFIX))
+  }
+  val newTableProps =
+dataSourceProps ++ maybeWithStatsPropsTable.properties + 
partitionProviderProp
--- End diff --

Let's create a new helper function for generating the table properties. 
Now, `alterTable` has 100+ lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r101960398
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -504,15 +504,15 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
* Note: As of now, this doesn't support altering table schema, 
partition column names and bucket
* specification. We will ignore them even if users do specify different 
values for these fields.
*/
-  override def alterTable(tableDefinition: CatalogTable): Unit = 
withClient {
-assert(tableDefinition.identifier.database.isDefined)
-val db = tableDefinition.identifier.database.get
-requireTableExists(db, tableDefinition.identifier.table)
-verifyTableProperties(tableDefinition)
+  override def alterTable(newTableDefinition: CatalogTable): Unit = 
withClient {
+assert(newTableDefinition.identifier.database.isDefined)
+val db = newTableDefinition.identifier.database.get
+requireTableExists(db, newTableDefinition.identifier.table)
+verifyTableProperties(newTableDefinition)
 
 // convert table statistics to properties so that we can persist them 
through hive api
-val withStatsProps = if (tableDefinition.stats.isDefined) {
-  val stats = tableDefinition.stats.get
+val maybeWithStatsPropsTable: CatalogTable = if 
(newTableDefinition.stats.isDefined) {
--- End diff --

`: CatalogTable ` is not needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r101960271
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -523,18 +523,29 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   statsProperties += (columnStatKeyPropName(colName, k) -> v)
 }
   }
-  tableDefinition.copy(properties = tableDefinition.properties ++ 
statsProperties)
+  newTableDefinition.copy(properties = newTableDefinition.properties 
++ statsProperties)
 } else {
-  tableDefinition
+  newTableDefinition
 }
 
-if (tableDefinition.tableType == VIEW) {
-  client.alterTable(withStatsProps)
+if (newTableDefinition.tableType == VIEW) {
+  client.alterTable(maybeWithStatsPropsTable)
 } else {
-  val oldTableDef = getRawTable(db, withStatsProps.identifier.table)
--- End diff --

To the other reviewers: `oldTableDef ` actually is storing the raw table 
metadata. In the new changes, it is renamed to `oldRawTableDef `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r101960044
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -563,35 +574,47 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 //   want to alter the table location to a file path, we will 
fail. This should be fixed
 //   in the future.
 
-val newLocation = tableDefinition.storage.locationUri
-val storageWithPathOption = tableDefinition.storage.copy(
-  properties = tableDefinition.storage.properties ++ 
newLocation.map("path" -> _))
+val newLocation = newTableDefinition.storage.locationUri
+val storageWithPathOption = newTableDefinition.storage.copy(
+  properties = newTableDefinition.storage.properties ++ 
newLocation.map("path" -> _))
 
-val oldLocation = getLocationFromStorageProps(oldTableDef)
+val oldLocation = getLocationFromStorageProps(oldRawTableDef)
 if (oldLocation == newLocation) {
-  storageWithPathOption.copy(locationUri = 
oldTableDef.storage.locationUri)
+  storageWithPathOption.copy(locationUri = 
oldRawTableDef.storage.locationUri)
 } else {
   storageWithPathOption
 }
   }
 
-  val partitionProviderProp = if 
(tableDefinition.tracksPartitionsInCatalog) {
+  val partitionProviderProp = if 
(newTableDefinition.tracksPartitionsInCatalog) {
 TABLE_PARTITION_PROVIDER -> TABLE_PARTITION_PROVIDER_CATALOG
   } else {
 TABLE_PARTITION_PROVIDER -> TABLE_PARTITION_PROVIDER_FILESYSTEM
   }
 
-  // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from 
the old table definition,
+  // Sets the `partitionColumnNames` and `bucketSpec` from the old 
table definition,
   // to retain the spark specific format if it is. Also add old data 
source properties to table
   // properties, to retain the data source table format.
-  val oldDataSourceProps = 
oldTableDef.properties.filter(_._1.startsWith(DATASOURCE_PREFIX))
-  val newTableProps = oldDataSourceProps ++ withStatsProps.properties 
+ partitionProviderProp
-  val newDef = withStatsProps.copy(
+  val dataSourceProps = if (schemaChange) {
+val props =
+  
tableMetaToTableProps(newTableDefinition).filter(_._1.startsWith(DATASOURCE_PREFIX))
+if (newTableDefinition.provider.isDefined
+  && newTableDefinition.provider.get.toLowerCase != 
DDLUtils.HIVE_PROVIDER) {
--- End diff --

`&&` should be moved up to the line # 601.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16949
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16949
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73145/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16949
  
**[Test build #73145 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73145/testReport)**
 for PR 16949 at commit 
[`ad570cf`](https://github.com/apache/spark/commit/ad570cff2f04b6d4e31feb1aaabe5483f8ad0cca).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16966: [SPARK-18409][ML]LSH approxNearestNeighbors should use a...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16966
  
**[Test build #73154 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73154/testReport)**
 for PR 16966 at commit 
[`e90f2ec`](https://github.com/apache/spark/commit/e90f2ec7a835d31b1d5b17c21769a3144598be6c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r101959389
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -504,15 +504,15 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
* Note: As of now, this doesn't support altering table schema, 
partition column names and bucket
* specification. We will ignore them even if users do specify different 
values for these fields.
*/
-  override def alterTable(tableDefinition: CatalogTable): Unit = 
withClient {
-assert(tableDefinition.identifier.database.isDefined)
-val db = tableDefinition.identifier.database.get
-requireTableExists(db, tableDefinition.identifier.table)
-verifyTableProperties(tableDefinition)
+  override def alterTable(newTableDefinition: CatalogTable): Unit = 
withClient {
+assert(newTableDefinition.identifier.database.isDefined)
+val db = newTableDefinition.identifier.database.get
+requireTableExists(db, newTableDefinition.identifier.table)
+verifyTableProperties(newTableDefinition)
 
 // convert table statistics to properties so that we can persist them 
through hive api
-val withStatsProps = if (tableDefinition.stats.isDefined) {
-  val stats = tableDefinition.stats.get
+val maybeWithStatsPropsTable: CatalogTable = if 
(newTableDefinition.stats.isDefined) {
--- End diff --

Keep the original name ` withStatsProps`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r101958919
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -174,6 +177,79 @@ case class AlterTableRenameCommand(
 }
 
 /**
+ * A command that add columns to a table
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   ALTER TABLE table_identifier
+ *   ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
+ * }}}
+*/
+case class AlterTableAddColumnsCommand(
+table: TableIdentifier,
+columns: Seq[StructField]) extends RunnableCommand {
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val catalogTable = verifyAlterTableAddColumn(catalog, table)
+
+// If an exception is thrown here we can just assume the table is 
uncached;
+// this can happen with Hive tables when the underlying catalog is 
in-memory.
+val wasCached = 
Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false)
+if (wasCached) {
+  try {
+sparkSession.catalog.uncacheTable(table.unquotedString)
+  } catch {
+case NonFatal(e) => log.warn(e.toString, e)
+  }
+}
+// Invalidate the table last, otherwise uncaching the table would load 
the logical plan
+// back into the hive metastore cache
+catalog.refreshTable(table)
+val partitionFields = 
catalogTable.schema.takeRight(catalogTable.partitionColumnNames.length)
+val dataSchema = catalogTable.schema
+  .take(catalogTable.schema.length - 
catalogTable.partitionColumnNames.length)
+catalog.alterTable(catalogTable.copy(schema =
+  catalogTable.schema.copy(fields = (dataSchema ++ columns ++ 
partitionFields).toArray)))
+
+Seq.empty[Row]
+  }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
--- End diff --

Also need to explain what are supported too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16726: [SPARK-19390][SQL] Replace the unnecessary usages of hiv...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16726
  
**[Test build #73153 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73153/testReport)**
 for PR 16726 at commit 
[`75d8017`](https://github.com/apache/spark/commit/75d801765141dbc6b6acca06eb91a2465f6affaa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16726: [SPARK-19390][SQL] Replace the unnecessary usages...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16726#discussion_r101958369
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -251,11 +251,11 @@ private[hive] class 
HiveMetastoreCatalog(sparkSession: SparkSession) extends Log
 // Write path
 case InsertIntoTable(r: MetastoreRelation, partition, query, 
overwrite, ifNotExists)
 // Inserting into partitioned table is not supported in 
Parquet data source (yet).
-if query.resolved && !r.hiveQlTable.isPartitioned && 
shouldConvertMetastoreParquet(r) =>
+if query.resolved && !r.catalogTable.isPartitioned && 
shouldConvertToParquet(r) =>
--- End diff --

Exceed 101 characters. Thus,,, rename it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15415
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r101958217
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/sources/TableScanSuite.scala ---
@@ -416,4 +416,21 @@ class TableScanSuite extends DataSourceTest with 
SharedSQLContext {
 val comments = 
planned.schema.fields.map(_.getComment().getOrElse("NO_COMMENT")).mkString(",")
 assert(comments === "SN,SA,NO_COMMENT")
   }
+
+  test("ALTER TABLE ADD COLUMNS does not support RelationProvider") {
+withTable("ds_relationProvider") {
+  sql(
+"""
+  |CREATE TABLE ds_relationProvider
+  |USING org.apache.spark.sql.sources.SimpleScanSource
+  |OPTIONS (
+  |  From '1',
+  |  To '10'
+  |)""".stripMargin)
--- End diff --

Syntax issue


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15415
  
**[Test build #73149 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73149/testReport)**
 for PR 15415 at commit 
[`dfdf85d`](https://github.com/apache/spark/commit/dfdf85d4cf26864fdbcf57d2e60153d299741197).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15415
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73149/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r101958137
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
---
@@ -71,8 +71,20 @@ class JDBCSuite extends SparkFunSuite
 conn.prepareStatement("insert into test.people values ('mary', 
2)").executeUpdate()
 conn.prepareStatement(
   "insert into test.people values ('joe ''foo'' \"bar\"', 
3)").executeUpdate()
+
+conn.prepareStatement("create table test.t_alter_add(c1 int, c2 
int)").executeUpdate()
+conn.prepareStatement("insert into test.t_alter_add values (1, 
2)").executeUpdate()
+conn.prepareStatement("insert into test.t_alter_add values (2, 
4)").executeUpdate()
--- End diff --

We do not need to add the extra table for the invalid case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r101958020
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -174,6 +177,79 @@ case class AlterTableRenameCommand(
 }
 
 /**
+ * A command that add columns to a table
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   ALTER TABLE table_identifier
+ *   ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
+ * }}}
+*/
+case class AlterTableAddColumnsCommand(
+table: TableIdentifier,
+columns: Seq[StructField]) extends RunnableCommand {
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val catalogTable = verifyAlterTableAddColumn(catalog, table)
+
+// If an exception is thrown here we can just assume the table is 
uncached;
+// this can happen with Hive tables when the underlying catalog is 
in-memory.
+val wasCached = 
Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false)
+if (wasCached) {
+  try {
+sparkSession.catalog.uncacheTable(table.unquotedString)
+  } catch {
+case NonFatal(e) => log.warn(e.toString, e)
+  }
+}
+// Invalidate the table last, otherwise uncaching the table would load 
the logical plan
+// back into the hive metastore cache
+catalog.refreshTable(table)
+val partitionFields = 
catalogTable.schema.takeRight(catalogTable.partitionColumnNames.length)
+val dataSchema = catalogTable.schema
+  .take(catalogTable.schema.length - 
catalogTable.partitionColumnNames.length)
+catalog.alterTable(catalogTable.copy(schema =
+  catalogTable.schema.copy(fields = (dataSchema ++ columns ++ 
partitionFields).toArray)))
+
+Seq.empty[Row]
+  }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
+   */
+  private def verifyAlterTableAddColumn(
+catalog: SessionCatalog,
+table: TableIdentifier): CatalogTable = {
+val catalogTable = catalog.getTempViewOrPermanentTableMetadata(table)
+
+if (catalogTable.tableType == CatalogTableType.VIEW) {
+  throw new AnalysisException(
+s"${table.toString} is a VIEW, which does not support ALTER ADD 
COLUMNS.")
+}
+
+if (DDLUtils.isDatasourceTable(catalogTable)) {
+  DataSource.lookupDataSource(catalogTable.provider.get).newInstance() 
match {
+// For datasource table, this command can only support the 
following File format.
+// TextFileFormat only default to one column "value"
+// OrcFileFormat can not handle difference between user-specified 
schema and
+// inferred schema yet. TODO, once this issue is resolved , we can 
add Orc back.
+// Hive type is already considered as hive serde table, so the 
logic will not
+// come in here.
+case _: JsonFileFormat | _: CSVFileFormat | _: ParquetFileFormat =>
+case s =>
+  throw new AnalysisException(
+s"""${table.toString} is a datasource table with type $s,
--- End diff --

`toString` is not needed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive ser...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r101958045
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -174,6 +177,79 @@ case class AlterTableRenameCommand(
 }
 
 /**
+ * A command that add columns to a table
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   ALTER TABLE table_identifier
+ *   ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
+ * }}}
+*/
+case class AlterTableAddColumnsCommand(
+table: TableIdentifier,
+columns: Seq[StructField]) extends RunnableCommand {
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val catalogTable = verifyAlterTableAddColumn(catalog, table)
+
+// If an exception is thrown here we can just assume the table is 
uncached;
+// this can happen with Hive tables when the underlying catalog is 
in-memory.
+val wasCached = 
Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false)
+if (wasCached) {
+  try {
+sparkSession.catalog.uncacheTable(table.unquotedString)
+  } catch {
+case NonFatal(e) => log.warn(e.toString, e)
+  }
+}
+// Invalidate the table last, otherwise uncaching the table would load 
the logical plan
+// back into the hive metastore cache
+catalog.refreshTable(table)
+val partitionFields = 
catalogTable.schema.takeRight(catalogTable.partitionColumnNames.length)
+val dataSchema = catalogTable.schema
+  .take(catalogTable.schema.length - 
catalogTable.partitionColumnNames.length)
+catalog.alterTable(catalogTable.copy(schema =
+  catalogTable.schema.copy(fields = (dataSchema ++ columns ++ 
partitionFields).toArray)))
+
+Seq.empty[Row]
+  }
+
+  /**
+   * ALTER TABLE ADD COLUMNS command does not support temporary view/table,
+   * view, or datasource table with text, orc formats or external provider.
+   */
+  private def verifyAlterTableAddColumn(
+catalog: SessionCatalog,
+table: TableIdentifier): CatalogTable = {
--- End diff --

indent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16923
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73146/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16923
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16923
  
**[Test build #73146 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73146/testReport)**
 for PR 16923 at commit 
[`57060e3`](https://github.com/apache/spark/commit/57060e351a4e00f93a832a05dabaaa086086b1aa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16996
  
**[Test build #73152 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73152/testReport)**
 for PR 16996 at commit 
[`ac0a1c6`](https://github.com/apache/spark/commit/ac0a1c61d6794de4d049b4dd50593da0aa4f9cfe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16997: Updated the SQL programming guide to explain about the E...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16997
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16997: Updated the SQL programming guide to explain abou...

2017-02-19 Thread HarshSharma8
GitHub user HarshSharma8 opened a pull request:

https://github.com/apache/spark/pull/16997

Updated the SQL programming guide to explain about the Encoding opera…


## What changes were proposed in this pull request?

Made some updates to SQL programming guide to explain the Encoding 
operation with kryo.

## How was this patch tested?

Just updated the docs.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HarshSharma8/spark feature/docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16997.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16997


commit 103906fb23b5212858e89e9a090693b6fb2c6307
Author: Harsh Sharma 
Date:   2017-02-20T06:51:55Z

Updated the SQL programming guide to explain about the Encoding operation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16819
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73151/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16819
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16819
  
**[Test build #73151 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73151/testReport)**
 for PR 16819 at commit 
[`8e99701`](https://github.com/apache/spark/commit/8e9970107c8e74b57718398d4972af7d4709ec2d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16978: [SPARK-19652][UI] Do auth checks for REST API access.

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16978
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16981
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16981
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73143/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16981
  
**[Test build #73143 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73143/testReport)**
 for PR 16981 at commit 
[`9b1c015`](https://github.com/apache/spark/commit/9b1c015661529f4e0db9f295574dcd5ed66a2919).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16978: [SPARK-19652][UI] Do auth checks for REST API access.

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16978
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73139/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16978: [SPARK-19652][UI] Do auth checks for REST API access.

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16978
  
**[Test build #73139 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73139/testReport)**
 for PR 16978 at commit 
[`7288160`](https://github.com/apache/spark/commit/7288160e5c3c2cce72133f68693ad2ab47f346d0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16977: [SPARK-19651][CORE] ParallelCollectionRDD.collect...

2017-02-19 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16977#discussion_r101955822
  
--- Diff: 
core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala ---
@@ -105,6 +105,17 @@ private[spark] class ParallelCollectionRDD[T: 
ClassTag](
   override def getPreferredLocations(s: Partition): Seq[String] = {
 locationPrefs.getOrElse(s.index, Nil)
   }
+
+  override def collect(): Array[T] = toArray(data)
+
+  override def take(num: Int): Array[T] = toArray(data.take(num))
+
+  private def toArray(data: Seq[T]): Array[T] = {
+// We serialize the data and deserialize it back, to simulate the 
behavior of sending it to
+// remote executors and collect it back.
+val ser = sc.env.closureSerializer.newInstance()
+ser.deserialize[Seq[T]](ser.serialize(data)).toArray
+  }
--- End diff --

> ...  with a round-trip serialization to simulate the previously behavior 
and make sure collect returns a new copy of data.

I think the description quoted explains that.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16819
  
**[Test build #73151 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73151/testReport)**
 for PR 16819 at commit 
[`8e99701`](https://github.com/apache/spark/commit/8e9970107c8e74b57718398d4972af7d4709ec2d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16977: [SPARK-19651][CORE] ParallelCollectionRDD.collect...

2017-02-19 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16977#discussion_r101954581
  
--- Diff: 
core/src/main/scala/org/apache/spark/rdd/ParallelCollectionRDD.scala ---
@@ -105,6 +105,17 @@ private[spark] class ParallelCollectionRDD[T: 
ClassTag](
   override def getPreferredLocations(s: Partition): Seq[String] = {
 locationPrefs.getOrElse(s.index, Nil)
   }
+
+  override def collect(): Array[T] = toArray(data)
+
+  override def take(num: Int): Array[T] = toArray(data.take(num))
+
+  private def toArray(data: Seq[T]): Array[T] = {
+// We serialize the data and deserialize it back, to simulate the 
behavior of sending it to
+// remote executors and collect it back.
+val ser = sc.env.closureSerializer.newInstance()
+ser.deserialize[Seq[T]](ser.serialize(data)).toArray
+  }
--- End diff --

Why should we simulate like this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15125
  
**[Test build #73150 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73150/testReport)**
 for PR 15125 at commit 
[`2639eb1`](https://github.com/apache/spark/commit/2639eb10f516a1c11f94cf2918cf2635f3b459bc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16981
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73141/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15415
  
**[Test build #73149 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73149/testReport)**
 for PR 15415 at commit 
[`dfdf85d`](https://github.com/apache/spark/commit/dfdf85d4cf26864fdbcf57d2e60153d299741197).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16981
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16981
  
**[Test build #73141 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73141/testReport)**
 for PR 16981 at commit 
[`31ca0ff`](https://github.com/apache/spark/commit/31ca0ff772d10561357d6ff375ce36275bba7550).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16865: [SPARK-19530][SQL] Use guava weigher for code cac...

2017-02-19 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/16865


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16865: [SPARK-19530][SQL] Use guava weigher for code cache evic...

2017-02-19 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16865
  
Ok. Close this for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15125
  
**[Test build #73148 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73148/testReport)**
 for PR 15125 at commit 
[`dd6c366`](https://github.com/apache/spark/commit/dd6c366f504833f064b126a7fe85ea9cdc42fde1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-19 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16949
  
cc @vanzin Take a second review please!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16996
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73144/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16819
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73147/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16996
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16819
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16819
  
**[Test build #73147 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73147/testReport)**
 for PR 16819 at commit 
[`4f81680`](https://github.com/apache/spark/commit/4f81680364c16e5e70b65e785a439c184b1313e3).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16996
  
**[Test build #73144 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73144/testReport)**
 for PR 16996 at commit 
[`92c1452`](https://github.com/apache/spark/commit/92c1452da5f994a96f1bf5cf90df75492e742746).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-19 Thread dding3
Github user dding3 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r101952678
  
--- Diff: docs/graphx-programming-guide.md ---
@@ -720,25 +722,53 @@ class GraphOps[VD, ED] {
sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
mergeMsg: (A, A) => A)
 : Graph[VD, ED] = {
-// Receive the initial message at each vertex
-var g = mapVertices( (vid, vdata) => vprog(vid, vdata, initialMsg) 
).cache()
+val checkpointInterval = graph.vertices.sparkContext.getConf
--- End diff --

OK. I will change back then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16819
  
**[Test build #73147 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73147/testReport)**
 for PR 16819 at commit 
[`4f81680`](https://github.com/apache/spark/commit/4f81680364c16e5e70b65e785a439c184b1313e3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16726: [SPARK-19390][SQL] Replace the unnecessary usages...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16726#discussion_r101952068
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala
 ---
@@ -166,13 +166,11 @@ class HiveTableScanSuite extends HiveComparisonTest 
with SQLTestUtils with TestH
  |PARTITION (p1='a',p2='c',p3='c',p4='d',p5='e')
  |SELECT v.id
""".stripMargin)
-val plan = sql(
-  s"""
- |SELECT * FROM $table
-   """.stripMargin).queryExecution.sparkPlan
+val plan = sql(s"SELECT * FROM $table").queryExecution.sparkPlan
 val relation = plan.collectFirst {
   case p: HiveTableScanExec => p.relation
 }.get
+// This test case is to verify `hiveQlTable` and 
`getHiveQlPartitions()`
 val tableCols = relation.hiveQlTable.getCols
--- End diff --

Let me remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16994
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73142/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16994
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16726: [SPARK-19390][SQL] Replace the unnecessary usages...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16726#discussion_r101952060
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala
 ---
@@ -166,13 +166,11 @@ class HiveTableScanSuite extends HiveComparisonTest 
with SQLTestUtils with TestH
  |PARTITION (p1='a',p2='c',p3='c',p4='d',p5='e')
  |SELECT v.id
""".stripMargin)
-val plan = sql(
-  s"""
- |SELECT * FROM $table
-   """.stripMargin).queryExecution.sparkPlan
+val plan = sql(s"SELECT * FROM $table").queryExecution.sparkPlan
 val relation = plan.collectFirst {
   case p: HiveTableScanExec => p.relation
 }.get
+// This test case is to verify `hiveQlTable` and 
`getHiveQlPartitions()`
 val tableCols = relation.hiveQlTable.getCols
--- End diff --

The whole test case for https://github.com/apache/spark/pull/14515 is not 
needed after the recent code refactoring.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16994
  
**[Test build #73142 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73142/testReport)**
 for PR 16994 at commit 
[`4b73130`](https://github.com/apache/spark/commit/4b73130d33d2af1e74a688b7e19db0fb5d90f72e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class BucketedTableTestSpec(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16819: [SPARK-16441][YARN] Set maxNumExecutor depends on yarn c...

2017-02-19 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/16819
  
@srowen . Dynamic set `spark.dynamicAllocation.maxExecutors`  can avoid 
some strange problems:

1. [Spark application hang when dynamic allocation is 
enabled](https://issues.apache.org/jira/browse/SPARK-16441)
2. [Report failure reason from Reporter 
Thread](https://issues.apache.org/jira/browse/SPARK-19226)
3. CLI shows successful but web ui didn't, simally to 
[this](https://issues.apache.org/jira/secure/attachment/12846513/can-not-consume-taskEnd-events.jpg)

I add a unit test just now.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16994
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73140/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16994
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16994
  
**[Test build #73140 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73140/testReport)**
 for PR 16994 at commit 
[`f1569bf`](https://github.com/apache/spark/commit/f1569bf1a0a3047aef860bde18d8d34f71548886).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class BucketTableTestSpec(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-02-19 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15125#discussion_r101950972
  
--- Diff: docs/graphx-programming-guide.md ---
@@ -720,25 +722,53 @@ class GraphOps[VD, ED] {
sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)],
mergeMsg: (A, A) => A)
 : Graph[VD, ED] = {
-// Receive the initial message at each vertex
-var g = mapVertices( (vid, vdata) => vprog(vid, vdata, initialMsg) 
).cache()
+val checkpointInterval = graph.vertices.sparkContext.getConf
--- End diff --

hmm, as this is just a implementation sketch, I don't think we should 
include such details of checkpointer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16923: [SPARK-19038][Hive][YARN] Correctly figure out keytab fi...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16923
  
**[Test build #73146 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73146/testReport)**
 for PR 16923 at commit 
[`57060e3`](https://github.com/apache/spark/commit/57060e351a4e00f93a832a05dabaaa086086b1aa).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...

2017-02-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r101950039
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -482,6 +482,15 @@ case class JsonTuple(children: Seq[Expression])
 /**
  * Converts an json input string to a [[StructType]] with the specified 
schema.
  */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(jsonStr, schema[, options]) - Returns a struct value 
with the given `jsonStr` and `schema`.",
+  extended = """
+Examples:
+  > SELECT _FUNC_('{"a":1}', '{"type":"struct", "fields":[{"name":"a", 
"type":"integer", "nullable":true}]}');
--- End diff --

I'll check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...

2017-02-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r101950073
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -174,4 +174,44 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
   .select(to_json($"struct").as("json"))
 checkAnswer(dfTwo, readBackTwo)
   }
+
+  test("SPARK-19637 Support to_json/from_json in SQL") {
+// to_json
+val df1 = Seq(Tuple1(Tuple1(1))).toDF("a")
+checkAnswer(
+  df1.selectExpr("to_json(a)"),
+  Row("""{"_1":1}""") :: Nil)
+
+val df2 = Seq(Tuple1(Tuple1(java.sql.Timestamp.valueOf("2015-08-26 
18:00:00.0".toDF("a")
+checkAnswer(
+  df2.selectExpr("""to_json(a, '{"timestampFormat": "dd/MM/ 
HH:mm"}')"""),
+  Row("""{"_1":"26/08/2015 18:00"}""") :: Nil)
+
+val errMsg1 = intercept[AnalysisException] {
+  df2.selectExpr("""to_json(a, '{"k": [{"k": "v"}]}')""").collect
+}
+assert(errMsg1.getMessage.startsWith(
+  """The format must be '{"key": "value", ...}', but {"k": [{"k": 
"v"}]}"""))
+
+// from_json
+val df3 = Seq("""{"a": 1}""").toDS()
+val schema1 = new StructType().add("a", IntegerType)
+checkAnswer(
+  df3.selectExpr(s"from_json(value, '${schema1.json}')"),
+  Row(Row(1)) :: Nil)
+
+val df4 = Seq("""{"time": "26/08/2015 18:00"}""").toDS()
+val schema2 = new StructType().add("time", TimestampType)
+checkAnswer(
+  df4.selectExpr(
+s"""from_json(value, '${schema2.json}', """ +
+   """'{"timestampFormat": "dd/MM/ HH:mm"}')"""),
--- End diff --

okay, I'll fix in that way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16865: [SPARK-19530][SQL] Use guava weigher for code cache evic...

2017-02-19 Thread davies
Github user davies commented on the issue:

https://github.com/apache/spark/pull/16865
  
I still think it's not worth it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r101948866
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -482,6 +482,15 @@ case class JsonTuple(children: Seq[Expression])
 /**
  * Converts an json input string to a [[StructType]] with the specified 
schema.
  */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(jsonStr, schema[, options]) - Returns a struct value 
with the given `jsonStr` and `schema`.",
+  extended = """
+Examples:
+  > SELECT _FUNC_('{"a":1}', '{"type":"struct", "fields":[{"name":"a", 
"type":"integer", "nullable":true}]}');
--- End diff --

Can we let users call `named_struct` function to specify the schema?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.dir in ha...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16996
  
**[Test build #73144 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73144/testReport)**
 for PR 16996 at commit 
[`92c1452`](https://github.com/apache/spark/commit/92c1452da5f994a96f1bf5cf90df75492e742746).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16949: [SPARK-16122][CORE] Add rest api for job environment

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16949
  
**[Test build #73145 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73145/testReport)**
 for PR 16949 at commit 
[`ad570cf`](https://github.com/apache/spark/commit/ad570cff2f04b6d4e31feb1aaabe5483f8ad0cca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16818: [SPARK-19451][SQL][Core] Underlying integer overf...

2017-02-19 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16818#discussion_r101948606
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/WindowSpec.scala ---
@@ -180,16 +180,20 @@ class WindowSpec private[sql](
   private def between(typ: FrameType, start: Long, end: Long): WindowSpec 
= {
 val boundaryStart = start match {
   case 0 => CurrentRow
-  case Long.MinValue => UnboundedPreceding
-  case x if x < 0 => ValuePreceding(-start.toInt)
-  case x if x > 0 => ValueFollowing(start.toInt)
+  case x if x < Int.MinValue => UnboundedPreceding
--- End diff --

cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r101948552
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -174,4 +174,44 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
   .select(to_json($"struct").as("json"))
 checkAnswer(dfTwo, readBackTwo)
   }
+
+  test("SPARK-19637 Support to_json/from_json in SQL") {
+// to_json
+val df1 = Seq(Tuple1(Tuple1(1))).toDF("a")
+checkAnswer(
+  df1.selectExpr("to_json(a)"),
+  Row("""{"_1":1}""") :: Nil)
+
+val df2 = Seq(Tuple1(Tuple1(java.sql.Timestamp.valueOf("2015-08-26 
18:00:00.0".toDF("a")
+checkAnswer(
+  df2.selectExpr("""to_json(a, '{"timestampFormat": "dd/MM/ 
HH:mm"}')"""),
+  Row("""{"_1":"26/08/2015 18:00"}""") :: Nil)
+
+val errMsg1 = intercept[AnalysisException] {
+  df2.selectExpr("""to_json(a, '{"k": [{"k": "v"}]}')""").collect
--- End diff --

`collect ` is not needed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16996: [SPARK-19664][SQL]put hive.metastore.warehouse.di...

2017-02-19 Thread windpiger
GitHub user windpiger opened a pull request:

https://github.com/apache/spark/pull/16996

[SPARK-19664][SQL]put hive.metastore.warehouse.dir in hadoopconf to 
verwrite its original value


## What changes were proposed in this pull request?

In [SPARK-15959](https://issues.apache.org/jira/browse/SPARK-15959), we 
bring back the `hive.metastore.warehouse.dir` , while in the logic, when use 
the value of  `spark.sql.warehouse.dir` to overwrite 
`hive.metastore.warehouse.dir` , it set it to `sparkContext.conf` which does 
not overwrite the value is hadoopConf, I think it should put in 
`sparkContext.hadoopConfiguration` and overwrite the original value of 
hadoopConf


https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L64

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/windpiger/spark hivemetawarehouseConf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16996.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16996


commit 92c1452da5f994a96f1bf5cf90df75492e742746
Author: windpiger 
Date:   2017-02-20T05:04:17Z

[SPARK-19664][SQL]put hive.metastore.warehouse.dir in hadoopconf to 
overwrite its original value




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r101948378
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -174,4 +174,44 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
   .select(to_json($"struct").as("json"))
 checkAnswer(dfTwo, readBackTwo)
   }
+
+  test("SPARK-19637 Support to_json/from_json in SQL") {
+// to_json
+val df1 = Seq(Tuple1(Tuple1(1))).toDF("a")
+checkAnswer(
+  df1.selectExpr("to_json(a)"),
+  Row("""{"_1":1}""") :: Nil)
+
+val df2 = Seq(Tuple1(Tuple1(java.sql.Timestamp.valueOf("2015-08-26 
18:00:00.0".toDF("a")
+checkAnswer(
+  df2.selectExpr("""to_json(a, '{"timestampFormat": "dd/MM/ 
HH:mm"}')"""),
+  Row("""{"_1":"26/08/2015 18:00"}""") :: Nil)
+
+val errMsg1 = intercept[AnalysisException] {
+  df2.selectExpr("""to_json(a, '{"k": [{"k": "v"}]}')""").collect
+}
+assert(errMsg1.getMessage.startsWith(
+  """The format must be '{"key": "value", ...}', but {"k": [{"k": 
"v"}]}"""))
+
+// from_json
+val df3 = Seq("""{"a": 1}""").toDS()
+val schema1 = new StructType().add("a", IntegerType)
+checkAnswer(
+  df3.selectExpr(s"from_json(value, '${schema1.json}')"),
+  Row(Row(1)) :: Nil)
+
+val df4 = Seq("""{"time": "26/08/2015 18:00"}""").toDS()
+val schema2 = new StructType().add("time", TimestampType)
+checkAnswer(
+  df4.selectExpr(
+s"""from_json(value, '${schema2.json}', """ +
+   """'{"timestampFormat": "dd/MM/ HH:mm"}')"""),
--- End diff --

Regarding the format of options, another way is to use the MapType.

For example, ```Scala
from_json(value, '${schema2.json}', map("timestampFormat", "dd/MM/ 
HH:mm"))
```

I am not sure whether using JSON to represent options is a good way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...

2017-02-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r101948229
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala
 ---
@@ -55,4 +60,24 @@ object JacksonUtils {
 
 schema.foreach(field => verifyType(field.name, field.dataType))
   }
+
+  private def validateStringLiteral(exp: Expression): String = exp match {
+case Literal(s, StringType) => s.toString
+case e => throw new AnalysisException(s"Must be a string literal, but: 
$e")
+  }
+
+  def validateSchemaLiteral(exp: Expression): StructType =
+DataType.fromJson(validateStringLiteral(exp)).asInstanceOf[StructType]
+
+  /**
+   * Convert a literal including a json option string (e.g., '{"mode": 
"PERMISSIVE", ...}')
--- End diff --

Aha, you mean we use a map literal, directly? Sorry, but I missed that 
idea. This json option is totally meaningless? If yes, I'll fix to use a map 
literal here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r101947987
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -482,6 +482,15 @@ case class JsonTuple(children: Seq[Expression])
 /**
  * Converts an json input string to a [[StructType]] with the specified 
schema.
  */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(jsonStr, schema[, options]) - Returns a struct value 
with the given `jsonStr` and `schema`.",
+  extended = """
+Examples:
+  > SELECT _FUNC_('{"a":1}', '{"type":"struct", "fields":[{"name":"a", 
"type":"integer", "nullable":true}]}');
+   {"a":1}
--- End diff --

More examples are needed to show users how to use option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...

2017-02-19 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16981
  
@gatorsmile okay, I'll do soon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16981
  
**[Test build #73143 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73143/testReport)**
 for PR 16981 at commit 
[`9b1c015`](https://github.com/apache/spark/commit/9b1c015661529f4e0db9f295574dcd5ed66a2919).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16981
  
Could you add SQL test cases to SQLQueryTestSuite?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r101947601
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala
 ---
@@ -55,4 +60,24 @@ object JacksonUtils {
 
 schema.foreach(field => verifyType(field.name, field.dataType))
   }
+
+  private def validateStringLiteral(exp: Expression): String = exp match {
+case Literal(s, StringType) => s.toString
+case e => throw new AnalysisException(s"Must be a string literal, but: 
$e")
+  }
+
+  def validateSchemaLiteral(exp: Expression): StructType =
+DataType.fromJson(validateStringLiteral(exp)).asInstanceOf[StructType]
+
+  /**
+   * Convert a literal including a json option string (e.g., '{"mode": 
"PERMISSIVE", ...}')
--- End diff --

What is the reason we use the Json option string?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r101947505
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -482,6 +482,15 @@ case class JsonTuple(children: Seq[Expression])
 /**
  * Converts an json input string to a [[StructType]] with the specified 
schema.
  */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(jsonStr, schema[, options]) - Return a struct value with 
the given `jsonStr` and `schema`.",
--- End diff --

`Return` -> `Returns`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15125
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73138/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15125
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15125
  
**[Test build #73138 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73138/testReport)**
 for PR 15125 at commit 
[`dae94aa`](https://github.com/apache/spark/commit/dae94aa1c216b390ad2fcc0b435b98e9fc2436b4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...

2017-02-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r101947175
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala
 ---
@@ -55,4 +60,26 @@ object JacksonUtils {
 
 schema.foreach(field => verifyType(field.name, field.dataType))
   }
+
+  private def validateStringLiteral(exp: Expression): String = exp match {
+case Literal(s, StringType) => s.toString
+case e => throw new AnalysisException("Must be a string literal, but: 
" + e)
+  }
+
+  def validateSchemaLiteral(exp: Expression): StructType =
+DataType.fromJson(validateStringLiteral(exp)).asInstanceOf[StructType]
--- End diff --

okay, I'll do that ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...

2017-02-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r101947094
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala
 ---
@@ -55,4 +60,26 @@ object JacksonUtils {
 
 schema.foreach(field => verifyType(field.name, field.dataType))
   }
+
+  private def validateStringLiteral(exp: Expression): String = exp match {
+case Literal(s, StringType) => s.toString
+case e => throw new AnalysisException("Must be a string literal, but: 
" + e)
+  }
+
+  def validateSchemaLiteral(exp: Expression): StructType =
+DataType.fromJson(validateStringLiteral(exp)).asInstanceOf[StructType]
--- End diff --

Ah, thanks. Yes, if it throws a class cast exception, I think we should 
produce a better exception and message rather than just one saying `A cannot be 
cast to B`. Maybe, add a util for both places?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16995: [SPARK-19340][SQL] CSV file will result in an exception ...

2017-02-19 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16995
  
Could you add  tests for this pr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16994
  
**[Test build #73142 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73142/testReport)**
 for PR 16994 at commit 
[`4b73130`](https://github.com/apache/spark/commit/4b73130d33d2af1e74a688b7e19db0fb5d90f72e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16981: [SPARK-19637][SQL] Add from_json/to_json in FunctionRegi...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16981
  
**[Test build #73141 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73141/testReport)**
 for PR 16981 at commit 
[`31ca0ff`](https://github.com/apache/spark/commit/31ca0ff772d10561357d6ff375ce36275bba7550).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExec to ex...

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16994
  
**[Test build #73140 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73140/testReport)**
 for PR 16994 at commit 
[`f1569bf`](https://github.com/apache/spark/commit/f1569bf1a0a3047aef860bde18d8d34f71548886).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16981: [SPARK-19637][SQL] Add from_json/to_json in Funct...

2017-02-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16981#discussion_r101946265
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonUtils.scala
 ---
@@ -55,4 +60,26 @@ object JacksonUtils {
 
 schema.foreach(field => verifyType(field.name, field.dataType))
   }
+
+  private def validateStringLiteral(exp: Expression): String = exp match {
+case Literal(s, StringType) => s.toString
+case e => throw new AnalysisException("Must be a string literal, but: 
" + e)
+  }
+
+  def validateSchemaLiteral(exp: Expression): StructType =
+DataType.fromJson(validateStringLiteral(exp)).asInstanceOf[StructType]
--- End diff --

I just wrote this way along with here 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3010.
 Both is okay to me though, if we modify the code in a way you suggested, we 
need to modify `from_json` code, too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16978: [SPARK-19652][UI] Do auth checks for REST API access.

2017-02-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16978
  
**[Test build #73139 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73139/testReport)**
 for PR 16978 at commit 
[`7288160`](https://github.com/apache/spark/commit/7288160e5c3c2cce72133f68693ad2ab47f346d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16865: [SPARK-19530][SQL] Use guava weigher for code cache evic...

2017-02-19 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16865
  
ping @davies Do you still think this is not helpful generally?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16994: [SPARK-15453] [SQL] [Follow-up] FileSourceScanExe...

2017-02-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16994#discussion_r101941261
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala ---
@@ -240,6 +240,7 @@ class BucketedReadSuite extends QueryTest with 
SQLTestUtils with TestHiveSinglet
   joinCondition: (DataFrame, DataFrame) => Column,
   shuffleLeft: Boolean,
   shuffleRight: Boolean,
+  numPartitions: Int = 10,
--- End diff --

Sure, let me do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16855: [SPARK-13931] Stage can hang if an executor fails...

2017-02-19 Thread GavinGavinNo1
Github user GavinGavinNo1 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16855#discussion_r101940439
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -664,6 +665,55 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 assert(thrown2.getMessage().contains("bigger than 
spark.driver.maxResultSize"))
   }
 
+  test("taskSetManager should not send Resubmitted tasks after being a 
zombie") {
+// Regression test for SPARK-13931
+val conf = new SparkConf().set("spark.speculation", "true")
+sc = new SparkContext("local", "test", conf)
+
+val sched = new FakeTaskScheduler(sc, ("execA", "host1"), ("execB", 
"host2"))
+sched.initialize(new FakeSchedulerBackend() {
+  override def killTask(taskId: Long, executorId: String, 
interruptThread: Boolean): Unit = {}
+})
+
+// count for Resubmitted tasks
+var resubmittedTasks = 0
+val dagScheduler = new FakeDAGScheduler(sc, sched) {
--- End diff --

@kayousterhout 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16171: [SPARK-18739][ML][PYSPARK] Classification and regression...

2017-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16171
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73137/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >