subject:"\[GitHub\] spark pull request #16422\: \[SPARK\-17642\] \[SQL\] support DESC EXTENDED\/FORMATT..."

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-13 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r138540329
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/describe-table-column.sql ---
@@ -0,0 +1,35 @@
+-- Test temp table
+CREATE TEMPORARY VIEW desc_col_temp_table (key int COMMENT 
'column_comment') USING PARQUET;
+
+DESC desc_col_temp_table key;
+
+DESC EXTENDED desc_col_temp_table key;
+
+DESC FORMATTED desc_col_temp_table key;
+
+-- Describe a column with qualified name
+DESC FORMATTED desc_col_temp_table desc_col_temp_table.key;
+
+-- Describe a non-existent column
+DESC desc_col_temp_table key1;
+
+-- Test persistent table
+CREATE TABLE desc_col_table (key int COMMENT 'column_comment') USING 
PARQUET;
--- End diff --

yes we should. I'll drop them in the followup pr.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-13 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r138539525
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-table-column.sql.out ---
@@ -0,0 +1,184 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 15
+
+
+-- !query 0
+CREATE TEMPORARY VIEW desc_col_temp_table (key int COMMENT 
'column_comment') USING PARQUET
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+DESC desc_col_temp_table key
+-- !query 1 schema
+struct
+-- !query 1 output
+col_name   key
+data_type  int
+commentcolumn_comment
+
+
+-- !query 2
+DESC EXTENDED desc_col_temp_table key
+-- !query 2 schema
+struct
+-- !query 2 output
+col_name   key
+data_type  int
+commentcolumn_comment
+minNULL
+maxNULL
+num_nulls  NULL
+distinct_count NULL
+avg_col_lenNULL
+max_col_lenNULL
+
+
+-- !query 3
+DESC FORMATTED desc_col_temp_table key
+-- !query 3 schema
+struct
+-- !query 3 output
+col_name   key
+data_type  int
+commentcolumn_comment
+minNULL
+maxNULL
+num_nulls  NULL
+distinct_count NULL
+avg_col_lenNULL
+max_col_lenNULL
+
+
+-- !query 4
+DESC FORMATTED desc_col_temp_table desc_col_temp_table.key
+-- !query 4 schema
+struct
+-- !query 4 output
+col_name   key
+data_type  int
+commentcolumn_comment
+minNULL
+maxNULL
+num_nulls  NULL
+distinct_count NULL
+avg_col_lenNULL
+max_col_lenNULL
+
+
+-- !query 5
+DESC desc_col_temp_table key1
+-- !query 5 schema
+struct<>
+-- !query 5 output
+org.apache.spark.sql.AnalysisException
+Column key1 does not exist;
+
+
+-- !query 6
+CREATE TABLE desc_col_table (key int COMMENT 'column_comment') USING 
PARQUET
+-- !query 6 schema
+struct<>
+-- !query 6 output
+
+
+
+-- !query 7
+ANALYZE TABLE desc_col_table COMPUTE STATISTICS FOR COLUMNS key
+-- !query 7 schema
+struct<>
+-- !query 7 output
+
+
+
+-- !query 8
+DESC desc_col_table key
+-- !query 8 schema
+struct
+-- !query 8 output
+col_name   key
+data_type  int
+commentcolumn_comment
+
+
+-- !query 9
+DESC EXTENDED desc_col_table key
+-- !query 9 schema
+struct
+-- !query 9 output
+col_name   key
+data_type  int
+commentcolumn_comment
+minNULL
+maxNULL
--- End diff --

because the table is empty


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r138525087
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-table-column.sql.out ---
@@ -0,0 +1,184 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 15
+
+
+-- !query 0
+CREATE TEMPORARY VIEW desc_col_temp_table (key int COMMENT 
'column_comment') USING PARQUET
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+DESC desc_col_temp_table key
+-- !query 1 schema
+struct
+-- !query 1 output
+col_name   key
+data_type  int
+commentcolumn_comment
+
+
+-- !query 2
+DESC EXTENDED desc_col_temp_table key
+-- !query 2 schema
+struct
+-- !query 2 output
+col_name   key
+data_type  int
+commentcolumn_comment
+minNULL
+maxNULL
+num_nulls  NULL
+distinct_count NULL
+avg_col_lenNULL
+max_col_lenNULL
+
+
+-- !query 3
+DESC FORMATTED desc_col_temp_table key
+-- !query 3 schema
+struct
+-- !query 3 output
+col_name   key
+data_type  int
+commentcolumn_comment
+minNULL
+maxNULL
+num_nulls  NULL
+distinct_count NULL
+avg_col_lenNULL
+max_col_lenNULL
+
+
+-- !query 4
+DESC FORMATTED desc_col_temp_table desc_col_temp_table.key
+-- !query 4 schema
+struct
+-- !query 4 output
+col_name   key
+data_type  int
+commentcolumn_comment
+minNULL
+maxNULL
+num_nulls  NULL
+distinct_count NULL
+avg_col_lenNULL
+max_col_lenNULL
+
+
+-- !query 5
+DESC desc_col_temp_table key1
+-- !query 5 schema
+struct<>
+-- !query 5 output
+org.apache.spark.sql.AnalysisException
+Column key1 does not exist;
+
+
+-- !query 6
+CREATE TABLE desc_col_table (key int COMMENT 'column_comment') USING 
PARQUET
+-- !query 6 schema
+struct<>
+-- !query 6 output
+
+
+
+-- !query 7
+ANALYZE TABLE desc_col_table COMPUTE STATISTICS FOR COLUMNS key
+-- !query 7 schema
+struct<>
+-- !query 7 output
+
+
+
+-- !query 8
+DESC desc_col_table key
+-- !query 8 schema
+struct
+-- !query 8 output
+col_name   key
+data_type  int
+commentcolumn_comment
+
+
+-- !query 9
+DESC EXTENDED desc_col_table key
+-- !query 9 schema
+struct
+-- !query 9 output
+col_name   key
+data_type  int
+commentcolumn_comment
+minNULL
+maxNULL
--- End diff --

why min max is NULL?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r138524974
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/describe-table-column.sql ---
@@ -0,0 +1,35 @@
+-- Test temp table
+CREATE TEMPORARY VIEW desc_col_temp_table (key int COMMENT 
'column_comment') USING PARQUET;
+
+DESC desc_col_temp_table key;
+
+DESC EXTENDED desc_col_temp_table key;
+
+DESC FORMATTED desc_col_temp_table key;
+
+-- Describe a column with qualified name
+DESC FORMATTED desc_col_temp_table desc_col_temp_table.key;
+
+-- Describe a non-existent column
+DESC desc_col_temp_table key1;
+
+-- Test persistent table
+CREATE TABLE desc_col_table (key int COMMENT 'column_comment') USING 
PARQUET;
--- End diff --

shall we drop these testing tables at the end?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-12 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r138510770
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,74 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
--- End diff --

A followup PR to improve the comments is sent: 
https://github.com/apache/spark/pull/19213


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-12 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r138505758
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,74 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
--- End diff --

There are other two similar comments (`ShowPartitionsCommand`, 
`ShowColumnsCommand`) in this file, shall I remove them all?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r138498310
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,74 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
--- End diff --

This comment line seems not needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16422


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r137918748
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,73 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+colNameParts: Seq[String],
+isExtended: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+Seq(
+  AttributeReference("info_name", StringType, nullable = false,
+new MetadataBuilder().putString("comment", "name of the column 
info").build())(),
+  AttributeReference("info_value", StringType, nullable = false,
+new MetadataBuilder().putString("comment", "value of the column 
info").build())()
+)
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val relation = sparkSession.table(table).queryExecution.analyzed
+val field = {
+  relation.resolve(colNameParts, resolver).getOrElse {
+throw new AnalysisException(s"Column 
${UnresolvedAttribute(colNameParts).name} does not " +
+  s"exist")
+  }
+}
+if (!field.isInstanceOf[Attribute]) {
+  // If the field is not an attribute after `resolve`, then it's a 
nested field.
+  throw new AnalysisException(s"DESC TABLE COLUMN command is not 
supported for nested column:" +
+s" ${UnresolvedAttribute(colNameParts).name}")
+}
--- End diff --

```Scala
val colName = UnresolvedAttribute(colNameParts).name
val field = relation.resolve(colNameParts, resolver).getOrElse {
  throw new AnalysisException(s"Column $colName does not exist")
}
if (!field.isInstanceOf[Attribute]) {
  // If the field is not an attribute after `resolve`, then it's a 
nested field.
  throw new AnalysisException(
s"DESC TABLE COLUMN command does not supported nested data types: 
$colName")
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-08 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r137918691
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,73 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+colNameParts: Seq[String],
+isExtended: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+Seq(
+  AttributeReference("info_name", StringType, nullable = false,
+new MetadataBuilder().putString("comment", "name of the column 
info").build())(),
+  AttributeReference("info_value", StringType, nullable = false,
+new MetadataBuilder().putString("comment", "value of the column 
info").build())()
+)
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val relation = sparkSession.table(table).queryExecution.analyzed
+val field = {
--- End diff --

```Scala
val field = relation.resolve(colNameParts, resolver).getOrElse {
  val colName = UnresolvedAttribute(colNameParts).name
  throw new AnalysisException(s"Column $colName does not exist")
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-13 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r127230217
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-table-column.sql.out ---
@@ -0,0 +1,133 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 15
+
+
+-- !query 0
+CREATE TEMPORARY VIEW desc_col_temp_table (key int COMMENT 
'column_comment') USING PARQUET
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+DESC desc_col_temp_table key
+-- !query 1 schema
+struct
+-- !query 1 output
+col_name   data_type   comment 
+keyint column_comment
+
+
+-- !query 2
+DESC EXTENDED desc_col_temp_table key
+-- !query 2 schema

+struct
+-- !query 2 output
+col_name   data_type   min max num_nulls   distinct_count  
avg_col_len max_col_len comment 
--- End diff --

I think Hive's style would have better readability only if it supports 
describe multiple columns. So I did some tests, which showed hive doesn't 
support that:
```
hive> desc formatted src key, value;
FAILED: ParseException line 1:22 missing EOF at ',' near 'key'
hive> desc formatted src key value;
FAILED: ParseException line 1:23 extraneous input 'value' expecting EOF 
near ''
```
Therefore, I think @cloud-fan 's proposed style is more readable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-10 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126386081
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,120 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+colNameParts: Seq[String],
+isExtended: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isExtended) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"maximum length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val relation = sparkSession.table(table).queryExecution.analyzed
+val field = {
+  relation.resolve(colNameParts, resolver).getOrElse {
+throw new AnalysisException(s"Column 
${UnresolvedAttribute(colNameParts).name} does not " +
+  s"exist")
+  }
+}
+if (!field.isInstanceOf[Attribute]) {
+  // If the field is not an attribute after `resolve`, then it's a 
nested field.
+  throw new AnalysisException(s"DESC TABLE COLUMN command is not 
supported for nested column:" +
+s" ${UnresolvedAttribute(colNameParts).name}")
+}
+
+val catalogTable = catalog.getTempViewOrPermanentTableMetadata(table)
+val colStats = catalogTable.stats.map(_.colStats).getOrElse(Map.empty)
+val cs = colStats.get(field.name)
+
+val comment = if (field.metadata.contains("comment")) {
+  Option(field.metadata.getString("comment"))
+} else {
+  None
+}
+
+val fieldValues = if (isExtended) {
+  // Show column stats when extended or formatted is specified.
+  Seq(
+field.name,
+field.dataType.catalogString,
+cs.flatMap(_.min.map(_.toString)).getOrElse("NULL"),
+cs.flatMap(_.max.map(_.toString)).getOrElse("NULL"),
+cs.map(_.nullCount.toString).getOrElse("NULL"),
+cs.map(_.distinctCount.toString).getOrElse("NULL"),
+cs.map(_.avgLen.toString).getOrElse("NULL"),
+

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126355022
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-table-column.sql.out ---
@@ -0,0 +1,133 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 15
+
+
+-- !query 0
+CREATE TEMPORARY VIEW desc_col_temp_table (key int COMMENT 
'column_comment') USING PARQUET
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+DESC desc_col_temp_table key
+-- !query 1 schema
+struct
+-- !query 1 output
+col_name   data_type   comment 
+keyint column_comment
+
+
+-- !query 2
+DESC EXTENDED desc_col_temp_table key
+-- !query 2 schema

+struct
+-- !query 2 output
+col_name   data_type   min max num_nulls   distinct_count  
avg_col_len max_col_len comment 
--- End diff --

ok then we need to decide if we wanna diverge with hive here, cc 
@gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126354680
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,120 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+colNameParts: Seq[String],
+isExtended: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isExtended) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"maximum length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val relation = sparkSession.table(table).queryExecution.analyzed
+val field = {
+  relation.resolve(colNameParts, resolver).getOrElse {
+throw new AnalysisException(s"Column 
${UnresolvedAttribute(colNameParts).name} does not " +
+  s"exist")
+  }
+}
+if (!field.isInstanceOf[Attribute]) {
+  // If the field is not an attribute after `resolve`, then it's a 
nested field.
+  throw new AnalysisException(s"DESC TABLE COLUMN command is not 
supported for nested column:" +
+s" ${UnresolvedAttribute(colNameParts).name}")
+}
+
+val catalogTable = catalog.getTempViewOrPermanentTableMetadata(table)
+val colStats = catalogTable.stats.map(_.colStats).getOrElse(Map.empty)
+val cs = colStats.get(field.name)
+
+val comment = if (field.metadata.contains("comment")) {
+  Option(field.metadata.getString("comment"))
+} else {
+  None
+}
+
+val fieldValues = if (isExtended) {
+  // Show column stats when extended or formatted is specified.
+  Seq(
+field.name,
+field.dataType.catalogString,
+cs.flatMap(_.min.map(_.toString)).getOrElse("NULL"),
+cs.flatMap(_.max.map(_.toString)).getOrElse("NULL"),
+cs.map(_.nullCount.toString).getOrElse("NULL"),
+cs.map(_.distinctCount.toString).getOrElse("NULL"),
+cs.map(_.avgLen.toString).getOrElse("NULL"),
+

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-10 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126352882
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,120 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+colNameParts: Seq[String],
+isExtended: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isExtended) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"maximum length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val relation = sparkSession.table(table).queryExecution.analyzed
+val field = {
+  relation.resolve(colNameParts, resolver).getOrElse {
+throw new AnalysisException(s"Column 
${UnresolvedAttribute(colNameParts).name} does not " +
+  s"exist")
+  }
+}
+if (!field.isInstanceOf[Attribute]) {
+  // If the field is not an attribute after `resolve`, then it's a 
nested field.
+  throw new AnalysisException(s"DESC TABLE COLUMN command is not 
supported for nested column:" +
+s" ${UnresolvedAttribute(colNameParts).name}")
+}
+
+val catalogTable = catalog.getTempViewOrPermanentTableMetadata(table)
+val colStats = catalogTable.stats.map(_.colStats).getOrElse(Map.empty)
+val cs = colStats.get(field.name)
+
+val comment = if (field.metadata.contains("comment")) {
+  Option(field.metadata.getString("comment"))
+} else {
+  None
+}
+
+val fieldValues = if (isExtended) {
+  // Show column stats when extended or formatted is specified.
+  Seq(
+field.name,
+field.dataType.catalogString,
+cs.flatMap(_.min.map(_.toString)).getOrElse("NULL"),
+cs.flatMap(_.max.map(_.toString)).getOrElse("NULL"),
+cs.map(_.nullCount.toString).getOrElse("NULL"),
+cs.map(_.distinctCount.toString).getOrElse("NULL"),
+cs.map(_.avgLen.toString).getOrElse("NULL"),
+

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-10 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126352497
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-table-column.sql.out ---
@@ -0,0 +1,133 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 15
+
+
+-- !query 0
+CREATE TEMPORARY VIEW desc_col_temp_table (key int COMMENT 
'column_comment') USING PARQUET
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+DESC desc_col_temp_table key
+-- !query 1 schema
+struct
+-- !query 1 output
+col_name   data_type   comment 
+keyint column_comment
+
+
+-- !query 2
+DESC EXTENDED desc_col_temp_table key
+-- !query 2 schema

+struct
+-- !query 2 output
+col_name   data_type   min max num_nulls   distinct_count  
avg_col_len max_col_len comment 
--- End diff --

I already checked with hive previously [in this 
discussion](https://github.com/apache/spark/pull/16422#discussion_r125805788). 
The output here is the same as in hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126345116
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-table-column.sql.out ---
@@ -0,0 +1,133 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 15
+
+
+-- !query 0
+CREATE TEMPORARY VIEW desc_col_temp_table (key int COMMENT 
'column_comment') USING PARQUET
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+DESC desc_col_temp_table key
+-- !query 1 schema
+struct
+-- !query 1 output
+col_name   data_type   comment 
+keyint column_comment
+
+
+-- !query 2
+DESC EXTENDED desc_col_temp_table key
+-- !query 2 schema

+struct
+-- !query 2 output
+col_name   data_type   min max num_nulls   distinct_count  
avg_col_len max_col_len comment 
--- End diff --

can you check with hive? I feel this output is not friendly to users. I'd 
like to see something like:
schema: 
output:
```
col_name   abc
data_type  int
max   3

```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126344763
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,120 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+colNameParts: Seq[String],
+isExtended: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isExtended) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"maximum length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val relation = sparkSession.table(table).queryExecution.analyzed
+val field = {
+  relation.resolve(colNameParts, resolver).getOrElse {
+throw new AnalysisException(s"Column 
${UnresolvedAttribute(colNameParts).name} does not " +
+  s"exist")
+  }
+}
+if (!field.isInstanceOf[Attribute]) {
+  // If the field is not an attribute after `resolve`, then it's a 
nested field.
+  throw new AnalysisException(s"DESC TABLE COLUMN command is not 
supported for nested column:" +
+s" ${UnresolvedAttribute(colNameParts).name}")
+}
+
+val catalogTable = catalog.getTempViewOrPermanentTableMetadata(table)
+val colStats = catalogTable.stats.map(_.colStats).getOrElse(Map.empty)
+val cs = colStats.get(field.name)
+
+val comment = if (field.metadata.contains("comment")) {
+  Option(field.metadata.getString("comment"))
+} else {
+  None
+}
+
+val fieldValues = if (isExtended) {
+  // Show column stats when extended or formatted is specified.
+  Seq(
+field.name,
+field.dataType.catalogString,
+cs.flatMap(_.min.map(_.toString)).getOrElse("NULL"),
+cs.flatMap(_.max.map(_.toString)).getOrElse("NULL"),
+cs.map(_.nullCount.toString).getOrElse("NULL"),
+cs.map(_.distinctCount.toString).getOrElse("NULL"),
+cs.map(_.avgLen.toString).getOrElse("NULL"),
+

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126344451
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,120 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+colNameParts: Seq[String],
+isExtended: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isExtended) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"maximum length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val relation = sparkSession.table(table).queryExecution.analyzed
+val field = {
+  relation.resolve(colNameParts, resolver).getOrElse {
+throw new AnalysisException(s"Column 
${UnresolvedAttribute(colNameParts).name} does not " +
+  s"exist")
+  }
+}
+if (!field.isInstanceOf[Attribute]) {
+  // If the field is not an attribute after `resolve`, then it's a 
nested field.
+  throw new AnalysisException(s"DESC TABLE COLUMN command is not 
supported for nested column:" +
+s" ${UnresolvedAttribute(colNameParts).name}")
+}
+
+val catalogTable = catalog.getTempViewOrPermanentTableMetadata(table)
+val colStats = catalogTable.stats.map(_.colStats).getOrElse(Map.empty)
+val cs = colStats.get(field.name)
--- End diff --

nit: `val colStats = catalogTable.stats.flatMap(_.colStats.get(field.name))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126344265
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,120 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+colNameParts: Seq[String],
+isExtended: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isExtended) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"maximum length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val relation = sparkSession.table(table).queryExecution.analyzed
+val field = {
--- End diff --

nit:
```
val field = relation.resolve(colNameParts, resolver).getOrElse {
  ...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-08 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126275817
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-table-column.sql.out ---
@@ -0,0 +1,133 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 15
+
+
+-- !query 0
+CREATE TEMPORARY VIEW desc_col_temp_table (key int COMMENT 
'column_comment') USING PARQUET
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+DESC desc_col_temp_table key
+-- !query 1 schema
+struct
+-- !query 1 output
+col_name   data_type   comment 
+keyint column_comment
+
+
+-- !query 2
+DESC EXTENDED desc_col_temp_table key
--- End diff --

ok, I'll make them identical.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-08 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126275601
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,117 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+colNameParts: Seq[String],
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"maximum length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val relation = sparkSession.table(table).queryExecution.analyzed
+val attribute = {
+  val field = relation.resolve(
--- End diff --

right, that's better, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-08 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126275587
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -320,10 +320,15 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+DescribeColumnCommand(
+  visitTableIdentifier(ctx.tableIdentifier),
+  ctx.describeColName.nameParts.asScala.map(_.getText),
+  ctx.FORMATTED != null)
--- End diff --

ok. I'll change this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-07 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126275554
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-table-column.sql.out ---
@@ -0,0 +1,133 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 15
+
+
+-- !query 0
+CREATE TEMPORARY VIEW desc_col_temp_table (key int COMMENT 
'column_comment') USING PARQUET
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+DESC desc_col_temp_table key
+-- !query 1 schema
+struct
+-- !query 1 output
+col_name   data_type   comment 
+keyint column_comment
+
+
+-- !query 2
+DESC EXTENDED desc_col_temp_table key
--- End diff --

Since we already treat `DESC EXTENDED` and `DESC FORMATTED` identical for 
the tables, let us make DESC EXTENDED and FORMATTED the same for the column too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-07 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126275528
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -320,10 +320,15 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+DescribeColumnCommand(
+  visitTableIdentifier(ctx.tableIdentifier),
+  ctx.describeColName.getText,
+  ctx.FORMATTED != null)
--- End diff --

I think we are fine to keep EXTENDED and FORMATTED the same. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-07 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126274465
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,117 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+colNameParts: Seq[String],
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"maximum length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val relation = sparkSession.table(table).queryExecution.analyzed
+val attribute = {
+  val field = relation.resolve(
--- End diff --

instead of changing code in `LogicalPlan`, can't we just do
```
val field = relation.resolve(colNameParts, resolver).getOrElse {
  throw ...
}
if (!field.isInstanceOf[Attribute]) {
  throw ...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-07 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r126274424
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -320,10 +320,15 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+DescribeColumnCommand(
+  visitTableIdentifier(ctx.tableIdentifier),
+  ctx.describeColName.nameParts.asScala.map(_.getText),
+  ctx.FORMATTED != null)
--- End diff --

can we use `ctx.EXTENDED != null || ctx.FORMATTED != null`? I think this is 
more reasonable, although different from hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-06 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r125930485
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -262,7 +262,7 @@ describeFuncName
 ;
 
 describeColName
-: identifier ('.' (identifier | STRING))*
+: identifier
--- End diff --

@gatorsmile I have an idea to fix this, let me try it tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-05 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r125805788
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -320,10 +320,15 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+DescribeColumnCommand(
+  visitTableIdentifier(ctx.tableIdentifier),
+  ctx.describeColName.getText,
+  ctx.FORMATTED != null)
--- End diff --

I just tested in hive 2.1, it only show detailed column info with 
`formatted`, shall we keep the same as hive?
```
hive> desc test key;
OK
key int from deserializer   
Time taken: 0.063 seconds, Fetched: 1 row(s)
hive> desc extended test key;
OK
key int from deserializer   
Time taken: 0.053 seconds, Fetched: 1 row(s)
hive> desc formatted test key;
OK
# col_name  data_type   min 
max num_nulls   distinct_count  
avg_col_len max_col_len num_trues   
num_falses  comment 

 
key int 1   
2   0   2   

from deserializer   
Time taken: 0.067 seconds, Fetched: 3 row(s)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-05 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r125801248
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -39,6 +39,40 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 
   setupTestData()
 
+  test("describe table column") {
--- End diff --

yea, that's better, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-05 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r125800773
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -262,7 +262,7 @@ describeFuncName
 ;
 
 describeColName
-: identifier ('.' (identifier | STRING))*
+: identifier
--- End diff --

That was my previous approach. As @cloud-fan mentioned in 
[#discussion_r122347812](https://github.com/apache/spark/pull/16422#discussion_r122347812)
 and 
[#discussion_r122348197](https://github.com/apache/spark/pull/16422#discussion_r122348197),
 it's hard to tell if "col.a" is a qualified name or a nested name, or maybe a 
more complex example:
```
`a.b`.`c.d`
```
```
`a.b.c.d`
```
It may be very tricky to deal with such cases. What's your opinion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r125692588
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala ---
@@ -158,6 +158,27 @@ class StatisticsCollectionSuite extends 
StatisticsCollectionTestBase with Shared
 }
   }
 
+  test("desc column with stats") {
--- End diff --

This can be moved to the new .sql too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r125692285
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -39,6 +39,40 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 
   setupTestData()
 
+  test("describe table column") {
--- End diff --

How about creating `describe-table-column.sql` and move them there? It also 
can help reviewers read the output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r125690880
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -320,10 +320,15 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+DescribeColumnCommand(
+  visitTableIdentifier(ctx.tableIdentifier),
+  ctx.describeColName.getText,
+  ctx.FORMATTED != null)
--- End diff --

How about creating a variable and then sharing it in both code paths?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r125690472
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -320,10 +320,15 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+DescribeColumnCommand(
+  visitTableIdentifier(ctx.tableIdentifier),
+  ctx.describeColName.getText,
+  ctx.FORMATTED != null)
--- End diff --

`ctx.EXTENDED != null || ctx.FORMATTED != null`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-05 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r125689057
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -262,7 +262,7 @@ describeFuncName
 ;
 
 describeColName
-: identifier ('.' (identifier | STRING))*
+: identifier
--- End diff --

We should keep it the same, but just issue an exception when users specify 
nested columns. Also add a test case to verify it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-07-05 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r125563599
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -262,7 +262,7 @@ describeFuncName
 ;
 
 describeColName
-: identifier ('.' (identifier | STRING))*
+: identifier
--- End diff --

@cloud-fan I changed the syntax here so that the command supports quoted 
column name but not nested columns. Could you take another look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-06-24 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r123872260
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -619,6 +620,104 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
--- End diff --

@cloud-fan If we want to get the data type for the given column name, we 
need to get the CatalogTable like we do in `def run(sparkSession)`, but seems 
we can't get that here in `output`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-06-16 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r122517212
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -619,6 +620,104 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"maximum length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val catalogTable = catalog.getTempViewOrPermanentTableMetadata(table)
+val attribute = {
+  val field = catalogTable.schema.find(f => resolver(f.name, column))
+  field.getOrElse {
+if (column.contains(".")) {
+  throw new AnalysisException(
+s"DESC TABLE COLUMN is not supported for nested column: 
$column")
+} else {
+  throw new AnalysisException(s"Column $column does not exist.")
+}
+  }
+}
+
+val colStats = catalogTable.stats.map(_.colStats).getOrElse(Map.empty)
+val cs = colStats.get(attribute.name)
+
+val comment = if (attribute.metadata.contains("comment")) {
+  Option(attribute.metadata.getString("comment"))
+} else {
+  None
+}
+
+val result = if (isFormatted) {
+  // Show column stats only when formatted is specified.
+  Row(
+attribute.name,
+attribute.dataType.simpleString,
+cs.flatMap(_.min.map(_.toString)).orNull,
+cs.flatMap(_.max.map(_.toString)).orNull,
+cs.map(_.nullCount.toString).orNull,
+cs.map(_.distinctCount.toString).orNull,
+cs.map(_.avgLen.toString).orNull,
+cs.map(_.maxLen.toString).orNull,
+comment.orNull)
+} else {
+  Row(
+attribute.name,
+attribute.dataType.simpleString,
--- End diff --

`simpleString` -> `catalogString`. We do not

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-06-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r122348197
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -619,6 +620,104 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"maximum length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val catalogTable = catalog.getTempViewOrPermanentTableMetadata(table)
+val attribute = {
+  val field = catalogTable.schema.find(f => resolver(f.name, column))
+  field.getOrElse {
+if (column.contains(".")) {
--- End diff --

what if users run
```
DESC t1 `col1.a`
```
They are looking for a special column named "col1.a", but not nested column


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-06-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r122348016
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -619,6 +620,104 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
--- End diff --

shall we use the actual column type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-06-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r122347812
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -306,10 +306,15 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+DescribeColumnCommand(
+  visitTableIdentifier(ctx.tableIdentifier),
+  ctx.describeColName.getText,
--- End diff --

the column name syntax is:
```
describeColName
: identifier ('.' (identifier | STRING))*
;
```
since we will always have a table name before column name, do we really 
need to support qualified column name here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-06-15 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r122347841
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -306,10 +306,15 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder(conf) {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+DescribeColumnCommand(
+  visitTableIdentifier(ctx.tableIdentifier),
+  ctx.describeColName.getText,
--- End diff --

or are we going to describe nested column?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-06-11 Thread wzhfy

GitHub user wzhfy reopened a pull request:

https://github.com/apache/spark/pull/16422

[SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED table column commands

## What changes were proposed in this pull request?

Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command.
Support DESC FORMATTED TABLE COLUMN command to show column-level statistics.

## How was this patch tested?

Add test cases

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark descColumn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16422.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16422


commit 3058ab1ab8540b596f14c0bd1fb205f23c887452
Author: Zhenhua Wang 
Date:   2016-12-28T13:47:34Z

support desc table column

commit d41a9cd5980c5c9cadefd65a09d059ff8b9266e4
Author: wangzhenhua 
Date:   2016-12-29T00:55:20Z

remove parser error interception because now we support it

commit 30cb1ae3bc7a8cb0b64fe570da9d8dadc24a3cf2
Author: Zhenhua Wang 
Date:   2016-12-29T12:03:49Z

fix comments

commit 4a68ed63bc993de438a42bcac6e47f5940b97320
Author: Zhenhua Wang 
Date:   2017-01-03T08:28:48Z

1.use getTempViewOrPermanentTableMetadata, 2.postpone nested column 
detection to run()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-05-23 Thread wzhfy

Github user wzhfy closed the pull request at:

https://github.com/apache/spark/pull/16422


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-01-02 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94361604
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
+if (columnName.contains(".")) {
+  throw new ParseException(
+"DESC TABLE COLUMN for an inner column of a nested type is not 
supported", ctx)
--- End diff --

Sure, you can try it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-01-02 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94357987
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
+if (columnName.contains(".")) {
+  throw new ParseException(
+"DESC TABLE COLUMN for an inner column of a nested type is not 
supported", ctx)
--- End diff --

In this case, `formatted` becomes table identifier. Should I postpone 
detection of nested column to `run()` method of DescColumnCommand? Then the 
existence of table idenfifier will be checked first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-30 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94267919
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
+if (columnName.contains(".")) {
+  throw new ParseException(
+"DESC TABLE COLUMN for an inner column of a nested type is not 
supported", ctx)
--- End diff --

This might generate a confusing error message.
```
sql("describe formatted default.tab1.s").show(false)
org.apache.spark.sql.catalyst.parser.ParseException:
DESC TABLE COLUMN for an inner column of a nested type is not 
supported(line 1, pos 0)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94208490
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
--- End diff --

I assume we are following Hive syntax here? What is the behavior of Hive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94208116
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
--- End diff --

mysql? Sorry, I do not follow what you asked above. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-29 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94202748
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,100 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"max length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val attribute = {
+  val field = catalog.lookupRelation(table).schema.find(f => 
resolver(f.name, column))
--- End diff --

ok, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-29 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94202675
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
--- End diff --

It seems mysql doesn't support struct or nested types? @gatorsmile Can you 
give some advice on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94199600
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,100 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] = {
+// The displayed names are based on Hive.
+// (Link for the corresponding Hive Jira: 
https://issues.apache.org/jira/browse/HIVE-7050)
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"max length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+  }
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val resolver = sparkSession.sessionState.conf.resolver
+val attribute = {
+  val field = catalog.lookupRelation(table).schema.find(f => 
resolver(f.name, column))
--- End diff --

shall we call `getTempViewOrPermanentTableMetadata`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94199552
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,21 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  if (ctx.partitionSpec != null) {
+throw new ParseException("DESC TABLE COLUMN for a specific 
partition is not supported", ctx)
+  } else {
+val columnName = ctx.describeColName.getText
--- End diff --

the parser rule for the column name here:
```
describeColName
: identifier ('.' (identifier | STRING))*
;
```
can we just make it `identifier`? "a.b" should refer to a column named 
"a.b", or the inner field "b" from column "a"? let's check with other databases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94107668
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] =
+// Column names are based on Hive.
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"max length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val result = new ArrayBuffer[Row]
--- End diff --

yea I'll delete it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94107442
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
--- End diff --

I will remove it since the result is same with or without `isExtended`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94106620
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  DescribeColumnCommand(
--- End diff --

yes we should


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94106492
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] =
+// Column names are based on Hive.
--- End diff --

I got these names by running hive. I can't find any document about the 
names, but I'll add a link of the corresponding JIRA of Hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94101640
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] =
+// Column names are based on Hive.
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"max length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val result = new ArrayBuffer[Row]
+val catalog = sparkSession.sessionState.catalog
+// Get the attribute referring to the given column
+val attribute = sparkSession.sessionState.executePlan(
--- End diff --

I don't get it, so you get the attribute just for column comment and name 
and data type?  I think `CatalogTable.schema` already have this information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94101475
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] =
+// Column names are based on Hive.
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"max length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val result = new ArrayBuffer[Row]
--- End diff --

why we create an `ArrayBuffer`? Doesn't it always return a single row?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94101445
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
--- End diff --

where do we use `isExtended`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94101427
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] =
+// Column names are based on Hive.
--- End diff --

can you add a link to the hive spec about this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94101415
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
--- End diff --

please follow other commands and add more description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94101398
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  DescribeColumnCommand(
--- End diff --

shall we throw exception here if partition spec is given?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread wzhfy

GitHub user wzhfy opened a pull request:

https://github.com/apache/spark/pull/16422

[SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED table column commands

## What changes were proposed in this pull request?

Support DESC (EXTENDED | FORMATTED) ? TABLE COLUMN command.
Support DESC FORMATTED TABLE COLUMN command to show column-level statistics.

## How was this patch tested?

Add test cases

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark descColumn

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16422.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16422


commit 3058ab1ab8540b596f14c0bd1fb205f23c887452
Author: Zhenhua Wang 
Date:   2016-12-28T13:47:34Z

support desc table column




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

65 matches

Mail list logo