[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-24 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16594


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-23 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102887155
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -794,6 +795,7 @@ EXPLAIN: 'EXPLAIN';
 FORMAT: 'FORMAT';
 LOGICAL: 'LOGICAL';
 CODEGEN: 'CODEGEN';
+COST: 'COST';
--- End diff --

Thanks! Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102775882
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -794,6 +795,7 @@ EXPLAIN: 'EXPLAIN';
 FORMAT: 'FORMAT';
 LOGICAL: 'LOGICAL';
 CODEGEN: 'CODEGEN';
+COST: 'COST';
--- End diff --

Yes. Also please update the `hiveNonReservedKeyword` in 
`TableIdentifierParserSuite`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102647596
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -794,6 +795,7 @@ EXPLAIN: 'EXPLAIN';
 FORMAT: 'FORMAT';
 LOGICAL: 'LOGICAL';
 CODEGEN: 'CODEGEN';
+COST: 'COST';
--- End diff --

also put in it `nonReserved`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-22 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102560014
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +57,29 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Show the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
+val decimalValue = BigDecimal(number, new MathContext(3, 
RoundingMode.HALF_UP))
+if (isSize) {
+  // The largest unit in Utils.bytesToString is TB
+  val PB = 1L << 50
+  if (number < 2 * PB) {
+// The number is not very large, so we can use Utils.bytesToString 
to show it.
+Utils.bytesToString(number.toLong)
+  } else {
+// The number is too large, show it in scientific notation.
+decimalValue.toString() + " B"
+  }
+} else {
+  decimalValue.toString()
--- End diff --

We can't make them consistent here, because unit string is added inside 
`Utils.bytesToString`.
How about move the logic in for size into `Utils.bytesToString` and make it 
support BigInt?
Then we can remove `def format`:
```
  def simpleString: String = {
Seq(s"sizeInBytes=${Utils.bytesToString(sizeInBytes)}",
  if (rowCount.isDefined) {
// Show row count in scientific notation.
s"rowCount=${BigDecimal(rowCount.get, new MathContext(3, 
RoundingMode.HALF_UP)).toString()}"
  } else {
""
  },
  s"isBroadcastable=$isBroadcastable"
).filter(_.nonEmpty).mkString(", ")
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102410793
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +57,29 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Show the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
+val decimalValue = BigDecimal(number, new MathContext(3, 
RoundingMode.HALF_UP))
+if (isSize) {
+  // The largest unit in Utils.bytesToString is TB
+  val PB = 1L << 50
+  if (number < 2 * PB) {
+// The number is not very large, so we can use Utils.bytesToString 
to show it.
+Utils.bytesToString(number.toLong)
+  } else {
+// The number is too large, show it in scientific notation.
+decimalValue.toString() + " B"
+  }
+} else {
+  decimalValue.toString()
--- End diff --

With or without units, the readability is the same, right? If we make them 
consistent, the impl of `def format(number: BigInt)` will look much cleaner. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102409892
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -282,7 +282,8 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
 if (statement == null) {
   null  // This is enough since ParseException will raise later.
 } else if (isExplainableStatement(statement)) {
-  ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = 
ctx.CODEGEN != null)
+  ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = 
ctx.CODEGEN != null,
+cost = ctx.COST != null)
--- End diff --

```
  ExplainCommand(
statement,
extended = ctx.EXTENDED != null,
codegen = ctx.CODEGEN != null,
cost = ctx.COST != null)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102399167
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +57,29 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Show the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
+val decimalValue = BigDecimal(number, new MathContext(3, 
RoundingMode.HALF_UP))
+if (isSize) {
+  // The largest unit in Utils.bytesToString is TB
--- End diff --

yea, I also think TB is a little small


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102398658
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -282,7 +282,8 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
 if (statement == null) {
   null  // This is enough since ParseException will raise later.
 } else if (isExplainableStatement(statement)) {
-  ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = 
ctx.CODEGEN != null)
+  ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = 
ctx.CODEGEN != null,
+cost = ctx.COST != null)
--- End diff --

Can you give a clue on the style?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-21 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102398371
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +57,29 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Show the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
+val decimalValue = BigDecimal(number, new MathContext(3, 
RoundingMode.HALF_UP))
+if (isSize) {
+  // The largest unit in Utils.bytesToString is TB
+  val PB = 1L << 50
+  if (number < 2 * PB) {
+// The number is not very large, so we can use Utils.bytesToString 
to show it.
+Utils.bytesToString(number.toLong)
+  } else {
+// The number is too large, show it in scientific notation.
+decimalValue.toString() + " B"
+  }
+} else {
+  decimalValue.toString()
--- End diff --

I'm not sure, will that be more readable than scientific notation if no 
unit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102138925
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +57,29 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Show the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
+val decimalValue = BigDecimal(number, new MathContext(3, 
RoundingMode.HALF_UP))
+if (isSize) {
+  // The largest unit in Utils.bytesToString is TB
+  val PB = 1L << 50
+  if (number < 2 * PB) {
+// The number is not very large, so we can use Utils.bytesToString 
to show it.
+Utils.bytesToString(number.toLong)
+  } else {
+// The number is too large, show it in scientific notation.
+decimalValue.toString() + " B"
+  }
+} else {
+  decimalValue.toString()
--- End diff --

https://en.wikipedia.org/wiki/Metric_prefix

Even if we do not have a unit, we still can use K, M, G, T, P, E?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102138379
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +57,29 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Show the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
+val decimalValue = BigDecimal(number, new MathContext(3, 
RoundingMode.HALF_UP))
+if (isSize) {
+  // The largest unit in Utils.bytesToString is TB
--- End diff --

How about improving `bytesToString` and make it support PB or higher? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102137730
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala 
---
@@ -92,7 +92,8 @@ case class ExecutedCommandExec(cmd: RunnableCommand) 
extends SparkPlan {
 case class ExplainCommand(
 logicalPlan: LogicalPlan,
 extended: Boolean = false,
-codegen: Boolean = false)
+codegen: Boolean = false,
+cost: Boolean = false)
--- End diff --

Please add `@parm` like the other parameters 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102137661
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -282,7 +282,8 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
 if (statement == null) {
   null  // This is enough since ParseException will raise later.
 } else if (isExplainableStatement(statement)) {
-  ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = 
ctx.CODEGEN != null)
+  ExplainCommand(statement, extended = ctx.EXTENDED != null, codegen = 
ctx.CODEGEN != null,
+cost = ctx.COST != null)
--- End diff --

Need to fix the style.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102137390
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -197,20 +197,32 @@ class QueryExecution(val sparkSession: SparkSession, 
val logical: LogicalPlan) {
   """.stripMargin.trim
   }
 
-  override def toString: String = {
+  override def toString: String = completeString(appendStats = false)
+
+  def toStringWithStats: String = completeString(appendStats = true)
+
+  def completeString(appendStats: Boolean): String = {
--- End diff --

private?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-02-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r102137142
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +57,29 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Show the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
+val decimalValue = BigDecimal(number, new MathContext(3, 
RoundingMode.HALF_UP))
+if (isSize) {
+  // The largest unit in Utils.bytesToString is TB
+  val PB = 1L << 50
+  if (number < 2 * PB) {
+// The number is not very large, so we can use Utils.bytesToString 
to show it.
+Utils.bytesToString(number.toLong)
+  } else {
+// The number is too large, show it in scientific notation.
+decimalValue.toString() + " B"
+  }
+} else {
+  decimalValue.toString()
--- End diff --

Always represent it using scientific notation? Or only do it when the 
number is too large?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-23 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r97482084
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +56,32 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Print the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
--- End diff --

I'll try to use that method in combination with current logic, thanks for 
reminding


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-23 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r97481455
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +56,32 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Print the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
--- End diff --

That method can only accepts Long parameter, and estimated stats can still 
be unreadable even when using TB as unit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r97478978
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala
 ---
@@ -54,11 +56,32 @@ case class Statistics(
 
   /** Readable string representation for the Statistics. */
   def simpleString: String = {
-Seq(s"sizeInBytes=$sizeInBytes",
-  if (rowCount.isDefined) s"rowCount=${rowCount.get}" else "",
+Seq(s"sizeInBytes=${format(sizeInBytes, isSize = true)}",
+  if (rowCount.isDefined) s"rowCount=${format(rowCount.get, isSize = 
false)}" else "",
   s"isBroadcastable=$isBroadcastable"
 ).filter(_.nonEmpty).mkString(", ")
   }
+
+  /** Print the given number in a readable format. */
+  def format(number: BigInt, isSize: Boolean): String = {
--- End diff --

We are having [`bytesToString` 
](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1109-L1132)
 in Utils.scala





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-22 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r97217477
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -649,6 +649,14 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val SHOW_STATS_IN_EXPLAIN =
--- End diff --

OK. But since it influences user interface, let's double check with others. 
@rxin @hvanhovell @cloud-fan Shall we show stats of LogicalPlan directly in 
explain command ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r97216973
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -649,6 +649,14 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val SHOW_STATS_IN_EXPLAIN =
--- End diff --

If the `sizeInBytes` affects the plan decision, I think it makes sense to 
let users see it. 

When the plan is not expected and the number is super large, they might 
turn on CBO or trigger the command to re-analyze the tables. Hiding it looks 
not right to me, even if the number is ugly. : )



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-22 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r97216860
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -649,6 +649,14 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val SHOW_STATS_IN_EXPLAIN =
--- End diff --

I'm not sure. e.g., after joins of many tables, if `sizeInBytes` is 
computed by the simple way (non-cbo way), we just multiply all the sizes of 
these tables, then `sizeInBytes` becomes a ridiculously large value. I think 
this will harm user experience.
I agree removing the flag can simplify code a lot, but I'm hesitated to 
expose such information to all users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-21 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r97212822
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -649,6 +649,14 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val SHOW_STATS_IN_EXPLAIN =
--- End diff --

Then, when the stats are not accurate, will it be the cause of an 
inefficient plan? If so, why not showing them the number?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-18 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r96589910
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -649,6 +649,14 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val SHOW_STATS_IN_EXPLAIN =
--- End diff --

It's invalidated by default because stats info can be inaccurate (and in 
some cases very inaccurate), and can confuse regular users. At current stage 
it's better to be a feature for administrators and developers to see how cbo 
behaves in estimation. So I make the flag "internal".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-18 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r96588756
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -649,6 +649,14 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val SHOW_STATS_IN_EXPLAIN =
--- End diff --

`SHOW_TABLE_STATS_IN_EXPLAIN` could be misleading, because we are not only 
showing stats for table, but also for all logical plans.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-18 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r96585594
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala
 ---
@@ -27,6 +27,21 @@ import org.apache.spark.sql.test.SQLTestUtils
  */
 class HiveExplainSuite extends QueryTest with SQLTestUtils with 
TestHiveSingleton {
 
+  test("show stats in explain command") {
+withSQLConf("spark.sql.statistics.showInExplain" -> "false") {
+  checkKeywordsNotExist(sql(" explain  select * from src "), 
"sizeInBytes", "rowCount")
--- End diff --

thanks, fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r96473964
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -649,6 +649,14 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val SHOW_STATS_IN_EXPLAIN =
--- End diff --

If we do it by default, it can simplify this PR a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r96473423
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -649,6 +649,14 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val SHOW_STATS_IN_EXPLAIN =
--- End diff --

Why not doing this by default? Do we need an extra flag?

If needed, the name should be `SHOW_TABLE_STATS_IN_EXPLAIN`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r96473148
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala
 ---
@@ -27,6 +27,21 @@ import org.apache.spark.sql.test.SQLTestUtils
  */
 class HiveExplainSuite extends QueryTest with SQLTestUtils with 
TestHiveSingleton {
 
+  test("show stats in explain command") {
+withSQLConf("spark.sql.statistics.showInExplain" -> "false") {
+  checkKeywordsNotExist(sql(" explain  select * from src "), 
"sizeInBytes", "rowCount")
--- End diff --

A general style suggestion. Normally, the SQL keywords are using upper case 
in the test cases.

`explain  select * from src` -> `EXPLAIN SELECT * FROM src`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-15 Thread wzhfy
GitHub user wzhfy opened a pull request:

https://github.com/apache/spark/pull/16594

[SPARK-17078] [SQL] Show stats when explain

## What changes were proposed in this pull request?

Currently we can only check the estimated stats in logical plans by 
debugging. We need to provide an easier and more efficient way for 
developers/users.
In this pr, we add an internal conf, when it's true, we can check the stats 
by explain extended command.

## How was this patch tested?

Add test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark showStats

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16594.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16594


commit c3489fcad32caa1d6a9b7182e387a46aae5710fa
Author: wangzhenhua 
Date:   2017-01-16T07:24:23Z

show stats in explain command




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org