[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r216119774 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -969,6 +969,20 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkShowString(ds, expected) } + test("SPARK-24442 Show should follow spark.show.default.number.of.rows") { +withSQLConf("spark.sql.show.defaultNumRows" -> "100") { + (1 to 1000).toDS().as[Int].show +} + } + + test("SPARK-24442 Show should follow spark.show.default.truncate.characters.per.column") { +withSQLConf("spark.sql.show.truncateMaxCharsPerColumn" -> "30") { + (1 to 1000).map(x => "123456789_123456789_123456789_") +.toDS().as[String] +.show --- End diff -- Plz compare the show result? https://github.com/apache/spark/blob/9241e1e7e66574cfafa68791771959dfc39c9684/sql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala#L326 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r216119706 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -969,6 +969,20 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkShowString(ds, expected) } + test("SPARK-24442 Show should follow spark.show.default.number.of.rows") { +withSQLConf("spark.sql.show.defaultNumRows" -> "100") { + (1 to 1000).toDS().as[Int].show +} + } + + test("SPARK-24442 Show should follow spark.show.default.truncate.characters.per.column") { +withSQLConf("spark.sql.show.truncateMaxCharsPerColumn" -> "30") { --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r216119703 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -969,6 +969,20 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkShowString(ds, expected) } + test("SPARK-24442 Show should follow spark.show.default.number.of.rows") { +withSQLConf("spark.sql.show.defaultNumRows" -> "100") { --- End diff -- `SQLConf.SQL_SHOW_DEFAULT_MAX_ROWS.key -> "100"` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r216119684 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1950,6 +1962,11 @@ class SQLConf extends Serializable with Logging { def parallelFileListingInStatsComputation: Boolean = getConf(SQLConf.PARALLEL_FILE_LISTING_IN_STATS_COMPUTATION) + def sqlShowDefaultMaxRecordsToShow: Int = getConf(SQLConf.SQL_SHOW_DEFAULT_MAX_ROWS) + + def sqlShowDefaultMaxCharsPerTruncatedRow: Int = --- End diff -- `sqlShowTruncateMaxCharsPerColumn`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r216119676 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1950,6 +1962,11 @@ class SQLConf extends Serializable with Logging { def parallelFileListingInStatsComputation: Boolean = getConf(SQLConf.PARALLEL_FILE_LISTING_IN_STATS_COMPUTATION) + def sqlShowDefaultMaxRecordsToShow: Int = getConf(SQLConf.SQL_SHOW_DEFAULT_MAX_ROWS) --- End diff -- `sqlShowDefaultMaxRows`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user AndrewKL commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r216045605 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkShowString(ds, expected) } + + test("SPARK-2444git stat2 Show should follow spark.show.default.number.of.rows") { +withSQLConf("spark.sql.show.defaultNumRows" -> "100") { + val ds = (1 to 1000).toDS().as[Int].show --- End diff -- The way show is currently implemented makes it difficult to fully test this. The truncate and max length parameters are passed into the showString function that would normally be used to test the out put. unfortunately the parameters and final string is then passed into the println function without an easy way to capture the results. This could be tested with a spying library but there isn't one in in spark (yet). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user AndrewKL commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r215618109 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkShowString(ds, expected) } + + test("SPARK-2444git stat2 Show should follow spark.show.default.number.of.rows") { --- End diff -- removed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user AndrewKL commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r215617996 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt --- End diff -- will do --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user AndrewKL commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r215617577 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt + } + + /** + * Returns the default max characters per column to show before truncation when + * the show function is called with truncate. + * @since 2.3.0 --- End diff -- I'll update this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user AndrewKL commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r215617515 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 --- End diff -- I'll update this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213158510 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt --- End diff -- How about `spark.sql.defaultNumRowsInShow`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213158056 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkShowString(ds, expected) } + + test("SPARK-2444git stat2 Show should follow spark.show.default.number.of.rows") { +withSQLConf("spark.sql.show.defaultNumRows" -> "100") { + val ds = (1 to 1000).toDS().as[Int].show --- End diff -- I think its ok to check the output number of rows in show. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213157406 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213026874 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { --- End diff -- we shouldn't be adding methods here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213010807 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt + } + + /** + * Returns the default max characters per column to show before truncation when + * the show function is called with truncate. + * @since 2.3.0 --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213010706 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt + } + + /** + * Returns the default max characters per column to show before truncation when + * the show function is called with truncate. + * @since 2.3.0 + */ + private def maxCharactersPerColumnToShow(): Int = { --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213010879 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 --- End diff -- not needed as this is private, moreover, it'd be since 2.4.0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213010672 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { --- End diff -- I'd remove the `()` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r212832846 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkShowString(ds, expected) } + + test("SPARK-2444git stat2 Show should follow spark.show.default.number.of.rows") { +withSQLConf("spark.sql.show.defaultNumRows" -> "100") { + val ds = (1 to 1000).toDS().as[Int].show --- End diff -- Do we need to compare the result with a expected string using `assert`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r212832850 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkShowString(ds, expected) } + + test("SPARK-2444git stat2 Show should follow spark.show.default.number.of.rows") { +withSQLConf("spark.sql.show.defaultNumRows" -> "100") { + val ds = (1 to 1000).toDS().as[Int].show +} + } + + test("SPARK-24442 Show should follow spark.show.default.truncate.characters.per.column") { +withSQLConf("spark.sql.show.truncateMaxCharsPerColumn" -> "30") { + val ds = (1 to 1000) +.map(x => "123456789_123456789_123456789_") +.toDS().as[String] +.show --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r212832795 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -969,6 +969,22 @@ class DatasetSuite extends QueryTest with SharedSQLContext { checkShowString(ds, expected) } + + test("SPARK-2444git stat2 Show should follow spark.show.default.number.of.rows") { --- End diff -- nit: Do we remove `git stat2`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r212832804 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt + } + + /** + * Returns the default max characters per column to show before truncation when + * the show function is called with truncate. + * @since 2.3.0 + */ + private def maxCharactersPerColumnToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.truncateMaxCharsPerColumn", "20").toInt --- End diff -- ditto --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r212832734 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r212801959 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString(numRows, truncate, vertical)) // scalastyle:on println + /** + * Returns the default number of rows to show when the show function is called without + * a user specified max number of rows. + * @since 2.3.0 + */ + private def numberOfRowsToShow(): Int = { +this.sparkSession.conf.get("spark.sql.show.defaultNumRows", "20").toInt --- End diff -- I would add it to `SQLConf` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...
GitHub user AndrewKL opened a pull request: https://github.com/apache/spark/pull/22162 [spark-24442][SQL] Added parameters to control the default number of ⦠â¦records shown and the truncate length when a user calls .show() on a dataframe ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-24442 ## How was this patch tested? Unit tests plus local testing. This change is designed to not modify default behavior unless a user set the following configs... "spark.sql.show.defaultNumRows" ->default 20 (as it was before) "spark.sql.show.truncateMaxCharsPerColumn" -> default 30 (as it was before) You can merge this pull request into a Git repository by running: $ git pull https://github.com/AndrewKL/spark spark-24442 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22162.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22162 commit ed5991e53fe404b972c5a0a79febdd0398182cf6 Author: Long Date: 2018-08-20T21:19:52Z [spark-24442][SQL] Added parameters to control the default number of records shown and the truncate length when a user calls .show() on a dataframe --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org