[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user mgaido91 closed the pull request at: https://github.com/apache/spark/pull/19447 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143540779 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -279,11 +279,11 @@ class CodegenContext { inlineToOuterClass: Boolean = false): String = { // The number of named constants that can exist in the class is limited by the Constant Pool // limit, 65,536. We cannot know how many constants will be inserted for a class, so we use a -// threshold of 1600k bytes to determine when a function should be inlined to a private, nested -// sub-class. +// threshold to determine when a function should be inlined to a private, nested sub-class +val generatedClassLengthThreshold = SQLConf.get.generatedClassLengthThreshold --- End diff -- thanks for the interesting link. I assume using SparkEnv has the same problem, hasn't it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143527821 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -279,11 +279,11 @@ class CodegenContext { inlineToOuterClass: Boolean = false): String = { // The number of named constants that can exist in the class is limited by the Constant Pool // limit, 65,536. We cannot know how many constants will be inserted for a class, so we use a -// threshold of 1600k bytes to determine when a function should be inlined to a private, nested -// sub-class. +// threshold to determine when a function should be inlined to a private, nested sub-class +val generatedClassLengthThreshold = SQLConf.get.generatedClassLengthThreshold --- End diff -- Good to know [this discussion](https://github.com/apache/spark/pull/19449#pullrequestreview-67789784) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143228937 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -934,6 +934,15 @@ object SQLConf { .intConf .createWithDefault(1) + val GENERATED_CLASS_LENGTH_THRESHOLD = +buildConf("spark.sql.codegen.generatedClass.size.threshold") + .doc("Threshold in bytes for the size of a generated class. If the generated class " + +"size is higher of this value, a private nested class is created and used." + +"This is useful to limit the number of named constants in the class " + +"and therefore its Constant Pool. The default is 1600k.") + .intConf --- End diff -- Please add `.checkValue` here with a valid minimum and maximum. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143228854 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -934,6 +934,15 @@ object SQLConf { .intConf .createWithDefault(1) + val GENERATED_CLASS_LENGTH_THRESHOLD = +buildConf("spark.sql.codegen.generatedClass.size.threshold") + .doc("Threshold in bytes for the size of a generated class. If the generated class " + --- End diff -- thanks for the explanation, I'll add it immediately. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143227547 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -934,6 +934,15 @@ object SQLConf { .intConf .createWithDefault(1) + val GENERATED_CLASS_LENGTH_THRESHOLD = +buildConf("spark.sql.codegen.generatedClass.size.threshold") + .doc("Threshold in bytes for the size of a generated class. If the generated class " + --- End diff -- Users can change this value in the same way with other public parameters, but `internal` just means that parameters are not explicitly exposed to users. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143224951 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -934,6 +934,15 @@ object SQLConf { .intConf .createWithDefault(1) + val GENERATED_CLASS_LENGTH_THRESHOLD = +buildConf("spark.sql.codegen.generatedClass.size.threshold") + .doc("Threshold in bytes for the size of a generated class. If the generated class " + --- End diff -- I agree with you, but might you please explain me how to set a parameter which is internal? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143223820 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -934,6 +934,15 @@ object SQLConf { .intConf .createWithDefault(1) + val GENERATED_CLASS_LENGTH_THRESHOLD = +buildConf("spark.sql.codegen.generatedClass.size.threshold") + .doc("Threshold in bytes for the size of a generated class. If the generated class " + --- End diff -- I think this is not a frequently-used parameter for users. Also, other JVM implimentation-specific parameters are internal, e.g., `WHOLESTAGE_HUGE_METHOD_LIMIT`. cc: @rednaxelafx --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143220404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -934,6 +934,15 @@ object SQLConf { .intConf .createWithDefault(1) + val GENERATED_CLASS_LENGTH_THRESHOLD = +buildConf("spark.sql.codegen.generatedClass.size.threshold") + .doc("Threshold in bytes for the size of a generated class. If the generated class " + --- End diff -- if users cannot set it, how can this parameter be changed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143219623 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -934,6 +934,15 @@ object SQLConf { .intConf .createWithDefault(1) + val GENERATED_CLASS_LENGTH_THRESHOLD = +buildConf("spark.sql.codegen.generatedClass.size.threshold") + .doc("Threshold in bytes for the size of a generated class. If the generated class " + --- End diff -- Please add `.internal()`. I think this is not a parameter for users. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143217286 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -279,11 +279,13 @@ class CodegenContext { inlineToOuterClass: Boolean = false): String = { // The number of named constants that can exist in the class is limited by the Constant Pool // limit, 65,536. We cannot know how many constants will be inserted for a class, so we use a -// threshold of 1600k bytes to determine when a function should be inlined to a private, nested -// sub-class. +// threshold to determine when a function should be inlined to a private, nested sub-class +val generatedClassLengthThreshold = SparkEnv.get.conf.getInt( --- End diff -- you better refer other code in `CodeGenerate`: https://github.com/mgaido91/spark/blob/c69be31314d9aa96c3920073beaf7cca46d507fa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L934 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143215289 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -279,11 +279,13 @@ class CodegenContext { inlineToOuterClass: Boolean = false): String = { // The number of named constants that can exist in the class is limited by the Constant Pool // limit, 65,536. We cannot know how many constants will be inserted for a class, so we use a -// threshold of 1600k bytes to determine when a function should be inlined to a private, nested -// sub-class. +// threshold to determine when a function should be inlined to a private, nested sub-class +val generatedClassLengthThreshold = SparkEnv.get.conf.getInt( + SQLConf.GENERATED_CLASS_LENGTH_THRESHOLD.key, + SQLConf.GENERATED_CLASS_LENGTH_THRESHOLD.defaultValue.get) --- End diff -- Could you simplify the above three lines into the following? ```scala val generatedClassLengthThreshold = SparkEnv.get.conf.get( SQLConf.GENERATED_CLASS_LENGTH_THRESHOLD) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143213850 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -279,11 +279,13 @@ class CodegenContext { inlineToOuterClass: Boolean = false): String = { // The number of named constants that can exist in the class is limited by the Constant Pool // limit, 65,536. We cannot know how many constants will be inserted for a class, so we use a -// threshold of 1600k bytes to determine when a function should be inlined to a private, nested -// sub-class. +// threshold to determine when a function should be inlined to a private, nested sub-class +val generatedClassLengthThreshold = SparkEnv.get.conf.getInt( --- End diff -- The code that you pointed out is not in `CodeGenerator.scala`. In [my case](https://github.com/apache/spark/pull/18966/files#diff-8bcc5aea39c73d4bf38aef6f6951d42cR776) at `CodeGenerator.scala`, there are some cases that `SparkEnv.get` is `null`. If you have not seen this `null` in you case, it would be good. @gatorsmile will implement a better approach to get this conf soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143204826 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -279,11 +279,13 @@ class CodegenContext { inlineToOuterClass: Boolean = false): String = { // The number of named constants that can exist in the class is limited by the Constant Pool // limit, 65,536. We cannot know how many constants will be inserted for a class, so we use a -// threshold of 1600k bytes to determine when a function should be inlined to a private, nested -// sub-class. +// threshold to determine when a function should be inlined to a private, nested sub-class +val generatedClassLengthThreshold = SparkEnv.get.conf.getInt( --- End diff -- Can this be empty? In https://github.com/apache/spark/blob/83488cc3180ca18f829516f550766efb3095881e/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java#L80 there is no check about it being `None`... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143202186 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -279,11 +279,13 @@ class CodegenContext { inlineToOuterClass: Boolean = false): String = { // The number of named constants that can exist in the class is limited by the Constant Pool // limit, 65,536. We cannot know how many constants will be inserted for a class, so we use a -// threshold of 1600k bytes to determine when a function should be inlined to a private, nested -// sub-class. +// threshold to determine when a function should be inlined to a private, nested sub-class +val generatedClassLengthThreshold = SparkEnv.get.conf.getInt( --- End diff -- Can you ensure `SparkEnv.get` is always true? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/19447 [SPARK-22215][SQL] Add configuration to set the threshold for generated class ## What changes were proposed in this pull request? SPARK-18016 introduced an arbitrary threshold for the size of a generated class (https://github.com/apache/spark/blob/83488cc3180ca18f829516f550766efb3095881e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L286). This value is hardcoded. In some cases making it smaller can help avoiding the error of exceeding the maximum number of entries in the Constant Pool. This PR introduces a new configuration parameter, which defaults to the previous value, but it allows to set this to a smaller one if needed. ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-22215 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19447.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19447 commit c69be31314d9aa96c3920073beaf7cca46d507fa Author: Marco GaidoDate: 2017-10-06T12:58:36Z [SPARK-22215][SQL] Add configuration to set the threshold for generated class --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org