[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-11-03 Thread mgaido91
Github user mgaido91 closed the pull request at:

https://github.com/apache/spark/pull/19447


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-09 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143540779
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -279,11 +279,11 @@ class CodegenContext {
   inlineToOuterClass: Boolean = false): String = {
 // The number of named constants that can exist in the class is 
limited by the Constant Pool
 // limit, 65,536. We cannot know how many constants will be inserted 
for a class, so we use a
-// threshold of 1600k bytes to determine when a function should be 
inlined to a private, nested
-// sub-class.
+// threshold to determine when a function should be inlined to a 
private, nested sub-class
+val generatedClassLengthThreshold = 
SQLConf.get.generatedClassLengthThreshold
--- End diff --

thanks for the interesting link. I assume using SparkEnv has the same 
problem, hasn't it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143527821
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -279,11 +279,11 @@ class CodegenContext {
   inlineToOuterClass: Boolean = false): String = {
 // The number of named constants that can exist in the class is 
limited by the Constant Pool
 // limit, 65,536. We cannot know how many constants will be inserted 
for a class, so we use a
-// threshold of 1600k bytes to determine when a function should be 
inlined to a private, nested
-// sub-class.
+// threshold to determine when a function should be inlined to a 
private, nested sub-class
+val generatedClassLengthThreshold = 
SQLConf.get.generatedClassLengthThreshold
--- End diff --

Good to know [this 
discussion](https://github.com/apache/spark/pull/19449#pullrequestreview-67789784)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143228937
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -934,6 +934,15 @@ object SQLConf {
   .intConf
   .createWithDefault(1)
 
+  val GENERATED_CLASS_LENGTH_THRESHOLD =
+buildConf("spark.sql.codegen.generatedClass.size.threshold")
+  .doc("Threshold in bytes for the size of a generated class. If the 
generated class " +
+"size is higher of this value, a private nested class is created 
and used." +
+"This is useful to limit the number of named constants in the 
class " +
+"and therefore its Constant Pool. The default is 1600k.")
+  .intConf
--- End diff --

Please add `.checkValue` here with a valid minimum and maximum.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143228854
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -934,6 +934,15 @@ object SQLConf {
   .intConf
   .createWithDefault(1)
 
+  val GENERATED_CLASS_LENGTH_THRESHOLD =
+buildConf("spark.sql.codegen.generatedClass.size.threshold")
+  .doc("Threshold in bytes for the size of a generated class. If the 
generated class " +
--- End diff --

thanks for the explanation, I'll add it immediately.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143227547
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -934,6 +934,15 @@ object SQLConf {
   .intConf
   .createWithDefault(1)
 
+  val GENERATED_CLASS_LENGTH_THRESHOLD =
+buildConf("spark.sql.codegen.generatedClass.size.threshold")
+  .doc("Threshold in bytes for the size of a generated class. If the 
generated class " +
--- End diff --

Users can change this value in the same way with other public parameters, 
but  `internal` just means that parameters are not explicitly exposed to users.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143224951
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -934,6 +934,15 @@ object SQLConf {
   .intConf
   .createWithDefault(1)
 
+  val GENERATED_CLASS_LENGTH_THRESHOLD =
+buildConf("spark.sql.codegen.generatedClass.size.threshold")
+  .doc("Threshold in bytes for the size of a generated class. If the 
generated class " +
--- End diff --

I agree with you, but might you please explain me how to set a parameter 
which is internal? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143223820
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -934,6 +934,15 @@ object SQLConf {
   .intConf
   .createWithDefault(1)
 
+  val GENERATED_CLASS_LENGTH_THRESHOLD =
+buildConf("spark.sql.codegen.generatedClass.size.threshold")
+  .doc("Threshold in bytes for the size of a generated class. If the 
generated class " +
--- End diff --

I think this is not a frequently-used parameter for users. Also, other JVM 
implimentation-specific parameters are internal, e.g., 
`WHOLESTAGE_HUGE_METHOD_LIMIT`. cc: @rednaxelafx


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143220404
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -934,6 +934,15 @@ object SQLConf {
   .intConf
   .createWithDefault(1)
 
+  val GENERATED_CLASS_LENGTH_THRESHOLD =
+buildConf("spark.sql.codegen.generatedClass.size.threshold")
+  .doc("Threshold in bytes for the size of a generated class. If the 
generated class " +
--- End diff --

if users cannot set it, how can this parameter be changed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143219623
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -934,6 +934,15 @@ object SQLConf {
   .intConf
   .createWithDefault(1)
 
+  val GENERATED_CLASS_LENGTH_THRESHOLD =
+buildConf("spark.sql.codegen.generatedClass.size.threshold")
+  .doc("Threshold in bytes for the size of a generated class. If the 
generated class " +
--- End diff --

Please add `.internal()`. I think this is not a parameter for users.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143217286
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -279,11 +279,13 @@ class CodegenContext {
   inlineToOuterClass: Boolean = false): String = {
 // The number of named constants that can exist in the class is 
limited by the Constant Pool
 // limit, 65,536. We cannot know how many constants will be inserted 
for a class, so we use a
-// threshold of 1600k bytes to determine when a function should be 
inlined to a private, nested
-// sub-class.
+// threshold to determine when a function should be inlined to a 
private, nested sub-class
+val generatedClassLengthThreshold = SparkEnv.get.conf.getInt(
--- End diff --

you better refer other code in `CodeGenerate`:


https://github.com/mgaido91/spark/blob/c69be31314d9aa96c3920073beaf7cca46d507fa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L934


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143215289
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -279,11 +279,13 @@ class CodegenContext {
   inlineToOuterClass: Boolean = false): String = {
 // The number of named constants that can exist in the class is 
limited by the Constant Pool
 // limit, 65,536. We cannot know how many constants will be inserted 
for a class, so we use a
-// threshold of 1600k bytes to determine when a function should be 
inlined to a private, nested
-// sub-class.
+// threshold to determine when a function should be inlined to a 
private, nested sub-class
+val generatedClassLengthThreshold = SparkEnv.get.conf.getInt(
+  SQLConf.GENERATED_CLASS_LENGTH_THRESHOLD.key,
+  SQLConf.GENERATED_CLASS_LENGTH_THRESHOLD.defaultValue.get)
--- End diff --

Could you simplify the above three lines into the following?
```scala
val generatedClassLengthThreshold = SparkEnv.get.conf.get(
  SQLConf.GENERATED_CLASS_LENGTH_THRESHOLD)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143213850
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -279,11 +279,13 @@ class CodegenContext {
   inlineToOuterClass: Boolean = false): String = {
 // The number of named constants that can exist in the class is 
limited by the Constant Pool
 // limit, 65,536. We cannot know how many constants will be inserted 
for a class, so we use a
-// threshold of 1600k bytes to determine when a function should be 
inlined to a private, nested
-// sub-class.
+// threshold to determine when a function should be inlined to a 
private, nested sub-class
+val generatedClassLengthThreshold = SparkEnv.get.conf.getInt(
--- End diff --

The code that you pointed out is not in `CodeGenerator.scala`. In [my 
case](https://github.com/apache/spark/pull/18966/files#diff-8bcc5aea39c73d4bf38aef6f6951d42cR776)
 at `CodeGenerator.scala`, there are some cases that `SparkEnv.get` is `null`. 
If you have not seen this `null` in you case, it would be good.
@gatorsmile will implement a better approach to get this conf soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143204826
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -279,11 +279,13 @@ class CodegenContext {
   inlineToOuterClass: Boolean = false): String = {
 // The number of named constants that can exist in the class is 
limited by the Constant Pool
 // limit, 65,536. We cannot know how many constants will be inserted 
for a class, so we use a
-// threshold of 1600k bytes to determine when a function should be 
inlined to a private, nested
-// sub-class.
+// threshold to determine when a function should be inlined to a 
private, nested sub-class
+val generatedClassLengthThreshold = SparkEnv.get.conf.getInt(
--- End diff --

Can this be empty? In 
https://github.com/apache/spark/blob/83488cc3180ca18f829516f550766efb3095881e/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java#L80
 there is no check about it being `None`...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19447#discussion_r143202186
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -279,11 +279,13 @@ class CodegenContext {
   inlineToOuterClass: Boolean = false): String = {
 // The number of named constants that can exist in the class is 
limited by the Constant Pool
 // limit, 65,536. We cannot know how many constants will be inserted 
for a class, so we use a
-// threshold of 1600k bytes to determine when a function should be 
inlined to a private, nested
-// sub-class.
+// threshold to determine when a function should be inlined to a 
private, nested sub-class
+val generatedClassLengthThreshold = SparkEnv.get.conf.getInt(
--- End diff --

Can you ensure `SparkEnv.get` is always true?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/19447

[SPARK-22215][SQL] Add configuration to set the threshold for generated 
class

## What changes were proposed in this pull request?

SPARK-18016 introduced an arbitrary threshold for the size of a generated 
class 
(https://github.com/apache/spark/blob/83488cc3180ca18f829516f550766efb3095881e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L286).
 This value is hardcoded.
In some cases making it smaller can help avoiding the error of exceeding 
the maximum number of entries in the Constant Pool.
This PR introduces a new configuration parameter, which defaults to the 
previous value, but it allows to set this to a smaller one if needed.

## How was this patch tested?
manual tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-22215

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19447.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19447


commit c69be31314d9aa96c3920073beaf7cca46d507fa
Author: Marco Gaido 
Date:   2017-10-06T12:58:36Z

[SPARK-22215][SQL] Add configuration to set the threshold for generated 
class




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org