subject:"\[GitHub\] spark issue #19518\: \[SPARK\-18016\]\[SQL\]\[CATALYST\] Code Generation\: Constant P..."

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-12-19 Thread bdrillard

Github user bdrillard commented on the issue:

https://github.com/apache/spark/pull/19518
  
This PR was addressed by #19811, closing this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-12-19 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19518
  
@bdrillard can you close this pr?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19518
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-26 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
Good new is [the 
issue](https://github.com/janino-compiler/janino/issues/33) in janino has been 
quickly fixed. Bad new is no official date to release the next version now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-25 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
> I think ldc is 2 bytes and ldc_w is 3 bytes?
You are right, thanks, updated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19518
  
>  4 or 5 bytes: others (ldc or ldc_w)

I think ldc is 2 bytes and ldc_w is 3 bytes?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
In #19811, first, I will take a hybrid approach of outer (flat) global 
variable and arrays. The threshold for # of global variables would be 
configurable.

I will give high priority to primitive variables to place them at the outer 
class due to performance.

> I think it would be great if we were able to declare variables which are 
used only in an inner class in that inner class. Unfortunately, I think also 
that this is not trivial to achieve at all.
It would be great if we could do this. For now, #19811 will not address 
this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
I created and ran another synthetic benchmark program for comparing flat 
global variables, inner global variables, and array using janinoc for target 
Java file.  
The performance is not much different from the previous one. In summary, 
the followings are performance results (**small number is better**).
- 1: array
- 0.90: inner global variables
- 0.81: flat global variables

WDYT? Any comments are very appreciated.

Here are 
[Test.java](https://gist.github.com/kiszk/63c2829488cb777d7ca78d45d20c021f) and 
[myInsntance.py](https://gist.github.com/kiszk/049a62f5d1259481c400a86299bd0228)
 that I used.

```
$ cat /proc/cpuinfo | grep "model name" | uniq
model name  : Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
$ python myInstance.py > MyInstance.java && janinoc MyInstance.java  && 
javac Test.java && java -Xmx16g Test

Result(us): Array
   0: 484251.446
   1: 483374.255
   2: 483956.692
   3: 482498.241
   4: 483602.261
   5: 482654.567
   6: 482896.671
   7: 483458.625
   8: 483194.317
   9: 483387.234
  10: 484103.729
  11: 483536.493
  12: 483790.828
  13: 483590.991
  14: 483993.488
  15: 483455.164
  16: 484040.009
  17: 483225.837
  18: 483126.520
  19: 484105.989
  20: 484988.935
  21: 483766.245
  22: 483667.930
  23: 483271.499
  24: 483071.606
  25: 483174.438
  26: 483602.474
  27: 483210.405
  28: 483907.061
  29: 483071.964
BEST: 482498.241000, AVG: 483532.530

Result(us): InnerVars
   0: 437016.533
   1: 436125.481
   2: 436360.534
   3: 435857.758
   4: 436166.243
   5: 437089.913
   6: 436168.359
   7: 435570.397
   8: 435550.848
   9: 435256.088
  10: 435252.679
  11: 435765.156
  12: 435646.739
  13: 437303.993
  14: 435315.530
  15: 435752.545
  16: 434857.606
  17: 436776.190
  18: 435444.877
  19: 435657.649
  20: 436248.147
  21: 436322.998
  22: 437214.262
  23: 435907.223
  23: 435907.223
  24: 435431.025
  25: 435274.317
  26: 435412.202
  27: 435670.321
  28: 436494.045
  29: 436347.838
BEST: 434857.606, AVG: 435975.250

Result(us): Vars
   0: 353983.048
   1: 354067.690
   2: 353138.178
   3: 354093.115
   4: 354067.180
   5: 352750.571
   6: 353672.510
   7: 355179.115
   8: 353296.750
   9: 354522.113
  10: 355221.301
  11: 355178.172
  12: 353859.319
  13: 353539.817
  14: 352703.352
  15: 353923.981
  16: 354442.744
  17: 355523.145
  18: 354849.122
  19: 354082.888
  20: 354673.504
  21: 355526.218
  22: 355264.029
  23: 355455.492
  24: 355520.322
  25: 353923.520
  26: 353796.600
  27: 355021.849
  28: 355800.387
  29: 353810.567
BEST: 352703.352, AVG: 354362.887
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
In int case, the followings are the number of java byte code length for a 
value

1 byte: -1, 0, 1, 2, 3, 4, 5 (iconst_?)
2 byte: -128 ~ -2, 6 ~ 127 (bipush)
3 bytes: -32768 ~ -129, 128 ~ 32767 (sipush)
4 or 5 bytes: others (ldc or ldc_w)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19518
  
The PR looks good.

Although it can solve the constant pool pressure, however, I have a 
question. Does it mean every time a constant falling in short range is used in 
codes, we increase 1 byte in bytecodes because sipush is followed by two byte 
value. I'm afraid it may be negative to some cases like many short constants in 
a java program.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
Here is [a PR](https://github.com/janino-compiler/janino/pull/34) to fix a 
problem regarding `sipush`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
Next, I analyzed usage of constant pool entries and java bytecode ops using 
`janinoc`. The summary is as follows:
```
array[4] : 6 + 0 * n entries, 6-8 java bytecode ops / read access
outerInstance: 3 + 3 * n entries, 5 java bytecode ops / read access
innerInsntace: 9 + 3 * n entries, 6 java bytecode ops / read access
```

```
public class CP {
  int[] a = new int[100];
  int globalVar0;
  int globalVar1;
  private Inner inner = new Inner();

  private class Inner {
int nestedVar0;
int nestedVar1;
  }

  void access() {
a[4] = 0;
a[5] = 0;

globalVar0 = 0;
globalVar1 = 0;

inner.nestedVar0 = 0;
inner.nestedVar1 = 0;
  }

  static public void main(String[] argv) {
CP cp = new CP();
cp.access();
  }
}
```

Java bytecode
```
  void access();
descriptor: ()V
Code:
  stack=3, locals=1, args_size=1
 0: aload_0
 1: getfield  #12 // Field a:[I
 4: iconst_4
 5: iconst_0
 6: iastore
 7: aload_0
 8: getfield  #12 // Field a:[I
11: iconst_5
12: iconst_0
13: iastore

14: aload_0
15: iconst_0
16: putfield  #16 // Field globalVar0:I
19: aload_0
20: iconst_0
21: putfield  #19 // Field globalVar1:I

24: aload_0
25: getfield  #23 // Field inner:LCP$Inner;
28: iconst_0
29: putfield  #28 // Field CP$Inner.nestedVar0:I
32: aload_0
33: getfield  #23 // Field inner:LCP$Inner;
36: iconst_0
37: putfield  #31 // Field CP$Inner.nestedVar1:I
40: return
```

Constant pool
```
   #1 = Utf8   CP
   #2 = Class  #1 // CP
   #9 = Utf8   a
  #10 = Utf8   [I
  #11 = NameAndType#9:#10 // a:[I
  #12 = Fieldref   #2.#11 // CP.a:[I

  #13 = Utf8   globalVar0
  #14 = Utf8   I
  #15 = NameAndType#13:#14// globalVar0:I
  #16 = Fieldref   #2.#15 // CP.globalVar0:I

  #17 = Utf8   globalVar1
  #18 = NameAndType#17:#14// globalVar1:I
  #19 = Fieldref   #2.#18 // CP.globalVar1:I

  #20 = Utf8   inner
  #21 = Utf8   LCP$Inner;
  #22 = NameAndType#20:#21// inner:LCP$Inner;
  #23 = Fieldref   #2.#22 // CP.inner:LCP$Inner;

  #24 = Utf8   CP$Inner
  #25 = Class  #24// CP$Inner
  #26 = Utf8   nestedVar0
  #27 = NameAndType#26:#14// nestedVar0:I
  #28 = Fieldref   #25.#27// CP$Inner.nestedVar0:I
  
  #29 = Utf8   nestedVar1
  #30 = NameAndType#29:#14// nestedVar1:I
  #31 = Fieldref   #25.#30// CP$Inner.nestedVar1:I

  #32 = Utf8   LineNumberTable
  #33 = Utf8   Code
  #34 = Utf8   main
  #35 = Utf8   ([Ljava/lang/String;)V
  #36 = Utf8   
  #37 = NameAndType#36:#8 // "":()V
  #38 = Methodref  #2.#37 // CP."":()V
  #39 = NameAndType#7:#8  // access:()V
  #40 = Methodref  #2.#39 // CP.access:()V
  #41 = Methodref  #4.#37 // java/lang/Object."":()V
  #42 = Integer100
  #43 = Utf8   (LCP;)V
  #44 = NameAndType#36:#43// "":(LCP;)V
  #45 = Methodref  #25.#44// CP$Inner."":(LCP;)V
  #46 = Utf8   Inner
  #47 = Utf8   InnerClasses
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
First of all, I have to share sad news with you. `janino` does not use 
`sipush` for values from 128 to 32767. Current `janino` 3.0.7 uses `iconst...`, 
`bipush`, or `ldc`. `javac` uses `sipush` for values from 128 to 32767. In 
other words, if index is greater than 127, one constant pool is used by 
bytecode compiled by `janino`. It should be fixed.

```
public class Array {
  int[] a = new int[100];

  void access() {
a[5] = 0;
a[6] = 0;
a[127] = 0;
a[128] = 0;
a[1023] = 0;
a[16383] = 0;
a[32767] = 0;
a[32768] = 0;
  }

  static public void main(String[] argv) {
Array a = new Array();
a.access();
  }
}
```

```
  void access();
descriptor: ()V
flags:
Code:
  stack=3, locals=1, args_size=1
 0: aload_0
 1: getfield  #12 // Field a:[I
 4: iconst_5
 5: iconst_0
 6: iastore
 7: aload_0
 8: getfield  #12 // Field a:[I
11: bipush6
13: iconst_0
14: iastore
15: aload_0
16: getfield  #12 // Field a:[I
19: bipush127
21: iconst_0
22: iastore
23: aload_0
24: getfield  #12 // Field a:[I
27: ldc   #13 // int 128
29: iconst_0
30: iastore
31: aload_0
32: getfield  #12 // Field a:[I
35: ldc   #14 // int 1023
37: iconst_0
38: iastore
39: aload_0
40: getfield  #12 // Field a:[I
43: ldc   #15 // int 16383
45: iconst_0
46: iastore
47: aload_0
48: getfield  #12 // Field a:[I
51: ldc   #16 // int 32767
53: iconst_0
54: iastore
55: aload_0
56: getfield  #12 // Field a:[I
59: ldc   #17 // int 32768
61: iconst_0
62: iastore
63: return

Constant pool:
   #1 = Utf8   Array
   #2 = Class  #1 // Array
   #9 = Utf8   a
  #10 = Utf8   [I
  #11 = NameAndType#9:#10 // a:[I
  #12 = Fieldref   #2.#11 // Array.a:[I

  #13 = Integer128
  #14 = Integer1023
  #15 = Integer16383
  #16 = Integer32767
  #17 = Integer32768
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19518
  
For the strategy, I'd like to give priority to primitive values to be in 
flat global variables. We also need to decide the priority between primitive 
types, according to which type has largest performance difference between flat 
global variable and array, and which type is used more frequently(may be 
boolean).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19518
  
@viirya in general there is no benefit. There will be a benefit if we 
manage to declare them where they are used, but I am not sure this is feasible. 
In this way, they do not add any entry to the constant pool.

For instance, if we have a inner class `InnerClass1` and we use 
`isNull_1` only there, if we define `isNull_1` as a variable of 
`InnerClass1` instead of a variable of the outer class we have no entry about 
it in the outer class.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19518
  
@mgaido91 Thanks. I looked at the constant pool you posted. It's clear.

Any benefit to declare the variables in the inner classes? Looks like they 
still occupy constant pool entries?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19518
  
@viirya if you take a look at the example I posted, you can see that we are 
not saving either `NameAndType` or `Fieldref`, thus  think the only solution to 
save constant pool entries we have found so far is to use arrays.

What may be interesting IMHO, is to evaluate where we are using a variable. 
Since when we have a lot of instance variables we are very likely to have also 
several inner classes (for splitting the methods), I think it would be great if 
we were able to declare variables which are used only in an inner class in that 
inner class. Unfortunately, I think also that this is not trivial to achieve at 
all. @kiszk  what do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19518
  
> When we use an inner class approach, we still require constant pool entry 
for accessing instance variables (e.g. `this.inner001.globalVar5) in one 
class.

@kiszk But we still can save the name/type and field name for global 
variable?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19518
  
Btw, can we config the maximum number of global variables?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19518
  
The hybrid approach sounds reasonable to me. Any special strategy to use to 
decide which fields are global variables and which are in array?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-24 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
@mgaido91 Thank you very much. I did not know such a difference. I will 
validate with `janinoc` 3.0.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19518
  
@kiszk I meant that `janinoc` creates a slightly different constant pool 
from `javac`. I am not sure about performances, but the number of constant pool 
entries is definitely different. For instance, let's take this example and 
analyze the `Outer` class constant pool:
```
class Outer {
  private Inner innerInstance = new Inner();
  private void outerMethod(){
innerInstance.b = 1;
innerInstance.a = 1;
  }
  private boolean outerMethod2(){
return innerInstance.b > innerInstance.a;
  }
  private class Inner {
int b =2;
private int a = 0;
  }
}
```
if you compile it with `javac`, the constant pool will be:
```
Constant pool:
   #1 = Methodref  #9.#25 // java/lang/Object."":()V
   #2 = Class  #26// Outer$Inner
   #3 = Methodref  #2.#27 // 
Outer$Inner."":(LOuter;LOuter$1;)V
   #4 = Fieldref   #8.#28 // 
Outer.innerInstance:LOuter$Inner;
   #5 = Fieldref   #2.#29 // Outer$Inner.b:I
   #6 = Methodref  #2.#30 // 
Outer$Inner.access$102:(LOuter$Inner;I)I
   #7 = Methodref  #2.#31 // 
Outer$Inner.access$100:(LOuter$Inner;)I
   #8 = Class  #32// Outer
   #9 = Class  #33// java/lang/Object
  #10 = Class  #34// Outer$1
  #11 = Utf8   InnerClasses
  #12 = Utf8   Inner
  #13 = Utf8   innerInstance
  #14 = Utf8   LOuter$Inner;
  #15 = Utf8   
  #16 = Utf8   ()V
  #17 = Utf8   Code
  #18 = Utf8   LineNumberTable
  #19 = Utf8   outerMethod
  #20 = Utf8   outerMethod2
  #21 = Utf8   ()Z
  #22 = Utf8   StackMapTable
  #23 = Utf8   SourceFile
  #24 = Utf8   Outer.java
  #25 = NameAndType#15:#16// "":()V
  #26 = Utf8   Outer$Inner
  #27 = NameAndType#15:#35// "":(LOuter;LOuter$1;)V
  #28 = NameAndType#13:#14// innerInstance:LOuter$Inner;
  #29 = NameAndType#36:#37// b:I
  #30 = NameAndType#38:#39// access$102:(LOuter$Inner;I)I
  #31 = NameAndType#40:#41// access$100:(LOuter$Inner;)I
  #32 = Utf8   Outer
  #33 = Utf8   java/lang/Object
  #34 = Utf8   Outer$1
  #35 = Utf8   (LOuter;LOuter$1;)V
  #36 = Utf8   b
  #37 = Utf8   I
  #38 = Utf8   access$102
  #39 = Utf8   (LOuter$Inner;I)I
  #40 = Utf8   access$100
  #41 = Utf8   (LOuter$Inner;)I
```
(please note that it creates a fake getter and a fake setter method entries 
for the `private` inner variable `a`). 

If you compile the same class with `janinoc`, instead, the constant pool 
will be:
```
Constant pool:
   #1 = Utf8   Outer
   #2 = Class  #1 // Outer
   #3 = Utf8   java/lang/Object
   #4 = Class  #3 // java/lang/Object
   #5 = Utf8   SourceFile
   #6 = Utf8   Outer.java
   #7 = Utf8   outerMethod$
   #8 = Utf8   (LOuter;)V
   #9 = Utf8   innerInstance
  #10 = Utf8   LOuter$Inner;
  #11 = NameAndType#9:#10 // innerInstance:LOuter$Inner;
  #12 = Fieldref   #2.#11 // 
Outer.innerInstance:LOuter$Inner;
  #13 = Utf8   Outer$Inner
  #14 = Class  #13// Outer$Inner
  #15 = Utf8   b
  #16 = Utf8   I
  #17 = NameAndType#15:#16// b:I
  #18 = Fieldref   #14.#17// Outer$Inner.b:I
  #19 = Utf8   a
  #20 = NameAndType#19:#16// a:I
  #21 = Fieldref   #14.#20// Outer$Inner.a:I
  #22 = Utf8   LineNumberTable
  #23 = Utf8   Code
  #24 = Utf8   outerMethod2$
  #25 = Utf8   (LOuter;)Z
  #26 = Utf8   
  #27 = Utf8   ()V
  #28 = NameAndType#26:#27// "":()V
  #29 = Methodref  #4.#28 // java/lang/Object."":()V
  #30 = NameAndType#26:#8 // "":(LOuter;)V
  #31 = Methodref  #14.#30// Outer$Inner."":(LOuter;)V
  #32 = Utf8   Inner
  #33 = Utf8   InnerClasses
```
(note that `a` now is considered as a regular field).

Thus in all our

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19518
  
I like the latest @kiszk hybrid idea in terms of performance and 
readability. Also, this is a corner case, so  I don't want affect most regular 
small queries.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
@mgaido91 Thank you for your questions.
1. I am using `javac` as shown. I am sorry that I cannot understand what 
you are pointing out. In this benchmark, what are differences between `javac` 
and `janinoc`?
2. I agree with you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
Based on performance results and usage of constant pool entry, I would like 
to use hybrid approach with flat global variable and array.  
For example, first 500 variables are stored into flat global variables, 
then others are stored into arrays with 32767 elements. I think that most of 
non-extreme cases can enjoy simple code without array accesses and good 
performance.

WDYT?

```
class Foo {
  int globalVars1;
  int globalVars2;
  ...
  int globalVars499;
  int globalVars500;
  int[] globalArrays1 = new int[32767];
  int[] globalArrays2 = new int[32767];
  int[] globalArrays3 = new int[32767];
  ...

  void apply1(InternalRow i) {
globalVars1 = 1;
globalVars2 = 1;
...
globalVars499 = 1;
globalVars500 = 1;
  }

  void apply2(InternalRow i) {
globalArrays1[0] = 1;
globalArrays1[1] = 1;
...
  }

  void apply(InternalRow i) {
apply0(i);
apply1(i);
apply2(i);
...
  }
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
I created and ran another synthetic benchmark program for comparing flat 
global variables, inner global variables, and array.  In summary, the 
followings are performance results (**small number is better**).
- 0.65: flat global variables
- 0.91: inner global variables
- 1: array

WDYT? Any comments are very appreciated.

Here are 
[Test.java](https://gist.github.com/kiszk/63c2829488cb777d7ca78d45d20c021f) and 
[myInsntance.py](https://gist.github.com/kiszk/049a62f5d1259481c400a86299bd0228)
 that I used.

```
$ cat /proc/cpuinfo | grep "model name" | uniq
model name  : Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
$ python myInstance.py > MyInstance.java && javac Test.java && java Test

Result(us): Array
   0: 462848.969
   1: 461978.693
   2: 463174.459
   3: 461422.763
   4: 460563.915
   5: 460112.262
   6: 460059.957
   7: 460376.230
   8: 460245.445
   9: 460308.775
  10: 460154.955
  11: 460005.629
  12: 460330.584
  13: 460277.612
  14: 460181.360
  15: 460168.843
  16: 459790.137
  17: 460248.481
  18: 460344.471
  19: 460084.529
  20: 459987.263
  21: 459961.639
  22: 459952.447
  23: 460128.518
  24: 460025.783
  25: 459874.303
  26: 459932.685
  27: 460065.736
  28: 459954.526
  29: 459972.679
BEST: 459790.137000, AVG: 460417.788

Result(us): InnerVars
   0: 421013.480
   1: 420279.235
   2: 419366.157
   3: 421015.934
   4: 419540.049
   5: 420316.650
   6: 419816.612
   7: 420211.140
   8: 420215.864
   9: 421104.657
  10: 421836.430
  11: 420866.894
  12: 421457.850
  13: 421734.506
  14: 420796.010
  15: 419832.910
  16: 420012.167
  17: 420821.800
  18: 420962.178
  19: 421981.676
  20: 421721.257
  21: 419996.594
  22: 419742.884
  23: 420158.066
  24: 420156.773
  25: 420325.231
  26: 420966.914
  27: 420787.147
  28: 420296.789
  29: 420520.843
BEST: 419366.157, AVG: 420595.157

Result(us): Vars
   0: 343490.797
   1: 342849.079
   2: 341990.967
   3: 342844.044
   4: 343484.681
   5: 342586.419
   6: 342468.883
   7: 343113.300
   8: 343516.875
   9: 343002.395
  10: 341499.538
  11: 342192.102
  12: 341847.383
  13: 342533.215
  14: 341376.556
  15: 342018.111
  16: 341316.445
  17: 342043.378
  18: 341969.932
  19: 343415.854
  20: 343103.133
  21: 342084.686
  22: 341555.293
  23: 342984.355
  24: 342302.336
  25: 341994.372
  26: 342475.639
  27: 342281.214
  28: 342205.175
  29: 342462.032
BEST: 341316.445, AVG: 342433.606
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19518
  
can we use `32767` as array size upper bound? e.g.
```
class Foo {
  int[] globalVars1 = new int[32767];
  int[] globalVars2 = new int[32767];
  int[] globalVars3 = new int[32767];
  ...

  void apply0(InternalRow i) {
globalVars1[0] = 1;
globalVars1[1] = 1;
...
  }
  void apply1(InternalRow i) {
globalVars2[0] = 1;
globalVars2[1] = 1;
...
  }

  void apply(InternalRow i) {
apply0(i);
apply1(i);
...
  }
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19518
  
@kiszk thanks for your great analysis. May I have just a couple of 
additional questions?
 1 - In all your tests, which compiler are you using? Because I see that 
you are linking to the Oracle docs and maybe you are using `javac` for your 
tests, but in my tests (made for other cases) I realized that `janinoc` works a 
bit differently and what is true for `javac` may not be for `janinoc`.
 2 - If the problem with array occurs when we go beyond 32767, what about 
creating many arrays with max size 32767? I see that this is not a definitive 
solution and still we have some limitations, but dividing the number of 
constant pool entries by 32767 looks a very good achievement to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
| what do you mean by this? using array can't reduce constant pool size at 
all?
Not at all. However, when array index for an access is greater than 32768, 
the access requires constant pool entry. This is because integer constant of 
32768 or greater uses `ldc` java bytecode instruction 
[[ref]](https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.ldc)[[ref]](https://cs.au.dk/~mis/dOvs/jvmspec/ref-_ldc.html).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19518
  
> Each access to an array element requires one constant pool entry.

what do you mean by this? using array can't reduce constant pool size at 
all?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
@cloud-fan you are right, I am updating benchmark program and results.
I realized that one issue of the array approach: **limitation of constant 
pool entries due to array access index**

When we use an array approach, a global variable will be accessed by 
`this.globalVar[12345]`. Here is a bytecode sequence. Each access to an array 
element (index is greater than 5 since iconst_0 ... iconst_5 do not use 
constant pool entry) requires one constant pool entry.  
While we reduce one constant pool entry for global variable, we require one 
constant pool entry.
@bdrillard did your implementation (probably around 
[here](https://github.com/apache/spark/pull/19518/files#diff-8bcc5aea39c73d4bf38aef6f6951d42cR218))
 avoid this issue?

WDYT? cc @viirya @maropu 

```
aload 0  // load this
getfield [constant pool index] // load this.globalVar
ldc [constant pool index] // load 12345 from constant pool and push it
iaload
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19518
  
You are comparing array vs member variables, can we compare array vs inner 
class member variable? And too many classes will have overhead on the 
classloader, we should test some extreme cases like 1 million variables.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19518
  
I'd prefer inner class approach.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
@bdrillard @cloud-fan @maropu
I created and run a benchmark program. I think that to use an array for a 
compaction is slower than to use scalar instance variables. In the following my 
case, 20% slower in the best time. 

Thus, I would like to use an approach to create inner classes to keep in 
scalar instance variables.  
WDYT? Any comments are very appreciated.

Here are 
[Test.java](https://gist.github.com/kiszk/63c2829488cb777d7ca78d45d20c021f) and 
[myInsntance.py](https://gist.github.com/kiszk/049a62f5d1259481c400a86299bd0228)
 that I used.

```
$ cat /proc/cpuinfo | grep "model name" | uniq
model name  : Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
$ python myInstance.py > MyInstance.java && javac Test.java && java Test


Result(us): Array
   0: 145333.227
   1: 144288.262
   2: 144233.871
   3: 144536.350
   4: 144503.269
   5: 144836.117
   6: 18.053
   7: 144744.725
   8: 144688.652
   9: 144727.823
  10: 17.789
  11: 144500.638
  12: 144641.592
  13: 144464.106
  14: 144518.914
  15: 144844.639
  16: 144780.464
  17: 144617.363
  18: 144463.271
  19: 144508.170
  20: 144929.451
  21: 144529.697
  22: 144273.167
  23: 144362.926
  24: 144296.854
  25: 144398.665
  26: 144490.813
  27: 144435.732
  28: 144675.997
  29: 144483.581
BEST: 144233.871000, AVG: 144566.806

Result(us): Vars
   0: 120375.384
   1: 119800.238
   2: 119822.842
   3: 119830.761
   4: 119836.781
   5: 120185.751
   6: 120208.140
   7: 120274.925
   8: 120112.109
   9: 120082.120
  10: 120063.456
  11: 120112.493
  12: 120144.937
  13: 119964.356
  14: 119941.633
  15: 119825.758
  16: 119677.506
  17: 119833.236
  18: 119749.781
  19: 119723.932
  20: 120197.394
  21: 120052.820
  22: 120006.650
  23: 119939.335
  24: 119857.469
  25: 120176.229
  26: 120153.605
  27: 120345.581
  28: 120163.129
  29: 120038.673
BEST: 119677.506, AVG: 120016.567
```

Small MyInstance.java (N = 16, M = 4)
```
class MyInstance {
  final int N = 16;
  int[] instance = new int[N];
  void accessArrays0() {
instance[8] = instance[0];
instance[9] = instance[1];
instance[10] = instance[2];
instance[11] = instance[3];
  }
  void accessArrays1() {
instance[12] = instance[4];
instance[13] = instance[5];
instance[14] = instance[6];
instance[15] = instance[7];
  }
  void accessArrays2() {
instance[0] = instance[8];
instance[1] = instance[9];
instance[2] = instance[10];
instance[3] = instance[11];
  }
  void accessArrays3() {
instance[4] = instance[12];
instance[5] = instance[13];
instance[6] = instance[14];
instance[7] = instance[15];
  }
  void accessArray() {
accessArrays0();
accessArrays1();
accessArrays2();
accessArrays3();
  }

  int instance0;
  int instance1;
  int instance2;
  int instance3;
  int instance4;
  int instance5;
  int instance6;
  int instance7;
  int instance8;
  int instance9;
  int instance00010;
  int instance00011;
  int instance00012;
  int instance00013;
  int instance00014;
  int instance00015;
  void accessVars0() {
instance8 = instance0;
instance9 = instance1;
instance00010 = instance2;
instance00011 = instance3;
  }
  void accessVars1() {
instance00012 = instance4;
instance00013 = instance5;
instance00014 = instance6;
instance00015 = instance7;
  }
  void accessVars2() {
instance0 = instance8;
instance1 = instance9;
instance2 = instance00010;
instance3 = instance00011;
  }
  void accessVars3() {
instance4 = instance00012;
instance5 = instance00013;
instance6 = instance00014;
instance7 = instance00015;
  }
  void accessVars() {
accessVars0();
accessVars1();
accessVars2();
accessVars3();
  }
}
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19518
  
@bdrillard thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread bdrillard

Github user bdrillard commented on the issue:

https://github.com/apache/spark/pull/19518
  
Thanks for giving this the attention to shepard it on through. I haven't 
had the time to do the additional coding work necessary to properly benchmark 
it in the last few weeks. @kiszk, if there are any questions in regards to my 
earlier implementation as you make/review the second PR, I'm happy to make 
clarifications and would be able to respond to those in writing quickly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
OK, I will create a new PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19518
  
let's create a new PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
ping @cloud-fan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-21 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
@cloud-fan Is it better to use this PR? Or, create a new PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-21 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19518
  
yea, ok @kiszk I'll review your work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-21 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
@cloud-fan I want to take this over if possible
cc @maropu


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19518
  
 @kiszk @maropu any of you wanna take this over? This patch becomes 
important as we now split codes more aggressively.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19518
  
ping @bdrillard 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-10 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/19518
  
@bdrillard since my PR and other get merged now there are some conflicts, 
may you please fix them? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-10-19 Thread bdrillard

Github user bdrillard commented on the issue:

https://github.com/apache/spark/pull/19518
  
@kiszk Ah, thanks for the link back to that discussion. I'll make 
modifications to the trials for better data.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-10-19 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
@bdrillard I remember that we had the similar discussion about 
benchmarking. Could you see [this 
discussion](https://github.com/apache/spark/pull/16648#discussion_r118043056)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-10-19 Thread bdrillard

Github user bdrillard commented on the issue:

https://github.com/apache/spark/pull/19518
  
@kiszk You are correct that the current implementation compacts all mutable 
state (where the state does not have to be _explicitly_ inlined).

To your last question, I'd attempted some analysis of the JVM bytecode of 
array versus inlined state initialized either through method calls or in loops. 
I'd posted the experiment and results: https://github.com/bdrillard/bytecode-poc

If Spark has its own benchmarking tools, I'd be happy to use those to 
compare Catalyst-generated classes further.

To the general question of _when_ we compact state, I think some kind of 
threshold still does makes sense. It would be best to ensure that the typical 
code path (for typical Dataset schemas) remains un-impacted by the changes (as 
was the aim when generating nested classes in #18075).

I've found trying to set a global threshold for when to compact mutable 
state can be hard. Some state _has_ to be inlined (state that uses 
parameterized constructors that can't be easily initialized with loops, like 
the `BufferHolder` and `UnsafeRowWriter`). I've found situations where, due to 
code generator flow, we began by inlining an amount of state that _could have 
been_ compacted, then started compacting state as after a set threshold, but 
then began inlining state again that _could not be_ compacted, forcing us over 
the constant pool limit.

It's difficult to tell when a certain piece of state will be referenced 
frequently or infrequently. For example, we do know some pieces of primitive 
mutable state, like global booleans that are part of conditional checks, are 
initialized globally, assigned once in one method, and then referenced only 
once in a separate caller method. These are excellent candidates for 
compaction, since they proliferate very quickly and are, in a sense, "only used 
once" (declared, initialized, re-assigned in a method, accessed in another 
method, never used again). 

Other pieces of state, like row objects, and JavaBean objects, will be 
accessed a number of times relative to how many fields they have, which isn't 
necessarily easy info to retrieve during code generation (we'd have to reflect 
or do inspection of the initialization code to know how many fields such an 
object has). But these items are probably still good candidates for compaction 
in general because of how many of a given type there could be. 

I'm inclined to use a threshold against the name/types of the state, rather 
than a global threshold. Since `freshName` is always monotonically increasing 
from 1 for a given variable prefix, we could know when a threshold for state of 
that type was reached, and when we could begin compacting that type of state, 
independently/concurrently with the other types of state. Such a scheme would 
allow us to ensure the usual flow of code-generation remains as it is now, with 
no state-compaction for typical operations, and then with state-compaction in 
the more extreme cases that would threaten to blow the Constant Pool limit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-10-19 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
Thank you for creating a PR for the latest Spark.

I think that it is great to reduce # of constant pool entries. I have one 
high level comment.  
IIUC, this PR **always** perform mutable state compaction. In other words, 
mutable states are in arrays.  
I am afraid about possible performance degradation due to increasing access 
cost by putting states in arrays.

What do you think about putting mutable states into arrays (i.e. performing 
mutable state compaction) only when there are many mutable states or only for 
certain mutable states that are rarely accessed?  
Or, can we say there is no performance degradation due to mutable state 
compaction?

What do you think?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19518
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

50 matches

Mail list logo