[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

2016-10-20 Thread Aleksander Eskilson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592000#comment-15592000
 ] 

Aleksander Eskilson commented on SPARK-17131:
-

Yeah, that makes sense. So far, what I documented and this one seem to have 
been the only JIRAs that exhibit specifically the Constant Pool limit error. 
I'm trying to dig deeper into it to see if it really marks its own class of 
error, but given that SPARK-17702 didn't resolve the error case I posted (even 
though it splits up sections of large generated code), I do suspect they are, 
quite related, but ultimately different issues. I think the spliExpressions 
technique that was used in SPARK-17702 and that also appears to be being 
employed in SPARK-16845 could be useful for the range of different classes that 
can generate too many lines of code. Seeing the issues linked together is 
definitely useful.

To that end, I'll leave mine resolved as a duplicate of SPARK-16845 for now 
until I can make use of the patch it develops, so we can see more conclusively 
if they're related issues, or truly duplicates. And I'll link the two "0x" 
issues together as related.

> Code generation fails when running SQL expressions against a wide dataset 
> (thousands of columns)
> 
>
> Key: SPARK-17131
> URL: https://issues.apache.org/jira/browse/SPARK-17131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Iaroslav Zeigerman
> Attachments: 
> _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch
>
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to 
> generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0x
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>   val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>   val newDf = df.select(allCols: _*)
>   newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   ... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has 
> grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
>   at 
> org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
>   at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
>   at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
>   at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
>   at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
>   at 

[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

2016-10-20 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15591953#comment-15591953
 ] 

Sean Owen commented on SPARK-17131:
---

OK well I think it's fine to leave one copy of the "0x" issue open if you 
have any reasonable reason to suspect it's different, and just link the JIRAs. 
I suppose I was mostly saying this could just be reopened, and separately, 
there are a lot of real duplicates of similar issues out there too, making it 
hard to figure out what the underlying unique issues are.

> Code generation fails when running SQL expressions against a wide dataset 
> (thousands of columns)
> 
>
> Key: SPARK-17131
> URL: https://issues.apache.org/jira/browse/SPARK-17131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Iaroslav Zeigerman
> Attachments: 
> _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch
>
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to 
> generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0x
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>   val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>   val newDf = df.select(allCols: _*)
>   newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   ... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has 
> grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
>   at 
> org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
>   at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
>   at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
>   at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
>   at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

2016-10-20 Thread Aleksander Eskilson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15591810#comment-15591810
 ] 

Aleksander Eskilson commented on SPARK-17131:
-

Sure, I apologize for that. I'll also mark it as a duplicate of SPARK-16845 and 
monitor its pull-request to see if it resolves the issue I opened.

> Code generation fails when running SQL expressions against a wide dataset 
> (thousands of columns)
> 
>
> Key: SPARK-17131
> URL: https://issues.apache.org/jira/browse/SPARK-17131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Iaroslav Zeigerman
> Attachments: 
> _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch
>
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to 
> generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0x
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>   val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>   val newDf = df.select(allCols: _*)
>   newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   ... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has 
> grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
>   at 
> org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
>   at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
>   at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
>   at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
>   at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

2016-10-20 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15591067#comment-15591067
 ] 

Sean Owen commented on SPARK-17131:
---

It may or may not be, though again I suspect a common cause with one of several 
JIRAs. The point here is to join potentially related discussion without 
conflating issues. I don't think it's useful to just make another JIRA vs 
reopening this one, but, this seems to be a losing battle.

> Code generation fails when running SQL expressions against a wide dataset 
> (thousands of columns)
> 
>
> Key: SPARK-17131
> URL: https://issues.apache.org/jira/browse/SPARK-17131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Iaroslav Zeigerman
> Attachments: 
> _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch
>
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to 
> generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0x
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>   val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>   val newDf = df.select(allCols: _*)
>   newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   ... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has 
> grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
>   at 
> org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
>   at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
>   at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
>   at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
>   at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

2016-10-19 Thread Aleksander Eskilson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590024#comment-15590024
 ] 

Aleksander Eskilson commented on SPARK-17131:
-

[~sowen], [~melentye]
I'm not so certain this error is the same as SPARK-16845. It seems like there 
have been several classes of errors all related to the sizes of individual 
methods growing beyond the 64 KB limit (SPARK-16845, SPARK-17702). I think this 
one is of a different class of error, or at least {code}Constant pool has grown 
past JVM limit of 0x{code} marks a different class of error. I was able to 
produce similar to the one first documented when trying to encode a Java object 
with a very wide and deeply nested schema. I've gone ahead and created a bug 
report for that, SPARK-18016, and in its description I've attached a small 
project that can reproduce the error.

> Code generation fails when running SQL expressions against a wide dataset 
> (thousands of columns)
> 
>
> Key: SPARK-17131
> URL: https://issues.apache.org/jira/browse/SPARK-17131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Iaroslav Zeigerman
> Attachments: 
> _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch
>
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to 
> generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0x
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>   val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>   val newDf = df.select(allCols: _*)
>   newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   ... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has 
> grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
>   at 
> org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
>   at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
>   at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
>   at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
>   at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

2016-10-06 Thread Andrey Melentyev (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15553101#comment-15553101
 ] 

Andrey Melentyev commented on SPARK-17131:
--

Looks similar to https://issues.apache.org/jira/browse/SPARK-17217 btw

> Code generation fails when running SQL expressions against a wide dataset 
> (thousands of columns)
> 
>
> Key: SPARK-17131
> URL: https://issues.apache.org/jira/browse/SPARK-17131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Iaroslav Zeigerman
> Attachments: 
> _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch
>
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to 
> generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0x
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>   val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>   val newDf = df.select(allCols: _*)
>   newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   ... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has 
> grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
>   at 
> org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
>   at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
>   at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
>   at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
>   at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

2016-10-06 Thread Andrey Melentyev (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552994#comment-15552994
 ] 

Andrey Melentyev commented on SPARK-17131:
--

I tried wrapping the attached test into 
"withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false")" - still fails 
in a nasty way, printing the content of the 300K LOC generated class in 
seemingly endless loop. Running the code from spark-shell with --conf 
spark.sql.codegen.wholeStage=false, fails as well.



> Code generation fails when running SQL expressions against a wide dataset 
> (thousands of columns)
> 
>
> Key: SPARK-17131
> URL: https://issues.apache.org/jira/browse/SPARK-17131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Iaroslav Zeigerman
> Attachments: 
> _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch
>
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to 
> generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0x
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>   val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>   val newDf = df.select(allCols: _*)
>   newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   ... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has 
> grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
>   at 
> org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
>   at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
>   at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
>   at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
>   at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

2016-10-06 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552947#comment-15552947
 ] 

Sean Owen commented on SPARK-17131:
---

Yeah I'm not 100% sure, though I strongly suspect a common cause. If it ends up 
being different we can reopen this. I though ti might be more productive to tie 
them together until it's clear they're not the same, but I don't mind much 
either way, whatever is most helpful.

Can you try disabling whole stage codegen to see if that works around it?

> Code generation fails when running SQL expressions against a wide dataset 
> (thousands of columns)
> 
>
> Key: SPARK-17131
> URL: https://issues.apache.org/jira/browse/SPARK-17131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Iaroslav Zeigerman
> Attachments: 
> _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch
>
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to 
> generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0x
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>   val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>   val newDf = df.select(allCols: _*)
>   newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   ... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has 
> grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
>   at 
> org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
>   at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
>   at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
>   at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
>   at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

2016-10-06 Thread Andrey Melentyev (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552932#comment-15552932
 ] 

Andrey Melentyev commented on SPARK-17131:
--

[~srowen] are you sure it's a dup of SPARK-16845? The exceptions are a bit 
different, this one has 

```
Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection
 has grown past JVM limit of 0x
```

while SPARK-16845 says

```
Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method 
"(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I"
 of class 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
grows beyond 64 KB
```

both are about something growing too large in a generated class source code 
though.

> Code generation fails when running SQL expressions against a wide dataset 
> (thousands of columns)
> 
>
> Key: SPARK-17131
> URL: https://issues.apache.org/jira/browse/SPARK-17131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Iaroslav Zeigerman
> Attachments: 
> _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch
>
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to 
> generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0x
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>   val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>   val newDf = df.select(allCols: _*)
>   newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   ... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has 
> grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
>   at 
> org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
>   at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
>   at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
>   at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
>   at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

2016-09-22 Thread Aris Vlasakakis (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514085#comment-15514085
 ] 

Aris Vlasakakis commented on SPARK-17131:
-

Hi there,

I discovered a bug, and it also pertains to code generation with many columns 
-- although in my case the bugs within Janino code generation in Catalyst  
start after several hundred columns. Are these somehow related?

My bug report was merged into this one: 
[https://issues.apache.org/jira/browse/SPARK-16845]

> Code generation fails when running SQL expressions against a wide dataset 
> (thousands of columns)
> 
>
> Key: SPARK-17131
> URL: https://issues.apache.org/jira/browse/SPARK-17131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Iaroslav Zeigerman
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to 
> generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0x
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>   val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>   val newDf = df.select(allCols: _*)
>   newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   ... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has 
> grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
>   at 
> org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
>   at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
>   at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
>   at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
>   at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

2016-08-18 Thread Iaroslav Zeigerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426801#comment-15426801
 ] 

Iaroslav Zeigerman commented on SPARK-17131:


Having a different exception when trying to apply mean function to all columns:
{code}
val allCols = df.columns.map(c => mean(c))
val newDf = df.select(allCols: _*)
newDf.show()
{code}

{noformat}
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1383)
at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:555)
at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:518)
at org.codehaus.janino.util.ClassFile.(ClassFile.java:185)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:914)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:912)
at scala.collection.Iterator$class.foreach(Iterator.scala:742)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:912)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:884)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
at 
org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
...
{noformat}

> Code generation fails when running SQL expressions against a wide dataset 
> (thousands of columns)
> 
>
> Key: SPARK-17131
> URL: https://issues.apache.org/jira/browse/SPARK-17131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Iaroslav Zeigerman
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to 
> generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0x
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>   val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>   val newDf = df.select(allCols: _*)
>   newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
>   at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
>   at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
>   ... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has 
> grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
>   at 
> org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
>   at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
>   at 
> org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
>   at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
>   at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
>   at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
>