[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)
[ https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592000#comment-15592000 ] Aleksander Eskilson commented on SPARK-17131: - Yeah, that makes sense. So far, what I documented and this one seem to have been the only JIRAs that exhibit specifically the Constant Pool limit error. I'm trying to dig deeper into it to see if it really marks its own class of error, but given that SPARK-17702 didn't resolve the error case I posted (even though it splits up sections of large generated code), I do suspect they are, quite related, but ultimately different issues. I think the spliExpressions technique that was used in SPARK-17702 and that also appears to be being employed in SPARK-16845 could be useful for the range of different classes that can generate too many lines of code. Seeing the issues linked together is definitely useful. To that end, I'll leave mine resolved as a duplicate of SPARK-16845 for now until I can make use of the patch it develops, so we can see more conclusively if they're related issues, or truly duplicates. And I'll link the two "0x" issues together as related. > Code generation fails when running SQL expressions against a wide dataset > (thousands of columns) > > > Key: SPARK-17131 > URL: https://issues.apache.org/jira/browse/SPARK-17131 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Iaroslav Zeigerman > Attachments: > _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch > > > When reading the CSV file that contains 1776 columns Spark and Janino fail to > generate the code with message: > {noformat} > Constant pool has grown past JVM limit of 0x > {noformat} > When running a common select with all columns it's fine: > {code} > val allCols = df.columns.map(c => col(c).as(c + "_alias")) > val newDf = df.select(allCols: _*) > newDf.show() > {code} > But when I invoke the describe method: > {code} > newDf.describe(allCols: _*) > {code} > it fails with the following stack trace: > {noformat} > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > ... 30 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has > grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402) > at > org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300) > at > org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307) > at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346) > at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265) > at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975) > at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662) > at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654) > at org.codehaus.janino.UnitCompiler.compile2(Un
[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)
[ https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591953#comment-15591953 ] Sean Owen commented on SPARK-17131: --- OK well I think it's fine to leave one copy of the "0x" issue open if you have any reasonable reason to suspect it's different, and just link the JIRAs. I suppose I was mostly saying this could just be reopened, and separately, there are a lot of real duplicates of similar issues out there too, making it hard to figure out what the underlying unique issues are. > Code generation fails when running SQL expressions against a wide dataset > (thousands of columns) > > > Key: SPARK-17131 > URL: https://issues.apache.org/jira/browse/SPARK-17131 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Iaroslav Zeigerman > Attachments: > _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch > > > When reading the CSV file that contains 1776 columns Spark and Janino fail to > generate the code with message: > {noformat} > Constant pool has grown past JVM limit of 0x > {noformat} > When running a common select with all columns it's fine: > {code} > val allCols = df.columns.map(c => col(c).as(c + "_alias")) > val newDf = df.select(allCols: _*) > newDf.show() > {code} > But when I invoke the describe method: > {code} > newDf.describe(allCols: _*) > {code} > it fails with the following stack trace: > {noformat} > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > ... 30 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has > grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402) > at > org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300) > at > org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307) > at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346) > at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265) > at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975) > at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662) > at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643) > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)
[ https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591810#comment-15591810 ] Aleksander Eskilson commented on SPARK-17131: - Sure, I apologize for that. I'll also mark it as a duplicate of SPARK-16845 and monitor its pull-request to see if it resolves the issue I opened. > Code generation fails when running SQL expressions against a wide dataset > (thousands of columns) > > > Key: SPARK-17131 > URL: https://issues.apache.org/jira/browse/SPARK-17131 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Iaroslav Zeigerman > Attachments: > _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch > > > When reading the CSV file that contains 1776 columns Spark and Janino fail to > generate the code with message: > {noformat} > Constant pool has grown past JVM limit of 0x > {noformat} > When running a common select with all columns it's fine: > {code} > val allCols = df.columns.map(c => col(c).as(c + "_alias")) > val newDf = df.select(allCols: _*) > newDf.show() > {code} > But when I invoke the describe method: > {code} > newDf.describe(allCols: _*) > {code} > it fails with the following stack trace: > {noformat} > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > ... 30 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has > grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402) > at > org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300) > at > org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307) > at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346) > at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265) > at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975) > at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662) > at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643) > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)
[ https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591067#comment-15591067 ] Sean Owen commented on SPARK-17131: --- It may or may not be, though again I suspect a common cause with one of several JIRAs. The point here is to join potentially related discussion without conflating issues. I don't think it's useful to just make another JIRA vs reopening this one, but, this seems to be a losing battle. > Code generation fails when running SQL expressions against a wide dataset > (thousands of columns) > > > Key: SPARK-17131 > URL: https://issues.apache.org/jira/browse/SPARK-17131 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Iaroslav Zeigerman > Attachments: > _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch > > > When reading the CSV file that contains 1776 columns Spark and Janino fail to > generate the code with message: > {noformat} > Constant pool has grown past JVM limit of 0x > {noformat} > When running a common select with all columns it's fine: > {code} > val allCols = df.columns.map(c => col(c).as(c + "_alias")) > val newDf = df.select(allCols: _*) > newDf.show() > {code} > But when I invoke the describe method: > {code} > newDf.describe(allCols: _*) > {code} > it fails with the following stack trace: > {noformat} > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > ... 30 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has > grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402) > at > org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300) > at > org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307) > at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346) > at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265) > at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975) > at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662) > at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643) > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)
[ https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15590024#comment-15590024 ] Aleksander Eskilson commented on SPARK-17131: - [~sowen], [~melentye] I'm not so certain this error is the same as SPARK-16845. It seems like there have been several classes of errors all related to the sizes of individual methods growing beyond the 64 KB limit (SPARK-16845, SPARK-17702). I think this one is of a different class of error, or at least {code}Constant pool has grown past JVM limit of 0x{code} marks a different class of error. I was able to produce similar to the one first documented when trying to encode a Java object with a very wide and deeply nested schema. I've gone ahead and created a bug report for that, SPARK-18016, and in its description I've attached a small project that can reproduce the error. > Code generation fails when running SQL expressions against a wide dataset > (thousands of columns) > > > Key: SPARK-17131 > URL: https://issues.apache.org/jira/browse/SPARK-17131 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Iaroslav Zeigerman > Attachments: > _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch > > > When reading the CSV file that contains 1776 columns Spark and Janino fail to > generate the code with message: > {noformat} > Constant pool has grown past JVM limit of 0x > {noformat} > When running a common select with all columns it's fine: > {code} > val allCols = df.columns.map(c => col(c).as(c + "_alias")) > val newDf = df.select(allCols: _*) > newDf.show() > {code} > But when I invoke the describe method: > {code} > newDf.describe(allCols: _*) > {code} > it fails with the following stack trace: > {noformat} > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > ... 30 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has > grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402) > at > org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300) > at > org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307) > at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346) > at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265) > at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975) > at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662) > at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643) > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.
[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)
[ https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15553101#comment-15553101 ] Andrey Melentyev commented on SPARK-17131: -- Looks similar to https://issues.apache.org/jira/browse/SPARK-17217 btw > Code generation fails when running SQL expressions against a wide dataset > (thousands of columns) > > > Key: SPARK-17131 > URL: https://issues.apache.org/jira/browse/SPARK-17131 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Iaroslav Zeigerman > Attachments: > _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch > > > When reading the CSV file that contains 1776 columns Spark and Janino fail to > generate the code with message: > {noformat} > Constant pool has grown past JVM limit of 0x > {noformat} > When running a common select with all columns it's fine: > {code} > val allCols = df.columns.map(c => col(c).as(c + "_alias")) > val newDf = df.select(allCols: _*) > newDf.show() > {code} > But when I invoke the describe method: > {code} > newDf.describe(allCols: _*) > {code} > it fails with the following stack trace: > {noformat} > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > ... 30 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has > grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402) > at > org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300) > at > org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307) > at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346) > at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265) > at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975) > at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662) > at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643) > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)
[ https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15552994#comment-15552994 ] Andrey Melentyev commented on SPARK-17131: -- I tried wrapping the attached test into "withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> "false")" - still fails in a nasty way, printing the content of the 300K LOC generated class in seemingly endless loop. Running the code from spark-shell with --conf spark.sql.codegen.wholeStage=false, fails as well. > Code generation fails when running SQL expressions against a wide dataset > (thousands of columns) > > > Key: SPARK-17131 > URL: https://issues.apache.org/jira/browse/SPARK-17131 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Iaroslav Zeigerman > Attachments: > _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch > > > When reading the CSV file that contains 1776 columns Spark and Janino fail to > generate the code with message: > {noformat} > Constant pool has grown past JVM limit of 0x > {noformat} > When running a common select with all columns it's fine: > {code} > val allCols = df.columns.map(c => col(c).as(c + "_alias")) > val newDf = df.select(allCols: _*) > newDf.show() > {code} > But when I invoke the describe method: > {code} > newDf.describe(allCols: _*) > {code} > it fails with the following stack trace: > {noformat} > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > ... 30 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has > grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402) > at > org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300) > at > org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307) > at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346) > at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265) > at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975) > at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662) > at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643) > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)
[ https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15552947#comment-15552947 ] Sean Owen commented on SPARK-17131: --- Yeah I'm not 100% sure, though I strongly suspect a common cause. If it ends up being different we can reopen this. I though ti might be more productive to tie them together until it's clear they're not the same, but I don't mind much either way, whatever is most helpful. Can you try disabling whole stage codegen to see if that works around it? > Code generation fails when running SQL expressions against a wide dataset > (thousands of columns) > > > Key: SPARK-17131 > URL: https://issues.apache.org/jira/browse/SPARK-17131 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Iaroslav Zeigerman > Attachments: > _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch > > > When reading the CSV file that contains 1776 columns Spark and Janino fail to > generate the code with message: > {noformat} > Constant pool has grown past JVM limit of 0x > {noformat} > When running a common select with all columns it's fine: > {code} > val allCols = df.columns.map(c => col(c).as(c + "_alias")) > val newDf = df.select(allCols: _*) > newDf.show() > {code} > But when I invoke the describe method: > {code} > newDf.describe(allCols: _*) > {code} > it fails with the following stack trace: > {noformat} > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > ... 30 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has > grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402) > at > org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300) > at > org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307) > at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346) > at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265) > at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975) > at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662) > at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643) > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)
[ https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15552932#comment-15552932 ] Andrey Melentyev commented on SPARK-17131: -- [~srowen] are you sure it's a dup of SPARK-16845? The exceptions are a bit different, this one has ``` Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for class org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection has grown past JVM limit of 0x ``` while SPARK-16845 says ``` Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method "(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB ``` both are about something growing too large in a generated class source code though. > Code generation fails when running SQL expressions against a wide dataset > (thousands of columns) > > > Key: SPARK-17131 > URL: https://issues.apache.org/jira/browse/SPARK-17131 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Iaroslav Zeigerman > Attachments: > _SPARK_17131__add_a_test_case_with_1000_column_DF_where_describe___fails.patch > > > When reading the CSV file that contains 1776 columns Spark and Janino fail to > generate the code with message: > {noformat} > Constant pool has grown past JVM limit of 0x > {noformat} > When running a common select with all columns it's fine: > {code} > val allCols = df.columns.map(c => col(c).as(c + "_alias")) > val newDf = df.select(allCols: _*) > newDf.show() > {code} > But when I invoke the describe method: > {code} > newDf.describe(allCols: _*) > {code} > it fails with the following stack trace: > {noformat} > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > ... 30 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has > grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402) > at > org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300) > at > org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307) > at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346) > at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265) > at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975) > at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662) > at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643) > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.
[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)
[ https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514085#comment-15514085 ] Aris Vlasakakis commented on SPARK-17131: - Hi there, I discovered a bug, and it also pertains to code generation with many columns -- although in my case the bugs within Janino code generation in Catalyst start after several hundred columns. Are these somehow related? My bug report was merged into this one: [https://issues.apache.org/jira/browse/SPARK-16845] > Code generation fails when running SQL expressions against a wide dataset > (thousands of columns) > > > Key: SPARK-17131 > URL: https://issues.apache.org/jira/browse/SPARK-17131 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Iaroslav Zeigerman > > When reading the CSV file that contains 1776 columns Spark and Janino fail to > generate the code with message: > {noformat} > Constant pool has grown past JVM limit of 0x > {noformat} > When running a common select with all columns it's fine: > {code} > val allCols = df.columns.map(c => col(c).as(c + "_alias")) > val newDf = df.select(allCols: _*) > newDf.show() > {code} > But when I invoke the describe method: > {code} > newDf.describe(allCols: _*) > {code} > it fails with the following stack trace: > {noformat} > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > ... 30 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has > grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402) > at > org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300) > at > org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307) > at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346) > at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265) > at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975) > at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662) > at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643) > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)
[ https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426801#comment-15426801 ] Iaroslav Zeigerman commented on SPARK-17131: Having a different exception when trying to apply mean function to all columns: {code} val allCols = df.columns.map(c => mean(c)) val newDf = df.select(allCols: _*) newDf.show() {code} {noformat} java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at java.io.DataInputStream.readFully(DataInputStream.java:169) at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1383) at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:555) at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:518) at org.codehaus.janino.util.ClassFile.(ClassFile.java:185) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:914) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:912) at scala.collection.Iterator$class.foreach(Iterator.scala:742) at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:912) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:884) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) ... {noformat} > Code generation fails when running SQL expressions against a wide dataset > (thousands of columns) > > > Key: SPARK-17131 > URL: https://issues.apache.org/jira/browse/SPARK-17131 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Iaroslav Zeigerman > > When reading the CSV file that contains 1776 columns Spark and Janino fail to > generate the code with message: > {noformat} > Constant pool has grown past JVM limit of 0x > {noformat} > When running a common select with all columns it's fine: > {code} > val allCols = df.columns.map(c => col(c).as(c + "_alias")) > val newDf = df.select(allCols: _*) > newDf.show() > {code} > But when I invoke the describe method: > {code} > newDf.describe(allCols: _*) > {code} > it fails with the following stack trace: > {noformat} > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > ... 30 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has > grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402) > at > org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300) > at > org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307) > at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346) > at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185) > at > org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265) > at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290) > at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975) > at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.ja