Hi Akash,

Please check stackoverflow.

https://stackoverflow.com/questions/41098953/codegen-grows-beyond-64-kb-error-when-normalizing-large-pyspark-dataframe

Regards,
Vaquar khan

On Sat, Jun 16, 2018 at 3:27 PM, Aakash Basu <[email protected]>
wrote:

> Hi guys,
>
> I'm getting an error when I'm feature engineering on 30+ columns to create
> about 200+ columns. It is not failing the job, but the ERROR shows. I want
> to know how can I avoid this.
>
> Spark - 2.3.1
> Python - 3.6
>
> Cluster Config -
> 1 Master - 32 GB RAM, 16 Cores
> 4 Slaves - 16 GB RAM, 8 Cores
>
>
> Input data - 8 partitions of parquet file with snappy compression.
>
> My Spark-Submit -> spark-submit --master spark://192.168.60.20:7077
> --num-executors 4 --executor-cores 5 --executor-memory 10G --driver-cores 5
> --driver-memory 25G --conf spark.sql.shuffle.partitions=60 --conf
> spark.driver.maxResultSize=2G --conf "spark.executor.
> extraJavaOptions=-XX:+UseParallelGC" --conf 
> spark.scheduler.listenerbus.eventqueue.capacity=20000
> --conf spark.sql.codegen=true 
> /appdata/bblite-codebase/pipeline_data_test_run.py
> > /appdata/bblite-data/logs/log_10_iter_pipeline_8_partitions_33_col.txt
>
> Stack-Trace below -
>
> ERROR CodeGenerator:91 - failed to compile: 
> org.codehaus.janino.InternalCompilerException:
>> Compiling "GeneratedClass": Code of method "processNext()V" of class
>> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$
>> GeneratedIteratorForCodegenStage3426" grows beyond 64 KB
>> org.codehaus.janino.InternalCompilerException: Compiling
>> "GeneratedClass": Code of method "processNext()V" of class
>> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$
>> GeneratedIteratorForCodegenStage3426" grows beyond 64 KB
>>     at org.codehaus.janino.UnitCompiler.compileUnit(
>> UnitCompiler.java:361)
>>     at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:234)
>>     at org.codehaus.janino.SimpleCompiler.compileToClassLoader(
>> SimpleCompiler.java:446)
>>     at org.codehaus.janino.ClassBodyEvaluator.compileToClass(
>> ClassBodyEvaluator.java:313)
>>     at org.codehaus.janino.ClassBodyEvaluator.cook(
>> ClassBodyEvaluator.java:235)
>>     at org.codehaus.janino.SimpleCompiler.cook(SimpleCompiler.java:204)
>>     at org.codehaus.commons.compiler.Cookable.cook(Cookable.java:80)
>>     at org.apache.spark.sql.catalyst.expressions.codegen.
>> CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$
>> CodeGenerator$$doCompile(CodeGenerator.scala:1417)
>>     at org.apache.spark.sql.catalyst.expressions.codegen.
>> CodeGenerator$$anon$1.load(CodeGenerator.scala:1493)
>>     at org.apache.spark.sql.catalyst.expressions.codegen.
>> CodeGenerator$$anon$1.load(CodeGenerator.scala:1490)
>>     at org.spark_project.guava.cache.LocalCache$LoadingValueReference.
>> loadFuture(LocalCache.java:3599)
>>     at org.spark_project.guava.cache.LocalCache$Segment.loadSync(
>> LocalCache.java:2379)
>>     at org.spark_project.guava.cache.LocalCache$Segment.
>> lockedGetOrLoad(LocalCache.java:2342)
>>     at org.spark_project.guava.cache.LocalCache$Segment.get(
>> LocalCache.java:2257)
>>     at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
>>     at org.spark_project.guava.cache.LocalCache.getOrLoad(
>> LocalCache.java:4004)
>>     at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.
>> get(LocalCache.java:4874)
>>     at org.apache.spark.sql.catalyst.expressions.codegen.
>> CodeGenerator$.compile(CodeGenerator.scala:1365)
>>     at org.apache.spark.sql.execution.WholeStageCodegenExec.
>> liftedTree1$1(WholeStageCodegenExec.scala:579)
>>     at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(
>> WholeStageCodegenExec.scala:578)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:131)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> executeQuery$1.apply(SparkPlan.scala:155)
>>     at org.apache.spark.rdd.RDDOperationScope$.withScope(
>> RDDOperationScope.scala:151)
>>     at org.apache.spark.sql.execution.SparkPlan.
>> executeQuery(SparkPlan.scala:152)
>>     at org.apache.spark.sql.execution.SparkPlan.execute(
>> SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.
>> prepareShuffleDependency(ShuffleExchangeExec.scala:92)
>>     at org.apache.spark.sql.execution.exchange.
>> ShuffleExchangeExec$$anonfun$doExecute$1.apply(
>> ShuffleExchangeExec.scala:128)
>>     at org.apache.spark.sql.execution.exchange.
>> ShuffleExchangeExec$$anonfun$doExecute$1.apply(
>> ShuffleExchangeExec.scala:119)
>>     at org.apache.spark.sql.catalyst.errors.package$.attachTree(
>> package.scala:52)
>>     at org.apache.spark.sql.execution.exchange.
>> ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:131)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> executeQuery$1.apply(SparkPlan.scala:155)
>>     at org.apache.spark.rdd.RDDOperationScope$.withScope(
>> RDDOperationScope.scala:151)
>>     at org.apache.spark.sql.execution.SparkPlan.
>> executeQuery(SparkPlan.scala:152)
>>     at org.apache.spark.sql.execution.SparkPlan.execute(
>> SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.InputAdapter.inputRDDs(
>> WholeStageCodegenExec.scala:371)
>>     at org.apache.spark.sql.execution.SortExec.inputRDDs(
>> SortExec.scala:121)
>>     at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(
>> WholeStageCodegenExec.scala:605)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:131)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> executeQuery$1.apply(SparkPlan.scala:155)
>>     at org.apache.spark.rdd.RDDOperationScope$.withScope(
>> RDDOperationScope.scala:151)
>>     at org.apache.spark.sql.execution.SparkPlan.
>> executeQuery(SparkPlan.scala:152)
>>     at org.apache.spark.sql.execution.SparkPlan.execute(
>> SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.joins.SortMergeJoinExec.doExecute(
>> SortMergeJoinExec.scala:150)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:131)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> executeQuery$1.apply(SparkPlan.scala:155)
>>     at org.apache.spark.rdd.RDDOperationScope$.withScope(
>> RDDOperationScope.scala:151)
>>     at org.apache.spark.sql.execution.SparkPlan.
>> executeQuery(SparkPlan.scala:152)
>>     at org.apache.spark.sql.execution.SparkPlan.execute(
>> SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.ProjectExec.doExecute(
>> basicPhysicalOperators.scala:70)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:131)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> executeQuery$1.apply(SparkPlan.scala:155)
>>     at org.apache.spark.rdd.RDDOperationScope$.withScope(
>> RDDOperationScope.scala:151)
>>     at org.apache.spark.sql.execution.SparkPlan.
>> executeQuery(SparkPlan.scala:152)
>>     at org.apache.spark.sql.execution.SparkPlan.execute(
>> SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.joins.SortMergeJoinExec.doExecute(
>> SortMergeJoinExec.scala:150)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:131)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> executeQuery$1.apply(SparkPlan.scala:155)
>>     at org.apache.spark.rdd.RDDOperationScope$.withScope(
>> RDDOperationScope.scala:151)
>>     at org.apache.spark.sql.execution.SparkPlan.
>> executeQuery(SparkPlan.scala:152)
>>     at org.apache.spark.sql.execution.SparkPlan.execute(
>> SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.ProjectExec.doExecute(
>> basicPhysicalOperators.scala:70)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:131)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> execute$1.apply(SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.SparkPlan$$anonfun$
>> executeQuery$1.apply(SparkPlan.scala:155)
>>     at org.apache.spark.rdd.RDDOperationScope$.withScope(
>> RDDOperationScope.scala:151)
>>     at org.apache.spark.sql.execution.SparkPlan.
>> executeQuery(SparkPlan.scala:152)
>>     at org.apache.spark.sql.execution.SparkPlan.execute(
>> SparkPlan.scala:127)
>>     at org.apache.spark.sql.execution.columnar.
>> InMemoryRelation.buildBuffers(InMemoryRelation.scala:107)
>>     at org.apache.spark.sql.execution.columnar.InMemoryRelation.<init>(
>> InMemoryRelation.scala:102)
>>     at org.apache.spark.sql.execution.columnar.InMemoryRelation$.apply(
>> InMemoryRelation.scala:43)
>>     at org.apache.spark.sql.execution.CacheManager$$
>> anonfun$cacheQuery$1.apply(CacheManager.scala:97)
>>     at org.apache.spark.sql.execution.CacheManager.
>> writeLock(CacheManager.scala:67)
>>     at org.apache.spark.sql.execution.CacheManager.
>> cacheQuery(CacheManager.scala:91)
>>     at org.apache.spark.sql.Dataset.persist(Dataset.scala:2924)
>>     at sun.reflect.GeneratedMethodAccessor78.invoke(Unknown Source)
>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>> DelegatingMethodAccessorImpl.java:43)
>>     at java.lang.reflect.Method.invoke(Method.java:498)
>>     at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>>     at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>>     at py4j.Gateway.invoke(Gateway.java:282)
>>     at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.
>> java:132)
>>     at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>     at py4j.GatewayConnection.run(GatewayConnection.java:238)
>>     at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.codehaus.janino.InternalCompilerException: Code of method
>> "processNext()V" of class "org.apache.spark.sql.catalyst.expressions.
>> GeneratedClass$GeneratedIteratorForCodegenStage3426" grows beyond 64 KB
>>
>
> Thanks,
> Aakash.
>



-- 
Regards,
Vaquar Khan
+1 -224-436-0783
Greater Chicago

Reply via email to