Re: [Help] Codegen Stage grows beyond 64 KB

2018-06-20 Thread Kazuaki Ishizaki
2018/06/21 01:29 Subject: Re: [Help] Codegen Stage grows beyond 64 KB Hi Kazuaki, It would be really difficult to produce a small S-A code to reproduce this problem because, I'm running through a big pipeline of feature engineering where I derive a lot of variables based on the pr

Re: [Help] Codegen Stage grows beyond 64 KB

2018-06-20 Thread Aakash Basu
2018/06/18 01:57 > Subject:Re: [Help] Codegen Stage grows beyond 64 KB > -- > > > > Totally agreed with Eyal . > > The problem is that when Java programs generated using Catalyst from > programs using DataFrame and Dataset are compi

Re: [Help] Codegen Stage grows beyond 64 KB

2018-06-20 Thread Kazuaki Ishizaki
that the community will address this problem. Best regards, Kazuaki Ishizaki From: vaquar khan To: Eyal Zituny Cc: Aakash Basu , user Date: 2018/06/18 01:57 Subject:Re: [Help] Codegen Stage grows beyond 64 KB Totally agreed with Eyal . The problem is that when Java

Re: [Help] Codegen Stage grows beyond 64 KB

2018-06-17 Thread vaquar khan
Totally agreed with Eyal . The problem is that when Java programs generated using Catalyst from programs using DataFrame and Dataset are compiled into Java bytecode, the size of byte code of one method must not be 64 KB or more, This conflicts with the limitation of the Java class file, which is

Re: [Help] Codegen Stage grows beyond 64 KB

2018-06-17 Thread Eyal Zituny
Hi Akash, such errors might appear in large spark pipelines, the root cause is a 64kb jvm limitation. the reason that your job isn't failing at the end is due to spark fallback - if code gen is failing, spark compiler will try to create the flow without the code gen (less optimized) if you do not

Re: [Help] Codegen Stage grows beyond 64 KB

2018-06-16 Thread Aakash Basu
Hi, I already went through it, that's one use case. I've a complex and very big pipeline of multiple jobs under one spark session. Not getting, on how to solve this, as it is happening over Logistic Regression and Random Forest models, which I'm just using from Spark ML package rather than doing

Re: [Help] Codegen Stage grows beyond 64 KB

2018-06-16 Thread vaquar khan
Hi Akash, Please check stackoverflow. https://stackoverflow.com/questions/41098953/codegen-grows-beyond-64-kb-error-when-normalizing-large-pyspark-dataframe Regards, Vaquar khan On Sat, Jun 16, 2018 at 3:27 PM, Aakash Basu wrote: > Hi guys, > > I'm getting an error when I'm feature

[Help] Codegen Stage grows beyond 64 KB

2018-06-16 Thread Aakash Basu
Hi guys, I'm getting an error when I'm feature engineering on 30+ columns to create about 200+ columns. It is not failing the job, but the ERROR shows. I want to know how can I avoid this. Spark - 2.3.1 Python - 3.6 Cluster Config - 1 Master - 32 GB RAM, 16 Cores 4 Slaves - 16 GB RAM, 8 Cores