date:20170415

Fwd: Problem with Execution plan using loop

2017-04-15 Thread Javier Rey

Hi guys, I have this situation: 1. Data frame with 22 columns 2. I need to add some columns (feature engineering) using existing columns, 12 columns will be add by each column in list. 3. I created a loop, but in the 5 item(col) on the loop this starts to go very slow in the join part, I can

Problem with Execution plan using loop

2017-04-15 Thread Javier Rey

Hi guys, I have this situation: 1. Data frame with 22 columns 2. I need to add some columns (feature engineering) using existing columns, 12 columns will be add by each column in list. 3. I created a loop, but in the 5 item(col) on the loop this starts to go very slow in the join part, I can

Re: Memory problems with simple ETL in Pyspark

2017-04-15 Thread ayan guha

What i missed is try increasing number of partitions using repartition On Sun, 16 Apr 2017 at 11:06 am, ayan guha wrote: > It does not look like scala vs python thing. How big is your audience data > store? Can it be broadcasted? > > What is the memory footprint you are

Re: Memory problems with simple ETL in Pyspark

2017-04-15 Thread ayan guha

It does not look like scala vs python thing. How big is your audience data store? Can it be broadcasted? What is the memory footprint you are seeing? At what point yarn is killing? Depeneding on that you may want to tweak around number of partitions of input dataset and increase number of

Join streams Apache Spark

2017-04-15 Thread tencas

Hi everybody, I am using Apache Spark Streaming using a TCP connector to receive data. I have a python application that connects to a sensor, and create a TCP server that waits connection from Apache Spark, and then, sends json data through this socket. How can I manage to join many independent

Fwd: Problem with Execution plan using loop

Problem with Execution plan using loop

Re: Memory problems with simple ETL in Pyspark

Re: Memory problems with simple ETL in Pyspark

Join streams Apache Spark

5 matches

Site Navigation

Mail list logo

Footer information