Re:Re:Re: Re:Re: Will the HiveContext cause memory leak ?

2016-05-12 Thread
Sorry, the bug link in previous mail was is wrong. 

Here is the real link:

At 2016-05-13 09:49:05, "李明伟"  wrote:

It seems we hit the same issue.

There was a bug on 1.5.1 about memory leak. But I am using 1.6.1

Here is the link about the bug in 1.5.1

At 2016-05-12 23:10:43, "Simon Schiff [via Apache Spark User List]" 
I read with Spark-Streaming from a Port. The incoming data consists of key and 
value pairs. Then I call forEachRDD on each window. There I create a Dataset 
from the window and do some SQL Querys on it. On the result i only do show, to 
see the content. It works well, but the memory usage increases. When it reaches 
the maximum nothing works anymore. When I use more memory. The Program runs 
some time longer, but the problem persists. Because I run a Programm which 
writes to the Port, I can control perfectly how much Data Spark has to Process. 
When I write every one ms one key and value Pair the Problem is the same as 
when i write only every second a key and value pair to the port.

When I dont create a Dataset in the foreachRDD and only count the Elements in 
the RDD, then everything works fine. I also use groupBy agg functions in the 

If you reply to this email, your message will be added to the discussion below:
To unsubscribe from Will the HiveContext cause memory leak ?, click here.


View this message in context:
Sent from the Apache Spark User List mailing list archive at

Re:Re: Re:Re: Will the HiveContext cause memory leak ?

2016-05-12 Thread
It seems we hit the same issue.

There was a bug on 1.5.1 about memory leak. But I am using 1.6.1

Here is the link about the bug in 1.5.1

At 2016-05-12 23:10:43, "Simon Schiff [via Apache Spark User List]" 
I read with Spark-Streaming from a Port. The incoming data consists of key and 
value pairs. Then I call forEachRDD on each window. There I create a Dataset 
from the window and do some SQL Querys on it. On the result i only do show, to 
see the content. It works well, but the memory usage increases. When it reaches 
the maximum nothing works anymore. When I use more memory. The Program runs 
some time longer, but the problem persists. Because I run a Programm which 
writes to the Port, I can control perfectly how much Data Spark has to Process. 
When I write every one ms one key and value Pair the Problem is the same as 
when i write only every second a key and value pair to the port.

When I dont create a Dataset in the foreachRDD and only count the Elements in 
the RDD, then everything works fine. I also use groupBy agg functions in the 

If you reply to this email, your message will be added to the discussion below:
To unsubscribe from Will the HiveContext cause memory leak ?, click here.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

Re:Re: Will the HiveContext cause memory leak ?

2016-05-11 Thread
Hi Simon

Can you describe your problem in more details? 
I suspect that my problem is because the window function (or may be the groupBy 
agg functions).
If you are the same. May be we should report a bug 

At 2016-05-11 23:46:49, "Simon Schiff [via Apache Spark User List]" 
I have the same Problem with Spark-2.0.0 Snapshot with Streaming. There I use 
Datasets instead of Dataframes. I hope you or someone will find a solution.

If you reply to this email, your message will be added to the discussion below:
To unsubscribe from Will the HiveContext cause memory leak ?, click here.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

Re:Re: Will the HiveContext cause memory leak ?

2016-05-10 Thread 李明伟
Hi  Ted

Spark version :  spark-1.6.0-bin-hadoop2.6
I tried increase the memory of executor. Still have the same problem.
I can use jmap to capture some thing. But the output is too difficult to 

在 2016-05-11 11:50:14,"Ted Yu"  写道:

Which Spark release are you using ?

I assume executor crashed due to OOME.

Did you have a chance to capture jmap on the executor before it crashed ?

Have you tried giving more memory to the executor ?


On Tue, May 10, 2016 at 8:25 PM, wrote:
I submit my code to a spark stand alone cluster. Find the memory usage
executor process keeps growing. Which cause the program to crash.

I modified the code and submit several times. Find below 4 line may causing
the issue

dataframe =
windowSpec =
rank = func.dense_rank().over(windowSpec)
ret =['router'],dataframe['interface'],dataframe['bits'],

It looks a little complicated but it is just some Window function on
dataframe. I use the HiveContext because SQLContext do not support window
function yet. Without the 4 line, my code can run all night. Adding them
will cause the memory leak. Program will crash in a few hours.

I will provided the whole code (50 lines)here.

Please advice me if it is a bug..

Also here is the submit command

nohup ./bin/spark-submit  \
--master spark://ES01:7077 \
--executor-memory 4G \
--num-executors 1 \
--total-executor-cores 1 \
--conf ""  \
./ 1>a.log 2>b.log &

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail: