Re: Very long pause/hang at end of execution

2016-11-16 Thread Michael Johnson
On Wed, Nov 16, 2016 at 10:44 AM Aniket Bhatnagar  
wrote:
Thanks for sharing the thread dump. I had a look at them and couldn't find 
anything unusual. Is there anything in the logs (driver + executor) that 
suggests what's going on? Also, what does the spark job do and what is the 
version of spark and hadoop you are using?

I haven't seen anything in the logs; when I observed it happening before, in 
local mode, the last output before the hang would be a log statement from my 
code (that is, I had a log4j logger and was calling info() on that logger). 
That was also the last line of my main() function. Then, I saw no more output, 
neither from the driver nor the executors. I have seen the pause be as short as 
a few minutes, or approaching an hour. As far as I can tell, when it continues, 
the log statements look more or less normal.
Locally, I'm using Spark 2.0.1 built for Hadoop 2.7, but without installing 
Hadoop. Remotely, I'm running on Google Cloud Dataproc, which also uses Spark 
2.0.1, along with Hadoop 2.7.3. I've had it happen both locally and remotely.
The job loads data from a text file (using SparkContext.textFile()), and then 
splits each line and converts it into an array of integers. From there, I do 
some sketching (the data encodes either a tree, a graph, or text, and I create 
a fixed-length sketch that probabilistically produces similar results for 
similar nodes in the tree/graph). I then do some lightweight clustering on the 
sketches, and save the cluster assignments to a text file.
For what it's worth, when I look at the GC stats from the UI, they seem a bit 
high (they can be as high as 1 minute GC for a 15 minute run). However, those 
stats do not change during the pause period.
On Wed, Nov 16, 2016 at 2:48 AM Aniket Bhatnagar  
wrote:
Also, how are you launching the application? Through spark submit or creating 
spark content in your app? 


I'm calling spark-submit, and then within my app I call 
SparkContext.getOrCreate() to get a context. I then call sc.textFile() to load 
my data into an RDD, and then perform various actions on that. I tried adding a 
call to sc.stop() at the very end, after seeing some discussion that that might 
be necessary, but it didn't seem to make a difference.
The strange thing is that this behavior comes and goes. I tried opening the UI, 
as Pietro suggested, but that didn't seem to trigger it for me; I haven't 
figured out what, if anything, will make it happen every time.
On Wednesday, November 16, 2016 4:41 AM, Pietro Pugni  
wrote:

I have the same issue with Spark 2.0.1, Java 1.8.x and pyspark. I also use 
SparkSQL and JDBC. My application runs locally. It happens only of I connect to 
the UI during Spark execution and even if I close the browser before the 
execution ends. I observed this behaviour both on macOS Sierra and Red Hat 6.7

That is interesting that you are seeing this too. I can't get it to happen by 
using the UI...but I also am having difficulty making it happen at all right 
now. (Only trying locally at the moment.)
   

Re: Very long pause/hang at end of execution

2016-11-16 Thread Aniket Bhatnagar
Also, how are you launching the application? Through spark submit or
creating spark content in your app?

Thanks,
Aniket

On Wed, Nov 16, 2016 at 10:44 AM Aniket Bhatnagar <
aniket.bhatna...@gmail.com> wrote:

> Thanks for sharing the thread dump. I had a look at them and couldn't find
> anything unusual. Is there anything in the logs (driver + executor) that
> suggests what's going on? Also, what does the spark job do and what is the
> version of spark and hadoop you are using?
>
> Thanks,
> Aniket
>
>
> On Wed, Nov 16, 2016 at 2:07 AM Michael Johnson 
> wrote:
>
> The extremely long hand/pause has started happening again. I've been
> running on a small remote cluster, so I used the UI to grab thread dumps
> rather than doing it from the command line. There seems to be one executor
> still alive, along with the driver; I grabbed 4 thread dumps from each, a
> couple of seconds apart. I'd greatly appreciate any help tracking down
> what's going on! (I've attached them, but I can paste them somewhere if
> that's more convenient.)
>
> Thanks,
> Michael
>
>
>
>
> On Sunday, November 6, 2016 10:49 PM, Michael Johnson
>  wrote:
>
>
> Hm. Something must have changed, as it was happening quite consistently
> and now I can't get it to reproduce. Thank you for the offer, and if it
> happens again I will try grabbing thread dumps and I will see if I can
> figure out what is going on.
>
>
> On Sunday, November 6, 2016 10:02 AM, Aniket Bhatnagar <
> aniket.bhatna...@gmail.com> wrote:
>
>
> I doubt it's GC as you mentioned that the pause is several minutes. Since
> it's reproducible in local mode, can you run the spark application locally
> and once your job is complete (and application appears paused), can you
> take 5 thread dumps (using jstack or jcmd on the local spark JVM process)
> with 1 second delay between each dump and attach them? I can take a look.
>
> Thanks,
> Aniket
>
> On Sun, Nov 6, 2016 at 2:21 PM Michael Johnson 
> wrote:
>
> Thanks; I tried looking at the thread dumps for the driver and the one
> executor that had that option in the UI, but I'm afraid I don't know how to
> interpret what I saw...  I don't think it could be my code directly, since
> at this point my code has all completed? Could GC be taking that long?
>
> (I could also try grabbing the thread dumps and pasting them here, if that
> would help?)
>
> On Sunday, November 6, 2016 8:36 AM, Aniket Bhatnagar <
> aniket.bhatna...@gmail.com> wrote:
>
>
> In order to know what's going on, you can study the thread dumps either
> from spark UI or from any other thread dump analysis tool.
>
> Thanks,
> Aniket
>
> On Sun, Nov 6, 2016 at 1:31 PM Michael Johnson
>  wrote:
>
> I'm doing some processing and then clustering of a small dataset (~150
> MB). Everything seems to work fine, until the end; the last few lines of my
> program are log statements, but after printing those, nothing seems to
> happen for a long time...many minutes; I'm not usually patient enough to
> let it go, but I think one time when I did just wait, it took over an hour
> (and did eventually exit on its own). Any ideas on what's happening, or how
> to troubleshoot?
>
> (This happens both when running locally, using the localhost mode, as well
> as on a small cluster with four 4-processor nodes each with 15GB of RAM; in
> both cases the executors have 2GB+ of RAM, and none of the inputs/outputs
> on any of the stages is more than 75 MB...)
>
> Thanks,
> Michael
>
>
>
>
>
>
>
>


Re: Very long pause/hang at end of execution

2016-11-16 Thread Aniket Bhatnagar
Thanks for sharing the thread dump. I had a look at them and couldn't find
anything unusual. Is there anything in the logs (driver + executor) that
suggests what's going on? Also, what does the spark job do and what is the
version of spark and hadoop you are using?

Thanks,
Aniket

On Wed, Nov 16, 2016 at 2:07 AM Michael Johnson 
wrote:

> The extremely long hand/pause has started happening again. I've been
> running on a small remote cluster, so I used the UI to grab thread dumps
> rather than doing it from the command line. There seems to be one executor
> still alive, along with the driver; I grabbed 4 thread dumps from each, a
> couple of seconds apart. I'd greatly appreciate any help tracking down
> what's going on! (I've attached them, but I can paste them somewhere if
> that's more convenient.)
>
> Thanks,
> Michael
>
>
>
>
> On Sunday, November 6, 2016 10:49 PM, Michael Johnson
>  wrote:
>
>
> Hm. Something must have changed, as it was happening quite consistently
> and now I can't get it to reproduce. Thank you for the offer, and if it
> happens again I will try grabbing thread dumps and I will see if I can
> figure out what is going on.
>
>
> On Sunday, November 6, 2016 10:02 AM, Aniket Bhatnagar <
> aniket.bhatna...@gmail.com> wrote:
>
>
> I doubt it's GC as you mentioned that the pause is several minutes. Since
> it's reproducible in local mode, can you run the spark application locally
> and once your job is complete (and application appears paused), can you
> take 5 thread dumps (using jstack or jcmd on the local spark JVM process)
> with 1 second delay between each dump and attach them? I can take a look.
>
> Thanks,
> Aniket
>
> On Sun, Nov 6, 2016 at 2:21 PM Michael Johnson 
> wrote:
>
> Thanks; I tried looking at the thread dumps for the driver and the one
> executor that had that option in the UI, but I'm afraid I don't know how to
> interpret what I saw...  I don't think it could be my code directly, since
> at this point my code has all completed? Could GC be taking that long?
>
> (I could also try grabbing the thread dumps and pasting them here, if that
> would help?)
>
> On Sunday, November 6, 2016 8:36 AM, Aniket Bhatnagar <
> aniket.bhatna...@gmail.com> wrote:
>
>
> In order to know what's going on, you can study the thread dumps either
> from spark UI or from any other thread dump analysis tool.
>
> Thanks,
> Aniket
>
> On Sun, Nov 6, 2016 at 1:31 PM Michael Johnson
>  wrote:
>
> I'm doing some processing and then clustering of a small dataset (~150
> MB). Everything seems to work fine, until the end; the last few lines of my
> program are log statements, but after printing those, nothing seems to
> happen for a long time...many minutes; I'm not usually patient enough to
> let it go, but I think one time when I did just wait, it took over an hour
> (and did eventually exit on its own). Any ideas on what's happening, or how
> to troubleshoot?
>
> (This happens both when running locally, using the localhost mode, as well
> as on a small cluster with four 4-processor nodes each with 15GB of RAM; in
> both cases the executors have 2GB+ of RAM, and none of the inputs/outputs
> on any of the stages is more than 75 MB...)
>
> Thanks,
> Michael
>
>
>
>
>
>
>
>


Re: Very long pause/hang at end of execution

2016-11-16 Thread Pietro Pugni
I have the same issue with Spark 2.0.1, Java 1.8.x and pyspark. I also use
SparkSQL and JDBC. My application runs locally. It happens only of I
connect to the UI during Spark execution and even if I close the browser
before the execution ends. I observed this behaviour both on macOS Sierra
and Red Hat 6.7

Il 16 nov 2016 3:09 AM, "Michael Johnson" 
ha scritto:

> The extremely long hand/pause has started happening again. I've been
> running on a small remote cluster, so I used the UI to grab thread dumps
> rather than doing it from the command line. There seems to be one executor
> still alive, along with the driver; I grabbed 4 thread dumps from each, a
> couple of seconds apart. I'd greatly appreciate any help tracking down
> what's going on! (I've attached them, but I can paste them somewhere if
> that's more convenient.)
>
> Thanks,
> Michael
>
>
>
>
> On Sunday, November 6, 2016 10:49 PM, Michael Johnson <
> mjjohnson@yahoo.com.INVALID> wrote:
>
>
> Hm. Something must have changed, as it was happening quite consistently
> and now I can't get it to reproduce. Thank you for the offer, and if it
> happens again I will try grabbing thread dumps and I will see if I can
> figure out what is going on.
>
>
> On Sunday, November 6, 2016 10:02 AM, Aniket Bhatnagar <
> aniket.bhatna...@gmail.com> wrote:
>
>
> I doubt it's GC as you mentioned that the pause is several minutes. Since
> it's reproducible in local mode, can you run the spark application locally
> and once your job is complete (and application appears paused), can you
> take 5 thread dumps (using jstack or jcmd on the local spark JVM process)
> with 1 second delay between each dump and attach them? I can take a look.
>
> Thanks,
> Aniket
>
> On Sun, Nov 6, 2016 at 2:21 PM Michael Johnson 
> wrote:
>
> Thanks; I tried looking at the thread dumps for the driver and the one
> executor that had that option in the UI, but I'm afraid I don't know how to
> interpret what I saw...  I don't think it could be my code directly, since
> at this point my code has all completed? Could GC be taking that long?
>
> (I could also try grabbing the thread dumps and pasting them here, if that
> would help?)
>
> On Sunday, November 6, 2016 8:36 AM, Aniket Bhatnagar <
> aniket.bhatna...@gmail.com> wrote:
>
>
> In order to know what's going on, you can study the thread dumps either
> from spark UI or from any other thread dump analysis tool.
>
> Thanks,
> Aniket
>
> On Sun, Nov 6, 2016 at 1:31 PM Michael Johnson 
> 
> wrote:
>
> I'm doing some processing and then clustering of a small dataset (~150
> MB). Everything seems to work fine, until the end; the last few lines of my
> program are log statements, but after printing those, nothing seems to
> happen for a long time...many minutes; I'm not usually patient enough to
> let it go, but I think one time when I did just wait, it took over an hour
> (and did eventually exit on its own). Any ideas on what's happening, or how
> to troubleshoot?
>
> (This happens both when running locally, using the localhost mode, as well
> as on a small cluster with four 4-processor nodes each with 15GB of RAM; in
> both cases the executors have 2GB+ of RAM, and none of the inputs/outputs
> on any of the stages is more than 75 MB...)
>
> Thanks,
> Michael
>
>
>
>
>
>
>
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>


Re: Very long pause/hang at end of execution

2016-11-06 Thread Michael Johnson
Hm. Something must have changed, as it was happening quite consistently and now 
I can't get it to reproduce. Thank you for the offer, and if it happens again I 
will try grabbing thread dumps and I will see if I can figure out what is going 
on. 

On Sunday, November 6, 2016 10:02 AM, Aniket Bhatnagar 
 wrote:
 

 I doubt it's GC as you mentioned that the pause is several minutes. Since it's 
reproducible in local mode, can you run the spark application locally and once 
your job is complete (and application appears paused), can you take 5 thread 
dumps (using jstack or jcmd on the local spark JVM process) with 1 second delay 
between each dump and attach them? I can take a look.
Thanks,Aniket
On Sun, Nov 6, 2016 at 2:21 PM Michael Johnson  wrote:

Thanks; I tried looking at the thread dumps for the driver and the one executor 
that had that option in the UI, but I'm afraid I don't know how to interpret 
what I saw...  I don't think it could be my code directly, since at this point 
my code has all completed? Could GC be taking that long? 
(I could also try grabbing the thread dumps and pasting them here, if that 
would help?)

On Sunday, November 6, 2016 8:36 AM, Aniket Bhatnagar 
 wrote:
 

 In order to know what's going on, you can study the thread dumps either from 
spark UI or from any other thread dump analysis tool.
Thanks,Aniket
On Sun, Nov 6, 2016 at 1:31 PM Michael Johnson 
 wrote:

I'm doing some processing and then clustering of a small dataset (~150 MB). 
Everything seems to work fine, until the end; the last few lines of my program 
are log statements, but after printing those, nothing seems to happen for a 
long time...many minutes; I'm not usually patient enough to let it go, but I 
think one time when I did just wait, it took over an hour (and did eventually 
exit on its own). Any ideas on what's happening, or how to troubleshoot?
(This happens both when running locally, using the localhost mode, as well as 
on a small cluster with four 4-processor nodes each with 15GB of RAM; in both 
cases the executors have 2GB+ of RAM, and none of the inputs/outputs on any of 
the stages is more than 75 MB...)
Thanks,Michael


   


   

Re: Very long pause/hang at end of execution

2016-11-06 Thread Gourav Sengupta
Hi,

In case your process finishes after a lag, then please check whether you
are writing by converting to Pandas or using coalesce (in which case entire
traffic is being directed to a single node) or writing over S3, in which
case there can be lags.

Regards,
Gourav

On Sun, Nov 6, 2016 at 1:28 PM, Michael Johnson <
mjjohnson@yahoo.com.invalid> wrote:

> I'm doing some processing and then clustering of a small dataset (~150
> MB). Everything seems to work fine, until the end; the last few lines of my
> program are log statements, but after printing those, nothing seems to
> happen for a long time...many minutes; I'm not usually patient enough to
> let it go, but I think one time when I did just wait, it took over an hour
> (and did eventually exit on its own). Any ideas on what's happening, or how
> to troubleshoot?
>
> (This happens both when running locally, using the localhost mode, as well
> as on a small cluster with four 4-processor nodes each with 15GB of RAM; in
> both cases the executors have 2GB+ of RAM, and none of the inputs/outputs
> on any of the stages is more than 75 MB...)
>
> Thanks,
> Michael
>


Re: Very long pause/hang at end of execution

2016-11-06 Thread Aniket Bhatnagar
I doubt it's GC as you mentioned that the pause is several minutes. Since
it's reproducible in local mode, can you run the spark application locally
and once your job is complete (and application appears paused), can you
take 5 thread dumps (using jstack or jcmd on the local spark JVM process)
with 1 second delay between each dump and attach them? I can take a look.

Thanks,
Aniket

On Sun, Nov 6, 2016 at 2:21 PM Michael Johnson 
wrote:

> Thanks; I tried looking at the thread dumps for the driver and the one
> executor that had that option in the UI, but I'm afraid I don't know how to
> interpret what I saw...  I don't think it could be my code directly, since
> at this point my code has all completed? Could GC be taking that long?
>
> (I could also try grabbing the thread dumps and pasting them here, if that
> would help?)
>
> On Sunday, November 6, 2016 8:36 AM, Aniket Bhatnagar <
> aniket.bhatna...@gmail.com> wrote:
>
>
> In order to know what's going on, you can study the thread dumps either
> from spark UI or from any other thread dump analysis tool.
>
> Thanks,
> Aniket
>
> On Sun, Nov 6, 2016 at 1:31 PM Michael Johnson
>  wrote:
>
> I'm doing some processing and then clustering of a small dataset (~150
> MB). Everything seems to work fine, until the end; the last few lines of my
> program are log statements, but after printing those, nothing seems to
> happen for a long time...many minutes; I'm not usually patient enough to
> let it go, but I think one time when I did just wait, it took over an hour
> (and did eventually exit on its own). Any ideas on what's happening, or how
> to troubleshoot?
>
> (This happens both when running locally, using the localhost mode, as well
> as on a small cluster with four 4-processor nodes each with 15GB of RAM; in
> both cases the executors have 2GB+ of RAM, and none of the inputs/outputs
> on any of the stages is more than 75 MB...)
>
> Thanks,
> Michael
>
>
>
>


Re: Very long pause/hang at end of execution

2016-11-06 Thread Michael Johnson
Thanks; I tried looking at the thread dumps for the driver and the one executor 
that had that option in the UI, but I'm afraid I don't know how to interpret 
what I saw...  I don't think it could be my code directly, since at this point 
my code has all completed? Could GC be taking that long? 
(I could also try grabbing the thread dumps and pasting them here, if that 
would help?)

On Sunday, November 6, 2016 8:36 AM, Aniket Bhatnagar 
 wrote:
 

 In order to know what's going on, you can study the thread dumps either from 
spark UI or from any other thread dump analysis tool.
Thanks,Aniket
On Sun, Nov 6, 2016 at 1:31 PM Michael Johnson 
 wrote:

I'm doing some processing and then clustering of a small dataset (~150 MB). 
Everything seems to work fine, until the end; the last few lines of my program 
are log statements, but after printing those, nothing seems to happen for a 
long time...many minutes; I'm not usually patient enough to let it go, but I 
think one time when I did just wait, it took over an hour (and did eventually 
exit on its own). Any ideas on what's happening, or how to troubleshoot?
(This happens both when running locally, using the localhost mode, as well as 
on a small cluster with four 4-processor nodes each with 15GB of RAM; in both 
cases the executors have 2GB+ of RAM, and none of the inputs/outputs on any of 
the stages is more than 75 MB...)
Thanks,Michael


   

Re: Very long pause/hang at end of execution

2016-11-06 Thread Aniket Bhatnagar
In order to know what's going on, you can study the thread dumps either
from spark UI or from any other thread dump analysis tool.

Thanks,
Aniket

On Sun, Nov 6, 2016 at 1:31 PM Michael Johnson
 wrote:

> I'm doing some processing and then clustering of a small dataset (~150
> MB). Everything seems to work fine, until the end; the last few lines of my
> program are log statements, but after printing those, nothing seems to
> happen for a long time...many minutes; I'm not usually patient enough to
> let it go, but I think one time when I did just wait, it took over an hour
> (and did eventually exit on its own). Any ideas on what's happening, or how
> to troubleshoot?
>
> (This happens both when running locally, using the localhost mode, as well
> as on a small cluster with four 4-processor nodes each with 15GB of RAM; in
> both cases the executors have 2GB+ of RAM, and none of the inputs/outputs
> on any of the stages is more than 75 MB...)
>
> Thanks,
> Michael
>