Re: "GC overhead limit exceeded": Deceptive message?

2017-08-09 Thread Nathan Smutz
Thanks @Paulus, @Gary and @Peter,

Rearranging the process to let go of the head is good advice.

I believe the problem (should I need to keep all elements in memory) may 
ultimately be lazy collections inside the maps I'm producing. 
I saved 1,917 of these elements to disk and it took only 3 megabytes.

An inner functions creates a lot of lazy sequences, I believe, closing over 
large zipper structures.  
If that's the case, then I need to wrap those sequences in (doall) 
expressions or refactor to something more explicitly eager.

Best,
Nathan





On Tuesday, August 8, 2017 at 9:39:21 AM UTC-7, Paulus Esterhazy wrote:
>
> For background on "holding onto the head of a sequence" type problems, see 
>
> https://stuartsierra.com/2015/04/26/clojure-donts-concat 
>
> and 
>
> https://stackoverflow.com/questions/15994316/clojure-head-retention 
>
> On Tue, Aug 8, 2017 at 6:19 PM, Nathan Smutz  > wrote: 
> > The one thing I'm aware of holding on to is a filtered file-seq: 
> > (def the-files (filter #(s/ends-with? (.getName %) ".xml" ) (rest 
> (file-seq 
> > (io/file dw-path) 
> > There are 7,000+ files; but I'm assuming the elements there are just 
> > file-references and shouldn't take much space. 
> > 
> > The rest of the process is a transducer sequence: 
> > (def requirement-seq (sequence 
> >  (comp 
> >(map xml-zip-from-file) 
> >(remove degree-complete?) 
> >(map student-and-requirements)) 
> >  the-files)) 
> > 
> > Those functions are admittedly space inefficient (lots of work with 
> > zippers); but are pure.  What comes out the other end is a sequence of 
> > Clojure maps.  Could holding on to the file references prevent all that 
> > processing effluvia from being collected? 
> > 
> > The original files add up to 1.3 gigs altogether.  I'd expect the 
> gleaned 
> > data to be significantly smaller; but I'd better check into how close 
> that's 
> > getting to the default heap-size. 
> > 
> > Best, 
> > Nathan 
> > 
> > On Tuesday, August 8, 2017 at 1:20:21 AM UTC-7, Peter Hull wrote: 
> >> 
> >> 
> >> On Tuesday, 8 August 2017 06:20:56 UTC+1, Nathan Smutz wrote: 
> >>> 
> >>> Does this message sometimes present because the non-garbage data is 
> >>> getting too big? 
> >> 
> >> Yes, it's when most of your heap is non-garbage, so the GC has to keep 
> >> running but doesn't succeed in freeing much memory each time. 
> >> See 
> >> 
> https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html
>  
> >> 
> >> You can increases the heap but that might only defer the problem. 
> >> 
> >> As you process all your files, are you holding on to references to 
> objects 
> >> that you don't need any more? 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "Clojure" group. 
> > To post to this group, send email to clo...@googlegroups.com 
>  
> > Note that posts from new members are moderated - please be patient with 
> your 
> > first post. 
> > To unsubscribe from this group, send email to 
> > clojure+u...@googlegroups.com  
> > For more options, visit this group at 
> > http://groups.google.com/group/clojure?hl=en 
> > --- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "Clojure" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > email to clojure+u...@googlegroups.com . 
> > For more options, visit https://groups.google.com/d/optout. 
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: "GC overhead limit exceeded": Deceptive message?

2017-08-08 Thread Paulus Esterhazy
For background on "holding onto the head of a sequence" type problems, see

https://stuartsierra.com/2015/04/26/clojure-donts-concat

and

https://stackoverflow.com/questions/15994316/clojure-head-retention

On Tue, Aug 8, 2017 at 6:19 PM, Nathan Smutz  wrote:
> The one thing I'm aware of holding on to is a filtered file-seq:
> (def the-files (filter #(s/ends-with? (.getName %) ".xml" ) (rest (file-seq
> (io/file dw-path)
> There are 7,000+ files; but I'm assuming the elements there are just
> file-references and shouldn't take much space.
>
> The rest of the process is a transducer sequence:
> (def requirement-seq (sequence
>  (comp
>(map xml-zip-from-file)
>(remove degree-complete?)
>(map student-and-requirements))
>  the-files))
>
> Those functions are admittedly space inefficient (lots of work with
> zippers); but are pure.  What comes out the other end is a sequence of
> Clojure maps.  Could holding on to the file references prevent all that
> processing effluvia from being collected?
>
> The original files add up to 1.3 gigs altogether.  I'd expect the gleaned
> data to be significantly smaller; but I'd better check into how close that's
> getting to the default heap-size.
>
> Best,
> Nathan
>
> On Tuesday, August 8, 2017 at 1:20:21 AM UTC-7, Peter Hull wrote:
>>
>>
>> On Tuesday, 8 August 2017 06:20:56 UTC+1, Nathan Smutz wrote:
>>>
>>> Does this message sometimes present because the non-garbage data is
>>> getting too big?
>>
>> Yes, it's when most of your heap is non-garbage, so the GC has to keep
>> running but doesn't succeed in freeing much memory each time.
>> See
>> https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html
>>
>> You can increases the heap but that might only defer the problem.
>>
>> As you process all your files, are you holding on to references to objects
>> that you don't need any more?
>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: "GC overhead limit exceeded": Deceptive message?

2017-08-08 Thread Gary Trakhman
@Nathan the top-level (def requirement-seq ..) is probably the thing
holding on to all the objects.  Try removing the def and calling (last
(sequence (comp ..))) and see if it returns?  The purpose of a lazy
sequence is to allow processing to happen one item or chunk at a time, if
there are still problems, then maybe each element is too big, but that
top-level def is definitely a no-no.  I don't think transducers are
relevant here and you'd get the same problem with normal map/remove calls.

On Tue, Aug 8, 2017 at 12:19 PM Nathan Smutz  wrote:

> The one thing I'm aware of holding on to is a filtered file-seq:
> (def the-files (filter #(s/ends-with? (.getName %) ".xml" ) (rest
> (file-seq (io/file dw-path)
> There are 7,000+ files; but I'm assuming the elements there are just
> file-references and shouldn't take much space.
>
> The rest of the process is a transducer sequence:
> (def requirement-seq (sequence
>  (comp
>(map xml-zip-from-file)
>(remove degree-complete?)
>(map student-and-requirements))
>  the-files))
>
> Those functions are admittedly space inefficient (lots of work with
> zippers); but are pure.  What comes out the other end is a sequence of
> Clojure maps.  Could holding on to the file references prevent all that
> processing effluvia from being collected?
>
> The original files add up to 1.3 gigs altogether.  I'd expect the gleaned
> data to be significantly smaller; but I'd better check into how close
> that's getting to the default heap-size.
>
> Best,
> Nathan
>
> On Tuesday, August 8, 2017 at 1:20:21 AM UTC-7, Peter Hull wrote:
>>
>>
>> On Tuesday, 8 August 2017 06:20:56 UTC+1, Nathan Smutz wrote:
>>
>>> Does this message sometimes present because the non-garbage data is
>>> getting too big?
>>>
>> Yes, it's when most of your heap is non-garbage, so the GC has to keep
>> running but doesn't succeed in freeing much memory each time.
>> See
>> https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html
>> 
>>
>> You can increases the heap but that might only defer the problem.
>>
>> As you process all your files, are you holding on to references to
>> objects that you don't need any more?
>>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: "GC overhead limit exceeded": Deceptive message?

2017-08-08 Thread Nathan Smutz
The one thing I'm aware of holding on to is a filtered file-seq: 
(def the-files (filter #(s/ends-with? (.getName %) ".xml" ) (rest (file-seq 
(io/file dw-path)
There are 7,000+ files; but I'm assuming the elements there are just 
file-references and shouldn't take much space.

The rest of the process is a transducer sequence:
(def requirement-seq (sequence 
 (comp
   (map xml-zip-from-file)
   (remove degree-complete?)
   (map student-and-requirements))
 the-files))

Those functions are admittedly space inefficient (lots of work with 
zippers); but are pure.  What comes out the other end is a sequence of 
Clojure maps.  Could holding on to the file references prevent all that 
processing effluvia from being collected?  

The original files add up to 1.3 gigs altogether.  I'd expect the gleaned 
data to be significantly smaller; but I'd better check into how close 
that's getting to the default heap-size.

Best,
Nathan

On Tuesday, August 8, 2017 at 1:20:21 AM UTC-7, Peter Hull wrote:
>
>
> On Tuesday, 8 August 2017 06:20:56 UTC+1, Nathan Smutz wrote:
>
>> Does this message sometimes present because the non-garbage data is 
>> getting too big?
>>
> Yes, it's when most of your heap is non-garbage, so the GC has to keep 
> running but doesn't succeed in freeing much memory each time.
> See 
> https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html
>  
> 
>  
> You can increases the heap but that might only defer the problem.
>
> As you process all your files, are you holding on to references to objects 
> that you don't need any more?
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: "GC overhead limit exceeded": Deceptive message?

2017-08-08 Thread Peter Hull

On Tuesday, 8 August 2017 06:20:56 UTC+1, Nathan Smutz wrote:

> Does this message sometimes present because the non-garbage data is 
> getting too big?
>
Yes, it's when most of your heap is non-garbage, so the GC has to keep 
running but doesn't succeed in freeing much memory each time.
See 
https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html
 
You can increases the heap but that might only defer the problem.

As you process all your files, are you holding on to references to objects 
that you don't need any more?

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.