Re: Running out of memory when using filter?

2008-12-14 Thread Mark Engelberg
On Sat, Dec 13, 2008 at 5:51 AM, Rich Hickey richhic...@gmail.com wrote: No you can't, for the same reasons you can't for Iterator or Enumeration seqs. Again it comes down to abstractions, and the abstraction for (seq x) is one on persistent collections. It presumes that (seq x) is

Re: Running out of memory when using filter?

2008-12-13 Thread Rich Hickey
On Dec 13, 2008, at 2:18 AM, Mark Engelberg wrote: On Fri, Dec 12, 2008 at 9:28 PM, Rich Hickey richhic...@gmail.com wrote: I think it's very important not to conflate different notions of sequences. Clojure's model a very specific abstraction, the Lisp list, originally implemented

Re: Running out of memory when using filter?

2008-12-12 Thread Mark Engelberg
On Fri, Dec 12, 2008 at 6:37 AM, Rich Hickey richhic...@gmail.com wrote: I'm appreciate the time you and others have spent on this, and will improve filter, but I'm not sure where you are getting your presumptions about lazy sequences. They are not a magic bullet that makes working with data

Re: Running out of memory when using filter?

2008-12-12 Thread Randall R Schulz
On Friday 12 December 2008 15:15, Mark Engelberg wrote: ... --Mark Not being nearly sophisticated enough in Clojure, FP or the relevant concepts to say anything other than that all makes complete sense to me, I wonder only what would be the impact on existing programs were the default to

Re: Running out of memory when using filter?

2008-12-12 Thread Rich Hickey
On Dec 12, 6:15 pm, Mark Engelberg mark.engelb...@gmail.com wrote: On Fri, Dec 12, 2008 at 6:37 AM, Rich Hickey richhic...@gmail.com wrote: I'm appreciate the time you and others have spent on this, and will improve filter, but I'm not sure where you are getting your presumptions about

Re: Running out of memory when using filter?

2008-12-12 Thread Mark Engelberg
On Fri, Dec 12, 2008 at 5:28 PM, Paul Mooser taron...@gmail.com wrote: On Dec 12, 3:15 pm, Mark Engelberg mark.engelb...@gmail.com wrote: And in fact, it turns out that in those languages, uncached lazy lists end up rarely used. Did you mean that the cached lazy lists are rarely used? Or

Re: Running out of memory when using filter?

2008-12-12 Thread Mark Engelberg
On Fri, Dec 12, 2008 at 9:28 PM, Rich Hickey richhic...@gmail.com wrote: I think it's very important not to conflate different notions of sequences. Clojure's model a very specific abstraction, the Lisp list, originally implemented as a singly-linked list of cons cells. It is a persistent

Re: Running out of memory when using filter?

2008-12-11 Thread Mark Engelberg
On Mon, Dec 8, 2008 at 6:51 PM, Rich Hickey richhic...@gmail.com wrote: I don't have the latest build of Clojure with atoms, so I reimplemented Rich's filter solution using refs, turning: (defn filter [pred coll] (let [sa (atom (seq coll)) step (fn step []

Re: Running out of memory when using filter?

2008-12-08 Thread Mark Engelberg
Has anyone made progress on this bug? The simplest form of the bug was this: (defn splode [n] (doseq [i (filter #(= % 20) (map inc (range n)))])) This blows the heap, but it shouldn't. I find this deeply troubling, because if this doesn't work, it undermines my faith in the implementation

Re: Running out of memory when using filter?

2008-12-08 Thread Paul Mooser
Is there a place we should file an official bug, so that Rich and the rest of the clojure people are aware of it? I imagine that they may have read this thread, but I'm not sure if there is an official process to make sure these things get addressed. As I said in a previous reply, it's not clear

Re: Running out of memory when using filter?

2008-12-08 Thread MikeM
I share your concern about the LazyCons problem - hopefully Rich and others are looking into this. I have continued to experiment to see if I can gain some understanding that might help with a solution. The following is something I thought of today, and I'd like to see if others get the same

Re: Running out of memory when using filter?

2008-12-08 Thread Stephen C. Gilardi
On Dec 8, 2008, at 7:45 PM, Mark Engelberg wrote: I have an idea to try, but I'm not set up to build the java sources on my computer, so maybe someone else can run with it: This looked very promising to me. For one thing, I remembered that the root of the big chain of LazyCons objects in

Re: Running out of memory when using filter?

2008-12-08 Thread Stephen C. Gilardi
On Dec 8, 2008, at 8:40 PM, Stephen C. Gilardi wrote: This looked very promising to me. For one thing, I remembered that the root of the big chain of LazyCons objects in memory (as displayed by the YourKit profiler) was f. Now it's r and is listed as a stack local. --Steve smime.p7s

Re: Running out of memory when using filter?

2008-12-08 Thread Stephen C. Gilardi
I think I finally see the problem. The rest expression in filter's call to lazy-cons has a reference to coll in it. That's all it takes for coll to be retained during the entire calculation of the rest. (defn filter Returns a lazy seq of the items in coll for which (pred item) returns

Re: Running out of memory when using filter?

2008-12-08 Thread Mark Engelberg
On Mon, Dec 8, 2008 at 5:56 PM, Stephen C. Gilardi [EMAIL PROTECTED] wrote: I think I finally see the problem. The rest expression in filter's call to lazy-cons has a reference to coll in it. That's all it takes for coll to be retained during the entire calculation of the rest. Well, I had

Re: Running out of memory when using filter?

2008-12-08 Thread Mark Engelberg
On Mon, Dec 8, 2008 at 5:56 PM, Stephen C. Gilardi [EMAIL PROTECTED] wrote: I think I finally see the problem. The rest expression in filter's call to lazy-cons has a reference to coll in it. That's all it takes for coll to be retained during the entire calculation of the rest. Well, I had

Re: Running out of memory when using filter?

2008-12-08 Thread Rich Hickey
On Dec 8, 2008, at 8:56 PM, Stephen C. Gilardi wrote: I think I finally see the problem. The rest expression in filter's call to lazy-cons has a reference to coll in it. That's all it takes for coll to be retained during the entire calculation of the rest. (defn filter Returns a

Re: Running out of memory when using filter?

2008-12-07 Thread rzeze...@gmail.com
On Dec 7, 1:52 am, Chouser [EMAIL PROTECTED] wrote: On Sun, Dec 7, 2008 at 1:16 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I'm also running into, what I believe to be, the same problem.  Every time I run the following code I get java.lang.OutOfMemoryError: Java heap space. (use

Re: Running out of memory when using filter?

2008-12-06 Thread Chouser
On Sat, Dec 6, 2008 at 4:52 AM, Paul Mooser [EMAIL PROTECTED] wrote: Ok, even after the change precisely as you described, I still get the same result. The following object is a GC root of a graph worth several hundreds of megabytes of memory, consisting basically of a giant chain of

Re: Running out of memory when using filter?

2008-12-06 Thread MikeM
A while back I posted an experimental patch to capture each form that is compiled into a fn. This might be useful in your case, where you have identified a function but don't have details on it. Here's the thread: http://groups.google.com/group/clojure/browse_frm/thread/e63e48a71935e31a?q= Also,

Re: Running out of memory when using filter?

2008-12-06 Thread Paul Mooser
I can, and what it has is the elements you might expect for something defined in the context of filter: coll pred coll is a lazy-cons, and is the start of a chain of a larger number of lazy conses. One thing that is interesting is that it does not appear to be the head of the sequence - it's

Re: Running out of memory when using filter?

2008-12-06 Thread Paul Mooser
On Dec 6, 5:45 am, Chouser [EMAIL PROTECTED] wrote: This may not be worth much, but can you see the data members of that object? It's not itself the head of a cons chain, presumably, so I'm wondering if the data member that *is* at the head has a useful name. I can, and it has the elements

Re: Running out of memory when using filter?

2008-12-06 Thread Paul Mooser
On Dec 6, 9:37 am, MikeM [EMAIL PROTECTED] wrote: A while back I posted an experimental patch to capture each form that is compiled into a fn. This might be useful in your case, where you have identified a function but don't have details on it. Here's the

Re: Running out of memory when using filter?

2008-12-06 Thread Stephen C. Gilardi
On Dec 6, 2008, at 6:21 PM, Paul Mooser wrote: (defn splode [index-path] (with-local-vars [doc-count 0] (doseq [document (filter my-filter-pred (document-seq index- path))] (var-set doc-count (inc @doc-count))) 'done)) The fn in question is likely the one in the macro expansion

Re: Running out of memory when using filter?

2008-12-06 Thread Paul Mooser
On Dec 6, 4:35 pm, Stephen C. Gilardi [EMAIL PROTECTED] wrote: The fn in question is likely the one in the macro expansion of the   lazy-cons inside filter. As a next step, I would try replacing the   call to doseq with the equivalent loop/recur: That's a good idea, Steve - I didn't totally

Re: Running out of memory when using filter?

2008-12-06 Thread Stephen C. Gilardi
On Dec 6, 2008, at 8:28 PM, Paul Mooser wrote: That's a good idea, Steve - I didn't totally understand the code you included, but I do always forget that clojure has destructuring in its binding forms, so I rewrote it like this, which I believe should be fine (correct me if I am wrong):

Re: Running out of memory when using filter?

2008-12-06 Thread MikeM
I think I've been able to duplicate your results with a simpler example. I first tried this: (defn splode [n] (with-local-vars [doc-count 0] (doseq [document (filter #(= % 1) (range n))] (var-set doc-count (inc @doc-count))) 'done)) and it doesn't blow up with (splode

Re: Running out of memory when using filter?

2008-12-06 Thread Stephen C. Gilardi
This also fails: (defn splode [n] (doseq [document (filter #(= % 20) (map inc (range n)))])) Looking at the heap dump, I see that the first item for which the filter returns true is the root of the chain of lazy cons's that's being kept. The filter is constructing a lazy-cons from the

Re: Running out of memory when using filter?

2008-12-06 Thread Paul Mooser
You reproduced it for sure -- On Dec 6, 5:58 pm, MikeM [EMAIL PROTECTED] wrote: Next, I tried this (since your app filters the seq from a map): (defn splode2 [n]   (with-local-vars [doc-count 0]     (doseq [document (filter #(= % 1) (map inc (range n)))]       (var-set doc-count (inc

Re: Running out of memory when using filter?

2008-12-06 Thread Stephen C. Gilardi
On Dec 6, 2008, at 9:48 PM, Paul Mooser wrote: I also saw your subsequent example which uses a different anonymous function which does NOT blow up, and that's very interesting. I'm not sure why this would be, but it seems that filter ends up holding on to the collection its filtering

Re: Running out of memory when using filter?

2008-12-06 Thread Paul Mooser
I think I understand. The lazy-cons that filter is constructing maintains a reference to the whole coll in its tail so that it can evaluate (rest coll) when it is forced. Hmmm! --~--~-~--~~~---~--~~ You received this message because you are subscribed to the

Re: Running out of memory when using filter?

2008-12-06 Thread Stephen C. Gilardi
On Dec 6, 2008, at 10:27 PM, Paul Mooser wrote: I think I understand. The lazy-cons that filter is constructing maintains a reference to the whole coll in its tail so that it can evaluate (rest coll) when it is forced. Hmmm! In the current implementation of filter, coll is held the entire

Re: Running out of memory when using filter?

2008-12-06 Thread Mark Engelberg
Except your version of filter doesn't do any filtering on the rest in the case where the first satisfies the predicate. On Sat, Dec 6, 2008 at 7:43 PM, Stephen C. Gilardi [EMAIL PROTECTED] wrote: If you use a definition of filter like this in your test, I think it will succeed: (defn

Re: Running out of memory when using filter?

2008-12-06 Thread Stephen C. Gilardi
On Dec 6, 2008, at 10:43 PM, Stephen C. Gilardi wrote: The following separation into two functions appears to solve it. I'll be looking at simplifying it. Well it doesn't run out of memory, but it's not an implementation of filter either... ah well. --Steve smime.p7s Description:

Re: Running out of memory when using filter?

2008-12-06 Thread Stephen C. Gilardi
On Dec 6, 2008, at 10:52 PM, Mark Engelberg wrote: Except your version of filter doesn't do any filtering on the rest in the case where the first satisfies the predicate. Right. It's looking to me now like this may have to be solved in Java. I don't see how to write this in Clojure without

Re: Running out of memory when using filter?

2008-12-06 Thread Mark Engelberg
Well, part of the puzzle is to figure out why filter works just fine on the output of the range function, but not on the output of the map function. I'm starting to wonder whether there might be a fundamental bug in the java implementation of LazyCons. Maybe it doesn't implement first

Re: Running out of memory when using filter?

2008-12-06 Thread puzzler
OK, I see what you're saying now. Range doesn't cause problems because it's not coded in a way that links to a bunch of other cells. So it's plausible that the problem is the way filter hangs on to the collection while generating the rest (since it's not a tail-recursive call in that case).

Re: Running out of memory when using filter?

2008-12-06 Thread Paul Mooser
On Dec 6, 8:38 pm, puzzler [EMAIL PROTECTED] wrote: Maybe LazyCons shouldn't cache.  Make LazyCons something that executes its function every time.  For most things, it's not a problem because sequences are often traversed only once.  If a person wants to cache it for multiple traversals, he

Re: Running out of memory when using filter?

2008-12-06 Thread [EMAIL PROTECTED]
I'm also running into, what I believe to be, the same problem. Every time I run the following code I get java.lang.OutOfMemoryError: Java heap space. (use 'clojure.contrib.duck-streams) (count (line-seq (reader big.csv))) If I change count to dorun then it will return without problem. I

Re: Running out of memory when using filter?

2008-12-06 Thread Chouser
On Sun, Dec 7, 2008 at 1:16 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: I'm also running into, what I believe to be, the same problem. Every time I run the following code I get java.lang.OutOfMemoryError: Java heap space. (use 'clojure.contrib.duck-streams) (count (line-seq (reader

Re: Running out of memory when using filter?

2008-12-05 Thread Paul Mooser
I'm continuing to try to suss this out, so I decided to run with a memory profiler. I'm seeing tens of thousands of lazy conses accounting for hundreds of megabytes of memory, which perhaps implies I'm holding on to a reference to them somewhere, but I just don't see how, since as I showed above,

Re: Running out of memory when using filter?

2008-12-05 Thread Paul Mooser
My operating theory was that the anonymous function being used by filter was closing over both parameters to the enclosing function, but making a simple modification to avoid that didn't seem to address the problem. --~--~-~--~~~---~--~~ You received this message

Re: Running out of memory when using filter?

2008-12-05 Thread Paul Mooser
The memory profiler says that the following object is a GC root which is holding onto the collection being passed into the filter call: clojure.core$filter__3364$fn__3367 I'm not familiar enough with clojure's internals to speculate about what that means, beyond what I've already mentioned

Re: Running out of memory when using filter?

2008-12-05 Thread Stuart Sierra
On Dec 5, 4:59 pm, Paul Mooser [EMAIL PROTECTED] wrote: The memory profiler says that the following object is a GC root which is holding onto the collection being passed into the filter call: clojure.core$filter__3364$fn__3367 That class should be the instance of the anonymous fn you

Running out of memory when using filter?

2008-12-04 Thread Paul Mooser
I've been writing a few different functions that return and operate on some very long sequences. Everything I've been doing has been written in terms of underlying operations that are described as lazy, so I assumed everything would be fine as long as I don't retain the head of any sequences.

Re: Running out of memory when using filter?

2008-12-04 Thread Stuart Sierra
On Dec 4, 6:20 pm, Paul Mooser [EMAIL PROTECTED] wrote: However, I'm running out of heap space using the following function to filter my sequences: ... I know that filter is lazy, and I see that the documentation of lazy- cons says that lazy-conses are cached. Cached for what duration ? Does