Re: Streaming a big file

2015-05-05 Thread Fluid Dynamics
On Tuesday, May 5, 2015 at 11:18:56 PM UTC-4, Sam Raker wrote:

 I've got two really big CSV files that I need to compare. Stay tuned for 
 the semi-inevitable how do I optimize over this M x N space? question, 
 but for now I'm still trying to get the data into a reasonable format--I'm 
 planning on converting each line into a map, with keys coming from either 
 the first line of the file, or a separate list I was given. Non-lazy 
 approaches run into memory limitations; lazy approaches run into  Stream 
 closed exceptions while trying to coordinate `with-open` and `line-seq`. 
 Given that memory is already tight, I'd like to avoid leaving open 
 files/file descriptors/readers/whatever-the-term-in-clojure-is lying 
 around. I've tried writing a macro, I've tried transducers, I've tried 
 passing around the open reader along with the lazy seq, none successfully, 
 albeit none necessarily particularly well. Any suggestions on streaming 
 such big files?


Something like this didn't work?

(with-open [rdr1 ...
rdr2 ...]
  (let [l1 (line-seq rdr1)
l2 (line-seq rdr2)]
(- (map something l1 l2)
  (filter whatever)
  (first

For instance, to check if two text files are the same, something would be 
not= and whatever would be identity, and the result would be nil if they 
were the same, and something truthy otherwise. The first has the effect of 
short circuiting when the result is known, and neither line-seq's head 
should be held. The first also has the effect of ensuring the with-open 
scope is not left until as much of both line-seqs are consumed as will be 
needed. Reduce and the use of trans/reducers that get reduced would have 
the same effect.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Streaming a big file

2015-05-05 Thread Sam Raker
I've got two really big CSV files that I need to compare. Stay tuned for 
the semi-inevitable how do I optimize over this M x N space? question, 
but for now I'm still trying to get the data into a reasonable format--I'm 
planning on converting each line into a map, with keys coming from either 
the first line of the file, or a separate list I was given. Non-lazy 
approaches run into memory limitations; lazy approaches run into  Stream 
closed exceptions while trying to coordinate `with-open` and `line-seq`. 
Given that memory is already tight, I'd like to avoid leaving open 
files/file descriptors/readers/whatever-the-term-in-clojure-is lying 
around. I've tried writing a macro, I've tried transducers, I've tried 
passing around the open reader along with the lazy seq, none successfully, 
albeit none necessarily particularly well. Any suggestions on streaming 
such big files?



Thanks!

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Streaming a big file

2015-05-05 Thread Alan Busby
I wrote a library ( https://github.com/thebusby/iota ) to handle a very
similar issue, which I hope would be of some help to you here.

On Wed, May 6, 2015 at 12:18 PM, Sam Raker sam.ra...@gmail.com wrote:

 I've got two really big CSV files that I need to compare. Stay tuned for
 the semi-inevitable how do I optimize over this M x N space? question,
 but for now I'm still trying to get the data into a reasonable format--I'm
 planning on converting each line into a map, with keys coming from either
 the first line of the file, or a separate list I was given. Non-lazy
 approaches run into memory limitations; lazy approaches run into  Stream
 closed exceptions while trying to coordinate `with-open` and `line-seq`.
 Given that memory is already tight, I'd like to avoid leaving open
 files/file descriptors/readers/whatever-the-term-in-clojure-is lying
 around. I've tried writing a macro, I've tried transducers, I've tried
 passing around the open reader along with the lazy seq, none successfully,
 albeit none necessarily particularly well. Any suggestions on streaming
 such big files?



 Thanks!

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are subscribed to the Google Groups
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.