[racket-users] Re: appending files
On Sunday, January 31, 2016 at 12:13:31 AM UTC-6, Scotty C wrote: > > that's what i did. so new performance data. this is with bytes instead of > > strings for data on the hard drive but bignums in the hash still. > > > > as a single large file and a hash with 203 buckets for 26.6 million > > records the data rate is 98408/sec. > > > > when i split and go with 11 smaller files and hash with 59 buckets the > > data rate is 106281/sec. > > hash is reworked, bytes based. same format though, vector of bytes. so time > test results: > > single large file same # buckets as above data rate 175962/sec. > > 11 smaller files same # buckets as above data rate 205971/sec. throughput update. i had to hand code some of the stuff (places are just not working for me) but i just managed to hack my way through running this in parallel. i copied the original 26.6 million records to a new file. ran two slightly reworked copies of my duplicate removal code at a shell prompt like this: racket ddd-parallel.rkt & racket ddd-parallel1.rkt & i'm not messing with the single large file anymore. so for twice the data the data rate is up to 356649/sec. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
> Yes. You probably do need to convert the files. Your originalat > coding likely is not [easily] compatible with binary I/O. that's what i did. so new performance data. this is with bytes instead of strings for data on the hard drive but bignums in the hash still. as a single large file and a hash with 203 buckets for 26.6 million records the data rate is 98408/sec. when i split and go with 11 smaller files and hash with 59 buckets the data rate is 106281/sec. clearly it is quicker to read/write twice than to read/write once. but of course my laptop is pretty sad and my hash may end up being sad as well. with these revised and quicker rates, i'm ready to migrate the hash to bytes. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
just found a small mistake in the documentation: can you find it? (numerator q) → integer? q : rational? Coerces q to an exact number, finds the numerator of the number expressed in its simplest fractional form, and returns this number coerced to the exactness of q. (denominator q) → integer? q : rational? Coerces q to an exact number, finds the numerator of the number expressed in its simplest fractional form, and returns this number coerced to the exactness of q. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
> that's what i did. so new performance data. this is with bytes instead of > strings for data on the hard drive but bignums in the hash still. > > as a single large file and a hash with 203 buckets for 26.6 million > records the data rate is 98408/sec. > > when i split and go with 11 smaller files and hash with 59 buckets the > data rate is 106281/sec. hash is reworked, bytes based. same format though, vector of bytes. so time test results: single large file same # buckets as above data rate 175962/sec. 11 smaller files same # buckets as above data rate 205971/sec. i played around with the # buckets parameter but what worked for bignums worked for bytes too. overall speed has nearly doubled. very nice, thanks to all who contributed some ideas. and to think, all i wanted to do was paste some files together. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Re: appending files
> > question for you all. right now i use modulo on my bignums. i know i > > can't do that to a byte string. i'll figure something out. if any of > > you know how to do this, can you post a method? > > > > I'm not sure what your asking exactly. i'm talking about getting the hash index of a key. see my key is a bignum and i get the hash index with (modulo key 611). so i either need to turn the key (which will be a byte string) into a number. stuff that right in where i have key. or i replace modulo. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Re: appending files
> However, if you have implemented your own, you can still call > `equal-hash-code` yes, my own hash. i think the equal-hash-code will work. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
> i get the feeling that i will need to read the entire file as i used to read > it taking each record and doing the following: > convert the string record to a bignum record > convert the bignum record into a byte string > write the byte string to a new data file > > does that seem right? nevermind. this is indeed what i needed to do. the new file is 438.4 mb. the time to read, hash, write is now 317 seconds. processing rate is 83818/sec. the hash still uses bignums. the speed change is just from reading and writing bytes instead of strings. the drop in filesize is about 30% the gain is speed is about 15% -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
> i get the feeling that i will need to read the entire file as i used to read > it taking each record and doing the following: > convert the string record to a bignum record > convert the bignum record into a byte string > write the byte string to a new data file > > does that seem right? nevermind. this is indeed what i needed to do. the new file is 438.4 mb. the time to read, hash, write is now 83818 rec/sec. the hash still uses bignums. the speed change is just from reading and writing bytes instead of strings. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
> my plan right now is to rework my current hash so that it runs byte strings > instead of bignums. i have a new issue. i wrote my data as char and end records with 'return. i use (read-line x 'return) and the first record is 15 char. when i use (read-line-bytes x 'return) i get 23 byte. i have to assume that my old assumption that an 8 bit char would write to disk as 8 bits is incorrect? from documentation on read-char Reads a single character from in—which may involve reading several bytes to UTF-8-decode them into a character i get the feeling that i will need to read the entire file as i used to read it taking each record and doing the following: convert the string record to a bignum record convert the bignum record into a byte string write the byte string to a new data file does that seem right? -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
ok, had time to run my hash on my one test file '(611 1 1 19 24783208 4.19) this means # buckets % buckets empty non empty bucket # keys least non empty bucket # keys most total number of keys average number of keys per non empty bucket it took 377 sec. original # records is 26570359 so 6.7% dupes. processing rate is 70478/sec my plan right now is to rework my current hash so that it runs byte strings instead of bignums. i will probably be tomorrow afternoon before i have more stats. question for you all. right now i use modulo on my bignums. i know i can't do that to a byte string. i'll figure something out. if any of you know how to do this, can you post a method? -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] Re: appending files
On Thursday, January 28, 2016 at 11:36:50 PM UTC-6, Brandon Thomas wrote: > On Thu, 2016-01-28 at 20:32 -0800, Scotty C wrote: > > > I think you understand perfectly. > > i'm coming around > > > > > You said the keys are 128-bit (16 byte) values. You can store one > > > key > > > directly in a byte string of length 16. > > yup > > > > > So instead of using a vector of pointers to individual byte > > > strings, > > > you would allocate a single byte string of length > > > {buckets x chain length x 16} > > > and index it directly as if it were a 3-dimensional array [using > > > offset calculations as you would in C]. > > i'm going to actually run my code later (probably in the morning). > > and then grab some stats from the hash. take a look at the stats. > > give me some thoughts on the above implementation after looking at > > the stats. > > > > > Can you explain why the bignums are important. For a simple task > > > like > > > filtering a file, they would seem to be both a waste of memory and > > > a > > > performance drain wrt storing the key values as byte strings. > > ok, true story. about 10 years ago i'm a student taking an intro to > > ai class we have been assigned the 3x3 tile puzzle and 2 algorithms, > > ida* and a*. also, try it on a 4x4. so i whip these out and am > > appalled by how slow the fn things are but intrigued by what they can > > do and i'm just a puzzle kind of guy. btw, the 4x4 wasn't solvable by > > my stuff at the time. so i head to the chief's office with my little > > bit of code. tell him it's way to slow. what do you think? he takes 5 > > seconds to peruse the item and says "you're probably making too much > > memory". side note, later i graded this class for that prof and the > > same project. not a single student, including the ms and phd types, > > did anything better than a vector of vectors of ints. so back to me, > > i'm thinking, how do i cram 4 bits together so that there is > > absolutely no wasted space. i start digging through the documentation > > and i find the bitwise stuff. i see arithmetic shift and having no > > idea whether it will work or not i type into the interface > > (arithmetic-shift 1 100). if we blow up, well we blow up big but > > we didn't. i flipped everything in my project to bignum at that > > point. the bignum stuff is massively faster than vector of vectors of > > ints and faster than just a single vector of ints. lots of bignum is > > easy to implement, like the hash. making a child state, not so much. > > i'm not married to bignums despite all this. > > > > > I've been programming for over 30 years, professionally for 23. > > i was a programmer. dot com bought us as a brick and mortar back in > > '99 and the whole shebang blew up 2 years later. idiots. anyway, been > > a stay at home dad for the most part since then. > > > > > have an MS in Computer Science > > me too. that's the other piece of the most part from up above. > > > > > (current-memory-use) > > yup, tried that a while back didn't like what i saw. check this out: > > > > > (current-memory-use) > > 581753864 > > > (current-memory-use) > > 586242568 > > > (current-memory-use) > > 591181736 > > > (current-memory-use) > > 595527064 > > > > does that work for you? > > > > For whatever you're storing, I still suggest using a disk based > structure (preferably using one that's already optimised and built for > you). I've done a bit of work on cache aware algortihms, where reducing > memory footprint is really big (along with the memory juggling). Yes, > if you try to store something that takes only a few bits and store each > one into an integer, you'll have waisted space. In theory, you could > use a bignum, and only shift it as many bits as you need, which is what > you have done. The issue with that is that bignum's have extra overhead > thats neccessary for it to do arithmetic. Obviously, there needs to be > a way to match or beat bignums with primitive structures, since bignums > are implemented with primitive structures. So, if you want to beat > bignum for storage, you'll want to use some contiguous memory with > fixed sized elements (like byte strings, or arrays of uint32_t's in C) > - but using bit manipulation on each byte, such that you have multiple > stored values in each one, directly beside eachother bitwise, like a > bignum has, but without it's overhead. > > Regards, > Brandon Thomas -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
> You claim you want filtering to be as fast as possible. If that were > so, you would not pack multiple keys (or features thereof) into a > bignum but rather would store the keys individually. chasing pointers? no, you're thinking about doing some sort of byte-append and subbytes type of thing. only way that data in a hash would be small in memory and reasonably quick. care to elaborate? > The 16-byte keys are 1/3 the size of even the _smallest_ bignum, and where are you getting this? i've been digging all over the documentation and can't find a fn thing on how much space is required for any data type or it's overhead. what i do is open up htop and check memory then load drracket, a huge bignum and recheck. close drracket, check memory, restart drracket, load a huge vector and it's associated bignums and recheck. so 2 questions for you all 1)where is this info about data type memory requirements? 2)what is a better way of checking actual memory sucked up by my data structures? > comparing two small byte strings is faster than anything you can do > with the bignum. With the right data structure can put a lot more > keys into the same space you're using now and use them faster. you knew this was coming, right? put this into your data structure of choice: 16 5 1 12 6 24 17 9 2 22 4 10 13 18 19 20 0 23 7 21 15 11 8 3 14 this is a particular 5x5 tile puzzle (#6 in www.aaai.org/Papers/AAAI/1996/AAAI96-178.pdf) with the blank in a position where 4 children can be made. make a child while extracting the value of the tile being swapped with the blank. compare child to parent for equality. repeat that for the other 3 children. time testing that won't matter because we have different hardware. if you (any of you) think you have something that will best what i'm doing (bignums), bring it on. show me something cool and fast. let me check out your approach. i experimented a bit with the byte strings yesterday. what i'm doing with the bignums can't be done with byte strings. i'd have to rewrite just about everything i've got. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
what's been bothering me was trying to get the data into 16 bytes in a byte string of that length. i couldn't get that to work so gave up and just shoved the data into 25 bytes. here's a bit of code. i think it's faster than my bignum stuff. (define p (bytes 16 5 1 12 6 24 17 9 2 22 4 10 13 18 19 20 0 23 7 21 15 11 8 3 14)) (define (c1 p) ;move blank left (let( (x (bytes-ref p 15)) (c (bytes-copy p))) (bytes-set! c 15 0) (bytes-set! c 16 x) (bytes=? p c) c)) -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
> I think you understand perfectly. i'm coming around > You said the keys are 128-bit (16 byte) values. You can store one key > directly in a byte string of length 16. yup > So instead of using a vector of pointers to individual byte strings, > you would allocate a single byte string of length >{buckets x chain length x 16} > and index it directly as if it were a 3-dimensional array [using > offset calculations as you would in C]. i'm going to actually run my code later (probably in the morning). and then grab some stats from the hash. take a look at the stats. give me some thoughts on the above implementation after looking at the stats. > Can you explain why the bignums are important. For a simple task like > filtering a file, they would seem to be both a waste of memory and a > performance drain wrt storing the key values as byte strings. ok, true story. about 10 years ago i'm a student taking an intro to ai class we have been assigned the 3x3 tile puzzle and 2 algorithms, ida* and a*. also, try it on a 4x4. so i whip these out and am appalled by how slow the fn things are but intrigued by what they can do and i'm just a puzzle kind of guy. btw, the 4x4 wasn't solvable by my stuff at the time. so i head to the chief's office with my little bit of code. tell him it's way to slow. what do you think? he takes 5 seconds to peruse the item and says "you're probably making too much memory". side note, later i graded this class for that prof and the same project. not a single student, including the ms and phd types, did anything better than a vector of vectors of ints. so back to me, i'm thinking, how do i cram 4 bits together so that there is absolutely no wasted space. i start digging through the documentation and i find the bitwise stuff. i see arithmetic shift and having no idea whether it will work or not i type into the interface (arithmetic-shift 1 100). if we blow up, well we blow up big but we didn't. i flipped everything in my project to bignum at that point. the bignum stuff is massively faster than vector of vectors of ints and faster than just a single vector of ints. lots of bignum is easy to implement, like the hash. making a child state, not so much. i'm not married to bignums despite all this. > I've been programming for over 30 years, professionally for 23. i was a programmer. dot com bought us as a brick and mortar back in '99 and the whole shebang blew up 2 years later. idiots. anyway, been a stay at home dad for the most part since then. > have an MS in Computer Science me too. that's the other piece of the most part from up above. > (current-memory-use) yup, tried that a while back didn't like what i saw. check this out: > (current-memory-use) 581753864 > (current-memory-use) 586242568 > (current-memory-use) 591181736 > (current-memory-use) 595527064 does that work for you? -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
> Way back in this thread you implied that you had extremely large FILES > containing FIXED SIZE RECORDS, from which you needed > to FILTER DUPLICATE records based on the value of a FIXED SIZE KEY > field. this is mostly correct. the data is state and state associated data on the fringe. hence the name of the algorithm: fringe search. states (keys) are fixed size. associated data (due to the operator sequence) is variable size. i didn't post that here. i sent you (george) an email directly wed at 9:49 am according to my sent email box. getting this piece of the algorithm to go faster, less memory, both? awesome. my actual test file (just checked) is 633 mb. it is data from perhaps halfway through a search. the fringe for a 5x5 grows by about 9x each successive fringe. i say about 9x because as the fringes grow, the amount of redundancy will increase. when i hit the limits of my hardware and patience with this algorithm i was at 90% redundancy but that fringe file was huge. i still hadn't produced an answer for the problem and decided i needed to get the code to run in parallel. that was about 5 years ago. last spring i started reworking my old stuff to work with places and ran out of enthusiasm until about 2 weeks ago. > Doesn't work for 6x6? Well 36 6-bit values fit neatly into 216 bits > (27 bytes). the guy (korf) who did the paper on the 24 puzzle has attempted the 6x6 and failed. notice in the 24 puzzle paper that he was unable to solve one of those 10 sample problems. 5x5 is what i'm after. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
On Wednesday, January 27, 2016 at 2:57:42 AM UTC-6, gneuner2 wrote: > What is this other field on which the file is sorted? this field is the cost in operators to arrive at the key value > WRT a set of duplicates: are you throwing away all duplicates? Keeping > the 1st one encountered? Something else? keep first instance, chuck the rest > This structure uses a lot more space than necessary. Where did you > get the idea that a bignum is 10 bytes? not sure about the 10 bytes. if i shove 5 128 bit keys into a bignum is that about 80 bytes plus some overhead? so 80*600 is 480 mb not including overhead. > In the worst case of every key being a bignum no, every key is contained within a bignum which can contain many many keys. > Since you are only comparing the hash entries for equality, you could > save a lot of space [at the expense of complexity] by defining a > {bucket x chain_size x 16} byte array and storing the 16-byte keys > directly. i must be able to grow the chains. i can't make it fixed size like that. > > have another rather large bignum in memory that i use to reduce > >but not eliminate record duplication of about .5 gb. > ??? ha, ok. this is what this bignum is for. cycle elimination. a sequence of operators (2 bit per) when strung together is a number like 4126740 which represents the operator sequence (0 0 3 0 0 3 1 1 2 2 1). i change that bit in the bignum from 0 to 1. during data generation i look up my about to be applied operator sequence in the bignum. if i see a one, i skip data generation. i'm not really happy with the volume of memory this takes but it is an insanely fast lookup and keeps a ton of data off the hard drive. > In the meantime, I would suggest you look up "merge sort" and it's logarithmic? not happening -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
> Is it important to retain that sorting? Or is it just informational? it's important > Then you're not using the hash in a conventional manner ... else the > filter entries would be unique ... and we really have no clue what > you're actually doing. So any suggestions we give you are shots in > the dark. using it conventionally? absolutely. it is a hash with separate chaining. will a bit of code help? (define (put p) (define kh (hashfn p)) (vector-set! tble kh (bitwise-ior (arithmetic-shift (vector-ref tble kh) shftl) p))) -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
ok brandon, that's a thought. build the hash on the hard drive at the time of data creation. you mention collision resolution. so let me build my hash on the hard drive using my 6 million buckets but increase the size of each bucket from 5 slots to 20. right? i can't exactly recreate my vector/bignum hash on the hard drive because i can't dynamically resize the buckets like i can the bignums. this gives me a 4 gb file whereas my original was 1 gb. i have enough space for that so that's not a problem. so as my buckets fill up they head towards the average of 5 data items per bucket. so on average here's what happens with each hd hash record. i go to my hd hash and read 3.5 (think about it) items and 90% of the time i don't find my data so i do a write. in my process i do an initial write, then a read, a write, a read, a write. compare: 3.5 vs 2 reads; 1 vs 3 writes. the reads are more costly and if i exceed 20 items in a bucket the hd hash breaks. what do you think? is it worth it? -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
robby findler, you the man. i like the copy-port idea. i incorporated it and it is nice and fast and easily fit into the existing code. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
neil van dyke, i have used the system function before but had forgotten what it was called and couldn't find it as a result in the documentation. my problem with using the system function is that i need 2 versions of it: windoz and linux. the copy-port function is a write once use across multiple os solution. sweet. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
gneuner2 (george), you are over thinking this thing. my test data of 1 gb is but a small sample file. i can't even hash that small 1 gb at the time of data creation. the hashed data won't fit in ram. at the time i put the redundant data on the hard drive, i do some constant time sorting so that the redundant data on the hard drive is contained in roughly 200 usefully sorted files. some of these files will be small and can be hashed with a single read, hash and write. some will be massive (data won't fit in ram) and must be split further. this produces another another type of single read, hash and write. these split files can now be fully hashed which means a second read, hash and write. recombining the second level files is virtually instantaneous (copy-port) relative to the effort spent to get to that point. all of these operations are constant time. it would be nice to cut into that big fat hard drive induced C but i can't do it with a single read and write on the larger files. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] Re: appending files
alright george, i'm open to new ideas. here's what i've got going. running 64 bit linux mint OS on a 2 core laptop with 2 gb of ram. my key is 128 bits with ~256 bits per record. so my 1 gb file contains ~63 million records and ~32 million keys. about 8% will be dupes leaving me with ~30 million keys. i run a custom built hash. i use separate chaining with a vector of bignums. i am willing to let my chains run up to 5 keys per chain so i need a vector of 6 million pointers. that's 48 mb for the array. another 480 mb for the bignums. let's round that sum to .5 gb. i have another rather large bignum in memory that i use to reduce but not eliminate record duplication of about .5 gb. i'm attempting to get this thing to run in 2 places so i need 2 hashes. add this up .5+.5+.5 is 1.5 gb and that gets me to about my memory limit. the generated keys are random but i use one of the associated fields for sorting during the initial write to the hard drive. what goes in each of those files is totally random but dupes do not run across files. also, the number of keys is >1e25. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[racket-users] appending files
here's what i'm doing. i make a large, say 1 gb file with small records and there is some redundancy in the records. i will use a hash to identify duplicates by reading the file back in a record at a time but the file is too large to hash so i split it. the resultant files (10) are about 100 mb and are easily hashed. the resultant files need to be appended and then renamed back to the original. i know that i can do this in a linux terminal window with the following: cat mytmp*.dat >> myoriginal.dat. i'd like to accomplish this from within the program by shelling out. can't figure it out. other methodologies that are super fast will be entertained. thanks, scott -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.