[racket-users] Re: appending files

2016-01-31 Thread Scotty C
On Sunday, January 31, 2016 at 12:13:31 AM UTC-6, Scotty C wrote: > > that's what i did. so new performance data. this is with bytes instead of > > strings for data on the hard drive but bignums in the hash still. > > > > as a single large file and a hash with 203 buckets for 26.6 million >

[racket-users] Re: appending files

2016-01-30 Thread Scotty C
> Yes. You probably do need to convert the files. Your originalat > coding likely is not [easily] compatible with binary I/O. that's what i did. so new performance data. this is with bytes instead of strings for data on the hard drive but bignums in the hash still. as a single large file and

Re: [racket-users] Re: appending files

2016-01-30 Thread Benjamin Greenman
Fixed, thanks for the report! On Sat, Jan 30, 2016 at 8:31 PM, Scotty C wrote: > just found a small mistake in the documentation: can you find it? > > (numerator q) → integer? > > q : rational? > > Coerces q to an exact number, finds the numerator of the

[racket-users] Re: appending files

2016-01-30 Thread Scotty C
just found a small mistake in the documentation: can you find it? (numerator q) → integer? q : rational? Coerces q to an exact number, finds the numerator of the number expressed in its simplest fractional form, and returns this number coerced to the exactness of q.

[racket-users] Re: appending files

2016-01-30 Thread Scotty C
> that's what i did. so new performance data. this is with bytes instead of > strings for data on the hard drive but bignums in the hash still. > > as a single large file and a hash with 203 buckets for 26.6 million > records the data rate is 98408/sec. > > when i split and go with 11

Re: [racket-users] Re: appending files

2016-01-29 Thread Scotty C
> > question for you all. right now i use modulo on my bignums. i know i > > can't do that to a byte string. i'll figure something out. if any of > > you know how to do this, can you post a method? > > > > I'm not sure what your asking exactly. i'm talking about getting the hash index of a key.

Re: [racket-users] Re: appending files

2016-01-29 Thread Jon Zeppieri
On Fri, Jan 29, 2016 at 7:04 PM, Scotty C wrote: > > > question for you all. right now i use modulo on my bignums. i know i > > > can't do that to a byte string. i'll figure something out. if any of > > > you know how to do this, can you post a method? > > > > > > > I'm not

Re: [racket-users] Re: appending files

2016-01-29 Thread Scotty C
> However, if you have implemented your own, you can still call > `equal-hash-code` yes, my own hash. i think the equal-hash-code will work. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving

Re: [racket-users] Re: appending files

2016-01-29 Thread Jon Zeppieri
On Fri, Jan 29, 2016 at 7:45 PM, Scotty C wrote: > > my plan right now is to rework my current hash so that it runs byte > strings instead of bignums. > > i have a new issue. i wrote my data as char and end records with 'return. > i use (read-line x 'return) and the first

Re: [racket-users] Re: appending files

2016-01-29 Thread Brandon Thomas
On Fri, 2016-01-29 at 13:00 -0800, Scotty C wrote: > ok, had time to run my hash on my one test file > '(611 1 1 19 24783208 4.19) > this means > # buckets > % buckets empty > non empty bucket # keys least > non empty bucket # keys most > total number of keys > average number of keys per non

[racket-users] Re: appending files

2016-01-29 Thread Scotty C
> i get the feeling that i will need to read the entire file as i used to read > it taking each record and doing the following: > convert the string record to a bignum record > convert the bignum record into a byte string > write the byte string to a new data file > > does that seem right?

[racket-users] Re: appending files

2016-01-29 Thread George Neuner
On Fri, 29 Jan 2016 16:45:40 -0800 (PST), Scotty C wrote: >i have a new issue. i wrote my data as char and end records with 'return. i >use (read-line x 'return) and the first record is 15 char. when i use > (read-line-bytes x 'return) i get 23 byte. i have to assume that

[racket-users] Re: appending files

2016-01-29 Thread Scotty C
> i get the feeling that i will need to read the entire file as i used to read > it taking each record and doing the following: > convert the string record to a bignum record > convert the bignum record into a byte string > write the byte string to a new data file > > does that seem right?

[racket-users] Re: appending files

2016-01-29 Thread George Neuner
On Thu, 28 Jan 2016 20:32:08 -0800 (PST), Scotty C wrote: >> (current-memory-use) >yup, tried that a while back didn't like what i saw. check this out: > >> (current-memory-use) >581753864 >> (current-memory-use) >586242568 >> (current-memory-use) >591181736 >>

[racket-users] Re: appending files

2016-01-29 Thread Scotty C
> my plan right now is to rework my current hash so that it runs byte strings > instead of bignums. i have a new issue. i wrote my data as char and end records with 'return. i use (read-line x 'return) and the first record is 15 char. when i use (read-line-bytes x 'return) i get 23 byte. i

[racket-users] Re: appending files

2016-01-29 Thread Scotty C
ok, had time to run my hash on my one test file '(611 1 1 19 24783208 4.19) this means # buckets % buckets empty non empty bucket # keys least non empty bucket # keys most total number of keys average number of keys per non empty bucket it took 377 sec. original # records is 26570359 so 6.7%

Re: [racket-users] Re: appending files

2016-01-28 Thread Scotty C
On Thursday, January 28, 2016 at 11:36:50 PM UTC-6, Brandon Thomas wrote: > On Thu, 2016-01-28 at 20:32 -0800, Scotty C wrote: > > > I think you understand perfectly. > > i'm coming around > > > > > You said the keys are 128-bit (16 byte) values.  You can store one > > > key > > > directly in a

[racket-users] Re: appending files

2016-01-28 Thread Scotty C
> You claim you want filtering to be as fast as possible. If that were > so, you would not pack multiple keys (or features thereof) into a > bignum but rather would store the keys individually. chasing pointers? no, you're thinking about doing some sort of byte-append and subbytes type of thing.

[racket-users] Re: appending files

2016-01-28 Thread George Neuner
On Thu, 28 Jan 2016 07:56:05 -0800 (PST), Scotty C wrote: >you knew this was coming, right? put this into your data structure of choice: > >16 5 1 12 6 24 17 9 2 22 4 10 13 18 19 20 0 23 7 21 15 11 8 3 14 > >this is a particular 5x5 tile puzzle >(#6 in

[racket-users] Re: appending files

2016-01-28 Thread George Neuner
On Thu, 28 Jan 2016 07:56:05 -0800 (PST), Scotty C wrote: >> You claim you want filtering to be as fast as possible. If that were >> so, you would not pack multiple keys (or features thereof) into a >> bignum but rather would store the keys individually. > >chasing

[racket-users] Re: appending files

2016-01-28 Thread Scotty C
what's been bothering me was trying to get the data into 16 bytes in a byte string of that length. i couldn't get that to work so gave up and just shoved the data into 25 bytes. here's a bit of code. i think it's faster than my bignum stuff. (define p (bytes 16 5 1 12 6 24 17 9 2 22 4 10 13 18

[racket-users] Re: appending files

2016-01-28 Thread George Neuner
On Thu, 28 Jan 2016 11:49:09 -0800 (PST), Scotty C wrote: >what's been bothering me was trying to get the data into 16 bytes in >a byte string of that length. i couldn't get that to work so gave up and >just shoved the data into 25 bytes. here's a bit of code. i think it's

[racket-users] Re: appending files

2016-01-28 Thread George Neuner
On Wed, 27 Jan 2016 19:43:49 -0800 (PST), Scotty C wrote: >> Then you're not using the hash in a conventional manner ... else the >> filter entries would be unique > >using it conventionally? absolutely. it is a hash with separate chaining. You snipped the part I was

[racket-users] Re: appending files

2016-01-28 Thread Scotty C
> I think you understand perfectly. i'm coming around > You said the keys are 128-bit (16 byte) values. You can store one key > directly in a byte string of length 16. yup > So instead of using a vector of pointers to individual byte strings, > you would allocate a single byte string of length

Re: [racket-users] Re: appending files

2016-01-28 Thread Brandon Thomas
On Thu, 2016-01-28 at 20:32 -0800, Scotty C wrote: > > I think you understand perfectly. > i'm coming around > > > You said the keys are 128-bit (16 byte) values.  You can store one > > key > > directly in a byte string of length 16. > yup > > > So instead of using a vector of pointers to

[racket-users] Re: appending files

2016-01-28 Thread Scotty C
> Way back in this thread you implied that you had extremely large FILES > containing FIXED SIZE RECORDS, from which you needed > to FILTER DUPLICATE records based on the value of a FIXED SIZE KEY > field. this is mostly correct. the data is state and state associated data on the fringe. hence

Re: [racket-users] Re: appending files

2016-01-27 Thread George Neuner
On 1/27/2016 10:50 AM, Brandon Thomas wrote: On Wed, 2016-01-27 at 04:01 -0500, George Neuner wrote: > On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas > wrote: > > > Is there anything stopping you from restructuring > > the data on disk and using the hash directly from

Re: [racket-users] Re: appending files

2016-01-27 Thread George Neuner
ers@googlegroups.com> on behalf of George Neuner <gneun...@comcast.net> Sent: Wednesday, January 27, 2016 4:28 AM To: racket-users@googlegroups.com Subject: Re: [racket-users] Re: appending files Sorry. I shouldn't do math at 4am. Ignore the numbers. However, it is still correct that th

Re: [racket-users] Re: appending files

2016-01-27 Thread Brandon Thomas
On Wed, 2016-01-27 at 17:49 -0500, George Neuner wrote: > On 1/27/2016 10:50 AM, Brandon Thomas wrote: > > On Wed, 2016-01-27 at 04:01 -0500, George Neuner wrote: > > > On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas > > > wrote: > > > > > > > Is there anything stopping

[racket-users] Re: appending files

2016-01-27 Thread George Neuner
On Wed, 27 Jan 2016 11:17:04 -0800 (PST), Scotty C wrote: >On Wednesday, January 27, 2016 at 2:57:42 AM UTC-6, gneuner2 wrote: > >> What is this other field on which the file is sorted? >this field is the cost in operators to arrive at the key value Is it important to

Re: [racket-users] Re: appending files

2016-01-27 Thread Brandon Thomas
On Wed, 2016-01-27 at 04:01 -0500, George Neuner wrote: > On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas > wrote: > > > Is there anything stopping you from restructuring > > the data on disk and using the hash directly from there > > Scotty's hash table is much larger

Re: [racket-users] Re: appending files

2016-01-27 Thread Brandon Thomas
On Tue, 2016-01-26 at 22:48 -0800, Scotty C wrote: > ok brandon, that's a thought. build the hash on the hard drive at the > time of data creation. you mention collision resolution. so let me > build my hash on the hard drive using my 6 million buckets but > increase the size of each bucket from 5

[racket-users] Re: appending files

2016-01-27 Thread Scotty C
On Wednesday, January 27, 2016 at 2:57:42 AM UTC-6, gneuner2 wrote: > What is this other field on which the file is sorted? this field is the cost in operators to arrive at the key value > WRT a set of duplicates: are you throwing away all duplicates? Keeping > the 1st one encountered?

[racket-users] Re: appending files

2016-01-27 Thread Scotty C
> Is it important to retain that sorting? Or is it just informational? it's important > Then you're not using the hash in a conventional manner ... else the > filter entries would be unique ... and we really have no clue what > you're actually doing. So any suggestions we give you are shots in

[racket-users] Re: appending files

2016-01-27 Thread George Neuner
Hi Scotty, I rearranged your message a bit for (my own) clarity. On Tue, 26 Jan 2016 18:40:28 -0800 (PST), Scotty C wrote: >running 64 bit linux mint OS on a 2 core laptop with 2 gb of ram. >the generated keys are random but i use one of the associated >fields for

Re: [racket-users] Re: appending files

2016-01-27 Thread George Neuner
Sorry. I shouldn't do math at 4am. Ignore the numbers. However, it is still correct that the byte array will use less space than an array of bignums. George On 1/27/2016 3:54 AM, George Neuner wrote: i run a custom built hash. i use separate chaining with a vector of bignums. i am

[racket-users] Re: appending files

2016-01-27 Thread George Neuner
On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas wrote: >Is there anything stopping you from restructuring >the data on disk and using the hash directly from there Scotty's hash table is much larger than he thinks it is and very likely is being paged to disk already.

[racket-users] Re: appending files

2016-01-26 Thread Scotty C
ok brandon, that's a thought. build the hash on the hard drive at the time of data creation. you mention collision resolution. so let me build my hash on the hard drive using my 6 million buckets but increase the size of each bucket from 5 slots to 20. right? i can't exactly recreate my

Re: [racket-users] Re: appending files

2016-01-26 Thread Brandon Thomas
On Tue, 2016-01-26 at 18:40 -0800, Scotty C wrote: > alright george, i'm open to new ideas. here's what i've got going. > running 64 bit linux mint OS on a 2 core laptop with 2 gb of ram. my > key is 128 bits with ~256 bits per record. so my 1 gb file contains > ~63 million records and ~32 million

[racket-users] Re: appending files

2016-01-26 Thread Scotty C
robby findler, you the man. i like the copy-port idea. i incorporated it and it is nice and fast and easily fit into the existing code. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails

[racket-users] Re: appending files

2016-01-26 Thread Scotty C
neil van dyke, i have used the system function before but had forgotten what it was called and couldn't find it as a result in the documentation. my problem with using the system function is that i need 2 versions of it: windoz and linux. the copy-port function is a write once use across

[racket-users] Re: appending files

2016-01-26 Thread Scotty C
gneuner2 (george), you are over thinking this thing. my test data of 1 gb is but a small sample file. i can't even hash that small 1 gb at the time of data creation. the hashed data won't fit in ram. at the time i put the redundant data on the hard drive, i do some constant time sorting so that

Re: [racket-users] Re: appending files

2016-01-26 Thread George Neuner
On 1/26/2016 2:51 PM, Scotty C wrote: gneuner2 (george), you are over thinking this thing. my test data of 1 gb is but a small sample file. i can't even hash that small 1 gb at the time of data creation. the hashed data won't fit in ram. at the time i put the redundant data on the hard drive,

Re: [racket-users] Re: appending files

2016-01-26 Thread Neil Van Dyke
+1 on George Neuner's comments about how one can do smart processing of huge files in small space. (I almost said something about that myself, but didn't have time to get into that kind of discussion, so I stuck to only the simpler file concatenation question.) BTW, students who have 8GB RAM

[racket-users] Re: appending files

2016-01-26 Thread Scotty C
alright george, i'm open to new ideas. here's what i've got going. running 64 bit linux mint OS on a 2 core laptop with 2 gb of ram. my key is 128 bits with ~256 bits per record. so my 1 gb file contains ~63 million records and ~32 million keys. about 8% will be dupes leaving me with ~30