On Sunday, January 31, 2016 at 12:13:31 AM UTC-6, Scotty C wrote:
> > that's what i did. so new performance data. this is with bytes instead of
> > strings for data on the hard drive but bignums in the hash still.
> >
> > as a single large file and a hash with 203 buckets for 26.6 million
>
> Yes. You probably do need to convert the files. Your originalat
> coding likely is not [easily] compatible with binary I/O.
that's what i did. so new performance data. this is with bytes instead of
strings for data on the hard drive but bignums in the hash still.
as a single large file and
Fixed, thanks for the report!
On Sat, Jan 30, 2016 at 8:31 PM, Scotty C wrote:
> just found a small mistake in the documentation: can you find it?
>
> (numerator q) → integer?
>
> q : rational?
>
> Coerces q to an exact number, finds the numerator of the
just found a small mistake in the documentation: can you find it?
(numerator q) → integer?
q : rational?
Coerces q to an exact number, finds the numerator of the number expressed in
its simplest fractional form, and returns this number coerced to the exactness
of q.
> that's what i did. so new performance data. this is with bytes instead of
> strings for data on the hard drive but bignums in the hash still.
>
> as a single large file and a hash with 203 buckets for 26.6 million
> records the data rate is 98408/sec.
>
> when i split and go with 11
> > question for you all. right now i use modulo on my bignums. i know i
> > can't do that to a byte string. i'll figure something out. if any of
> > you know how to do this, can you post a method?
> >
>
> I'm not sure what your asking exactly.
i'm talking about getting the hash index of a key.
On Fri, Jan 29, 2016 at 7:04 PM, Scotty C wrote:
> > > question for you all. right now i use modulo on my bignums. i know i
> > > can't do that to a byte string. i'll figure something out. if any of
> > > you know how to do this, can you post a method?
> > >
> >
> > I'm not
> However, if you have implemented your own, you can still call
> `equal-hash-code`
yes, my own hash.
i think the equal-hash-code will work.
--
You received this message because you are subscribed to the Google Groups
"Racket Users" group.
To unsubscribe from this group and stop receiving
On Fri, Jan 29, 2016 at 7:45 PM, Scotty C wrote:
> > my plan right now is to rework my current hash so that it runs byte
> strings instead of bignums.
>
> i have a new issue. i wrote my data as char and end records with 'return.
> i use (read-line x 'return) and the first
On Fri, 2016-01-29 at 13:00 -0800, Scotty C wrote:
> ok, had time to run my hash on my one test file
> '(611 1 1 19 24783208 4.19)
> this means
> # buckets
> % buckets empty
> non empty bucket # keys least
> non empty bucket # keys most
> total number of keys
> average number of keys per non
> i get the feeling that i will need to read the entire file as i used to read
> it taking each record and doing the following:
> convert the string record to a bignum record
> convert the bignum record into a byte string
> write the byte string to a new data file
>
> does that seem right?
On Fri, 29 Jan 2016 16:45:40 -0800 (PST), Scotty C
wrote:
>i have a new issue. i wrote my data as char and end records with 'return. i
>use (read-line x 'return) and the first record is 15 char. when i use
> (read-line-bytes x 'return) i get 23 byte. i have to assume that
> i get the feeling that i will need to read the entire file as i used to read
> it taking each record and doing the following:
> convert the string record to a bignum record
> convert the bignum record into a byte string
> write the byte string to a new data file
>
> does that seem right?
On Thu, 28 Jan 2016 20:32:08 -0800 (PST), Scotty C
wrote:
>> (current-memory-use)
>yup, tried that a while back didn't like what i saw. check this out:
>
>> (current-memory-use)
>581753864
>> (current-memory-use)
>586242568
>> (current-memory-use)
>591181736
>>
> my plan right now is to rework my current hash so that it runs byte strings
> instead of bignums.
i have a new issue. i wrote my data as char and end records with 'return. i use
(read-line x 'return) and the first record is 15 char. when i use
(read-line-bytes x 'return) i get 23 byte. i
ok, had time to run my hash on my one test file
'(611 1 1 19 24783208 4.19)
this means
# buckets
% buckets empty
non empty bucket # keys least
non empty bucket # keys most
total number of keys
average number of keys per non empty bucket
it took 377 sec.
original # records is 26570359 so 6.7%
On Thursday, January 28, 2016 at 11:36:50 PM UTC-6, Brandon Thomas wrote:
> On Thu, 2016-01-28 at 20:32 -0800, Scotty C wrote:
> > > I think you understand perfectly.
> > i'm coming around
> >
> > > You said the keys are 128-bit (16 byte) values. You can store one
> > > key
> > > directly in a
> You claim you want filtering to be as fast as possible. If that were
> so, you would not pack multiple keys (or features thereof) into a
> bignum but rather would store the keys individually.
chasing pointers? no, you're thinking about doing some sort of byte-append and
subbytes type of thing.
On Thu, 28 Jan 2016 07:56:05 -0800 (PST), Scotty C
wrote:
>you knew this was coming, right? put this into your data structure of choice:
>
>16 5 1 12 6 24 17 9 2 22 4 10 13 18 19 20 0 23 7 21 15 11 8 3 14
>
>this is a particular 5x5 tile puzzle
>(#6 in
On Thu, 28 Jan 2016 07:56:05 -0800 (PST), Scotty C
wrote:
>> You claim you want filtering to be as fast as possible. If that were
>> so, you would not pack multiple keys (or features thereof) into a
>> bignum but rather would store the keys individually.
>
>chasing
what's been bothering me was trying to get the data into 16 bytes in a byte
string of that length. i couldn't get that to work so gave up and just shoved
the data into 25 bytes. here's a bit of code. i think it's faster than my
bignum stuff.
(define p (bytes 16 5 1 12 6 24 17 9 2 22 4 10 13 18
On Thu, 28 Jan 2016 11:49:09 -0800 (PST), Scotty C
wrote:
>what's been bothering me was trying to get the data into 16 bytes in
>a byte string of that length. i couldn't get that to work so gave up and
>just shoved the data into 25 bytes. here's a bit of code. i think it's
On Wed, 27 Jan 2016 19:43:49 -0800 (PST), Scotty C
wrote:
>> Then you're not using the hash in a conventional manner ... else the
>> filter entries would be unique
>
>using it conventionally? absolutely. it is a hash with separate chaining.
You snipped the part I was
> I think you understand perfectly.
i'm coming around
> You said the keys are 128-bit (16 byte) values. You can store one key
> directly in a byte string of length 16.
yup
> So instead of using a vector of pointers to individual byte strings,
> you would allocate a single byte string of length
On Thu, 2016-01-28 at 20:32 -0800, Scotty C wrote:
> > I think you understand perfectly.
> i'm coming around
>
> > You said the keys are 128-bit (16 byte) values. You can store one
> > key
> > directly in a byte string of length 16.
> yup
>
> > So instead of using a vector of pointers to
> Way back in this thread you implied that you had extremely large FILES
> containing FIXED SIZE RECORDS, from which you needed
> to FILTER DUPLICATE records based on the value of a FIXED SIZE KEY
> field.
this is mostly correct. the data is state and state associated data on the
fringe. hence
On 1/27/2016 10:50 AM, Brandon Thomas wrote:
On Wed, 2016-01-27 at 04:01 -0500, George Neuner wrote:
> On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas
> wrote:
>
> > Is there anything stopping you from restructuring
> > the data on disk and using the hash directly from
ers@googlegroups.com> on behalf of
George Neuner <gneun...@comcast.net>
Sent: Wednesday, January 27, 2016 4:28 AM
To: racket-users@googlegroups.com
Subject: Re: [racket-users] Re: appending files
Sorry. I shouldn't do math at 4am. Ignore the numbers. However, it
is still correct that th
On Wed, 2016-01-27 at 17:49 -0500, George Neuner wrote:
> On 1/27/2016 10:50 AM, Brandon Thomas wrote:
> > On Wed, 2016-01-27 at 04:01 -0500, George Neuner wrote:
> > > On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas
> > > wrote:
> > >
> > > > Is there anything stopping
On Wed, 27 Jan 2016 11:17:04 -0800 (PST), Scotty C
wrote:
>On Wednesday, January 27, 2016 at 2:57:42 AM UTC-6, gneuner2 wrote:
>
>> What is this other field on which the file is sorted?
>this field is the cost in operators to arrive at the key value
Is it important to
On Wed, 2016-01-27 at 04:01 -0500, George Neuner wrote:
> On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas
> wrote:
>
> > Is there anything stopping you from restructuring
> > the data on disk and using the hash directly from there
>
> Scotty's hash table is much larger
On Tue, 2016-01-26 at 22:48 -0800, Scotty C wrote:
> ok brandon, that's a thought. build the hash on the hard drive at the
> time of data creation. you mention collision resolution. so let me
> build my hash on the hard drive using my 6 million buckets but
> increase the size of each bucket from 5
On Wednesday, January 27, 2016 at 2:57:42 AM UTC-6, gneuner2 wrote:
> What is this other field on which the file is sorted?
this field is the cost in operators to arrive at the key value
> WRT a set of duplicates: are you throwing away all duplicates? Keeping
> the 1st one encountered?
> Is it important to retain that sorting? Or is it just informational?
it's important
> Then you're not using the hash in a conventional manner ... else the
> filter entries would be unique ... and we really have no clue what
> you're actually doing. So any suggestions we give you are shots in
Hi Scotty,
I rearranged your message a bit for (my own) clarity.
On Tue, 26 Jan 2016 18:40:28 -0800 (PST), Scotty C
wrote:
>running 64 bit linux mint OS on a 2 core laptop with 2 gb of ram.
>the generated keys are random but i use one of the associated
>fields for
Sorry. I shouldn't do math at 4am. Ignore the numbers. However, it
is still correct that the byte array will use less space than an array
of bignums.
George
On 1/27/2016 3:54 AM, George Neuner wrote:
i run a custom built hash. i use separate chaining with a vector of
bignums. i am
On Tue, 26 Jan 2016 23:00:01 -0500, Brandon Thomas
wrote:
>Is there anything stopping you from restructuring
>the data on disk and using the hash directly from there
Scotty's hash table is much larger than he thinks it is and very
likely is being paged to disk already.
ok brandon, that's a thought. build the hash on the hard drive at the time of
data creation. you mention collision resolution. so let me build my hash on the
hard drive using my 6 million buckets but increase the size of each bucket from
5 slots to 20. right? i can't exactly recreate my
On Tue, 2016-01-26 at 18:40 -0800, Scotty C wrote:
> alright george, i'm open to new ideas. here's what i've got going.
> running 64 bit linux mint OS on a 2 core laptop with 2 gb of ram. my
> key is 128 bits with ~256 bits per record. so my 1 gb file contains
> ~63 million records and ~32 million
robby findler, you the man. i like the copy-port idea. i incorporated it and it
is nice and fast and easily fit into the existing code.
--
You received this message because you are subscribed to the Google Groups
"Racket Users" group.
To unsubscribe from this group and stop receiving emails
neil van dyke, i have used the system function before but had forgotten what it
was called and couldn't find it as a result in the documentation. my problem
with using the system function is that i need 2 versions of it: windoz and
linux. the copy-port function is a write once use across
gneuner2 (george), you are over thinking this thing. my test data of 1 gb is
but a small sample file. i can't even hash that small 1 gb at the time of data
creation. the hashed data won't fit in ram. at the time i put the redundant
data on the hard drive, i do some constant time sorting so that
On 1/26/2016 2:51 PM, Scotty C wrote:
gneuner2 (george), you are over thinking this thing. my test data of 1 gb is
but a small sample file. i can't even hash that small 1 gb at the time of data
creation. the hashed data won't fit in ram. at the time i put the redundant
data on the hard drive,
+1 on George Neuner's comments about how one can do smart processing of
huge files in small space. (I almost said something about that myself,
but didn't have time to get into that kind of discussion, so I stuck to
only the simpler file concatenation question.)
BTW, students who have 8GB RAM
alright george, i'm open to new ideas. here's what i've got going. running 64
bit linux mint OS on a 2 core laptop with 2 gb of ram. my key is 128 bits with
~256 bits per record. so my 1 gb file contains ~63 million records and ~32
million keys. about 8% will be dupes leaving me with ~30
45 matches
Mail list logo