subject:"\[racket\-users\] Searching in data the Racket way..."

Re: [racket-users] Searching in data the Racket way...

2016-10-30 Thread Alex Knauth


> On Oct 31, 2016, at 12:31 AM, Alex Harsanyi  wrote:
> 
> While in terms of O-notation, all linear algorithms are equivalent, the 
> actual time to run a linear algorithm can significantly depend on the actual 
> data structure.  In the program below, I calculate the sum of the elements of 
> a vector or list in three ways, all of them linear algorithms.
> 
> * calculating the sum of the elements in a list (do-time sum1 10) takes 296 
> milliseconds on my machine
> * calculating the sum of the elements in the vector, (do-time sum2 10) takes 
> 374 milliseconds (slower than the list version!)
> * calculating the sum of the elements in the vector by explicitly referencing 
> elements, (do-time sum3 10) takes 62 milliseconds.
> 
> I guess my example shows two things:
> 
> * the generator for vectors appears to have a performance problem (1.25 times 
> slower)
> * it is 4.7 times faster to iterate over vectors than it is to do it over 
> lists.
> 
> Best Regards,
> Alex.
> 
> #lang racket
> 
> (define nitems 100)
> (define as-list (for/list ([x (in-range nitems)]) (random nitems)))
> (define as-vector (list->vector as-list)) ; same data as in the list
> 
> (define (sum1)
>  (for/fold ((sum 0))
>((x as-list))

Try using (in-list as-list) here

>(+ sum x)))
> 
> (define (sum2)
>  (for/fold ((sum 0))
>((x as-vector))

and try (in-vector as-vector) here. That will take away some of the time taken 
by the generators.

>(+ sum x)))
> 
> (define (sum3)
>  (for/fold ((sum 0))
>((x (in-range (vector-length as-vector
>(+ sum (vector-ref as-vector x
> 
> (define (do-time fn (iterations 1))
>  (collect-garbage 'major)
>  (define start (current-inexact-milliseconds))
>  (for ((n (in-range iterations)))
>(fn))
>  (define end (current-inexact-milliseconds))
>  (- end start))

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [racket-users] Searching in data the Racket way...

2016-10-30 Thread Alex Harsanyi

While in terms of O-notation, all linear algorithms are equivalent, the actual 
time to run a linear algorithm can significantly depend on the actual data 
structure.  In the program below, I calculate the sum of the elements of a 
vector or list in three ways, all of them linear algorithms.

* calculating the sum of the elements in a list (do-time sum1 10) takes 296 
milliseconds on my machine
* calculating the sum of the elements in the vector, (do-time sum2 10) takes 
374 milliseconds (slower than the list version!)
* calculating the sum of the elements in the vector by explicitly referencing 
elements, (do-time sum3 10) takes 62 milliseconds.

I guess my example shows two things:

* the generator for vectors appears to have a performance problem (1.25 times 
slower)
* it is 4.7 times faster to iterate over vectors than it is to do it over lists.

Best Regards,
Alex.

#lang racket

(define nitems 100)
(define as-list (for/list ([x (in-range nitems)]) (random nitems)))
(define as-vector (list->vector as-list)) ; same data as in the list
  
(define (sum1)
  (for/fold ((sum 0))
((x as-list))
(+ sum x)))

(define (sum2)
  (for/fold ((sum 0))
((x as-vector))
(+ sum x)))

(define (sum3)
  (for/fold ((sum 0))
((x (in-range (vector-length as-vector
(+ sum (vector-ref as-vector x

(define (do-time fn (iterations 1))
  (collect-garbage 'major)
  (define start (current-inexact-milliseconds))
  (for ((n (in-range iterations)))
(fn))
  (define end (current-inexact-milliseconds))
  (- end start))

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [racket-users] Searching in data the Racket way...

2016-10-30 Thread David Storrs

Hi Meino,

I think there might be some confusion about how the word "linear" is being
used here, so let me step back.  Apologies if you already know this.

When talking about how long an algorithm takes, computer people tend to
talk in categories instead of in seconds or minutes. The categories refer
to the change in the algorithm's performance as the number of elements you
put in changes. For example, if it takes 1 second to run the algorithm on
10 elements and 10 seconds to run it on 100 elements then your algorithm
runs in "linear time" -- the graph of time taken vs elements seen is a
straight line.

The notation for this is O (pronounced "big oh") and "n" is the number of
elements sent into the algorithm.  You typically see the following
categories:

O(1), aka constant time. The time an O(1) algorithm takes does not
appreciably change as n increases.  Examples would be retrieval of a
randomly-chosen value from a hash, or an item from a vector -- essentially
you're just following a pointer so it doesn't matter how many items are in
the hash / vector.

O(n), aka linear time.  As stated above, the time increases directly as n
increases.  n = 10x takes 10x more time than n = x. Traversing a Racket
list is O(n), because you have to follow the links from one item to the
next instead of jumping straight to a destination the way you can for a
vector.  Important caveat:  retrieving the *next* item in a list is O(1),
since you're just following a pointer.

O(n*log(n)):  You see this with most sorting algorithms.  It's a little
slower than linear time, but not by much.

O(n^2), aka "n-squared":  This gets really slow, really fast, and is
something you definitely want to avoid.  It would typically mean you have
two nested loops, like this C-style example:
(for i = 0; i < n; i++){
(for j = 0; j < n; j++) {
 ...

What you said about following links in a list vs jumping straight to a
memory location is exactly right.  Because you can jump straight to a given
location in a vector, locating a randomly-selected item in a vector is
O(1), constant time.  Locating a randomly-selected item in a vector is
O(n), linear time, because you need to perform multiple "follow link"
operations to get there.  Remember the caveat, though:  locating the *next*
item is a list is O(1), because you're just following a pointer, the same
as doing a vector lookup.  That's key: it means that accessing all the
elements in a list is still O(n), as long as you do it in order.

And that brings us back to the big picture: the program you're talking
about  doesn't depend on the data structure.  If you need to look at n
elements, then your algorithm cannot possibly run in better than O(n) time
-- no matter how they are stored, looking at 100 elements will take 10x
more time than looking at 10 elements.

So both you and Matthias are correct, but you're talking about different
things -- accessing a randomly-chosen element in a vector (what you're
talking about) is O(1) vs the O(n) of accessing a randomly-chosen element
in a vector.  Matthias, on the other hand, is talking about the actual time
that your code will take to process all the items in your file, and that's
dependent on how many items it has to process.  You basically need to do
the following (pseudo-code):

for 1 to n:
retrieve next element
do something to it

Whether your 'retrieve next element' is modeled as a vector lookup or a
link traversal (and remember, both of those are O(1) operations), you're
going to do that operation 10x more often when you have 10x more items.
That means the code he gave you runs in linear time, O(n), despite the fact
that it's using lists instead of vectors.

Does that all make sense?

Dave

On Sunday, October 30, 2016,  wrote:

>
> Hi Matthias,
>
> thanks for your reply ! :)
>
> H...your are asking a newbie, what not linear in processing
> lists...h... ;)
>
> I read somewhere on the net, that - in opposite to vectors --
> processing lists is not linear. I /think/ (read: dont know for
> sure), that searching needs extra time to jump along the
> links of the list (linked list, isn't it?) to the next element,
> while searching a vector needs (internally) only adding an index to a
> base pointer to acces the next element.
>
> So accessing the 1000th element of a list needs 1000 times
> "jump via link" plus inspecting the 1000th element and doing the
> same with a vector needs accessing the addr=(base address)+ (999* item
> size)
> and inspecting the element.
> Therefore the processing time of an element of a vector does not
> depend on the position of the element itsself.
>
> WARNING! THIS IS BASED ON "C" KNOWLEDGE!
> MAY BE TOTALLY WRONG!
>
> The code to implement the search is (hopefully, fingers crossed...)
> not the problem...I simply dont know data type/construct to choose
> to make it a little faster...
>
> Cheers,
> Meino
>
>
> Matthias Felleisen  [16-10-30 12:28]:
> >
> > > On

Re: [racket-users] Searching in data the Racket way...

2016-10-30 Thread Meino . Cramer


Hi Matthias,

thanks for your reply ! :)

H...your are asking a newbie, what not linear in processing
lists...h... ;)

I read somewhere on the net, that - in opposite to vectors --
processing lists is not linear. I /think/ (read: dont know for
sure), that searching needs extra time to jump along the
links of the list (linked list, isn't it?) to the next element,
while searching a vector needs (internally) only adding an index to a 
base pointer to acces the next element.

So accessing the 1000th element of a list needs 1000 times 
"jump via link" plus inspecting the 1000th element and doing the 
same with a vector needs accessing the addr=(base address)+ (999* item size)
and inspecting the element.
Therefore the processing time of an element of a vector does not
depend on the position of the element itsself.

WARNING! THIS IS BASED ON "C" KNOWLEDGE!
MAY BE TOTALLY WRONG!

The code to implement the search is (hopefully, fingers crossed...)
not the problem...I simply dont know data type/construct to choose
to make it a little faster...

Cheers,
Meino


Matthias Felleisen  [16-10-30 12:28]:
> 
> > On Oct 30, 2016, at 11:02 AM, meino.cra...@gmx.de wrote:
> > 
> > Hi,
> > 
> > (still fiddling around with those data of shortwave broadcasters and
> > their schedules...)
> > 
> > The data in question consists of 7400 lines of 16 fields each. Not all
> > fields are filled in eacht line. The already existing code reads the
> > data into lists of sublists of 16 fields/items.
> > 
> > I want to prepare those data in a way, that I will be able to do searches
> > like "give me all broadcaster, which broadcasts are in English.
> > Furthermore give me broadcasts on Thursdays only between 5:30
> > pm and 8:00 pm on bands between 9000Khz and 12000khz”.
> 
> This looks like a “one-liner” to me in Racket: 
> 
> (filter (lambda (r) (and (language-is r English) (broadcast-day-is Thursday) 
> (broadcast-time-is-between 5:30 8:00) (broadcast-frequencies-are-between 9000 
> 12000)) list-of-data)
> 
> What’s not linear about this? 
> 
> 
> — Matthias
> 
> 
> 
> > 
> 
> > For the learning effect I want to do this in racket onlu and (despite 
> > knowing that this may be a better solultion performance wise) dont
> > want to use tools like sqlite or such.
> > 
> > What I read/think is:
> > Lists processing is not linear: The longer the list, the slower the
> > data processing.
> > Hashes are somehow problematic due to the data:
> > There is no real unique key beside an artificial, additonally created
> > IDwhich is of no information to the user...
> > Vectors? Linear performance in opposite to lists...but searching
> > in a vector of vectors ,,, is that sane enough to be implementable? ;)
> > What else?
> > 
> > 
> > Cheers,
> > Meino
> > 
> > -- 
> > You received this message because you are subscribed to the Google Groups 
> > "Racket Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to racket-users+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [racket-users] Searching in data the Racket way...

2016-10-30 Thread Matthias Felleisen


> On Oct 30, 2016, at 11:02 AM, meino.cra...@gmx.de wrote:
> 
> Hi,
> 
> (still fiddling around with those data of shortwave broadcasters and
> their schedules...)
> 
> The data in question consists of 7400 lines of 16 fields each. Not all
> fields are filled in eacht line. The already existing code reads the
> data into lists of sublists of 16 fields/items.
> 
> I want to prepare those data in a way, that I will be able to do searches
> like "give me all broadcaster, which broadcasts are in English.
> Furthermore give me broadcasts on Thursdays only between 5:30
> pm and 8:00 pm on bands between 9000Khz and 12000khz”.

This looks like a “one-liner” to me in Racket: 

(filter (lambda (r) (and (language-is r English) (broadcast-day-is Thursday) 
(broadcast-time-is-between 5:30 8:00) (broadcast-frequencies-are-between 9000 
12000)) list-of-data)

What’s not linear about this? 


— Matthias



> 

> For the learning effect I want to do this in racket onlu and (despite 
> knowing that this may be a better solultion performance wise) dont
> want to use tools like sqlite or such.
> 
> What I read/think is:
> Lists processing is not linear: The longer the list, the slower the
> data processing.
> Hashes are somehow problematic due to the data:
> There is no real unique key beside an artificial, additonally created
> IDwhich is of no information to the user...
> Vectors? Linear performance in opposite to lists...but searching
> in a vector of vectors ,,, is that sane enough to be implementable? ;)
> What else?
> 
> 
> Cheers,
> Meino
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[racket-users] Searching in data the Racket way...

2016-10-30 Thread Meino . Cramer

Hi,

(still fiddling around with those data of shortwave broadcasters and
their schedules...)

The data in question consists of 7400 lines of 16 fields each. Not all
fields are filled in eacht line. The already existing code reads the
data into lists of sublists of 16 fields/items.

I want to prepare those data in a way, that I will be able to do searches
like "give me all broadcaster, which broadcasts are in English.
Furthermore give me broadcasts on Thursdays only between 5:30
pm and 8:00 pm on bands between 9000Khz and 12000khz".

For the learning effect I want to do this in racket onlu and (despite 
knowing that this may be a better solultion performance wise) dont
want to use tools like sqlite or such.

What I read/think is:
Lists processing is not linear: The longer the list, the slower the
data processing.
Hashes are somehow problematic due to the data:
There is no real unique key beside an artificial, additonally created
IDwhich is of no information to the user...
Vectors? Linear performance in opposite to lists...but searching
in a vector of vectors ,,, is that sane enough to be implementable? ;)
What else?


Cheers,
Meino

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [racket-users] Searching in data the Racket way...

Re: [racket-users] Searching in data the Racket way...

Re: [racket-users] Searching in data the Racket way...

Re: [racket-users] Searching in data the Racket way...

Re: [racket-users] Searching in data the Racket way...

[racket-users] Searching in data the Racket way...

6 matches

Site Navigation

Mail list logo

Footer information