More generally, this seems to be a raw translation of an algorithm stated in a 
language that has a rather lowlevel view of data. Perhaps this is needed for 
this world, but I doubt it. 

I teach trees and decision trees to freshman students who have never programmed 
before, and Racket’s forms of data and functions are extremely suitable to this 
domain. 




> On Jul 23, 2017, at 8:05 PM, Jon Zeppieri <zeppi...@gmail.com> wrote:
> 
> On Sun, Jul 23, 2017 at 7:30 PM, Zelphir Kaltstahl
> <zelphirkaltst...@gmail.com <mailto:zelphirkaltst...@gmail.com>> wrote:
>> Hi Racket Users,
>> 
>> The last few days I've been working on implementing decision trees in Racket 
>> and I've been following the following guide: 
>> http://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python/
>> 
>> Now I have the following code: https://github.com/ZelphirKaltstahl/racket-ml
>> 
>> I also wrote some tests, I think for every procedure so far.
>> 
>> However, my implementation seems very very slow. It seems each iteration of 
>> `iter-features` takes way too much time.
>> 
>> I've tried to stick to the guide and sometimes "outsourced" some procedure.
>> 
>> I started out with using vectors, as I thought I might gain better 
>> performance than from lists. In the code I introduced an abstraction layer, 
>> which provides things like `data-length`, so that I could in theory change 
>> the representation of data and only change those accessors/getters. In the 
>> test cases I sometimes did not use the abstraction though.
>> 
>> So far I am not having much side effects in the code and I'd like to avoid 
>> them and unsafe operations.
>> 
>> A small `TEST-DATA` set is in the code and another data set I downloaded 
>> from the data set repositories. When running with `TEST-DATA` to calculate 
>> the best split, it only takes a few milliseconds, while it takes minutes 
>> with the other `data-set`.
>> 
>> How can I make my code more efficient, without changing the basic logic of 
>> it?
>> Should I not use vectors (what else?)?
>> Would I gain anything from using typed Racket or flonums?
>> 
> 
> 
> I haven't taken a close enough look to evaluate the algorithm itself,
> but on a micro-scale:
> 
> - Using `vector-take-right` to get the tail of a vector in a loop is
> expensive. From what I can see, you'll be better off representing your
> data as a list of vectors. (I see you have a `data-get-row` function,
> suggesting a need for random access, but you don't appear to be using
> it.)
> - Use a struct instead of a hash to represent a split.
> - `split-data` could partition the data list in a single pass, instead
> of making two passes. (You can use the `partition` function from
> racket/list.)
> - If I'm reading this right, for a given data set, you should be able
> to memoize calls to `data-get-col`.
> - From a quick glance, it looks like you're already using floats,
> rather than exact rationals.
> - Are you running this code inside DrRacket? If so, have you timed the
> difference between running it with debugging enabled and with no
> debugging or profiling? (Language -> Choose Language... -> Details)
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com 
> <mailto:racket-users+unsubscr...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to