On Sun, Jul 23, 2017 at 8:05 PM, Jon Zeppieri <zeppi...@gmail.com> wrote:

> On Sun, Jul 23, 2017 at 7:30 PM, Zelphir Kaltstahl
> <zelphirkaltst...@gmail.com> wrote:
> > Hi Racket Users,
> >
> > The last few days I've been working on implementing decision trees in
> Racket and I've been following the following guide:
> http://machinelearningmastery.com/implement-decision-tree-
> algorithm-scratch-python/
> >
> > Now I have the following code: https://github.com/
> ZelphirKaltstahl/racket-ml
> >
> > I also wrote some tests, I think for every procedure so far.
> >
> > However, my implementation seems very very slow. It seems each iteration
> of `iter-features` takes way too much time.
> >
> > I've tried to stick to the guide and sometimes "outsourced" some
> procedure.
> >
> > I started out with using vectors, as I thought I might gain better
> performance than from lists. In the code I introduced an abstraction layer,
> which provides things like `data-length`, so that I could in theory change
> the representation of data and only change those accessors/getters. In the
> test cases I sometimes did not use the abstraction though.
> >
> > So far I am not having much side effects in the code and I'd like to
> avoid them and unsafe operations.
> >
> > A small `TEST-DATA` set is in the code and another data set I downloaded
> from the data set repositories. When running with `TEST-DATA` to calculate
> the best split, it only takes a few milliseconds, while it takes minutes
> with the other `data-set`.
> >
> > How can I make my code more efficient, without changing the basic logic
> of it?
> > Should I not use vectors (what else?)?
> > Would I gain anything from using typed Racket or flonums?
> >
> I haven't taken a close enough look to evaluate the algorithm itself,
> but on a micro-scale:
> - Using `vector-take-right` to get the tail of a vector in a loop is
> expensive. From what I can see, you'll be better off representing your
> data as a list of vectors. (I see you have a `data-get-row` function,
> suggesting a need for random access, but you don't appear to be using
> it.)
As a general rule, all of your code appears to be using 'generate a new
vector with <x> modification' functions.  I would probably look at creating
new vectors only for mutation while otherwise reusing the vectors and
instead manipulating a list of indices.

> - Use a struct instead of a hash to represent a split.

Oh, are structs faster than hashes?

You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to