Re: [mlpack] Hints for A3C/PPO

2018-02-20 Thread Shangtong Zhang
> So that was stupid of me, forward() in policy.hpp is just computing the > softmaxes for the input (first param) and storing it in output (second param) > -> does that mean policy has to be the last layer of my neural net? See the comment here

Re: [mlpack] Hints for A3C/PPO

2018-02-19 Thread Shangtong Zhang
Yes. First try the vanilla implementation, if it doesn’t work augment it with experience replay (ER). However I would suggest not to merge your vanilla implementation with ER, because it’s wrong theoretically as I mentioned before. I would also suggest not to merge your vanilla implementation

Re: [mlpack] Hints for A3C/PPO

2018-02-19 Thread Shangtong Zhang
For TRPO you need to read the original paper.. I don’t have better idea. Starting from a vanilla policy gradient is good, however the main concern is that from my experience, you need either experience replay or multi-workers to make a non-linear function approximator work (they can give you