Recently I am doing mnist image classification using resnet. And I found something strange, or interesting. First, though it's usually said that we should do early stopping, I found it's always better to run more epochs with the initial learning rate, which I set to 0.1 or 0.01, and then downscale learning rate quickly. For example, my learning rate strategy is to begin with 0.1 and is scaled down by 0.1 at the 200th, 210th, 220th epoch with batchsize of 64 and totally 230 epochs. I also found the last downscaling of learning rate usually degrade performance. Am I doing anything wrong?You are welcomed to share your parameter adjusting experience.
-- --- You received this message because you are subscribed to the Google Groups "theano-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
