It is probably a good idea to start by separating off part of your training
data into a held-out development set that is not used for training, which
you can use to create learning curves and estimate probable performance on
unseen data. I really recommend Andrew Ng's machine learning course
materi
Experienced machine learning people usually start by trying to exactly
replicate what the paper did, using
exactly the same data, and exactly the same methods, and if possible, even
exactly the same software. It is very comforting
if you can do this, because you can then go ahead and make changes,