The "1.0" last parameter to evaluate() can be turned down to, say, 0.1 to use only 10% of the data for testing.
Precision/recall tests are always problematic for recommenders. I don't think this is going to give good results, but you're doing it right.
