I recall having problems with this before, using the non-Mahout Taste code. I have meaningful strings for content IDs and had mapped them systematically to pseudo-meaningful (but non-sequential) numbers. I remember that causing some problems a year or so back, ... but I'm trying it again now with the itemsimilarity Hadoop job. If I need to iterate through all rows in the log and generate consecutive counts to identify items and users I guess I could, though it doesn't seem very Hadoop-friendly. Or should I be OK with anything that's int-shaped?
Dan
