I recall having problems with this before, using the non-Mahout Taste
code. I have meaningful strings for content IDs and had mapped them
systematically to pseudo-meaningful (but non-sequential) numbers. I
remember that causing some problems a year or so back, ... but I'm
trying it again now with the itemsimilarity Hadoop job. If I need to
iterate through all rows in the log and generate consecutive counts to
identify items and users I guess I could, though it doesn't seem very
Hadoop-friendly. Or should I be OK with anything that's int-shaped?

Dan

Reply via email to