Hi, We are trying calculate ItemSimilarity. Right now we have 2*10^7 input lines. I do provide input data as raw text each day to recalculate item similarities. We do get +100..1000 new items each day. 1. It takes too much time to prepare input data. 2. It takes too much time to convert user_id, item_id to mahout ids
Is there any poissibility to provide data to mahout mapreduce ItemSimilarity using some binary format with compression?
