Hi all, Hope someone can please point me in the right direction, Very new to mahout.. Here's my scenario:
I have written a system that collects Classifieds items from multiple websites - phones,cars,antiques and many more using scrapy, all the items are then ingested into Solr - +- 3 million entries. This is then the backend for my search engine I want to be able to extract meaningful information to accurately calculate realistic price average etc. I need guidance/perhaps examples in accurate outlier detection, categorization etc extreme beginner in machine learning so need to know if that's what I should be using Part of my challenge is the broad range of items/categories, different levels of skewed data etc. e.g. finding outliers with "iphone" results when many of those are cheap iphone accessories. Basically it seems i need to cluster/classify but not sure exactly how to go about it, because i do already have the categories for 500K of the entries, example category "Cell Phones & Accessories - Accessories" And then actually connecting Mahout to Solr... Many thanks! David
