On Tue, Aug 31, 2010 at 10:55 PM, hdev ml <[email protected]> wrote: > Per my understanding of hive, we can do some statistical reporting, like > frequency of user sessions, which geographical region, which device he is > using the most etc.
Yes that's about what Hive is good for, if you're looking for some open-source libraries along those lines. > > But we also want to mine this data to get some predictive capabilities like > what is the likelihood that the user will use the same device again or if we > get sales/marketing data (on the roadmap for future), we want to possibly > predict which region to put more marketing/sales efforts. What is the > pattern for growth of user base, in which geographical regions etc. What is > the pattern of user requests failing and a number of requirements like these > from the business. This is pretty broad but I can try to give you the names of problems this sounds like, to guide your search. Predicting user usage of device sounds like a classification problem, like developing a probabilistic model of behavior. Deciding where to put marketing dollars sounds like a business problem, not machine learning. I don't think a computer can tell you that. Some techniques might help you identify trends in sales, but this is simple regression, not really machine learning. Looking for patterns in failure sounds a bit like frequent pattern mining -- trying to find events that go together unusually often.
