hi All First of all thank you for the great suggestions you gave me; you are simply great :) Anyway, returning to my problem, I'll try to be as much clear as possible...As far as I know (but we are still collecting requirements and understanding which kind of data we will have) we should have a situation of this type: on street XYZ in Spring without any events (an event can be manifestation, parade etc...) the medium velocity is 50 Km/h on street XYZ in Spring with an event the medium velocity is 20 Km/h on street XYZ in Autumn without any events (an event can be manifestation, parade etc...) the medium velocity is 40 Km/h on street XYZ in Autumn with an event the medium velocity is 15 Km/h
and so on for all the interested street (basically using the Open Street Map data); note that we are not interested in the worst case that is the case with accident (at least as far as I know). Now my customer would like to offer this kind functionality to the clients: a client connects to the site (or downloads an app) and he/she wants to go by car to the restaurant W; he/she would like to know if it's a good idea to go on that street or search for a different street; so by knowing the period of time (Spring, Autumn, Summer or Winter) and by knowing if there are some events (manifestations, parades etc...) I should tell him/her: if you go on street XYZ probably you will travel at 50Km/h or 20Km/h (the best would be if I may suggest a different way...but this is another topic :) ) So, since i should use old data in order to suggest to the client the velocity he/she may have on street XYZ, I was thinking to use mahout....but maybe I was wrong (sadly I'm really new in this kind of world...though I'm finding it amazing) Now by using the "old" data (the one I listed previously) 2013/10/15 Andrew Butkus <[email protected]> > > After giving some more thought, you could do something like this: > > Store: > > route > { > road > { > timestamp, > time_to_run_road, > } > } > > then build up a bigger model, which extracts timestamp from the road on > the route and the time it takes to run that road, and calculate an average > on a per day basis, (for example, if you travel this route every monday at > 9am, then extract the timestamp which matches every monday at 9am, and > average the time_to_run_road data you have collected on a monday for that > road. If you want to see how long it takes to run a road on every monday at > 9am in january, then you extract all timestamps that match that road for > january at 9am on monday > > Not entirely sure where mahout fits in here, but this could be a potential > way forward for you (assuming you can collect/have data about the road) > > Hope that helps > > Andy > > On 15 Oct 2013, at 13:09, Andrew Butkus <[email protected]> wrote: > > > Also to add to this you probably wouldn't want to do it by route, but > > maybe break it down by road, this gives more coverage and greater > > granularity > > > > Sent from my Windows Phone From: Andrew Butkus > > Sent: 15/10/2013 13:07 > > To: Bertrand Dechoux; [email protected] > > Subject: RE: Information > > IM not sure, i think the last 2 can be predicted, for example in > > january in the uk we get bad weather which causes delays and on average > > it will take longer to run a route in this month because of that, > > > > To consider weather as a variable is probably not scalable, recording > > the time to run a route with a timestamp should be good enough. > > > > Also consider once a year there is a festival in reading, so over this > > weekend routes through reading will always take longer. > > > > IM not sure where mahout can fit this problem, other than, but if u can > > train route time and add a timestamp this would give u something > > scalable. Then figure out on average how long it takes to run a route > > at similar time stamp, for example, minute, hour, week, month, year. > > > > Sent from my Windows Phone From: Bertrand Dechoux > > Sent: 15/10/2013 08:33 > > To: [email protected] > > Subject: Re: Information > > The biggest point is what data do you have and what exactly is your > problem. > > > > The maximum speed of the route can be easily known and in the best case > > that would be your speed. From a very broad point of view, there is three > > reasons for a slowdown. > > 1) traffic jam > > 2) accident > > 3) bad weather > > > > But without up to date observations, those three points are non trivial > to > > predict (especially the last two). Doing simple statistics (like average) > > can be a good start to see the variations and understand what factors > > should be taken into account. > > > > At the end, you want to do a regression but classification and clustering > > might help before that. Hard to say more without knowing why the medium > > speed is important, for which area, at which time... > > > > Bertrand > > > > On Tue, Oct 15, 2013 at 9:14 AM, Pavan K Narayanan < > > [email protected]> wrote: > > > >> Based on the information you have provided, street routing is > potentially a > >> Vehicle Routing Problem which is based on TSPs. You can check out the > below > >> link: > >> https://cwiki.apache.org/confluence/display/MAHOUT/Traveling+Salesman > >> Secondly, if you want to use Mahout for Forecasting, it is not possible > yet > >> as the solution methodology for Forecasting (LWR) is still an open > problem. > >> https://cwiki.apache.org/confluence/display/MAHOUT/Algorithms > >> > >> Bottomline: IMHO, you cannot use Mahout for forecasting at the moment; > good > >> luck with your project. > >> > >> Also, you can explore parallel computing paradigms if you have > relatively > >> high volumes of data. > >> > >> > >> On 15 October 2013 12:19, Angelo Immediata <[email protected]> wrote: > >> > >>> Hi there > >>> > >>> I'm pretty new to learning machine and apache mahout as well so pardon > me > >>> if this question is not too correct :) > >>> > >>> I'm in a street routing project where, beside other functionalities, we > >>> have to make forecasts. Precisely we should be able in forecasting the > >>> medium speed in a street in a well know period season (e.g we should be > >>> able in answering to this kind of question: on the american route 66 > what > >>> will be the medium speed in spring 2015?) > >>> As far as I know in order to offer this functionality we should use > some > >>> learning machine; this is the reason I'm checking mahout (moreover we > >> need > >>> to guarantee high performance and since mahout is based on Apache > hadoop > >>> and since it uses Map/Reduce, it seems to me very amazing) > >>> The first question I'ld love to do is: can I use Apache mahout in order > >> to > >>> implement the previously written funcionality? > >>> If I can use it sure I'll need some data in order to "train" > >> mahout....can > >>> I train mahout in a different time respect to when i need the > prevision? > >> I > >>> mean: can I make the train let's say every week at 10pm and then offer > >> the > >>> forecasting functionality only when a user is interested in it? Should > I > >>> store the training result in some way? > >>> And the last, but not the least :), always if I can use mahout....which > >>> algoritm should I use in order to implement my scenario? > >>> > >>> Thank you for the help and pardon me if i was not too much corrected > >>> > >> > >
