Thanks, and excellent points. I just wanted to know if someone is working this way and if it is a common use-case.
On Tue, Feb 19, 2013 at 7:39 PM, Mohammad Tariq <[email protected]> wrote: > Good points sir. Specially the second one. How the splits will get > generated? > > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > > > On Tue, Feb 19, 2013 at 11:04 PM, Robert Evans <[email protected]>wrote: > >> I don't know of any input format that will do this out of the box. But >> it should not be that hard to write one. There are two big issues here. >> >> >> 1. the data you are reading form the API really needs to be static, >> or you could get some very odd inconsistencies. For example a node dies >> after a map task has finished and not all of the reducers got the data, so >> the map task is rerun and some of the reducers have some old data, and >> some >> of the reducers have new data. This is the main reason to download the >> data before processing it. You can work around this by using the input >> format to run a map only job that then writes the data out to a file >> before >> processing it the rest of the way. >> 2. You need a good way to partition the data from the API. This can >> be difficult unless the REST API provides a logical way to split this up. >> >> --Bobby >> >> From: Yaron Gonen <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Tuesday, February 19, 2013 4:49 AM >> To: "[email protected]" <[email protected]> >> Subject: InputFormat for some REST api >> >> Hi, >> Do you know of any InputFormat implemented for some REST api provider? >> Usually when one needs to process data that is accessible only by REST, >> one should try to download the data first someone, but what if you cannot >> download it? >> >> thanks >> > >
