Are there examples detailing how to write input formats, record readers and related classes? I was hoping to write one against a Redis database and it seems that shares similar issues to accessing data from a rest API.
Alex Thieme [email protected] 508-361-2788 On Feb 19, 2013, at 1:34 PM, Robert Evans <[email protected]> wrote: > I don't know of any input format that will do this out of the box. But it > should not be that hard to write one. There are two big issues here. > > the data you are reading form the API really needs to be static, or you could > get some very odd inconsistencies. For example a node dies after a map task > has finished and not all of the reducers got the data, so the map task is > rerun and some of the reducers have some old data, and some of the reducers > have new data. This is the main reason to download the data before > processing it. You can work around this by using the input format to run a > map only job that then writes the data out to a file before processing it the > rest of the way. > You need a good way to partition the data from the API. This can be > difficult unless the REST API provides a logical way to split this up. > --Bobby > > From: Yaron Gonen <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Tuesday, February 19, 2013 4:49 AM > To: "[email protected]" <[email protected]> > Subject: InputFormat for some REST api > > Hi, > Do you know of any InputFormat implemented for some REST api provider? > Usually when one needs to process data that is accessible only by REST, one > should try to download the data first someone, but what if you cannot > download it? > > thanks
