I don't know of any input format that will do this out of the box.  But it 
should not be that hard to write one.  There are two big issues here.


 1.  the data you are reading form the API really needs to be static, or you 
could get some very odd inconsistencies. For example a node dies after a map 
task has finished and not all of the reducers got the data, so the map task is 
rerun and some of the reducers have some old data, and some of the reducers 
have new data.  This is the main reason to download the data before processing 
it.  You can work around this by using the input format to run a map only job 
that then writes the data out to a file before processing it the rest of the 
way.
 2.  You need a good way to partition the data from the API.  This can be 
difficult unless the REST API provides a logical way to split this up.

--Bobby

From: Yaron Gonen <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, February 19, 2013 4:49 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: InputFormat for some REST api

Hi,
Do you know of any InputFormat implemented for some REST api provider?
Usually when one needs to process data that is accessible only by REST, one 
should try to download the data first someone, but what if you cannot download 
it?

thanks

Reply via email to