Hi Ravi,

thanks a lot for your response and the code example!
I think this will help me a lot to get started .I am glad to see that my idea is not to exotic.
I will report if I can adapt the solution for my problem.

best regards
Ralph


On 31.07.2017 22:05, Ravi Prakash wrote:
Hi Ralph!

Although not totally similar to your use case, DistCp may be the closest thing to what you want. https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java . The client builds a file list, and then submits an MR job to copy over all the files.

HTH
Ravi

On Sun, Jul 30, 2017 at 2:21 PM, Ralph Soika <ralph.so...@imixs.com <mailto:ralph.so...@imixs.com>> wrote:

    Hi,

    I want to ask, what's the best way implementing a Job which is
    importing files into the HDFS?

    I have an external System offering data accessible through a Rest
    API. My goal is to have a job running in Hadoop which is
    periodical (maybe started by chron?) looking into the Rest API if
    new data is available.

    It would be nice if also this job could run on multiple data
    nodes. But in difference to all the MapReduce examples I found, is
    my job looking for new Data or changed data from an external
    interface and compares the data with existing one.

    This is a conceptual example of the job:

     1. The job ask the Rest API if there are new files
     2. if so, the job imports the first file in the list
     3. look if the file already exits
         1. if not, the job imports the file
         2. if yes, the job compares the data with the data already stored
             1. if changed the job updates the file
     4. if more file exits the job continues with 2 -
     5. otherwise ends.


    Can anybody give me a little help how to start (its my first job I
    write...) ?


    ===
    Ralph




--


--
*Imixs*...extends the way people work together
We are an open source company, read more at: www.imixs.org <http://www.imixs.org>
------------------------------------------------------------------------
Imixs Software Solutions GmbH
Agnes-Pockels-Bogen 1, 80992 München
*Web:* www.imixs.com <http://www.imixs.com>
*Office:* +49 (0)89-452136 16 *Mobil:* +49-177-4128245
Registergericht: Amtsgericht Muenchen, HRB 136045
Geschaeftsfuehrer: Gaby Heinle u. Ralph Soika

Reply via email to