Re: Uploading CSV data to Couchapp

Cliff Williams Mon, 04 Apr 2011 04:24:05 -0700

David,

This is my "snippet" for consuming a continuous _changes feed usingnode. Take a look and and see what you think. We can take it offline todiscuss in detail if you wish.


Best regards

cliff

var
  sys = require("sys"),
  http = require('http'),
  events = require("events"),
  host = "10.0.0.10",
  port = 5984,
  database="test",

changesurl = "http://"+ host +":"+ port.toString() +"/"+database+"/_changes?feed=continuous&include_docs=true&heartbeat=10000",

  infourl =  "http://"+ host +":"+port.toString()+"/"+database,
  lastsequence=0,
  stream = new process.EventEmitter(),
  timeout=0;

    sys.puts(changesurl);
// set up connection
    couchdbconnection = http.createClient(port, host);
// handlers to monitor the actual connection
    couchdbconnection.addListener('close', function() {
        couchdbconnection.destroy();
      return stream.emit('seq', info.update_seq);
    });
    couchdbconnection.addListener('error',function() {
        console.log("couchdbconnection listener ....error");
    });

// Send request
        sys.puts(infourl);
        request = couchdbconnection.request('GET', infourl);
        request.end();

// Listen for any responses
    request.addListener("response", function(res) {

/* "data" gets triggered on receipt of line end information so no needto do

 any special buffering ...... or at least I dont think so */
    res.addListener('data', function(infodata) {

// info comes in as a string
        info = JSON.parse(infodata);
     });
    });

// Wait until the code used to get the last sequence number completes
    stream.addListener('seq',function (lastseq) {
// set up connection
    couchdbconnection = http.createClient(port, host);
    couchdbconnection.setTimeout(timeout);
// handlers to monitor the actual connection
    couchdbconnection.addListener('close', function() {
    console.log("couchdbconnection Listener .....connection closed");
  });
    couchdbconnection.addListener('error',function() {
        sys.puts("couchdbconnection listener ....error");
    });
// Send request

request = couchdbconnection.request('GET',changesurl+"&since="+lastseq);

    request.end();

// Listen for any responses
    request.addListener("response", function(res) {

/* "data" gets triggered on receipt of line end information so no needto do

 any special buffering */
    res.addListener('data', function(changedata) {

// Couch sends an empty line as the "heartbeat"
        if (changedata == "\n") {
                  console.log("heartbeat");
                    }
                    else {
// info comes in as a string so create JSON object to add additional info
                    changes=JSON.parse(changedata);
                    changes.database=database;
                    changedata=JSON.stringify(changes);
                    console.log("Actual data Changed "+ changedata);

              };});});});


On 04/04/11 11:50, David Mitchell wrote:

Hi Cliff,

What you're describing doing using Node.js sounds exactly like what Iwant to do - process the uploads transparently in the background,entirely within the context of my Couchapp.

I've searched around for info on using Node.js and CouchDB like this,but only found links that describe the technique in very broad detail.Do you know of any links that describe it in reasonable detail?

I'm an experienced Python/Ruby/C/C#/... coder and an occasional Erlangcoder, but stuff like Node.js is completely new to me - I'm assumingthat if I could see how to do this sort of thing in a fairly conciseexample, it'd trigger the "aha" moment in my brain and then I'd be offand running...


Thanks again

Dave M.

On 4 April 2011 19:34, Cliff Williams <[email protected]<mailto:[email protected]>> wrote:


    David,

    I hope you are well.

    I think that you have covered your options pretty well.


    "- upload the data&  save it into a single "uploaded_csv" document in
    CouchDB.  Within CouchDB, detect the presence of a new "uploaded_csv"
    document, extract and process the content using Javascript and
    save it into
    multiple "data" records, with appropriate indexing, then dispose
    of the
    "uploaded_csv" document or mark it as "processed".  This seems
    reasonably
    straightforward, but I'm not sure how to detect the presence of a new
    "uploaded_csv" document"

    This is the approach that I would take.

    Couchdb has a quite excellent _changes feed which will notify you
    (or can be set up to notify) in real time on any changes made to
    specific databases.

    I personally would use Node.js to monitor the changes feed and
    process your csv files (Javascript and very fast) but you could of
    course use anything (erlang python (Ruby's CSV processing
    libraries are also quite good)).

    best regards

    Cliff

    On 04/04/11 08:58, David Mitchell wrote:

        Hello all,

        I'm just about to start on my first (wildly ambitious)
        Couchapp.  I've had
        quite a bit of Erlang experience, but not for the past couple
        of years so
        I'm a bit rusty.  I've had a tiny bit of experience with
        CouchDB via various
        Python scripts, but that's all been treating CouchDB as a
        "black box"
        database so I've currently got little knowledge of what it can
        do beyond
        being a document datastore.

        Initially, I'm trying to understand my options for uploading
        CSV files,
        parsing out the content and storing them in CouchDB (one
        CouchDB record per
        line of CSV content).  While it's reasonably straightforward
        to do this if I
        was using e.g. Python as a batch load tool, I don't want to go
        outside
        Javascript for this project if I can avoid it.  The CSV files
        are anywhere
        from 1k-30k records, with 8-10 fields in each that are
        straightforward
        timestamps and floating point numbers.

        For an old-school Web app with distinct database and app
        server layers,
        there's a straightforward option - upload the data to a file
        on the web
        server, then process the data out of the file and load it into
        your
        database.  Sure there's variations on this approach such as
        saving data as a
        database blob, but I'm looking for the best CouchApp-specific
        approach if
        one exists.

        Options I can see:
        - upload the data&  save it into a single "uploaded_csv"
        document in
        CouchDB.  Within CouchDB, detect the presence of a new
        "uploaded_csv"
        document, extract and process the content using Javascript and
        save it into
        multiple "data" records, with appropriate indexing, then
        dispose of the
        "uploaded_csv" document or mark it as "processed".  This seems
        reasonably
        straightforward, but I'm not sure how to detect the presence
        of a new
        "uploaded_csv" document (is there a cron equivalent in Couch?)
        and I'd have
        to track the progress of processing each uploaded CSV file to
        detect when
        they've been processed into "data" records
        - upload the data&  save it into a single "uploaded_csv"
        document in
        CouchDB.  Have CouchDB running embedded in an Erlang app, and
        use Erlang to
        read the "uploaded_csv" data, then send a series of e.g. HTTP
        PUTs to load
        the data into multiple "data" records in CouchDB.  This just
        seems ugly to
        me, but I'm pretty confident I could get it working pretty easily
        - upload the data and process it directly into "data" records
        from a web
        page served from CouchApp.  This seems like it could impact on
        scalability
        due to having long-running connections between client and
        server, but at
        least a user would know when their data has been uploaded and
        processed
        successfully with trivial extra work on my part
        - upload the data, convert it to JSON on the client using
        clientside
        Javascript, then send it as a set of document uploads (i.e.
        one document per
        CSV record) from the client to the Couch server.  This would
        let me parse
        out any bogus CSV content without sending it to the server,
        but I'll have
        users running browsers on mobile devices and I'm not sure I
        want to put that
        processing load onto the client

        Are there any "recommended" approaches for this type of task?
         I suspect
        this question and others I'll ask have probably already been
        considered and
        dealt with by various experts; if there's a "CouchApp
        cookbook" with
        recommended solutions for these and other common situations,
        I'd appreciate
        a pointer to it so I could start to answer my own questions.

        Thanks in advance

        Dave M.

Re: Uploading CSV data to Couchapp

Reply via email to