Hi Everyone,
  I've been working on a simple programming language to create large
data pipelines on Mesos. The language is called BDS which stands
for BigDataScript (yes, the name is kind of a joke for all jargon-lovers
out there) and here is the web page:

   http://pcingola.github.io/BigDataScript/

  Needles to say, it's open source and the code is available is GitHub.
At the moment I'm using BDS mostly for analysis of large genetic datasets
on our 25,000 core cluster, but it should scale to large(er) clusters as
well.

  BDS has a few interesting features:
    - Runs on Mesos (obviously) as well as SunGridEngine, Torque,
      MOAB, a large server or just your laptop.

    - You can develop on your laptop (without having to install Mesos or
       any cluster management system) and then deploy your script to a
Mesos
       cluster/datacenter without modification.

    - It performs automatic task dependency and schedules tasks according
to
      the implicit (or explicit) DAG.

    - It has lazy processing. Checks whether performing a task is necessary
and
      skips tasks whose output does not need to be updated (make-style).

    - It performs automatic checkpointing and has absolute serialization,
so you
      can copy the checkpoint file to another computer and continue running
      exactly where you left.

    - It can handle several parallel pipeline branches (threads).

    - Allows to define DAGs in a declarative form (using 'goals').

    - Cleans up stale files (and queues tasks in non-Mesos cluster).

Other cool features:

     - Automatically parses command line options in your scripts (it also
creates "help" for you)
     - Logs all processes's stdout / stderr and exit status
     - It has a built in debugger
     - It has a built in unity testing framework

  You can read more about all these features here:

   http://pcingola.github.io/BigDataScript/bigDataScript_manual.html

  I hope you find it useful and please do send me any
feedback you have.
  Yours

      Pablo

Reply via email to