Not written in MPI. Each task is a stand-alone execution of a binary
program that takes the 1-2 data file paths as parameters (GCS paths), with
the output stored in another GCS file (path as flag).
Different tasks do not need to communicate with others. Tasks only talk
with GCS to read and write their data.
The biggest bottleneck (for most tasks) is CPU. Few of the tasks do little
processing, so in these cases the bottleneck is GCS latency.

Our current solution is to run N services on each machine (N = number of
cores on the machine), with the main Python script sending commands to
available services (using sockets).
We are not happy with this solution because it requires us to deal with too
many low-level details, like tracking the status of the services,
restarting lost tasks, collecting logs, etc.


On Thu, Jul 24, 2014 at 11:37 AM, Tomas Barton <barton.to...@gmail.com>
wrote:

> Depends on the nature of your tasks. Your code is written in MPI? You
> tasks needs to communicate with others? One task will operate on all files,
> some subset, or just on file? You might have:
>      - one task per machine running on as many cores as possible
>      - many smaller tasks starting in a dynamic manner depending on the
> data
>
> What is the biggest bottleneck you have? disk read/write, network, CPU,
> memory?
>
> Writing own framework is possible, if you can take advantage of some
> problem specific property.
>
>
> On 24 July 2014 07:34, Itamar Ostricher <ita...@yowza3d.com> wrote:
>
>> many: we have a processing pipeline with ~10 stages (one C++ program per
>> stage usually), batch processing (almost-)all pairs of files in the
>> dataset. the dataset contains >10K files at the moment, so a couple of
>> hundreds of millions of program executions would be my definition for
>> "many" in this case :-)
>>
>> I'll start with few machines with deploy scripts and a small subset of
>> the dataset just to get the hang of it.
>> It's a bit difficult to comprehend the stack, with all the possible
>> options and combinations, though.
>> If I have a main Python script that generates all the processing pipeline
>> commands (that can be simply executed via shell), should I use a specific
>> framework (like Hydra)? Or maybe use raw mesos? Or maybe I should write my
>> own framework?
>>
>>
>> On Wed, Jul 23, 2014 at 2:25 PM, Tomas Barton <barton.to...@gmail.com>
>> wrote:
>>
>>> Define many :) If you want to use some provisioning tools like Puppet,
>>> Chef, Ansible... there are quite a few modules to do this job:
>>>
>>> http://mesosphere.io/learn/#tools
>>>
>>> If you have only a few machines, you might be fine with deploy scripts.
>>>
>>> An example of MPI framework is here:
>>>
>>> https://github.com/mesosphere/mesos-hydra
>>>
>>>
>>>
>>>
>>> On 23 July 2014 12:26, Itamar Ostricher <ita...@yowza3d.com> wrote:
>>>
>>>> Thanks Tomas.
>>>>
>>>> ldconfig didn't change anything. make still failed.
>>>>
>>>>  But the Debian packaged installed like a charm, so I'm good :-)
>>>> Now I just need to figure out how to use it...
>>>> (going to start with [1], unless anyone chimes in with a better
>>>> recommended starting point for a mesos-newbie who is trying to set up a
>>>> cluster of GCE instances in order to distribute execution of *many* C++
>>>> programs working on a large dataset that is currently stored in Google
>>>> Cloud Storage.)
>>>>
>>>> [1] http://mesos.apache.org/documentation/latest/deploy-scripts/
>>>>
>>>>
>>>> On Wed, Jul 23, 2014 at 11:55 AM, Tomas Barton <barton.to...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> that's quite strange. Try to run
>>>>>
>>>>> ldconfig
>>>>>
>>>>> and then again make.
>>>>>
>>>>> You can find binary packages for Debian here:
>>>>> http://mesosphere.io/downloads/
>>>>>
>>>>> Tomas
>>>>>
>>>>>
>>>>> On 23 July 2014 10:09, Itamar Ostricher <ita...@yowza3d.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm trying to do a clean build of mesos for the 0.19.0 tarball.
>>>>>> I was following the instructions from
>>>>>> http://mesos.apache.org/gettingstarted/ step by step. Got to running
>>>>>> `make`, which ran for quite a while, and exited with errors (see the end 
>>>>>> of
>>>>>> the output below).
>>>>>>
>>>>>> Extra env info: I'm trying to do this build on a 64-bit Debian GCE
>>>>>> instance:
>>>>>> itamar@mesos-test-1:/tmp/mesos-0.19.0/build$ uname -a
>>>>>> Linux mesos-test-1 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64
>>>>>> GNU/Linux
>>>>>>
>>>>>> Assistance will be much appreciated!
>>>>>> Alternatively, I don't mind using precompiled binaries, if anyone can
>>>>>> point me in the direction of such binaries for the GCE environment I
>>>>>> described :-)
>>>>>>
>>>>>> tail of make output:
>>>>>> ----------------------------
>>>>>>
>>>>>> libtool: link: warning:
>>>>>> `/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../lib/libgflags.la'
>>>>>> seems to be moved
>>>>>> *** Warning: Linking the shared library libmesos.la against the
>>>>>> *** static library ../3rdparty/leveldb/libleveldb.a is not portable!
>>>>>> libtool: link: warning:
>>>>>> `/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../lib/libgflags.la'
>>>>>> seems to be moved
>>>>>> libtool: link: g++  -fPIC -DPIC -shared -nostdlib
>>>>>> /usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/crti.o
>>>>>> /usr/lib/gcc/x86_64-linux-gnu/4.7/crtbeginS.o  -Wl,--whole-archive
>>>>>> ./.libs/libmesos_no_3rdparty.a ../3rdparty/libprocess/.libs/libprocess.a
>>>>>> ./.libs/libjava.a -Wl,--no-whole-archive
>>>>>>  ../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src/.libs/libprotobuf.a
>>>>>> ../3rdparty/libprocess/3rdparty/glog-0.3.3/.libs/libglog.a
>>>>>> -L/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../../lib
>>>>>> ../3rdparty/leveldb/libleveldb.a
>>>>>> ../3rdparty/zookeeper-3.4.5/src/c/.libs/libzookeeper_mt.a
>>>>>> /tmp/mesos-0.19.0/build/3rdparty/libprocess/3rdparty/glog-0.3.3/.libs/libglog.a
>>>>>> /usr/lib/libgflags.so -lpthread
>>>>>> /tmp/mesos-0.19.0/build/3rdparty/libprocess/3rdparty/libev-4.15/.libs/libev.a
>>>>>> -lsasl2 /usr/lib/x86_64-linux-gnu/libcurl-nss.so -lz -lrt
>>>>>> -L/usr/lib/gcc/x86_64-linux-gnu/4.7
>>>>>> -L/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu
>>>>>> -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu
>>>>>> -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/4.7/../../.. -lstdc++ 
>>>>>> -lm
>>>>>> -lc -lgcc_s /usr/lib/gcc/x86_64-linux-gnu/4.7/crtendS.o
>>>>>> /usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/crtn.o
>>>>>>  -pthread -Wl,-soname -Wl,libmesos-0.19.0.so -o .libs/
>>>>>> libmesos-0.19.0.so
>>>>>> libtool: link: (cd ".libs" && rm -f "libmesos.so" && ln -s "
>>>>>> libmesos-0.19.0.so" "libmesos.so")
>>>>>> libtool: link: ( cd ".libs" && rm -f "libmesos.la" && ln -s "../
>>>>>> libmesos.la" "libmesos.la" )
>>>>>> g++ -DPACKAGE_NAME=\"mesos\" -DPACKAGE_TARNAME=\"mesos\"
>>>>>> -DPACKAGE_VERSION=\"0.19.0\" -DPACKAGE_STRING=\"mesos\ 0.19.0\"
>>>>>> -DPACKAGE_BUGREPORT=\"\" -DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\"
>>>>>> -DVERSION=\"0.19.0\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1
>>>>>> -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1
>>>>>> -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 
>>>>>> -DHAVE_UNISTD_H=1
>>>>>> -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\" -DHAVE_PTHREAD=1 
>>>>>> -DMESOS_HAS_JAVA=1
>>>>>> -DHAVE_PYTHON=\"2.7\" -DMESOS_HAS_PYTHON=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1
>>>>>> -DHAVE_LIBSASL2=1 -I. -I../../src   -Wall -Werror
>>>>>> -DLIBDIR=\"/usr/local/lib\" -DPKGLIBEXECDIR=\"/usr/local/libexec/mesos\"
>>>>>> -DPKGDATADIR=\"/usr/local/share/mesos\" -I../../include
>>>>>> -I../../3rdparty/libprocess/include
>>>>>> -I../../3rdparty/libprocess/3rdparty/stout/include -I../include
>>>>>> -I../3rdparty/libprocess/3rdparty/boost-1.53.0
>>>>>> -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src
>>>>>> -I../3rdparty/libprocess/3rdparty/picojson-4f93734
>>>>>> -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src
>>>>>> -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include
>>>>>> -I../3rdparty/zookeeper-3.4.5/src/c/generated   -pthread -g -g2 -O2 -MT
>>>>>> local/mesos_local-main.o -MD -MP -MF local/.deps/mesos_local-main.Tpo -c 
>>>>>> -o
>>>>>> local/mesos_local-main.o `test -f 'local/main.cpp' || echo
>>>>>> '../../src/'`local/main.cpp
>>>>>> mv -f local/.deps/mesos_local-main.Tpo local/.deps/mesos_local-main.Po
>>>>>> /bin/bash ../libtool  --tag=CXX   --mode=link g++ -pthread -g -g2 -O2
>>>>>>   -o mesos-local local/mesos_local-main.o libmesos.la -lsasl2 -lcurl
>>>>>> -lz  -lrt
>>>>>> libtool: link: g++ -pthread -g -g2 -O2 -o .libs/mesos-local
>>>>>> local/mesos_local-main.o  ./.libs/libmesos.so /usr/lib/libgflags.so
>>>>>> -lpthread -lsasl2 /usr/lib/x86_64-linux-gnu/libcurl-nss.so -lz -lrt 
>>>>>> -pthread
>>>>>> ./.libs/libmesos.so: error: undefined reference to 'dlopen'
>>>>>> ./.libs/libmesos.so: error: undefined reference to 'dlsym'
>>>>>> ./.libs/libmesos.so: error: undefined reference to 'dlerror'
>>>>>> collect2: error: ld returned 1 exit status
>>>>>> make[2]: *** [mesos-local] Error 1
>>>>>> make[2]: Leaving directory `/tmp/mesos-0.19.0/build/src'
>>>>>> make[1]: *** [all] Error 2
>>>>>> make[1]: Leaving directory `/tmp/mesos-0.19.0/build/src'
>>>>>> make: *** [all-recursive] Error 1
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to