Re: Image indexing/searching with Hadoop and MPI

2009-06-07 Thread Owen O'Malley
 Ok I can understand your point - but I am sure that some people have been
 trying to use map-reduce programming model to do CFD, or any other
 scientific computing.
 Any experience in this area from the list ?

I know of one project that assumes it has an entire Hadoop cluster,
and generates the hostnames in the Mapper and uses those host lists in
the Reducer to launch an MPI job. They do it because it provides a
higher efficiency for doing very small data transfers. The alternative
was doing a long chain of map/reduce jobs that have very small outputs
from each phase. I wouldn't recommend using MPI under map/reduce in
general, since it involves making a lot of assumptions about your
application. In particular, to avoid from killing your cluster your
shouldn't use checkpoints in your application and just rerun the
application from the beginning on failures. That implies that the
application can't run very long (upper bound of probably 30 minutes on
2000 nodes).

That said, if you want to run other styles of applications, you really
want a two level scheduler. Where the first level scheduler allocates
nodes (or partial nodes) to jobs (or frameworks). Effectively, that is
what Hadoop On Demand (HOD) was doing with Torque, but I suspect there
will be a more performant solution than HOD with in the next year.

-- Owen


Image indexing/searching with Hadoop and MPI

2009-06-03 Thread tog
Hi there,

This is a kind of newbie question (at least as far as Hadoop is concerned).
I was wondering if they were any Hadoop based project around dealing with
Image indexing and searching ? We are working is this area and might be
interesting to have a look in such a project.
Second question is dealing with scientific computing with Haddop. Does
anyone has try to use Hadoop to parallelize a scientific application ? I
know there is Hama but it does not seem very active these days (I might be
wrong ;) )
Some time ago, I heard of an attempt of implementing some MPI implementation
on top of Hadoop , was it really the plan, is there any update ?
Anyway, I would be interested in any paper/fedeback on the performance of
scientific application running on large clusters using Hadoop.

Best Regards
Guillaume


Re: Image indexing/searching with Hadoop and MPI

2009-06-03 Thread Edward J. Yoon
 This is a kind of newbie question (at least as far as Hadoop is concerned).
 I was wondering if they were any Hadoop based project around dealing with
 Image indexing and searching ? We are working is this area and might be
 interesting to have a look in such a project.

There is a text-search engine library, called lucene. See also the
nutch project. Otherwise, Did you mean something like content-based
image indexing and searching usig image attributes, such as, color,
texture, and etc., not the text of image tag?

 Second question is dealing with scientific computing with Haddop. Does
 anyone has try to use Hadoop to parallelize a scientific application ? I
 know there is Hama but it does not seem very active these days (I might be
 wrong ;) )
 Some time ago, I heard of an attempt of implementing some MPI implementation
 on top of Hadoop , was it really the plan, is there any update ?
 Anyway, I would be interested in any paper/fedeback on the performance of
 scientific application running on large clusters using Hadoop.

I think the MPI programming isn't suitable for the concept of
distributed hdfs and map/reduce programming system, since MPI requires
the heavy communication among the nodes.

FYI, In hama, currently the basic matrix operations are implemented
based on the map/reduce programming model. For example, the matrix
get/set methods, the matrix norms, matrix-matrix
multiplication/addition, matrix transpose. In near future, SVD,
Eigenvalue decomposition and some graph algorithms will be
implemented. All the operations are sequentially executed.

Thanks.

On Wed, Jun 3, 2009 at 5:32 PM, tog guillaume.all...@gmail.com wrote:
 Hi there,

 This is a kind of newbie question (at least as far as Hadoop is concerned).
 I was wondering if they were any Hadoop based project around dealing with
 Image indexing and searching ? We are working is this area and might be
 interesting to have a look in such a project.
 Second question is dealing with scientific computing with Haddop. Does
 anyone has try to use Hadoop to parallelize a scientific application ? I
 know there is Hama but it does not seem very active these days (I might be
 wrong ;) )
 Some time ago, I heard of an attempt of implementing some MPI implementation
 on top of Hadoop , was it really the plan, is there any update ?
 Anyway, I would be interested in any paper/fedeback on the performance of
 scientific application running on large clusters using Hadoop.

 Best Regards
 Guillaume




-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardy...@apache.org
http://blog.udanax.org


Re: Image indexing/searching with Hadoop and MPI

2009-06-03 Thread tog
On Wed, Jun 3, 2009 at 5:17 PM, Edward J. Yoon edwardy...@apache.orgwrote:

  This is a kind of newbie question (at least as far as Hadoop is
 concerned).
  I was wondering if they were any Hadoop based project around dealing with
  Image indexing and searching ? We are working is this area and might be
  interesting to have a look in such a project.

 There is a text-search engine library, called lucene. See also the
 nutch project. Otherwise, Did you mean something like content-based
 image indexing and searching usig image attributes, such as, color,
 texture, and etc., not the text of image tag?


Yes this is exactly what I mean, I am looking at a project doing
content-based image indexing using for example GIST, BOF, ...
Does such a project exist ?




 I think the MPI programming isn't suitable for the concept of
 distributed hdfs and map/reduce programming system, since MPI requires
 the heavy communication among the nodes.


Ok I can understand your point - but I am sure that some people have been
trying to use map-reduce programming model to do CFD, or any other
scientific computing.
Any experience in this area from the list ?

Cheers
Guillaume