Serge Smertin created SPARK-4368:
------------------------------------

             Summary: Ceph integration?
                 Key: SPARK-4368
                 URL: https://issues.apache.org/jira/browse/SPARK-4368
             Project: Spark
          Issue Type: Bug
          Components: Input/Output
            Reporter: Serge Smertin


There is a use-case of storing big number of relatively small BLOB objects 
(2-20Mb), which has to have some ugly workarounds in HDFS environments. There 
is a need to process those BLOBs close to data themselves, so that's why 
MapReduce paradigm is good, as it guarantees data locality.

Ceph seems to be one of the systems that maintains both of the properties 
(small files and data locality) -  
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-July/032119.html. I 
know already that Spark supports GlusterFS - 
http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3ccf657f2b.5b3a1%25ven...@yarcdata.com%3E

So i wonder, could there be an integration with this storage solution and what 
could be the effort of doing that? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to