Serge Smertin created SPARK-4368: ------------------------------------ Summary: Ceph integration? Key: SPARK-4368 URL: https://issues.apache.org/jira/browse/SPARK-4368 Project: Spark Issue Type: Bug Components: Input/Output Reporter: Serge Smertin
There is a use-case of storing big number of relatively small BLOB objects (2-20Mb), which has to have some ugly workarounds in HDFS environments. There is a need to process those BLOBs close to data themselves, so that's why MapReduce paradigm is good, as it guarantees data locality. Ceph seems to be one of the systems that maintains both of the properties (small files and data locality) - http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-July/032119.html. I know already that Spark supports GlusterFS - http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3ccf657f2b.5b3a1%25ven...@yarcdata.com%3E So i wonder, could there be an integration with this storage solution and what could be the effort of doing that? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org