You probably would have to write a consumer app to dump data in binary form
to GPFS or NFS, since the HDFS api is very special.

Thanks,

Jun


On Fri, May 16, 2014 at 8:17 AM, Carlile, Ken <carli...@janelia.hhmi.org>wrote:

> Hi all,
>
> Sorry for the possible repost--hadn't seen this in the list after 18 hours
> and figured I'd try again....
>
> We are experimenting as using Kafka as a midpoint between microscopes and
> a Spark cluster for data analysis. Our microscopes almost universally use
> Windows machines for acquisition (as do most scientific instruments), and
> our compute cluster (which runs Spark among many other things) runs Linux.
> We use Isilon for file storage primarily, although we also have a GPFS
> cluster for HPC.
>
> We have a working http post system going into Kafka from the Windows
> acquisition machine, which is performing more reliably and faster than an
> SMB connection to the Isilon or GPFS clusters. Unfortunately, the Spark
> streaming consumer is much slower than reading from disk (Isilon or GPFS)
> on the Spark cluster.
>
> My proposal would be to not only improve the Spark streaming, but also to
> have a consumer (or multiple consumers!) that writes to disk, either over
> NFS or "locally" via a GPFS client.
>
> As I am a systems engineer, I'm not equipped to write this, so I'm
> wondering if anyone has done this sort of thing with Kafka before. I know
> there are HDFS consumers out there, and our Isilons can do HDFS, but the
> implementation on the Isilon is very limited at this time, and the ability
> to write to local filesystem or NFS would give us much more flexibility.
>
> Ideally, I would like to be able to use Kafka as a high speed transfer
> point between acquisition instruments (usually running Windows) and several
> kinds of storage, so that we could write virtually simultaneously to
> archive storage for the raw data and to HPC scratch for data analysis,
> thereby limiting the penalty incurred from data movement between storage
> tiers.
>
> Thanks for any input you have,
>
> --Ken

Reply via email to