Using hdfs (or a filesystem other than local) is not supported yet. tmpfs
would be your best bet in that case - we have tested with this before, but
this has capacity limitations, and mixing tmpfs with regular disks does not
provide a deterministic mechanism of selecting memory as the intermediate
storage.
Not sure if tachyon has an nfs interface to access it - otherwise that
could have been an option.

We have made simple changes in the past to use HDFS for shuffle - primarily
as experiments. None of that is available as patches, but IIRC - the
changes were not very complicated. This would involve changing the fetcher
to skip HTTP and use a pre-determined path on a specified filesystem to
fetch data. Also, the producer to write out to a specific path on a
non-local FileSystem.

On Mon, Dec 7, 2015 at 11:57 AM, Raajay <[email protected]> wrote:

> I wish to setup a Tez data analysis framework, where the data resides in
> memory. Currently, I have tez (and also Hive) setup such that it can read
> from an in-memory filesystem like Tachyon.
>
> However, the intermediate data is still written to disk at the each
> processing node. I considered writing to tmpfs, however, such a setup does
> not fall back to disk gracefully.
>
> Does Tez have an interface to write intermediate data to HDFS like
> filesystem ? If yes, what are the settings ?
>
> Does setting "yarn.nodemanager.local-dirs" to some HDFS or Tachyon URI
> suffice ?
>
> Thanks,
> Raajay
>

Reply via email to