Using hdfs (or a filesystem other than local) is not supported yet. tmpfs would be your best bet in that case - we have tested with this before, but this has capacity limitations, and mixing tmpfs with regular disks does not provide a deterministic mechanism of selecting memory as the intermediate storage. Not sure if tachyon has an nfs interface to access it - otherwise that could have been an option.
We have made simple changes in the past to use HDFS for shuffle - primarily as experiments. None of that is available as patches, but IIRC - the changes were not very complicated. This would involve changing the fetcher to skip HTTP and use a pre-determined path on a specified filesystem to fetch data. Also, the producer to write out to a specific path on a non-local FileSystem. On Mon, Dec 7, 2015 at 11:57 AM, Raajay <[email protected]> wrote: > I wish to setup a Tez data analysis framework, where the data resides in > memory. Currently, I have tez (and also Hive) setup such that it can read > from an in-memory filesystem like Tachyon. > > However, the intermediate data is still written to disk at the each > processing node. I considered writing to tmpfs, however, such a setup does > not fall back to disk gracefully. > > Does Tez have an interface to write intermediate data to HDFS like > filesystem ? If yes, what are the settings ? > > Does setting "yarn.nodemanager.local-dirs" to some HDFS or Tachyon URI > suffice ? > > Thanks, > Raajay >
