Re: IGFS YARN setup

Haithem Turki Thu, 26 May 2016 16:51:36 -0700

I also had to create a "default-config.xml" block and point towards it in
HDFS via "IGNITE_XML_CONFIG" and then add the following property to the
"igfs-data" bean, not sure if that's expected...


<property name="affinityMapper">

<bean class="org.apache.ignite.igfs.IgfsGroupDataBlocksKeyMapper">

<!— How many sequential blocks will be stored on the same node. -->

<constructor-arg value="512"/>

</bean>

</property>


On Thu, May 26, 2016 at 5:56 PM, Haithem Turki <[email protected]>
wrote:

> Hello,
>
> I'm interested in using IGFS as a Hadoop caching layer - the usecase
> revolves largely around Spark jobs running on a YARN cluster that persist
> data to S3 (although I have some non-Spark stuff running too so would
> ideally integrate at the Hadoop filesystem layer). I'm excited about the
> potential speedups that this could bring :)
>
> I took a stab at deploying this for the first time, and had some questions:
>
> - I ideally was envisioning deploying nodes via YARN to take advantage of
> dynamic scaling and use any available memory on the cluster, I wanted to
> make sure that this was indeed a supported workflow / on the roadmap as I
> hit a few bumps along the way:
> * I ended up needing to dump pretty much all of my Hadoop-related jars to
> HDFS for my nodes to startup correctly (or else I was getting
> ClassNotFoundExceptions ranging from guava to hadoop to asm to ignite
> classes not being there). Am I doing something horribly wrong / have you
> guys considered package a fat jar for the non-hadoop dependencies at least?
> * Couldn't specify the yarn queue despite attempting to
> set -Dmapreduce.job.queuename via IGNITE_JVM_OPTS variable (
> https://issues.apache.org/jira/browse/IGNITE-2738?)
> * Seems like dynamic allocation isn't supported? Wanted to get a sense of
> whether this was in the roadmap
> * Since YARN allocates containers at random it's pretty onerous to figure
> out which hostnames have Ignite nodes running on them and specifying those
> in the URL. For now I have TCP enabled (Ignite doesn't seem to die on port
> conflicts if multiple nodes are running on the same machine) and I guess I
> can set up a reverse proxy so that I can point towards a stable URL but
> it's not great / doesn't scale well so I was wondering if there were other
> suggestions on how to configure discovery (maybe spin up a local node
> outside of YARN that leverages the cluster discovery?)
> * I also wasn't clear on how cluster routing/balancing worked. If I
> specify my hadoop jobs to point at host1:10500 via TCP, will all
> read/writes route through that node or do the reads/writes somehow get
> balanced?
>
> Or is this completely crazy / should I just deploy IGFS outside of YARN?
>
> - Is there a way of configuring the local filesystem as a tiered storage
> layer (or is it on the roadmap)? Usecase is that even reading from an SSD
> is much faster than S3.
>
> Thanks in advance!
> - Haithem
>

Re: IGFS YARN setup

Reply via email to