Mich, We used Isilon for a POC of Splice Machine (Spark for Analytics, HBase for real-time). We were concerned initially and the initial setup took a bit longer than excepted, but it performed well on both low latency and high throughput use cases at scale (our POC ~ 100 TB).
Just a data point. Regards, John Leach > On Jun 5, 2017, at 9:11 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > > I am concerned about the use case of tools like Isilon or Panasas to create a > layer on top of HDFS, essentially a HCFS on top of HDFS with the usual 3x > replication gone into the tool itself. > > There is interest to push Isilon as a the solution forward but my caution is > about scalability and future proof of such tools. So I was wondering if > anyone else has tried such solution. > > Thanks > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 2 June 2017 at 19:09, Gene Pang <gene.p...@gmail.com > <mailto:gene.p...@gmail.com>> wrote: > As Vincent mentioned earlier, I think Alluxio can work for this. You can > mount your (potentially remote) storage systems to Alluxio > <http://www.alluxio.org/docs/master/en/Unified-and-Transparent-Namespace.html>, > and deploy Alluxio co-located to the compute cluster. The computation > framework will still achieve data locality since Alluxio workers are > co-located, even though the existing storage systems may be remote. You can > also use tiered storage > <http://www.alluxio.org/docs/master/en/Tiered-Storage-on-Alluxio.html> to > deploy using only memory, and/or other physical media. > > Here are some blogs (Alluxio with Minio > <https://www.alluxio.com/blog/scalable-genomics-data-processing-pipeline-with-alluxio-mesos-and-minio>, > Alluxio with HDFS > <https://www.alluxio.com/blog/qunar-performs-real-time-data-analytics-up-to-300x-faster-with-alluxio>, > Alluxio with S3 > <https://www.alluxio.com/blog/accelerating-on-demand-data-analytics-with-alluxio>) > which use similar architecture. > > Hope that helps, > Gene > > On Thu, Jun 1, 2017 at 1:45 AM, Mich Talebzadeh <mich.talebza...@gmail.com > <mailto:mich.talebza...@gmail.com>> wrote: > As a matter of interest what is the best way of creating virtualised clusters > all pointing to the same physical data? > > thanks > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 1 June 2017 at 09:27, vincent gromakowski <vincent.gromakow...@gmail.com > <mailto:vincent.gromakow...@gmail.com>> wrote: > If mandatory, you can use a local cache like alluxio > > Le 1 juin 2017 10:23 AM, "Mich Talebzadeh" <mich.talebza...@gmail.com > <mailto:mich.talebza...@gmail.com>> a écrit : > Thanks Vincent. I assume by physical data locality you mean you are going > through Isilon and HCFS and not through direct HDFS. > > Also I agree with you that shared network could be an issue as well. However, > it allows you to reduce data redundancy (you do not need R3 in HDFS anymore) > and also you can build virtual clusters on the same data. One cluster for > read/writes and another for Reads? That is what has been suggestes!. > > regards > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 1 June 2017 at 08:55, vincent gromakowski <vincent.gromakow...@gmail.com > <mailto:vincent.gromakow...@gmail.com>> wrote: > I don't recommend this kind of design because you loose physical data > locality and you will be affected by "bad neighboors" that are also using the > network storage... We have one similar design but restricted to small > clusters (more for experiments than production) > > 2017-06-01 9:47 GMT+02:00 Mich Talebzadeh <mich.talebza...@gmail.com > <mailto:mich.talebza...@gmail.com>>: > Thanks Jorn, > > This was a proposal made by someone as the firm is already using this tool on > other SAN based storage and extend it to Big Data > > On paper it seems like a good idea, in practice it may be a Wandisco scenario > again.. Of course as ever one needs to EMC for reference calls ans whether > anyone is using this product in anger. > > At the end of the day it's not HDFS. It is OneFS with a HCFS API. However > that may suit our needs. But would need to PoC it and test it thoroughly! > > Cheers > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 1 June 2017 at 08:21, Jörn Franke <jornfra...@gmail.com > <mailto:jornfra...@gmail.com>> wrote: > Hi, > > I have done this (not Isilon, but another storage system). It can be > efficient for small clusters and depending on how you design the network. > > What I have also seen is the microservice approach with object stores (e.g. > In the cloud s3, on premise swift) which is somehow also similar. > > If you want additional performance you could fetch the data from the object > stores and store it temporarily in a local HDFS. Not sure to what extent this > affects regulatory requirements though. > > Best regards > > On 31. May 2017, at 18:07, Mich Talebzadeh <mich.talebza...@gmail.com > <mailto:mich.talebza...@gmail.com>> wrote: > >> Hi, >> >> I realize this may not have direct relevance to Spark but has anyone tried >> to create virtualized HDFS clusters using tools like ISILON or similar? >> >> The prime motive behind this approach is to minimize the propagation or copy >> of data which has regulatory implication. In shoret you want your data to be >> in one place regardless of artefacts used against it such as Spark? >> >> Thanks, >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> >> >> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> >> >> Disclaimer: Use it at your own risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. The >> author will in no case be liable for any monetary damages arising from such >> loss, damage or destruction. >> > > > > > >