I'd guess that if the resources are broadcast Spark would put them into Tachyon...
> On Jan 12, 2016, at 7:04 AM, Dmitry Goldenberg <dgoldenberg...@gmail.com> > wrote: > > Would it make sense to load them into Tachyon and read and broadcast them > from there since Tachyon is already a part of the Spark stack? > > If so I wonder if I could do that Tachyon read/write via a Spark API? > > >> On Jan 12, 2016, at 2:21 AM, Sabarish Sasidharan >> <sabarish.sasidha...@manthan.com> wrote: >> >> One option could be to store them as blobs in a cache like Redis and then >> read + broadcast them from the driver. Or you could store them in HDFS and >> read + broadcast from the driver. >> >> Regards >> Sab >> >>> On Tue, Jan 12, 2016 at 1:44 AM, Dmitry Goldenberg >>> <dgoldenberg...@gmail.com> wrote: >>> We have a bunch of Spark jobs deployed and a few large resource files such >>> as e.g. a dictionary for lookups or a statistical model. >>> >>> Right now, these are deployed as part of the Spark jobs which will >>> eventually make the mongo-jars too bloated for deployments. >>> >>> What are some of the best practices to consider for maintaining and sharing >>> large resource files like these? >>> >>> Thanks. >> >> >> >> -- >> >> Architect - Big Data >> Ph: +91 99805 99458 >> >> Manthan Systems | Company of the year - Analytics (2014 Frost and Sullivan >> India ICT) >> +++