Would it make sense to load them into Tachyon and read and broadcast them from there since Tachyon is already a part of the Spark stack?
If so I wonder if I could do that Tachyon read/write via a Spark API? > On Jan 12, 2016, at 2:21 AM, Sabarish Sasidharan > <sabarish.sasidha...@manthan.com> wrote: > > One option could be to store them as blobs in a cache like Redis and then > read + broadcast them from the driver. Or you could store them in HDFS and > read + broadcast from the driver. > > Regards > Sab > >> On Tue, Jan 12, 2016 at 1:44 AM, Dmitry Goldenberg >> <dgoldenberg...@gmail.com> wrote: >> We have a bunch of Spark jobs deployed and a few large resource files such >> as e.g. a dictionary for lookups or a statistical model. >> >> Right now, these are deployed as part of the Spark jobs which will >> eventually make the mongo-jars too bloated for deployments. >> >> What are some of the best practices to consider for maintaining and sharing >> large resource files like these? >> >> Thanks. > > > > -- > > Architect - Big Data > Ph: +91 99805 99458 > > Manthan Systems | Company of the year - Analytics (2014 Frost and Sullivan > India ICT) > +++