Re: Best practices for sharing/maintaining large resource files for Spark jobs

Dmitry Goldenberg Tue, 12 Jan 2016 04:05:18 -0800

Would it make sense to load them into Tachyon and read and broadcast them from 
there since Tachyon is already a part of the Spark stack?


If so I wonder if I could do that Tachyon read/write via a Spark API?


> On Jan 12, 2016, at 2:21 AM, Sabarish Sasidharan 
> <sabarish.sasidha...@manthan.com> wrote:
> 
> One option could be to store them as blobs in a cache like Redis and then 
> read + broadcast them from the driver. Or you could store them in HDFS and 
> read + broadcast from the driver.
> 
> Regards
> Sab
> 
>> On Tue, Jan 12, 2016 at 1:44 AM, Dmitry Goldenberg 
>> <dgoldenberg...@gmail.com> wrote:
>> We have a bunch of Spark jobs deployed and a few large resource files such 
>> as e.g. a dictionary for lookups or a statistical model.
>> 
>> Right now, these are deployed as part of the Spark jobs which will 
>> eventually make the mongo-jars too bloated for deployments.
>> 
>> What are some of the best practices to consider for maintaining and sharing 
>> large resource files like these?
>> 
>> Thanks.
> 
> 
> 
> -- 
> 
> Architect - Big Data
> Ph: +91 99805 99458
> 
> Manthan Systems | Company of the year - Analytics (2014 Frost and Sullivan 
> India ICT)
> +++

Re: Best practices for sharing/maintaining large resource files for Spark jobs

Reply via email to