Re: Maximum Size of Reference Look Up Table in Spark
Hi, Never worked in a project that would require it. Jacek On 15 Jul 2016 5:31 p.m., "Saravanan Subramanian" wrote: > Hello Jacek, > > Have you seen any practical limitation or performance degradation issues > while using more than 10GB of broadcast cache ? > > Thanks, > Saravanan S. > > > On Thursday, 14 July 2016 8:06 PM, Jacek Laskowski > wrote: > > > Hi, > > My understanding is that the maximum size of a broadcast is the > Long.MAX_VALUE (and plus some more since the data is going to be > encoded to save space, esp. for catalyst-driver datasets). > > Ad 2. Before the tasks access the broadcast variable it has to be sent > across network that may be too slow to be acceptable. > > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Thu, Jul 14, 2016 at 11:32 PM, Saravanan Subramanian > wrote: > > Hello All, > > > > I am in the middle of designing real time data enhancement services using > > spark streaming. As part of this, I have to look up some reference data > > while processing the incoming stream. > > > > I have below questions: > > > > 1) what is the maximum size of look up table / variable can be stored as > > Broadcast variable () > > 2) What is the impact of cluster performance, if I store a 10GB data in > > broadcast variable > > > > Any suggestions and thoughts are welcome. > > > > Thanks, > > Saravanan S. > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > >
Re: Maximum Size of Reference Look Up Table in Spark
Hello Jacek, Have you seen any practical limitation or performance degradation issues while using more than 10GB of broadcast cache ? Thanks,Saravanan S. On Thursday, 14 July 2016 8:06 PM, Jacek Laskowski wrote: Hi, My understanding is that the maximum size of a broadcast is the Long.MAX_VALUE (and plus some more since the data is going to be encoded to save space, esp. for catalyst-driver datasets). Ad 2. Before the tasks access the broadcast variable it has to be sent across network that may be too slow to be acceptable. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Jul 14, 2016 at 11:32 PM, Saravanan Subramanian wrote: > Hello All, > > I am in the middle of designing real time data enhancement services using > spark streaming. As part of this, I have to look up some reference data > while processing the incoming stream. > > I have below questions: > > 1) what is the maximum size of look up table / variable can be stored as > Broadcast variable () > 2) What is the impact of cluster performance, if I store a 10GB data in > broadcast variable > > Any suggestions and thoughts are welcome. > > Thanks, > Saravanan S. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Maximum Size of Reference Look Up Table in Spark
Hi, My understanding is that the maximum size of a broadcast is the Long.MAX_VALUE (and plus some more since the data is going to be encoded to save space, esp. for catalyst-driver datasets). Ad 2. Before the tasks access the broadcast variable it has to be sent across network that may be too slow to be acceptable. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Jul 14, 2016 at 11:32 PM, Saravanan Subramanian wrote: > Hello All, > > I am in the middle of designing real time data enhancement services using > spark streaming. As part of this, I have to look up some reference data > while processing the incoming stream. > > I have below questions: > > 1) what is the maximum size of look up table / variable can be stored as > Broadcast variable () > 2) What is the impact of cluster performance, if I store a 10GB data in > broadcast variable > > Any suggestions and thoughts are welcome. > > Thanks, > Saravanan S. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Maximum Size of Reference Look Up Table in Spark
Hello All, I am in the middle of designing real time data enhancement services using spark streaming. As part of this, I have to look up some reference data while processing the incoming stream. I have below questions: 1) what is the maximum size of look up table / variable can be stored as Broadcast variable ()2) What is the impact of cluster performance, if I store a 10GB data in broadcast variable Any suggestions and thoughts are welcome. Thanks,Saravanan S.