回复： spark broadcast unavailable

十六夜涙 Wed, 10 Dec 2014 00:32:06 -0800

Hi All,
I'v read official docs of tachyon,It seems not fit my usage,For my 
understanding,‍It just cache files in memory,but I have a file contains over 
million lines amount about 70mb,retrieveing data and mapping to a Map varible 
will costs over serveral minuts,which I dont want to process it each time in 
map function.since tachyon occurs another problem raise an exception while 
doing ./bin/tachyon format
The exception:
Exception in thread "main" java.lang.RuntimeException: 
org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate 
with client version 4


‍It seems there's a compatibility problem with hadoop,but even solved it 
there's still an efficient issue as I described above.‍‍
could somebody tell me how to  persist the data in memory.for now I just 
broadcast it, and re-submit spark application while the broadcast value 
unavaible.‍






------------------ 原始邮件 ------------------
发件人: "Akhil Das";<ak...@sigmoidanalytics.com>;
发送时间: 2014年12月9日(星期二) 下午3:42
收件人: "十六夜涙"<cr...@qq.com>; 
抄送: "user"<u...@spark.incubator.apache.org>; 
主题: Re: spark broadcast unavailable



You cannot pass the sc object (val b = Utils.load(sc,ip_lib_path)) inside a map 
function and that's why the Serialization exception is popping up( since sc is 
not serializable). You can try tachyon's cache if you want to persist the data 
in memory kind of forever.


ThanksBest Regards



 
On Tue, Dec 9, 2014 at 12:12 PM, 十六夜涙 <cr...@qq.com> wrote:
Hi all    In my spark application,I load a csv file and map the datas to a Map 
vairable for later uses on driver node ,then broadcast it,every thing works 
fine untill the exception java.io.FileNotFoundException occurs.the console log 
information shows me the broadcast unavailable,I googled this problem,says 
spark will  clean up the broadcast,while these's an solution,the author 
mentioned about re-broadcast,I followed this way,written some exception handle 
code with `try` ,`catch`.after compliling and submitting the jar,I faced 
anthoner problem,It shows " task not serializable‍".‍‍‍
so here I have  there options:
1,get the right way persisting broadcast
2,solve the "task not serializable" problem re-broadcast variable
3,save the data to some kind of database,although I prefer save data in memory.


here is come code snippets:
  val esRdd = kafkaDStreams.flatMap(_.split("\\n"))
      .map{
      case esregex(datetime, time_request) =>


        var ipInfo:Array[String]=Array.empty
        try{
            ipInfo = Utils.getIpInfo(client_ip,b.value)
        }catch{
          case e:java.io.FileNotFoundException =>{
            val b = Utils.load(sc,ip_lib_path)
            ipInfo = Utils.getIpInfo(client_ip,b.value)
          }
        }
‍

回复： spark broadcast unavailable

Reply via email to