Re: Using Log4j for logging messages inside lambda functions
Hello! Thank you all for your answers. Akhil's proposed solution works fine. Thanks. Florin On Tue, May 26, 2015 at 3:08 AM, Wesley Miao wesley.mi...@gmail.com wrote: The reason it didn't work for you is that the function you registered with someRdd.map will be running on the worker/executor side, not in your driver's program. Then you need to be careful to not accidentally close over some objects instantiated from your driver's program, like the log object in your sample code above. You can look for more information online to understand more the concept of Closure so that you can understand to the bottom of it why it didn't work for you at first place. The usual solution to this type of problems is to instantiate the objects you want to use in your map functions from within your map functions. You can define a factory object that you can create your log object from. On Mon, May 25, 2015 at 11:05 PM, Spico Florin spicoflo...@gmail.com wrote: Hello! I would like to use the logging mechanism provided by the log4j, but I'm getting the Exception in thread main org.apache.spark.SparkException: Task not serializable - Caused by: java.io.NotSerializableException: org.apache.log4j.Logger The code (and the problem) that I'm using resembles the one used here : http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala, meaning: val log = Logger.getLogger(getClass.getName) def doTest() { val conf = new SparkConf().setMaster(local[4]).setAppName(LogTest) val spark = new SparkContext(conf) val someRdd = spark.parallelize(List(1, 2, 3)) someRdd.map { element = *log.info http://log.info(s$element will be processed)* element + 1 } I'm posting the same problem due to the fact that the one from stackoverflow didn't get any answer. In this case, can you please tell us what is the best way to use logging? Is any solution that is not using the rdd.forEachPartition? I look forward for your answers. Regards, Florin
Re: Using Log4j for logging messages inside lambda functions
Try this way: object Holder extends Serializable { @transient lazy val log = Logger.getLogger(getClass.getName)} val someRdd = spark.parallelize(List(1, 2, 3)) someRdd.map { element = Holder.*log.info http://log.info/(s$element will be processed)* element + 1 } Thanks Best Regards On Mon, May 25, 2015 at 8:35 PM, Spico Florin spicoflo...@gmail.com wrote: Hello! I would like to use the logging mechanism provided by the log4j, but I'm getting the Exception in thread main org.apache.spark.SparkException: Task not serializable - Caused by: java.io.NotSerializableException: org.apache.log4j.Logger The code (and the problem) that I'm using resembles the one used here : http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala, meaning: val log = Logger.getLogger(getClass.getName) def doTest() { val conf = new SparkConf().setMaster(local[4]).setAppName(LogTest) val spark = new SparkContext(conf) val someRdd = spark.parallelize(List(1, 2, 3)) someRdd.map { element = *log.info http://log.info(s$element will be processed)* element + 1 } I'm posting the same problem due to the fact that the one from stackoverflow didn't get any answer. In this case, can you please tell us what is the best way to use logging? Is any solution that is not using the rdd.forEachPartition? I look forward for your answers. Regards, Florin
Re: Using Log4j for logging messages inside lambda functions
The reason it didn't work for you is that the function you registered with someRdd.map will be running on the worker/executor side, not in your driver's program. Then you need to be careful to not accidentally close over some objects instantiated from your driver's program, like the log object in your sample code above. You can look for more information online to understand more the concept of Closure so that you can understand to the bottom of it why it didn't work for you at first place. The usual solution to this type of problems is to instantiate the objects you want to use in your map functions from within your map functions. You can define a factory object that you can create your log object from. On Mon, May 25, 2015 at 11:05 PM, Spico Florin spicoflo...@gmail.com wrote: Hello! I would like to use the logging mechanism provided by the log4j, but I'm getting the Exception in thread main org.apache.spark.SparkException: Task not serializable - Caused by: java.io.NotSerializableException: org.apache.log4j.Logger The code (and the problem) that I'm using resembles the one used here : http://stackoverflow.com/questions/29208844/apache-spark-logging-within-scala, meaning: val log = Logger.getLogger(getClass.getName) def doTest() { val conf = new SparkConf().setMaster(local[4]).setAppName(LogTest) val spark = new SparkContext(conf) val someRdd = spark.parallelize(List(1, 2, 3)) someRdd.map { element = *log.info http://log.info(s$element will be processed)* element + 1 } I'm posting the same problem due to the fact that the one from stackoverflow didn't get any answer. In this case, can you please tell us what is the best way to use logging? Is any solution that is not using the rdd.forEachPartition? I look forward for your answers. Regards, Florin