I assume you use Scala to implement your UDFs.
In this case, Scala language itself provides some options already for you. If you want to control more logic when UDFs init, you can define a Scala object, def your UDF as part of it, then the object in Scala will behavior like Singleton pattern for you. So the Sacala object's constructor logic can be treated as init/configure contract as in Hive. They will be called once per JVM, to init your Scala object. That should meet your requirement. The only trick part is the context reference for configure() method, which allow you to pass some configuration dynamic to your UDF for runtime. Since object in Scala has to fix at compile time, so you cannot pass any parameters to the construct of it. But there is nothing stopping you building Scala class/companion object to allow any parameter passed in at constructor/init time, which can control your UDF's behavior. If you have a concrete example that you cannot do in Spark Scala UDF, you can post here. Yong ________________________________ From: RD <rdsr...@gmail.com> Sent: Friday, June 16, 2017 11:37 AM To: Georg Heiler Cc: user@spark.apache.org Subject: Re: [Spark Sql/ UDFs] Spark and Hive UDFs parity Thanks Georg. But I'm not sure how mapPartitions is relevant here. Can you elaborate? On Thu, Jun 15, 2017 at 4:18 AM, Georg Heiler <georg.kf.hei...@gmail.com<mailto:georg.kf.hei...@gmail.com>> wrote: What about using map partitions instead? RD <rdsr...@gmail.com<mailto:rdsr...@gmail.com>> schrieb am Do. 15. Juni 2017 um 06:52: Hi Spark folks, Is there any plan to support the richer UDF API that Hive supports for Spark UDFs ? Hive supports the GenericUDF API which has, among others methods like initialize(), configure() (called once on the cluster) etc, which a lot of our users use. We have now a lot of UDFs in Hive which make use of these methods. We plan to move to UDFs to Spark UDFs but are being limited by not having similar lifecycle methods. Are there plans to address these? Or do people usually adopt some sort of workaround? If we directly use the Hive UDFs in Spark we pay a performance penalty. I think Spark anyways does a conversion from InternalRow to Row back to InternalRow for native spark udfs and for Hive it does InternalRow to Hive Object back to InternalRow but somehow the conversion in native udfs is more performant. -Best, R.