Re: [Spark Sql/ UDFs] Spark and Hive UDFs parity

2017-06-16 Thread RD
Thanks Georg. But I'm not sure how mapPartitions is relevant here. Can you elaborate? On Thu, Jun 15, 2017 at 4:18 AM, Georg Heiler wrote: > What about using map partitions instead? > > RD schrieb am Do. 15. Juni 2017 um 06:52: > >> Hi Spark folks, >> >>

[Spark Sql/ UDFs] Spark and Hive UDFs parity

2017-06-14 Thread RD
Hi Spark folks, Is there any plan to support the richer UDF API that Hive supports for Spark UDFs ? Hive supports the GenericUDF API which has, among others methods like initialize(), configure() (called once on the cluster) etc, which a lot of our users use. We have now a lot of UDFs in Hive

many 'activity' job are pending

2016-07-15 Thread 陆巍|Wei Lu(RD
Hi there, I meet with a “many Active jobs” issue when using direct kafka streaming on YARN. (spark 1.5, hadoop 2.6, CDH5.5.1) The problem happens when kafka has almost NO traffic. From application UI, I see many ‘active’ jobs are pending for hours. And finally the driver “Requesting 4 new exec