Thanks Georg. But I'm not sure how mapPartitions is relevant here. Can you
elaborate?
On Thu, Jun 15, 2017 at 4:18 AM, Georg Heiler
wrote:
> What about using map partitions instead?
>
> RD schrieb am Do. 15. Juni 2017 um 06:52:
>
>> Hi Spark folks,
>>
>>
Hi Spark folks,
Is there any plan to support the richer UDF API that Hive supports for
Spark UDFs ? Hive supports the GenericUDF API which has, among others
methods like initialize(), configure() (called once on the cluster) etc,
which a lot of our users use. We have now a lot of UDFs in Hive
Hi there,
I meet with a “many Active jobs” issue when using direct kafka streaming on
YARN. (spark 1.5, hadoop 2.6, CDH5.5.1)
The problem happens when kafka has almost NO traffic.
From application UI, I see many ‘active’ jobs are pending for hours. And
finally the driver “Requesting 4 new exec