Hi All,
What is the best way to instrument metrics of Spark Application from both
Driver and Executor.
I am trying to send my Spark application metrics into Kafka. I found two
approaches.
*Approach 1: * Implement custom Source and Sink and use the Source for
instrumenting from both Driver and Ex
Hi,
In some of the data science use cases like Predictions etc, we are using
Spark. Most of the times, we faced data skew ness issues and we have
distributed them using Murmur hashing or round robin assignment and fixed
skew ness issue across the partitions/tasks.
But still, some of the tasks are
Hi
For my use case, I need to call a third party function(which is in memory
based) for each complete partition data. So I am partitioning RDD logically
using repartition on index column and applying function f on
mapPartitions(f).
When, I iterate through mapPartition iterator. Can, I assume one
not valid. Please ignore it.
Thanks & Regards,
B Anil Kumar.
On Tue, Oct 25, 2016 at 2:35 PM, AnilKumar B wrote:
> Hi,
>
> I am using Spark SQL to transform data. My Source is ORACLE, In general, I
> am extracting multiple tables and joining them and then doing some other
>
Hi,
I am using Spark SQL to transform data. My Source is ORACLE, In general, I
am extracting multiple tables and joining them and then doing some other
transformations in Spark.
Is there any possibility for pushing down join operator to ORACLE using
SPARK SQL, instead of fetching and joining in S