Implementing Spark metric source and Sink for custom application metrics

2018-04-18 Thread AnilKumar B
Hi All, What is the best way to instrument metrics of Spark Application from both Driver and Executor. I am trying to send my Spark application metrics into Kafka. I found two approaches. *Approach 1: * Implement custom Source and Sink and use the Source for instrumenting from both Driver and Ex

Configurable Task level time outs and task failures

2017-06-14 Thread AnilKumar B
Hi, In some of the data science use cases like Predictions etc, we are using Spark. Most of the times, we faced data skew ness issues and we have distributed them using Murmur hashing or round robin assignment and fixed skew ness issue across the partitions/tasks. But still, some of the tasks are

mapPartition iterator

2017-01-16 Thread AnilKumar B
Hi For my use case, I need to call a third party function(which is in memory based) for each complete partition data. So I am partitioning RDD logically using repartition on index column and applying function f on mapPartitions(f). When, I iterate through mapPartition iterator. Can, I assume one

Re: Operator push down through JDBC driver

2016-10-25 Thread AnilKumar B
not valid. Please ignore it. Thanks & Regards, B Anil Kumar. On Tue, Oct 25, 2016 at 2:35 PM, AnilKumar B wrote: > Hi, > > I am using Spark SQL to transform data. My Source is ORACLE, In general, I > am extracting multiple tables and joining them and then doing some other >

Operator push down through JDBC driver

2016-10-25 Thread AnilKumar B
Hi, I am using Spark SQL to transform data. My Source is ORACLE, In general, I am extracting multiple tables and joining them and then doing some other transformations in Spark. Is there any possibility for pushing down join operator to ORACLE using SPARK SQL, instead of fetching and joining in S