Spark structured streaming - join static dataset with streaming dataset

2017-10-05 Thread jithinpt
I'm using Spark structured streaming to process records read from Kafka. Here's what I'm trying to achieve: (a) Each record is a Tuple2 of type (Timestamp, DeviceId). (b) I've created a static Dataset[DeviceId] which contains the set of all valid device IDs (of type DeviceId) that are expected to

Re: Accumulators not available in Task.taskMetrics

2017-10-05 Thread Tarun Kumar
Any response on this one? Thanks in advance! On Thu, Oct 5, 2017 at 1:44 PM, Tarun Kumar wrote: > Hi, I registered an accumulator in driver via > sparkContext.register(myCustomAccumulator, > "accumulator-name"). But this accumulator is not available in > task.metrics.accumulators() > list. Acc

Re: How to merge fragmented IDs into one cluster if one/more IDs are shared

2017-10-05 Thread 孫澤恩
Hi there, About GraphX, i thing that the graph process is parse you data into (VertexA) - [Edge1] - (VertexB). As we see the Graph class of GraphX contains edges and vertices. Such that, in the first line of your data would be parse to uuid_3_1,uuid_3_2,uuid_3_3,uuid_3_3 as vertices. (uuid_3_

Spark ML - LogisticRegression extract words with highest weights

2017-10-05 Thread pun
I am using Spark ML's pipeline to classify text documents with the following steps: Tokenizer -> CountVectorizer -> LogisticRegression I want to be able to print the words with the highest weights. Can this be done? So far I have been able to extract the LR coefficients, but can those be tied up t

Any rabbit mq connect for spark structured streaming ?

2017-10-05 Thread Darshan Pandya
-- Sincerely, Darshan

How to merge fragmented IDs into one cluster if one/more IDs are shared

2017-10-05 Thread Tushar Sudake
Hello Sparkans, I want to merge following cluster / set of IDs into one if they have shared IDs. For example: uuid_3_1,uuid_3_2,uuid_3_3,uuid_3_4 uuid_3_2,uuid_3_5,uuid_3_6 uuid_3_5,uuid_3_7,uuid_3_8,uuid_3_9 into single: uuid_3_1,uuid_3_2,uuid_3_3,uuid_3_4,uuid_3_5,uuid_3_6,uuid_3_7,uuid_3_8,

Accumulators not available in Task.taskMetrics

2017-10-05 Thread Tarun Kumar
Hi, I registered an accumulator in driver via sparkContext.register(myCustomAccumulator, "accumulator-name"). But this accumulator is not available in task.metrics.accumulators() list. Accumulator is not visible in spark UI as well. Does spark need different configuration to make accumulator visib