I'm using Spark structured streaming to process records read from Kafka.
Here's what I'm trying to achieve:
(a) Each record is a Tuple2 of type (Timestamp, DeviceId).
(b) I've created a static Dataset[DeviceId] which contains the set of all
valid device IDs (of type DeviceId) that are expected to
Any response on this one? Thanks in advance!
On Thu, Oct 5, 2017 at 1:44 PM, Tarun Kumar wrote:
> Hi, I registered an accumulator in driver via
> sparkContext.register(myCustomAccumulator,
> "accumulator-name"). But this accumulator is not available in
> task.metrics.accumulators()
> list. Acc
Hi there,
About GraphX, i thing that the graph process is parse you data into (VertexA) -
[Edge1] - (VertexB).
As we see the Graph class of GraphX contains edges and vertices.
Such that, in the first line of your data would be parse to
uuid_3_1,uuid_3_2,uuid_3_3,uuid_3_3 as vertices.
(uuid_3_
I am using Spark ML's pipeline to classify text documents with the following
steps:
Tokenizer -> CountVectorizer -> LogisticRegression
I want to be able to print the words with the highest weights. Can this be
done?
So far I have been able to extract the LR coefficients, but can those be
tied up t
--
Sincerely,
Darshan
Hello Sparkans,
I want to merge following cluster / set of IDs into one if they have shared
IDs.
For example:
uuid_3_1,uuid_3_2,uuid_3_3,uuid_3_4
uuid_3_2,uuid_3_5,uuid_3_6
uuid_3_5,uuid_3_7,uuid_3_8,uuid_3_9
into single:
uuid_3_1,uuid_3_2,uuid_3_3,uuid_3_4,uuid_3_5,uuid_3_6,uuid_3_7,uuid_3_8,
Hi, I registered an accumulator in driver via
sparkContext.register(myCustomAccumulator, "accumulator-name"). But this
accumulator is not available in task.metrics.accumulators() list.
Accumulator is not visible in spark UI as well.
Does spark need different configuration to make accumulator visib