I am new in pyspark and i am learning it in order to complete my Thesis project
in university.
I am trying to create a dataframe by reading from a postgresql database table,
but i am facing a problem when i try to connect my pyspark application with
postgresql db server. Could you please
Hi Saulo,
I meant using this to save:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/8_streaming.md#writing-to-cassandra-from-a-stream
But it might be slow on a different area.
Another point is that Cassandra and spark running on the same machine might
compete for
Hi Javier,
I removed the map and used "map" directly instead of using transform, but the
kafkaStream is created with KafkaUtils which does not have a method to save to
cassandra directly.
Do you know any workarround for this?
Thank you for the suggestion.
Best Regards,
On 29/04/2018
Hi Saulo,
I'm no expert but I will give it a try.
I would remove the rdd2.count(), I can't see the point and you will gain
performance right away. Because of this, I would not use a transform, just
directly the map.
I have not used python but in Scala the cassandra-spark connector can save
The transactions probably describe from which counterparty assets are
transferred to another counterparty at the different stages of the transaction.
You could use graphx for that if the algorithms there are suitable for your
needs.
Still trying to understand what you mean evolve over time? Eg
What is the use case you are trying to solve?
You want to load graph data from a streaming window in separate graphs -
possible but requires probably a lot of memory.
You want to update an existing graph with new streaming data and then fully
rerun an algorithms -> look at Janusgraph
You want
Maybe not necessarily what you want but you could, based on trans
attributes, find out initial state and end state and give it to a decision
tree to figure out if you if based on these attributes you can oreditc
tinal stage
Again, not what you asked but an idea to use ml for your data?
Kr
On Sun,
Hi All,
just like maxOffsetsTrigger is there a minOffsetsTrigger in spark
structured streaming 2.3.0?
Thanks!
Hi Nick,
Thanks for that idea!! Just to be more clear. The problem I am trying to
solve is that when a bunch of financial transactional data is thrown at me
I am trying to identify all possible relationships and lineage among them
without explicitly specifying what the relationships are among
Do GraphFrames support streaming?
One potential approach could be to construct a transition matrix showing
the probability of moving from each state to another state. This can be
visualized with a “heat map” encoding (I think matshow in numpy/matplotlib
does this).
On Sat, 28 Apr 2018 at 21:34, kant kodali
11 matches
Mail list logo