Connect to postgresql with pyspark

2018-04-29 Thread dimitris plakas
I am new in pyspark and i am learning it in order to complete my Thesis project in university.  I am trying to create a dataframe by reading from a postgresql database table, but i am facing a problem when i try to connect my pyspark application with postgresql db server. Could you please

Re: [Spark2.1] SparkStreaming to Cassandra performance problem

2018-04-29 Thread Javier Pareja
Hi Saulo, I meant using this to save: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/8_streaming.md#writing-to-cassandra-from-a-stream But it might be slow on a different area. Another point is that Cassandra and spark running on the same machine might compete for

Re: [Spark2.1] SparkStreaming to Cassandra performance problem

2018-04-29 Thread Saulo Sobreiro
Hi Javier, I removed the map and used "map" directly instead of using transform, but the kafkaStream is created with KafkaUtils which does not have a method to save to cassandra directly. Do you know any workarround for this? Thank you for the suggestion. Best Regards, On 29/04/2018

Re: [Spark2.1] SparkStreaming to Cassandra performance problem

2018-04-29 Thread Javier Pareja
Hi Saulo, I'm no expert but I will give it a try. I would remove the rdd2.count(), I can't see the point and you will gain performance right away. Because of this, I would not use a transform, just directly the map. I have not used python but in Scala the cassandra-spark connector can save

Re: A naive ML question

2018-04-29 Thread Jörn Franke
The transactions probably describe from which counterparty assets are transferred to another counterparty at the different stages of the transaction. You could use graphx for that if the algorithms there are suitable for your needs. Still trying to understand what you mean evolve over time? Eg

Re: Do GraphFrames support streaming?

2018-04-29 Thread Jörn Franke
What is the use case you are trying to solve? You want to load graph data from a streaming window in separate graphs - possible but requires probably a lot of memory. You want to update an existing graph with new streaming data and then fully rerun an algorithms -> look at Janusgraph You want

Re: A naive ML question

2018-04-29 Thread Marco Mistroni
Maybe not necessarily what you want but you could, based on trans attributes, find out initial state and end state and give it to a decision tree to figure out if you if based on these attributes you can oreditc tinal stage Again, not what you asked but an idea to use ml for your data? Kr On Sun,

is there a minOffsetsTrigger in spark structured streaming 2.3.0?

2018-04-29 Thread kant kodali
Hi All, just like maxOffsetsTrigger is there a minOffsetsTrigger in spark structured streaming 2.3.0? Thanks!

Re: A naive ML question

2018-04-29 Thread kant kodali
Hi Nick, Thanks for that idea!! Just to be more clear. The problem I am trying to solve is that when a bunch of financial transactional data is thrown at me I am trying to identify all possible relationships and lineage among them without explicitly specifying what the relationships are among

Do GraphFrames support streaming?

2018-04-29 Thread kant kodali
Do GraphFrames support streaming?

Re: A naive ML question

2018-04-29 Thread Nick Pentreath
One potential approach could be to construct a transition matrix showing the probability of moving from each state to another state. This can be visualized with a “heat map” encoding (I think matshow in numpy/matplotlib does this). On Sat, 28 Apr 2018 at 21:34, kant kodali