Accessing Scala RDD from pyspark

2018-03-15 Thread Shahab Yunus
Hi there. I am calling custom Scala code from pyspark (interpreter). The customer Scala code is simple: it just reads a textFile using sparkContext.textFile and returns RDD[String]. In pyspark, I am using sc._jvm to make the call to the Scala code: *s_rdd =

Re: [PySpark SQL] sql function to_date and to_timestamp return the same data type

2018-03-15 Thread Nicholas Sharkey
unsubscribe On Thu, Mar 15, 2018 at 8:00 PM, Alan Featherston Lago wrote: > I'm a pretty new user of spark and I've run into this issue with the > pyspark docs: > > The functions pyspark.sql.functions.to_date && > pyspark.sql.functions.to_timestamp > behave in the same way.

[PySpark SQL] sql function to_date and to_timestamp return the same data type

2018-03-15 Thread Alan Featherston Lago
I'm a pretty new user of spark and I've run into this issue with the pyspark docs: The functions pyspark.sql.functions.to_date && pyspark.sql.functions.to_timestamp behave in the same way. As in both functions convert a Column of pyspark.sql.types.StringType or pyspark.sql.types.TimestampType

How can I launch a a thread in background on all worker nodes before the data processing actually starts?

2018-03-15 Thread ravidspark
*Environment:* Spark 2.2.0 *Kafka:* 0.10.0 *Language:* Java *UseCase:* Streaming data from Kafka using JavaDStreams and storing into a downstream database. *Issue:* I have a use case, where in I have to launch a thread in the background that would connect to a DB and Cache the retrieved

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-15 Thread Aakash Basu
Awesome, thanks for detailing! Was thinking the same, we've to split by comma for csv while casting inside. Cool! Shall try it and revert back tomm. Thanks a ton! On 15-Mar-2018 11:50 PM, "Bowden, Chris" wrote: > To remain generic, the KafkaSource can only offer

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-15 Thread Aakash Basu
Hey Chris, You got it right. I'm reading a *csv *file from local as mentioned above, with a console producer on Kafka side. So, as it is a csv data with headers, shall I then use from_csv on the spark side and provide a StructType to shape it up with a schema and then cast it to string as TD

Sparklyr and idle executors

2018-03-15 Thread Florian Dewes
Hi all, I am currently trying to enable dynamic resource allocation for a little yarn managed spark cluster. We are using sparklyr to access spark from R and have multiple jobs which should run in parallel, because some of them take several days to complete or are in development. Everything

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-15 Thread Tathagata Das
Chris identified the problem correctly. You need to parse out the json text from Kafka into separate columns before you can join them up. I walk through an example of this in my slides -

Re: Spark Conf

2018-03-15 Thread Neil Jonkers
Hi "In general, configuration values explicitly set on a SparkConf take the highest precedence, then flags passed to spark-submit, then values in the defaults file." https://spark.apache.org/docs/latest/submitting-applications.html Perhaps this will help Vinyas: Look at args.sparkProperties in

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-15 Thread Aakash Basu
Hi, And if I run this below piece of code - from pyspark.sql import SparkSession import time class test: spark = SparkSession.builder \ .appName("DirectKafka_Spark_Stream_Stream_Join") \ .getOrCreate() # ssc = StreamingContext(spark, 20) table1_stream =

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-15 Thread Aakash Basu
Any help on the above? On Thu, Mar 15, 2018 at 3:53 PM, Aakash Basu wrote: > Hi, > > I progressed a bit in the above mentioned topic - > > 1) I am feeding a CSV file into the Kafka topic. > 2) Feeding the Kafka topic as readStream as TD's article suggests. > 3) Then,

Re: What's the best way to have Spark a service?

2018-03-15 Thread Jean Georges Perrin
Hi David, I ended building up my own. Livy sounded great on paper, but heavy to manipulate. I found out about Jobserver too late. We did not find too complicated to build ours, with a small Spring boot app that was holding the session (we did not need more than one session). jg > On Mar 15,

Re: What's the best way to have Spark a service?

2018-03-15 Thread Liana Napalkova
Hi David, Which type of incompatibility problems do you have with Apache Livy? BR, Liana From: David Espinosa Sent: 15 March 2018 12:06:20 To: user@spark.apache.org Subject: What's the best way to have Spark a service? Hi all, I'm quite

What's the best way to have Spark a service?

2018-03-15 Thread David Espinosa
Hi all, I'm quite new to Spark, and I would like to ask whats the best way to have Spark as a service, and for that I mean being able to include the response of a scala app/job running in a Spark into a RESTful common request. Up now I have read about Apache Livy (which I tried and found

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-15 Thread Aakash Basu
Hi, I progressed a bit in the above mentioned topic - 1) I am feeding a CSV file into the Kafka topic. 2) Feeding the Kafka topic as readStream as TD's article suggests. 3) Then, simply trying to do a show on the streaming dataframe, using queryName('XYZ') in the writeStream and writing a sql