Re: Spark Streaming

2018-11-26 Thread Siva Samraj
My joindf is taking 14 sec in the first run and i have commented out the withcolumn still it is taking more time. On Tue, Nov 27, 2018 at 12:08 PM Jungtaek Lim wrote: > You may need to put efforts on triage how much time is spent on each part. > Without such information you are only able to ge

Re: Spark Streaming

2018-11-26 Thread Jungtaek Lim
You may need to put efforts on triage how much time is spent on each part. Without such information you are only able to get general tips and tricks. Please check SQL tab and see DAG graph as well as details (logical plan, physical plan) to see whether you're happy about these plans. General tip o

Spark Streaming

2018-11-26 Thread Siva Samraj
Hello All, I am using Spark 2.3 version and i am trying to write Spark Streaming Join. It is a basic join and it is taking more time to join the stream data. I am not sure any configuration we need to set on Spark. Code: * import org.apache.spark.sql.SparkSession import or

Spark column combinations and combining multiple dataframes (pyspark)

2018-11-26 Thread Christopher Petrino
Hi all, I'm working on a problem where it is necessary to find all combinations of columns for a dataframe. THE PROBLEM: Let's say there is a dataframe with columns: [ col_a, col_b, col_c, col_d, col_e, result ] The number of combinations can vary between 1 and 5 but lets say 3 for this case. T

Re: [Spark SQL]: Does Spark SQL 2.3+ suppor UDT?

2018-11-26 Thread Suny Tyagi
Thanks and Regards, Suny Tyagi Phone No : 9885027192 On Mon, Nov 26, 2018 at 10:31 PM Suny Tyagi wrote: > Hi Team, > > > I was going through this ticket > https://issues.apache.org/jira/browse/SPARK-7768?jql=text%20~%20%22public%20udt%22 > and > could not understand that if spark support UDT i

Encoding not working when using a map / mapPartitions call

2018-11-26 Thread ccaspanello
Attached you will find a project with unit tests showing the issue at hand. If I read in a ISO-8859-1 encoded file and simply write out what was read; the contents in the part file matches what was read. Which is great. However, the second I use a map / mapPartitions function it looks like the e

Re: Zookeeper and Spark deployment for standby master

2018-11-26 Thread Jörn Franke
I guess it is the usual things - if the non zookeeper processes take too much memory , disk space etc it will negatively affect zookeeper and thus your whole running cluster. You will have to make for your specific architectural setting a risk assessment if this is acceptable. > Am 26.11.2018