Best Regards,
Vamshi T
Nirav,
Spark does not create a duplicate column when you use the below join
expression, as an array of column(s) like below but that requires the column
name to be same in both the data frames.
Example: df1.join(df2, [‘a’])
Thanks.
Vamshi Talla
On Jul 6, 2018, at 4:47 PM, Gokula Krishnan D
Hi Ravi,
RDDs are always immutable, so you cannot change them, instead you create new
ones by transforming one. Repartition is a transformation, so it lazily
evaluated, hence computed only when you call an action on it.
Thanks.
Vamshi Talla
On Jul 8, 2018, at 12:26 PM, mailto:ryanda
Raymond,
Is your SPARK_HOME set? In your .bash_profile, try setting the below:
export SPARK_HOME=/home/Downloads/spark (or wherever your spark is downloaded
to)
once done, source your .bash_profile or restart the shell and try spark-shell
Best Regards,
Vamshi T
___
Hi Raymond,
I see that you can make a small correction in your spark-submit command. Your
spark-submit command should say:
spark-submit --master local --class . < Jar
Location and JarName>
Example:
spark-submit --master local \
--class retail_db.GetRevenuePerOrder
C:\RXIE\Learning\Scal
Akash,
Are you able to run your code in pyspark shell with no issues?
Best Regards,
Vamshi T
From: Hyukjin Kwon
Sent: Friday, June 15, 2018 10:18 AM
To: Marcelo Vanzin
Cc: aakash.spark@gmail.com; user @spark
Subject: Re: Issue upgrading to Spark 2.3.1 (M
Aakash,
Like Jorn suggested, did you increase your test data set? If so, did you also
update your executor-memory setting? It seems like you might exceeding the
executor memory threshold.
Thanks
Vamshi Talla
Sent from my iPhone
On Jun 11, 2018, at 8:54 AM, Aakash Basu
mailto:aakash.spark