This is more of a Scala concept doubt than Spark. I have this Spark
initialization code :
object EntryPoint {
val spark = SparkFactory.createSparkSession(...
val funcsSingleton = ContextSingleton[CustomFunctions] { new
CustomFunctions(Some(hashConf)) }
lazy val funcs = fu
We are trying to run a job that has previously run on Spark 1.3 on a
different cluster. The job was converted to 2.3 spark and this is a
new cluster.
The job dies after completing about a half dozen stages with
java.io.IOException: No space left on device
It appears that the nodes are us
I assume you are using RDDs? What are you doing after the repartitioning +
sorting, if anything?
On Aug 20, 2018 11:22, "周浥尘" wrote:
In addition to my previous email,
Environment: spark 2.1.2, hadoop 2.6.0-cdh5.11, Java 1.8, CentOS 6.6
周浥尘 于2018年8月20日周一 下午8:52写道:
> Hi team,
>
> I found the S
In addition to my previous email,
Environment: spark 2.1.2, hadoop 2.6.0-cdh5.11, Java 1.8, CentOS 6.6
周浥尘 于2018年8月20日周一 下午8:52写道:
> Hi team,
>
> I found the Spark method *repartitionAndSortWithinPartitions *spends
> twice as much time as using Mapreduce in some cases.
> I want to repartition th
You can specify the hive-site.xml in spark-submit command using --files
option to make sure that the Spark job is referring to the hive metastore
you are interested in
spark-submit --files /path/to/hive-site.xml
On Sat, Aug 18, 2018 at 1:59 AM Patrick Alwell
wrote:
> You probably need to take
Hi team,
I found the Spark method *repartitionAndSortWithinPartitions *spends twice
as much time as using Mapreduce in some cases.
I want to repartition the dataset accorading to split keys and save them to
files in ascending. As the doc says, repartitionAndSortWithinPartitions “is
more efficient