date:20180820

Spark with Scala : understanding closures or best way to take udf registrations' code out of main and put in utils

2018-08-20 Thread aastha

This is more of a Scala concept doubt than Spark. I have this Spark initialization code : object EntryPoint { val spark = SparkFactory.createSparkSession(... val funcsSingleton = ContextSingleton[CustomFunctions] { new CustomFunctions(Some(hashConf)) } lazy val funcs = fu

Unsubscribe

2018-08-20 Thread Happy??????

No space left on device

2018-08-20 Thread Steve Lewis

We are trying to run a job that has previously run on Spark 1.3 on a different cluster. The job was converted to 2.3 spark and this is a new cluster. The job dies after completing about a half dozen stages with java.io.IOException: No space left on device It appears that the nodes are us

Re: Why repartitionAndSortWithinPartitions slower than MapReducer

2018-08-20 Thread Koert Kuipers

I assume you are using RDDs? What are you doing after the repartitioning + sorting, if anything? On Aug 20, 2018 11:22, "周浥尘" wrote: In addition to my previous email, Environment: spark 2.1.2, hadoop 2.6.0-cdh5.11, Java 1.8, CentOS 6.6 周浥尘于2018年8月20日周一下午8:52写道： > Hi team, > > I found the S

Re: Why repartitionAndSortWithinPartitions slower than MapReducer

2018-08-20 Thread 周浥尘

In addition to my previous email, Environment: spark 2.1.2, hadoop 2.6.0-cdh5.11, Java 1.8, CentOS 6.6 周浥尘于2018年8月20日周一下午8:52写道： > Hi team, > > I found the Spark method *repartitionAndSortWithinPartitions *spends > twice as much time as using Mapreduce in some cases. > I want to repartition th

Re: Two different Hive instances running

2018-08-20 Thread Vaibhav Kulkarni

You can specify the hive-site.xml in spark-submit command using --files option to make sure that the Spark job is referring to the hive metastore you are interested in spark-submit --files /path/to/hive-site.xml On Sat, Aug 18, 2018 at 1:59 AM Patrick Alwell wrote: > You probably need to take

Why repartitionAndSortWithinPartitions slower than MapReducer

2018-08-20 Thread 周浥尘

Hi team, I found the Spark method *repartitionAndSortWithinPartitions *spends twice as much time as using Mapreduce in some cases. I want to repartition the dataset accorading to split keys and save them to files in ascending. As the doc says, repartitionAndSortWithinPartitions “is more efficient

Spark with Scala : understanding closures or best way to take udf registrations' code out of main and put in utils

Unsubscribe

No space left on device

Re: Why repartitionAndSortWithinPartitions slower than MapReducer

Re: Why repartitionAndSortWithinPartitions slower than MapReducer

Re: Two different Hive instances running

Why repartitionAndSortWithinPartitions slower than MapReducer

7 matches

Site Navigation

Mail list logo

Footer information