date:20160302

Re: Spark on Yarn with Dynamic Resource Allocation. Container always marked as failed

2016-03-02 Thread Xiaoye Sun

Hi Jeff and Prabhu, Thanks for your help. I look deep in the nodemanager log and I found that I have a error message like this: 2016-03-02 03:13:59,692 ERROR org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error opening leveldb file file:/data/yarn/cache/yarn/nm-local-dir/registere

Re: select count(*) return wrong row counts

2016-03-02 Thread Mich Talebzadeh

This works fine scala> sql("use oraclehadoop") res1: org.apache.spark.sql.DataFrame = [result: string] scala> sql("select count(1) from sales").show +---+ |_c0| +---+ |4991761| +---+ You can do "select count(*) from tablename") as it is not dynamic sql. Does it actually work? Sin

Sorting the RDD

2016-03-02 Thread Angel Angel

Hello Sir/Madam, I am try to sort the RDD using *sortByKey* function but i am getting the following error. My code is 1) convert the rdd array into key value pair. 2) after that sort by key but i am getting the error *No implicit Ordering defined for any * [image: Inline image 1] thanks

RE: Converting array to DF

2016-03-02 Thread Mao, Wei

“Seq” will be implicitly converted to “DataFrameHolder”, and “toDF” method is defined in “DataFrameHolder”. And there is no such method for Array. So user has to convert explicitly by himself. implicit def localSeqToDataFrameHolder[A <: Product : TypeTag](data: Seq[A]): DataFrameHolder = { Da

Spark Mllib kmeans execution

2016-03-02 Thread Priya Ch

Hi All, I am running k-means clustering algorithm. Now, when I am running the algorithm as - val conf = new SparkConf val sc = new SparkContext(conf) . . val kmeans = new KMeans() val model = kmeans.run(RDD[Vector]) . . . The 'kmeans' object gets created on driver. Now does *kmeans.run() *get e

Re: Spark Mllib kmeans execution

2016-03-02 Thread Sonal Goyal

It will run distributed On Mar 2, 2016 3:00 PM, "Priya Ch" wrote: > Hi All, > > I am running k-means clustering algorithm. Now, when I am running the > algorithm as - > > val conf = new SparkConf > val sc = new SparkContext(conf) > . > . > val kmeans = new KMeans() > val model = kmeans.run(RDD[

rdd cache name

2016-03-02 Thread charles li

hi, there, I feel a little confused about the *cache* in spark. first, is there any way to *customize the cached RDD name*, it's not convenient for me when looking at the storage page, there are the kind of RDD in the RDD Name column, I hope to make it as my customized name, kinds of 'rdd 1', 'rrd

Re: How to control the number of parquet files getting created under a partition ?

2016-03-02 Thread James Hammerton

Hi, Based on the behaviour I've seen using parquet, the number of partitions in the DataFrame will determine the number of files in each parquet partition. I.e. when you use "PARTITION BY" you're actually partitioning twice, once via the partitions spark has created internally and then again with

Re: Spark on Yarn with Dynamic Resource Allocation. Container always marked as failed

Re: select count(*) return wrong row counts

Sorting the RDD

RE: Converting array to DF

Spark Mllib kmeans execution

Re: Spark Mllib kmeans execution

rdd cache name

Re: How to control the number of parquet files getting created under a partition ?

< 1 2

101 - 108 of 108 matches

Site Navigation

Mail list logo

Footer information