Hi Jeff and Prabhu,
Thanks for your help.
I look deep in the nodemanager log and I found that I have a error message
like this:
2016-03-02 03:13:59,692 ERROR
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error
opening leveldb file
file:/data/yarn/cache/yarn/nm-local-dir/registere
This works fine
scala> sql("use oraclehadoop")
res1: org.apache.spark.sql.DataFrame = [result: string]
scala> sql("select count(1) from sales").show
+---+
|_c0|
+---+
|4991761|
+---+
You can do "select count(*) from tablename") as it is not dynamic sql. Does
it actually work?
Sin
Hello Sir/Madam,
I am try to sort the RDD using *sortByKey* function but i am getting the
following error.
My code is
1) convert the rdd array into key value pair.
2) after that sort by key
but i am getting the error *No implicit Ordering defined for any *
[image: Inline image 1]
thanks
“Seq” will be implicitly converted to “DataFrameHolder”, and “toDF” method is
defined in “DataFrameHolder”. And there is no such method for Array. So user
has to convert explicitly by himself.
implicit def localSeqToDataFrameHolder[A <: Product : TypeTag](data: Seq[A]):
DataFrameHolder =
{
Da
Hi All,
I am running k-means clustering algorithm. Now, when I am running the
algorithm as -
val conf = new SparkConf
val sc = new SparkContext(conf)
.
.
val kmeans = new KMeans()
val model = kmeans.run(RDD[Vector])
.
.
.
The 'kmeans' object gets created on driver. Now does *kmeans.run() *get
e
It will run distributed
On Mar 2, 2016 3:00 PM, "Priya Ch" wrote:
> Hi All,
>
> I am running k-means clustering algorithm. Now, when I am running the
> algorithm as -
>
> val conf = new SparkConf
> val sc = new SparkContext(conf)
> .
> .
> val kmeans = new KMeans()
> val model = kmeans.run(RDD[
hi, there, I feel a little confused about the *cache* in spark.
first, is there any way to *customize the cached RDD name*, it's not
convenient for me when looking at the storage page, there are the kind of
RDD in the RDD Name column, I hope to make it as my customized name, kinds
of 'rdd 1', 'rrd
Hi,
Based on the behaviour I've seen using parquet, the number of partitions in
the DataFrame will determine the number of files in each parquet partition.
I.e. when you use "PARTITION BY" you're actually partitioning twice, once
via the partitions spark has created internally and then again with
101 - 108 of 108 matches
Mail list logo