Hi,
org.apache.spark.mllib.linalg.Vector =
(1048576,[35587,884670],[3.458767233,3.458767233])
it is sparse vector representation of terms
so the first term(1048576) is the length of vector
[35587,884670] is the index of words
[3.458767233,3.458767233] are the tf-idf values of the terms.
Thanks
S
Try scala eclipse plugin to eclipsify spark project and import spark as eclipse
project
-Somnath
-Original Message-
From: Nan Xiao [mailto:xiaonan830...@gmail.com]
Sent: Thursday, May 28, 2015 12:32 PM
To: user@spark.apache.org
Subject: How to use Eclipse on Windows to build Spark envir
Hi Akhil,
I am running my program standalone, I am getting null pointer exception when I
running spark program locally and when I am trying to save my RDD as a text
file.
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Tuesday, April 14, 2015 12:41 PM
To: Somnath Pandeya
Cc: user
JavaRDD lineswithoutStopWords = nonEmptylines
.map(new Function() {
/**
*
*/
private static final long serialVersionUID =
1L;
Hi All,
I want to find near duplicate items from given dataset
For e.g consider a data set
1. Cricket,bat,ball,stumps
2. Cricket,bowler,ball,stumps,
3. Football,goalie,midfielder,goal
4. Football,refree,midfielder,goal,
Here 1 and 2 are near duplicates (only field 2 is
Thanks Akhil , it was a simple fix which you told .. I missed it .. ☺
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Wednesday, February 25, 2015 12:48 PM
To: Somnath Pandeya
Cc: user@spark.apache.org
Subject: Re: used cores are less then total no. of core
You can set the following in
Hi All,
I am running a simple word count example of spark (standalone cluster) , In the
UI it is showing
For each worker no. of cores available are 32 ,but while running the jobs only
5 cores are being used,
What should I do to increase no. of used core or it is selected based on jobs.
Thanks
May be you can use wholeTextFiles method, which returns filename and content of
the file as PariRDD and ,then you can remove the first line from files.
-Original Message-
From: Hafiz Mujadid [mailto:hafizmujadi...@gmail.com]
Sent: Friday, January 09, 2015 11:48 AM
To: user@spark.apache.
You can follow the below the link also. It works on stand alone spark cluster.
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
thanks
Somnath
From: Michael Armbrust [mailto:mich...@databricks.com]
Sent: Thursday, January 08, 2015 2:21 AM
To: jamborta
Cc: user
Su
Hi,
I have setup the spark 1.2 standalone cluster and trying to run hive on spark
by following below link.
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
I got the latest build of hive on spark from git and was trying to running few
queries. Queries are runn
Hi ,
You can try reducebyKey also ,
Something like this
JavaPairRDD ones = lines
.mapToPair(new PairFunction() {
@Override
public Tuple2 call(String s) {
String[]
I think you should use minimum of 2gb of memory for building it from maven .
-Somnath
-Original Message-
From: Vladimir Protsenko [mailto:protsenk...@gmail.com]
Sent: Tuesday, December 23, 2014 8:28 PM
To: user@spark.apache.org
Subject: Spark Installation Maven PermGen OutOfMemoryExceptio
12 matches
Mail list logo