Hi all,
Recently I've ran into a scenario to conduct two sample tests between all
paired combination of columns of an RDD. But the networking load and
generation of pair-wise computation is too time consuming. That has puzzled
me for a long time. I want to conduct Wilcoxon rank-sum test
(http://en
Hi all,
Recently in our project, we need to update a RDD using data regularly
received from DStream, I plan to use "foreachRDD" API to achieve this:
var MyRDD = ...
dstream.foreachRDD { rdd =>
MyRDD = MyRDD.join(rdd)...
...
}
Is this usage correct? My concern is, as I am repeatedly
Hi all,
I am using Spark 1.3.1 to write a Spectral Clustering algorithm. This really
confused me today. At first I thought my implementation is wrong. It turns
out it's an issue in MLlib. Fortunately, I've figured it out.
I suggest to add a hint on user document of MLlib ( as far as I know, ther
On-line Collaborative Filtering(CF) has been widely used and studied. To
re-train a CF model from scratch every time when new data comes in is very
inefficient
(http://stackoverflow.com/questions/27734329/apache-spark-incremental-training-of-als-model).
However, in Spark community we see few discus
Hi everyone!
I am digging into MLlib of Spark 1.2.1 currently. When reading codes of
MLlib.stat.test, in the file ChiSqTest.scala under
/spark/mllib/src/main/scala/org/apache/spark/mllib/stat/test, I am confused
by the usage of mapPartitions API in the function
def chiSquaredFeatures(data: RDD[La
Nice!
-
Feel the sparking Spark!
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-development-with-IntelliJ-tp10032p10167.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
---
Followed is the discussion between Imran and me.
2015-01-18 4:12 GMT+08:00 Chunnan Yao :
> Thank you for your patience! Im now not so familiar with the mailing list.
> I just clicked "reply" in Gmail, thinking it would be automatically
> attached to the list. I will la
*I followed the procedures instructed by
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-IntelliJ.
But problems still occurs which has made me a little bit annoyed.
My environment settings are:JAVA 1.7.0 Scala: 2.10.4 Spark:1.2.0, Intellij
Idea 14.0.2