Dear Sujit,
Since you are senior with Spark, I might not know whether it is convenient for
you to help comment some on my dilemma
while using spark to deal with R background application ...
Thank you very much!Zhiliang
On Tuesday, September 22, 2015 1:45 AM, Zhiliang Zhu <[email protected]>
wrote:
Hi Romi,
I must show my sincere appreciation towards your kind & helpful help.
One more question, currently I am using spark to deal with financial data
analysis, so lots of operations on R data.frame/matrix and stat/regressionare
always called.However, SparkR currently is not that strong, most of its
functions are from spark SQL and Mlib. Then, SQL and DataFrame is not as
flexibly & easyas R operate on data.frame/matrix, moreover, now I do not decide
how much function in Mlib would be used to R specific stat/regression .
I have also thought of only operating the data by way of spark Java, it is
quite much hard to act as data.frame/matrix from R .I think I have lost in risk
by those.
Would you help comment some on my points...
Thank you very much!Zhiliang
On Tuesday, September 22, 2015 1:21 AM, Sujit Pal <[email protected]>
wrote:
Hi Zhiliang,
Haven't used the Java API but found this Javadoc page, may be helpful to you.
https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/mllib/rdd/RDDFunctions.html
I think the equivalent Java code snippet might go something like this:
RDDFunctions.fromRDD(rdd1, ClassTag$.apply(Class)).sliding(2)
(the second parameter of fromRDD comes from this discussion
thread).http://apache-spark-user-list.1001560.n3.nabble.com/how-to-construct-a-ClassTag-object-as-a-method-parameter-in-Java-td6768.html
There is also the SlidingRDD
decorator:https://spark.apache.org/docs/1.3.1/api/java/org/apache/spark/mllib/rdd/SlidingRDD.html
So maybe something like this:
new SlidingRDD(rdd1, 2, ClassTag$.apply(Class))
-sujit
On Mon, Sep 21, 2015 at 9:16 AM, Zhiliang Zhu <[email protected]> wrote:
Hi Sujit,
I must appreciate your kind help very much~
It seems to be OK, however, do you know the corresponding spark Java API
achievement...Is there any java API as scala sliding, and it seemed that I do
not find spark scala's doc about sliding ...
Thank you very much~Zhiliang
On Monday, September 21, 2015 11:48 PM, Sujit Pal <[email protected]>
wrote:
Hi Zhiliang,
Would something like this work?
val rdd2 = rdd1.sliding(2).map(v => v(1) - v(0))
-sujit
On Mon, Sep 21, 2015 at 7:58 AM, Zhiliang Zhu <[email protected]>
wrote:
Hi Romi,
Thanks very much for your kind help comment~~
In fact there is some valid backgroud of the application, it is about R data
analysis.... #fund_nav_daily is a M X N (or M X 1) matrix or data.frame, col is
each daily fund return, row is the daily date#fund_return_daily needs to count
the each fund's daily return subtracted the previous day's return
fund_return_daily <- diff(log(fund_nav_daily))
#the first row would be all 0, since there is no previous row ahead first row
fund_return_daily <- rbind(matrix(0,ncol = ncol(fund_return_daily)),
fund_return_daily) ...
I need to exactly code the R program by way of spark, then RDD/DataFrame is
used to replace R data.frame, however, I just found that it is VERY MUCH
diffcult to make the spark program to flexibly descript & transform R backgroud
applications.I think I have seriously lost myself into risk about this...
Would you help direct me some about the above coding issue... and my risk about
practice in spark/R application...
I must show all my sincere thanks torwards your kind help.
P.S. currently sparkR in spark 1.4.1 , there is many bug in the API
createDataFrame/except/unionAll, and it seemsthat spark Java has more functions
than sparkR.Also, no specific R regression algorithmn is including in sparkR .
Best Regards,Zhiliang
On Monday, September 21, 2015 7:36 PM, Romi Kuntsman <[email protected]>
wrote:
RDD is a set of data rows (in your case numbers), there is no meaning for the
order of the items.
What exactly are you trying to accomplish?
Romi Kuntsman, Big Data Engineer
http://www.totango.com
On Mon, Sep 21, 2015 at 2:29 PM, Zhiliang Zhu <[email protected]>
wrote:
Dear ,
I have took lots of days to think into this issue, however, without any
success...I shall appreciate your all kind help.
There is an RDD<int> rdd1, I would like get a new RDD<int> rdd2, each row in
rdd2[ i ] = rdd1[ i ] - rdd[i - 1] .What kinds of API or function would I use...
Thanks very much!John