I tried to run the tests in 'GeneralizedLinearRegressionSuite', and all tests
passed except for test("read/write") which yielded the following error message.
Any suggestion on why this happened and how to fix it? Thanks. BTW, I ran the
test in IntelliJ.
The default jsonEncode only supports
Thanks for the advice. I figured out a way to solve this problem by avoiding
the matrix representation.
Wayne
From: Sean Owen <so...@cloudera.com>
Sent: Thursday, December 29, 2016 1:52 PM
To: Yanwei Wayne Zhang; user
Subject: Re: Invert large matrix
I
Hi all,
I have a matrix X stored as RDD[SparseVector] that is high dimensional, say 800
million rows and 2 million columns, and more 95% of the entries are zero.
Is there a way to invert (X'X + eye) efficiently, where X' is the transpose of
X and eye is the identity matrix? I am thinking of
I would like to use some matrix operations in the BLAS object defined in
ml.linalg. But for some reason, spark shell complains it cannot locate this
object. I have constructed an example below to illustrate the issue. Please
advise how to fix this. Thanks .
import
Is it possible to retrieve a specific partition (e.g., the first partition) of
a DataFrame and apply some function there? My data is too large, and I just
want to get some approximate measures using the first few partitions in the
data. I'll illustrate what I want to accomplish using the
anybody shed some light for me?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-log-field-clarification-tp22892p22904.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I am trying to extract the *output data size* information for *each task*.
What *field(s)* should I look for, given the json-format log?
Also, what does Result Size stand for?
Thanks a lot in advance!
-Yanwei
--
View this message in context:
http://apache-spark-user-list.1001560.n3