Github user ghoto commented on a diff in the pull request:
https://github.com/apache/spark/pull/21086#discussion_r188701408
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
---
@@ -351,12 +338,26 @@ class
Github user ghoto commented on a diff in the pull request:
https://github.com/apache/spark/pull/21086#discussion_r188473831
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
---
@@ -351,12 +338,26 @@ class
Github user ghoto commented on the issue:
https://github.com/apache/spark/pull/21086
I'm hitting this issue after upgrading from 2.0.2 to 2.3.0. Please backport
this PR to Spark 2.3.0
---
-
To unsubscribe, e
Github user ghoto commented on a diff in the pull request:
https://github.com/apache/spark/pull/17940#discussion_r117619161
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -992,7 +992,16 @@ object Matrices {
new DenseMatrix(dm.rows
Github user ghoto commented on a diff in the pull request:
https://github.com/apache/spark/pull/17940#discussion_r117535893
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -992,7 +992,24 @@ object Matrices {
new DenseMatrix(dm.rows
Github user ghoto commented on a diff in the pull request:
https://github.com/apache/spark/pull/17940#discussion_r116155652
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
---
@@ -992,7 +992,20 @@ object Matrices {
new DenseMatrix(dm.rows
Github user ghoto commented on the issue:
https://github.com/apache/spark/pull/17907
I think I need sometime to run benchmarks. Originally the driver was set to
3GB, but since I was having this OutOfMemory in the driver I decided to give a
try and increase the size.
For
Github user ghoto commented on the issue:
https://github.com/apache/spark/pull/17940
Need to fix line in the test because it's too long.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user ghoto commented on the issue:
https://github.com/apache/spark/pull/17940
Sorry about that. I added more context in the description and updated the
title.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user ghoto commented on the issue:
https://github.com/apache/spark/pull/17907
With classic Spark PCA, approx. 55Kx15K matrix and 10GB in driver I go out
of memory. I chopped the matrix to be 55Kx3K and I can get the PCA. With the
SVD distributed approach I could compute PCA
GitHub user ghoto opened a pull request:
https://github.com/apache/spark/pull/17940
Bug fix/spark 20687
## What changes were proposed in this pull request?
Bugfix for https://issues.apache.org/jira/browse/SPARK-20687
Before converting a CSCMatrix to a Matrix, the
Github user ghoto commented on the issue:
https://github.com/apache/spark/pull/17907
My understanding is that the RowMatrix computes the SVD locally when the
data is suitable to improve performance, and distributed otherwise. Then, the
suggested implementation NOT always relies on a
Github user ghoto commented on a diff in the pull request:
https://github.com/apache/spark/pull/17907#discussion_r115502692
--- Diff:
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala
---
@@ -384,19 +384,23 @@ class RowMatrix @Since("
GitHub user ghoto opened a pull request:
https://github.com/apache/spark/pull/17907
SPARK-7856 Principal components and variance using computeSVD()
## What changes were proposed in this pull request?
The current implementation of
14 matches
Mail list logo