Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/23042#discussion_r234431689
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
---
@@ -138,6 +138,11 @@ object TypeCoercion
Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/23042#discussion_r233972629
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
---
@@ -138,6 +138,11 @@ object TypeCoercion
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/23042
@cloud-fan - could you please take a look?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user uzadude opened a pull request:
https://github.com/apache/spark/pull/23042
[SPARK-26070] add rule for implicit type coercion for decimal(x,0)
## What changes were proposed in this pull request?
Adding another excpetion rule for a popular case where ID columns
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/16732
After running some more experiments I was able to reduce the runtime by
another 1.5x factor. So currently the
"toCoordinateMatrix().toIndexedRowMatrix()" is better by a bit only in th
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/22964
this is the original query. we can see the explode followed by the shuffle:
```
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
val N = 1
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/22964
The whole idea is that we'll get one shuffle and it will be before the
explode as the window's partition is contained in the repartition.
I'll show the physical plan
GitHub user uzadude opened a pull request:
https://github.com/apache/spark/pull/22964
[SPARK-25963] Optimize generate followed by window
## What changes were proposed in this pull request?
I've added an optimizer rule to add Repartition operator when we have a
Generate operator
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/15599
> @uzadude sir I am getting same error in spark-sql 2.3.1 version
I don't think you meant to approach
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/19683
this timeout in
"org.apache.spark.ml.regression.LinearRegressionSuite.linear regression with
intercept without regularization" doesn't seem related
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/19683
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/19683#discussion_r159033662
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala ---
@@ -85,11 +84,19 @@ case class GenerateExec(
val
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/19683
seems reasonable, let's do that.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/19683
not sure why this build safe. it passes when I run it locally..
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/19683
retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/19683
looks like when the Optimizer is running, at some temporary point it makes
a copy of the Generate operator and fails on this cast. if we will move the
resolve check inside:
override lazy val
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/19683
have you seen the tests failures? it looks like they fail on the assertion..
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/19683#discussion_r158906785
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
---
@@ -73,8 +73,10 @@ case class Project
Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/19683#discussion_r158906660
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala ---
@@ -359,12 +359,12 @@ package object dsl {
def
Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/19683#discussion_r158854831
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
---
@@ -83,15 +85,17 @@ case class Project
Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/19683#discussion_r158854751
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -444,12 +444,22 @@ object ColumnPruning extends Rule
Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/19683#discussion_r158814175
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
---
@@ -73,8 +73,10 @@ case class Project
Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/19683#discussion_r158737578
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -846,12 +846,13 @@ class Analyzer
Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/19683#discussion_r158737560
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -695,7 +695,7 @@ class Analyzer
Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/19683#discussion_r158552882
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -451,6 +451,11 @@ object ColumnPruning extends Rule
Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/19683#discussion_r158488942
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -451,6 +451,11 @@ object ColumnPruning extends Rule
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/19683
@henryr I understand what you're saying. I'm not sure why there is the
UnsafeProject in the end of the function, but it's commented in this PR that
fixes [SPARK-13476] without much elaboration
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/19683
could you please retest this?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/19683#discussion_r153998263
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -450,6 +450,11 @@ object ColumnPruning extends Rule
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/19683
Hi, did somebody had a chance to look at this PR?
I think it's a pretty useful optimization.
---
-
To unsubscribe, e-mail
Github user uzadude commented on a diff in the pull request:
https://github.com/apache/spark/pull/19683#discussion_r151868042
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala ---
@@ -59,15 +61,23 @@ case class GenerateExec(
generator
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/19683
Do you understand this failure?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/19683
I've fixed more styling issue.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/19683
I've fixed the styling issue.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user uzadude opened a pull request:
https://github.com/apache/spark/pull/19683
[SPARK-21657][SQL] optimize explode quadratic memory consumpation
## What changes were proposed in this pull request?
The issue has been raised in two Jira tickets:
[SPARK-21657](https
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/16732
not sure I understand, matSparse has 1/10th nnz as needed for line 282:
> if (numberNonZeroPerRow <= 0.1) { // Sparse at 1/10th nnz
---
If your project is set up for it, you can
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/16732
Hi,
I removed the .asBreeze call for the sparse case.
about the performance issue for this sample code:
```scala
val n = 2
val rndEntryList
GitHub user uzadude opened a pull request:
https://github.com/apache/spark/pull/16732
[SPARK-19368][MLlib] BlockMatrix.toIndexedRowMatrix() optimization for
sparse matrices
## What changes were proposed in this pull request?
Optimization [SPARK-12869] was made for dense
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/14565
I also opened a Jira feature request [[SPARK-16976]]
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user uzadude opened a pull request:
https://github.com/apache/spark/pull/14565
KCore algorithm
## What changes were proposed in this pull request?
Added [KCore]
(https://en.wikipedia.org/wiki/Degeneracy_(graph_theory)#k-Cores) algorithm
implementation
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/14068
Sure, what size are you talking about? we had to some internal code fixes
to do that. right now the support for sparse matrices is pretty poor - mainly
because Breeze doesn't support it.
I'm
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/14068
Have done the requested nit changes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/14068
I have opened SPARK-16469.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/14068
Sure.
The current method for multiplying distributed block matrices starts by
deciding which block should be shuffled to which partition to do the actual
multiplications. This stage
Github user uzadude commented on the issue:
https://github.com/apache/spark/pull/14068
Hi srowen,
I have read the "how to contribute" wiki. I thought that it is too small of
enhancement to open a jira for it and it passes the tests.
---
If your project is set up for i
GitHub user uzadude opened a pull request:
https://github.com/apache/spark/pull/14068
enhanced simulate multiply
## What changes were proposed in this pull request?
We have a use case of multiplying very big sparse matrices. we have about
1000x1000 distributed block
46 matches
Mail list logo