[GitHub] spark pull request #23042: [SPARK-26070][SQL] add rule for implicit type coe...

2018-11-17 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/23042#discussion_r234431689 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -138,6 +138,11 @@ object TypeCoercion

[GitHub] spark pull request #23042: [SPARK-26070][SQL] add rule for implicit type coe...

2018-11-15 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/23042#discussion_r233972629 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -138,6 +138,11 @@ object TypeCoercion

[GitHub] spark issue #23042: [SPARK-26070] add rule for implicit type coercion for de...

2018-11-15 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/23042 @cloud-fan - could you please take a look? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #23042: [SPARK-26070] add rule for implicit type coercion...

2018-11-15 Thread uzadude
GitHub user uzadude opened a pull request: https://github.com/apache/spark/pull/23042 [SPARK-26070] add rule for implicit type coercion for decimal(x,0) ## What changes were proposed in this pull request? Adding another excpetion rule for a popular case where ID columns

[GitHub] spark issue #16732: [SPARK-19368][MLlib] BlockMatrix.toIndexedRowMatrix() op...

2018-11-14 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/16732 After running some more experiments I was able to reduce the runtime by another 1.5x factor. So currently the "toCoordinateMatrix().toIndexedRowMatrix()" is better by a bit only in th

[GitHub] spark issue #22964: [SPARK-25963] Optimize generate followed by window

2018-11-07 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/22964 this is the original query. we can see the explode followed by the shuffle: ``` import org.apache.spark.sql.functions._ import org.apache.spark.sql.expressions._ val N = 1

[GitHub] spark issue #22964: [SPARK-25963] Optimize generate followed by window

2018-11-07 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/22964 The whole idea is that we'll get one shuffle and it will be before the explode as the window's partition is contained in the repartition. I'll show the physical plan

[GitHub] spark pull request #22964: [SPARK-25963] Optimize generate followed by windo...

2018-11-07 Thread uzadude
GitHub user uzadude opened a pull request: https://github.com/apache/spark/pull/22964 [SPARK-25963] Optimize generate followed by window ## What changes were proposed in this pull request? I've added an optimizer rule to add Repartition operator when we have a Generate operator

[GitHub] spark issue #15599: [SPARK-18022][SQL] java.lang.NullPointerException instea...

2018-10-31 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/15599 > @uzadude sir I am getting same error in spark-sql 2.3.1 version I don't think you meant to approach

[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-12-29 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/19683 this timeout in "org.apache.spark.ml.regression.LinearRegressionSuite.linear regression with intercept without regularization" doesn't seem related

[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-12-29 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/19683 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-12-28 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/19683#discussion_r159033662 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala --- @@ -85,11 +84,19 @@ case class GenerateExec( val

[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-12-28 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/19683 seems reasonable, let's do that. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-12-28 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/19683 not sure why this build safe. it passes when I run it locally.. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-12-28 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/19683 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-12-28 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/19683 looks like when the Optimizer is running, at some temporary point it makes a copy of the Generate operator and fails on this cast. if we will move the resolve check inside: override lazy val

[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-12-28 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/19683 have you seen the tests failures? it looks like they fail on the assertion.. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-12-27 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/19683#discussion_r158906785 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -73,8 +73,10 @@ case class Project

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-12-27 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/19683#discussion_r158906660 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala --- @@ -359,12 +359,12 @@ package object dsl { def

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-12-27 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/19683#discussion_r158854831 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -83,15 +85,17 @@ case class Project

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-12-27 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/19683#discussion_r158854751 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -444,12 +444,22 @@ object ColumnPruning extends Rule

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-12-27 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/19683#discussion_r158814175 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -73,8 +73,10 @@ case class Project

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-12-26 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/19683#discussion_r158737578 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -846,12 +846,13 @@ class Analyzer

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-12-26 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/19683#discussion_r158737560 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -695,7 +695,7 @@ class Analyzer

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-12-22 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/19683#discussion_r158552882 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -451,6 +451,11 @@ object ColumnPruning extends Rule

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-12-22 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/19683#discussion_r158488942 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -451,6 +451,11 @@ object ColumnPruning extends Rule

[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-12-07 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/19683 @henryr I understand what you're saying. I'm not sure why there is the UnsafeProject in the end of the function, but it's commented in this PR that fixes [SPARK-13476] without much elaboration

[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-12-07 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/19683 could you please retest this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-11-29 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/19683#discussion_r153998263 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -450,6 +450,11 @@ object ColumnPruning extends Rule

[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-11-28 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/19683 Hi, did somebody had a chance to look at this PR? I think it's a pretty useful optimization. --- - To unsubscribe, e-mail

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-11-19 Thread uzadude
Github user uzadude commented on a diff in the pull request: https://github.com/apache/spark/pull/19683#discussion_r151868042 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala --- @@ -59,15 +61,23 @@ case class GenerateExec( generator

[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-11-08 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/19683 Do you understand this failure? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-11-07 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/19683 I've fixed more styling issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19683: [SPARK-21657][SQL] optimize explode quadratic memory con...

2017-11-07 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/19683 I've fixed the styling issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #19683: [SPARK-21657][SQL] optimize explode quadratic mem...

2017-11-07 Thread uzadude
GitHub user uzadude opened a pull request: https://github.com/apache/spark/pull/19683 [SPARK-21657][SQL] optimize explode quadratic memory consumpation ## What changes were proposed in this pull request? The issue has been raised in two Jira tickets: [SPARK-21657](https

[GitHub] spark issue #16732: [SPARK-19368][MLlib] BlockMatrix.toIndexedRowMatrix() op...

2017-01-30 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/16732 not sure I understand, matSparse has 1/10th nnz as needed for line 282: > if (numberNonZeroPerRow <= 0.1) { // Sparse at 1/10th nnz --- If your project is set up for it, you can

[GitHub] spark issue #16732: [SPARK-19368][MLlib] BlockMatrix.toIndexedRowMatrix() op...

2017-01-29 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/16732 Hi, I removed the .asBreeze call for the sparse case. about the performance issue for this sample code: ```scala val n = 2 val rndEntryList

[GitHub] spark pull request #16732: [SPARK-19368][MLlib] BlockMatrix.toIndexedRowMatr...

2017-01-28 Thread uzadude
GitHub user uzadude opened a pull request: https://github.com/apache/spark/pull/16732 [SPARK-19368][MLlib] BlockMatrix.toIndexedRowMatrix() optimization for sparse matrices ## What changes were proposed in this pull request? Optimization [SPARK-12869] was made for dense

[GitHub] spark issue #14565: KCore algorithm

2016-08-09 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/14565 I also opened a Jira feature request [[SPARK-16976]] --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #14565: KCore algorithm

2016-08-09 Thread uzadude
GitHub user uzadude opened a pull request: https://github.com/apache/spark/pull/14565 KCore algorithm ## What changes were proposed in this pull request? Added [KCore] (https://en.wikipedia.org/wiki/Degeneracy_(graph_theory)#k-Cores) algorithm implementation

[GitHub] spark issue #14068: [SPARK-16469] enhanced simulate multiply

2016-07-26 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/14068 Sure, what size are you talking about? we had to some internal code fixes to do that. right now the support for sparse matrices is pretty poor - mainly because Breeze doesn't support it. I'm

[GitHub] spark issue #14068: [SPARK-16469] enhanced simulate multiply

2016-07-12 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/14068 Have done the requested nit changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14068: [SPARK-16469] enhanced simulate multiply

2016-07-10 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/14068 I have opened SPARK-16469. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14068: enhanced simulate multiply

2016-07-07 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/14068 Sure. The current method for multiplying distributed block matrices starts by deciding which block should be shuffled to which partition to do the actual multiplications. This stage

[GitHub] spark issue #14068: enhanced simulate multiply

2016-07-06 Thread uzadude
Github user uzadude commented on the issue: https://github.com/apache/spark/pull/14068 Hi srowen, I have read the "how to contribute" wiki. I thought that it is too small of enhancement to open a jira for it and it passes the tests. --- If your project is set up for i

[GitHub] spark pull request #14068: enhanced simulate multiply

2016-07-06 Thread uzadude
GitHub user uzadude opened a pull request: https://github.com/apache/spark/pull/14068 enhanced simulate multiply ## What changes were proposed in this pull request? We have a use case of multiplying very big sparse matrices. we have about 1000x1000 distributed block