[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2017-05-23 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14817 Thanks for reporting it! After CBO, the relation size is not only used for deciding whether a table can be broadcasted. Maybe we can close this PR now? --- If your project is set up

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-09-28 Thread Parth-Brahmbhatt
Github user Parth-Brahmbhatt commented on the issue: https://github.com/apache/spark/pull/14817 @hvanhovell We have tables with 5-6 partition columns and data going back 4-5 years and given our data is stored in s3 the listing is paginated. If you want to wait till CBO work

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14817 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-09-14 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14817 @Parth-Brahmbhatt I am very curious why you have millions of partitions. What is the use case? You will be in a world of hurt as soon as you do any listing. I am not going to merge this

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-09-14 Thread Parth-Brahmbhatt
Github user Parth-Brahmbhatt commented on the issue: https://github.com/apache/spark/pull/14817 Request for review one more time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-09-08 Thread Parth-Brahmbhatt
Github user Parth-Brahmbhatt commented on the issue: https://github.com/apache/spark/pull/14817 Can someone please review this PR? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-09-01 Thread Parth-Brahmbhatt
Github user Parth-Brahmbhatt commented on the issue: https://github.com/apache/spark/pull/14817 @hvanhovell I looked at AlterTableRecoverPartitionsCommand and the parallelism in listing could help it will still cause huge perf penalty. We have tables with millions of partitions and

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-08-31 Thread Parth-Brahmbhatt
Github user Parth-Brahmbhatt commented on the issue: https://github.com/apache/spark/pull/14817 @hvanhovell I will take a look at it and update this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-08-31 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14817 @Parth-Brahmbhatt would the approach taken in `AlterTableRecoverPartitionsCommand` help? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-08-31 Thread Parth-Brahmbhatt
Github user Parth-Brahmbhatt commented on the issue: https://github.com/apache/spark/pull/14817 @hvanhovell its because of listing and gets worst as amount increases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-08-31 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14817 @Parth-Brahmbhatt here is the CBO ticket: https://issues.apache.org/jira/browse/SPARK-16026 Could you explain why this is so slow? Is this because of listing the files? Or because of

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-08-31 Thread Parth-Brahmbhatt
Github user Parth-Brahmbhatt commented on the issue: https://github.com/apache/spark/pull/14817 Can one of the committers take a look at this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-08-26 Thread Parth-Brahmbhatt
Github user Parth-Brahmbhatt commented on the issue: https://github.com/apache/spark/pull/14817 @hvanhovell can you also point me at the design doc/discuss thread for CBO work? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-08-25 Thread Parth-Brahmbhatt
Github user Parth-Brahmbhatt commented on the issue: https://github.com/apache/spark/pull/14817 @hvanhovell The behavior in case this fallbackToHdfs is not enabled ( and by default it is not enabled for performance reason) is to return the value specified via

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-08-25 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14817 @Parth-Brahmbhatt we are currently working Cost Based Optimization in Spark. An important input will be the actual size of the table. Having partial statistics (what you are suggestion) will not

[GitHub] spark issue #14817: [SPARK-17247][SQL]: when calcualting size of a relation ...

2016-08-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14817 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this