[jira] [Created] (SPARK-21127) Update statistics after data changing commands

2017-06-17 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-21127: Summary: Update statistics after data changing commands Key: SPARK-21127 URL: https://issues.apache.org/jira/browse/SPARK-21127 Project: Spark Issue Type: Su

[jira] [Commented] (SPARK-17129) Support statistics collection and cardinality estimation for partitioned tables

2017-06-20 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057030#comment-16057030 ] Zhenhua Wang commented on SPARK-17129: -- Sure, that'll be great~ > Support statistic

[jira] [Issue Comment Deleted] (SPARK-17129) Support statistics collection and cardinality estimation for partitioned tables

2017-06-20 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17129: - Comment: was deleted (was: Sure, that'll be great~) > Support statistics collection and cardinal

[jira] [Commented] (SPARK-17129) Support statistics collection and cardinality estimation for partitioned tables

2017-06-20 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057033#comment-16057033 ] Zhenhua Wang commented on SPARK-17129: -- [~mbasmanova] Sure, that'll be great~ > Sup

[jira] [Created] (SPARK-21180) Remove conf from stats functions since now we have conf in LogicalPlan

2017-06-22 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-21180: Summary: Remove conf from stats functions since now we have conf in LogicalPlan Key: SPARK-21180 URL: https://issues.apache.org/jira/browse/SPARK-21180 Project: Spark

[jira] [Commented] (SPARK-17129) Support statistics collection and cardinality estimation for partitioned tables

2017-06-26 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064062#comment-16064062 ] Zhenhua Wang commented on SPARK-17129: -- [~mbasmanova] Thanks for working on it~ I'll

[jira] [Created] (SPARK-21237) Invalidate stats once table data is changed

2017-06-28 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-21237: Summary: Invalidate stats once table data is changed Key: SPARK-21237 URL: https://issues.apache.org/jira/browse/SPARK-21237 Project: Spark Issue Type: Sub-t

[jira] [Created] (SPARK-21324) Improve statistics test suites

2017-07-05 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-21324: Summary: Improve statistics test suites Key: SPARK-21324 URL: https://issues.apache.org/jira/browse/SPARK-21324 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-21083) Consider staleness when collecting column stats

2017-07-07 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-21083: - Description: 1. When we first analyze without `noscan` and then analyze with `noscan`, the table

[jira] [Updated] (SPARK-21083) Do not remove correct stats when re-analyze

2017-07-07 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-21083: - Summary: Do not remove correct stats when re-analyze (was: Consider staleness when collecting co

[jira] [Updated] (SPARK-21083) Store zero size and row count after analyzing empty table

2017-07-08 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-21083: - Description: (was: 1. When we first analyze without `noscan` and then analyze with `noscan`,

[jira] [Updated] (SPARK-21083) Store zero size and row count after analyzing empty table

2017-07-08 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-21083: - Summary: Store zero size and row count after analyzing empty table (was: Do not remove correct s

[jira] [Comment Edited] (SPARK-17074) generate histogram information for column

2016-10-22 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570366#comment-15570366 ] Zhenhua Wang edited comment on SPARK-17074 at 10/22/16 12:52 PM: --

[jira] [Updated] (SPARK-18000) Aggregation function for computing endpoints for histograms

2016-10-25 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-18000: - Summary: Aggregation function for computing endpoints for histograms (was: Aggregation function

[jira] [Updated] (SPARK-18000) Aggregation function for computing endpoints for histograms

2016-10-25 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-18000: - Description: For a column, we will generate a equi-width or equi-height histogram, depending on

[jira] [Updated] (SPARK-17074) generate histogram information for column

2016-10-25 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17074: - Description: We support two kinds of histograms: - Equi-width histogram: We have a fixed w

[jira] [Commented] (SPARK-18000) Aggregation function for computing endpoints for histograms

2016-10-25 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15607306#comment-15607306 ] Zhenhua Wang commented on SPARK-18000: -- This issue is included in another issue SPAR

[jira] [Issue Comment Deleted] (SPARK-18000) Aggregation function for computing endpoints for histograms

2016-10-25 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-18000: - Comment: was deleted (was: This issue is included in another issue SPARK-17881, so I'll close thi

[jira] [Commented] (SPARK-17881) Aggregation function for generating string histograms

2016-10-25 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15607308#comment-15607308 ] Zhenhua Wang commented on SPARK-17881: -- This issue is included in another issue SPAR

[jira] [Closed] (SPARK-17881) Aggregation function for generating string histograms

2016-10-25 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang closed SPARK-17881. Resolution: Duplicate > Aggregation function for generating string histograms > ---

[jira] [Created] (SPARK-18111) Wrong ApproximatePercentile answer when multiple records have the minimum value

2016-10-25 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-18111: Summary: Wrong ApproximatePercentile answer when multiple records have the minimum value Key: SPARK-18111 URL: https://issues.apache.org/jira/browse/SPARK-18111 Proje

[jira] [Updated] (SPARK-18111) Wrong ApproximatePercentile answer when multiple records have the minimum value

2016-10-25 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-18111: - Description: When multiple records have the minimum value, the answer of ApproximatePercentile i

[jira] [Updated] (SPARK-18111) Wrong ApproximatePercentile answer when multiple records have the minimum value

2016-10-26 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-18111: - Description: When multiple records have the minimum value, the answer of ApproximatePercentile i

[jira] [Commented] (SPARK-18111) Wrong ApproximatePercentile answer when multiple records have the minimum value

2016-10-26 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610192#comment-15610192 ] Zhenhua Wang commented on SPARK-18111: -- [~srowen] The minimum is not only skipped on

[jira] [Comment Edited] (SPARK-18111) Wrong ApproximatePercentile answer when multiple records have the minimum value

2016-10-26 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610192#comment-15610192 ] Zhenhua Wang edited comment on SPARK-18111 at 10/27/16 12:46 AM: --

[jira] [Comment Edited] (SPARK-18111) Wrong ApproximatePercentile answer when multiple records have the minimum value

2016-10-26 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610192#comment-15610192 ] Zhenhua Wang edited comment on SPARK-18111 at 10/27/16 12:46 AM: --

[jira] [Updated] (SPARK-18111) Wrong ApproximatePercentile answer when multiple records have the minimum value

2016-10-27 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-18111: - Description: When multiple records have the minimum value, the answer of ApproximatePercentile i

[jira] [Commented] (SPARK-18111) Wrong ApproximatePercentile answer when multiple records have the minimum value

2016-10-27 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611160#comment-15611160 ] Zhenhua Wang commented on SPARK-18111: -- [~srowen] Sorry for the ambiguous example. I

[jira] [Comment Edited] (SPARK-18111) Wrong ApproximatePercentile answer when multiple records have the minimum value

2016-10-27 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611160#comment-15611160 ] Zhenhua Wang edited comment on SPARK-18111 at 10/27/16 8:31 AM: ---

[jira] [Comment Edited] (SPARK-18111) Wrong ApproximatePercentile answer when multiple records have the minimum value

2016-10-27 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611160#comment-15611160 ] Zhenhua Wang edited comment on SPARK-18111 at 10/27/16 8:35 AM: ---

[jira] [Commented] (SPARK-18111) Wrong ApproximatePercentile answer when multiple records have the minimum value

2016-10-27 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611229#comment-15611229 ] Zhenhua Wang commented on SPARK-18111: -- Yes, it says '5'. Because the samples in Qua

[jira] [Commented] (SPARK-18111) Wrong ApproximatePercentile answer when multiple records have the minimum value

2016-10-27 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611236#comment-15611236 ] Zhenhua Wang commented on SPARK-18111: -- Everytime it calls compress(), it will lose

[jira] [Updated] (SPARK-17079) broadcast decision based on cbo

2016-10-27 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17079: - Description: We decide if broadcast join should be used based on the cardinality and size of join

[jira] [Created] (SPARK-18149) build side decision based on cbo

2016-10-27 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-18149: Summary: build side decision based on cbo Key: SPARK-18149 URL: https://issues.apache.org/jira/browse/SPARK-18149 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-17791) Join reordering using star schema detection

2016-10-30 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15621070#comment-15621070 ] Zhenhua Wang commented on SPARK-17791: -- Hi Ioana, The current implementation is NOT

[jira] [Created] (SPARK-18221) Wrong ApproximatePercentile answer when multiple records have the minimum value(for branch 2.0)

2016-11-02 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-18221: Summary: Wrong ApproximatePercentile answer when multiple records have the minimum value(for branch 2.0) Key: SPARK-18221 URL: https://issues.apache.org/jira/browse/SPARK-18221

[jira] [Commented] (SPARK-18221) Wrong ApproximatePercentile answer when multiple records have the minimum value(for branch 2.0)

2016-11-02 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15628306#comment-15628306 ] Zhenhua Wang commented on SPARK-18221: -- Oh, sorry, should I close this one? > Wrong

[jira] [Updated] (SPARK-18000) Aggregation function for computing endpoints for histograms

2016-11-04 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-18000: - Description: The MapAggregate function Computes frequency for each distinct non-null value of a

[jira] [Updated] (SPARK-18000) Aggregation function for computing endpoints for histograms

2016-11-04 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-18000: - Description: This function Computes frequency for each distinct non-null value of a column. It re

[jira] [Updated] (SPARK-18000) Aggregation function for computing bins (distinct value, count) pairs for equi-width histograms

2016-11-04 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-18000: - Summary: Aggregation function for computing bins (distinct value, count) pairs for equi-width his

[jira] [Updated] (SPARK-18000) Aggregation function for computing bins (distinct value, count) pairs for equi-width histograms

2016-11-04 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-18000: - Description: This function computes the count for each distinct non-null value of a column. It re

[jira] [Commented] (SPARK-17446) no total size for data source tables in InMemoryCatalog

2016-11-08 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15646930#comment-15646930 ] Zhenhua Wang commented on SPARK-17446: -- This issue is resolved after [SPARK-17470|h

[jira] [Resolved] (SPARK-17446) no total size for data source tables in InMemoryCatalog

2016-11-08 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang resolved SPARK-17446. -- Resolution: Fixed Fix Version/s: 2.1.0 > no total size for data source tables in InMemor

[jira] [Created] (SPARK-18429) implement a new Aggregate for CountMinSketch

2016-11-13 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-18429: Summary: implement a new Aggregate for CountMinSketch Key: SPARK-18429 URL: https://issues.apache.org/jira/browse/SPARK-18429 Project: Spark Issue Type: New

[jira] [Created] (SPARK-18559) Also restrict the lower bound of relativeSD in HLL++

2016-11-23 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-18559: Summary: Also restrict the lower bound of relativeSD in HLL++ Key: SPARK-18559 URL: https://issues.apache.org/jira/browse/SPARK-18559 Project: Spark Issue Ty

[jira] [Updated] (SPARK-18559) Restrict the lower bound of relativeSD in HLL++

2016-11-23 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-18559: - Summary: Restrict the lower bound of relativeSD in HLL++ (was: Also restrict the lower bound of

[jira] [Updated] (SPARK-18559) Restrict the lower bound of relativeSD in HLL++

2016-11-23 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-18559: - Description: In HyperLogLogPlusPlus, if the relative error is so small that p >= 19, it will caus

[jira] [Updated] (SPARK-18559) Fix HLL++ with small relative error

2016-11-23 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-18559: - Summary: Fix HLL++ with small relative error (was: Restrict the lower bound of relativeSD in HLL

[jira] [Created] (SPARK-37846) TaskContext is used at wrong place in BlockManagerDecommissionIntegrationSuite

2022-01-08 Thread Zhenhua Wang (Jira)
Zhenhua Wang created SPARK-37846: Summary: TaskContext is used at wrong place in BlockManagerDecommissionIntegrationSuite Key: SPARK-37846 URL: https://issues.apache.org/jira/browse/SPARK-37846 Projec

[jira] [Created] (SPARK-38140) Column stats (min, max) for timestamp type is not consistent with the value due to time zone difference

2022-02-08 Thread Zhenhua Wang (Jira)
Zhenhua Wang created SPARK-38140: Summary: Column stats (min, max) for timestamp type is not consistent with the value due to time zone difference Key: SPARK-38140 URL: https://issues.apache.org/jira/browse/SPARK-

[jira] [Updated] (SPARK-38140) Column stats (min, max) for timestamp type is not consistent with the value due to time zone difference

2022-02-08 Thread Zhenhua Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-38140: - Description: Currently timestamp column's stats (min/max) are stored in UTC in metastore, and w

[jira] [Updated] (SPARK-38140) Column stats (min, max) for timestamp type is not consistent with the value due to time zone difference

2022-02-08 Thread Zhenhua Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-38140: - Description: Currently timestamp column's stats (min/max) are stored in UTC in metastore, and w

[jira] [Updated] (SPARK-38140) Desc column stats (min, max) for timestamp type is not consistent with the value due to time zone difference

2022-02-08 Thread Zhenhua Wang (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-38140: - Summary: Desc column stats (min, max) for timestamp type is not consistent with the value due to

[jira] [Created] (SPARK-22208) Improve percentile_approx by not rounding up targetError and starting from index 0

2017-10-05 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-22208: Summary: Improve percentile_approx by not rounding up targetError and starting from index 0 Key: SPARK-22208 URL: https://issues.apache.org/jira/browse/SPARK-22208 Pr

[jira] [Commented] (SPARK-22179) percentile_approx should choose the first element if it already reaches the percentage

2017-10-06 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194680#comment-16194680 ] Zhenhua Wang commented on SPARK-22179: -- [~srowen] Yes, I planned to close this after

[jira] [Commented] (SPARK-22164) support histogram in estimating the cardinality of aggregate (or group-by) operator

2017-10-12 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202081#comment-16202081 ] Zhenhua Wang commented on SPARK-22164: -- [~ron8hu] I don't think histogram can help w

[jira] [Closed] (SPARK-22164) support histogram in estimating the cardinality of aggregate (or group-by) operator

2017-10-12 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang closed SPARK-22164. > support histogram in estimating the cardinality of aggregate (or group-by) > operator >

[jira] [Resolved] (SPARK-22164) support histogram in estimating the cardinality of aggregate (or group-by) operator

2017-10-12 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang resolved SPARK-22164. -- Resolution: Won't Fix Target Version/s: (was: 2.3.0) > support histogram in estimat

[jira] [Comment Edited] (SPARK-22164) support histogram in estimating the cardinality of aggregate (or group-by) operator

2017-10-12 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202081#comment-16202081 ] Zhenhua Wang edited comment on SPARK-22164 at 10/12/17 3:21 PM: ---

[jira] [Created] (SPARK-22285) Change implementation of ApproxCountDistinctForIntervals to TypedImperativeAggregate

2017-10-16 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-22285: Summary: Change implementation of ApproxCountDistinctForIntervals to TypedImperativeAggregate Key: SPARK-22285 URL: https://issues.apache.org/jira/browse/SPARK-22285

[jira] [Updated] (SPARK-22285) Change implementation of ApproxCountDistinctForIntervals to TypedImperativeAggregate

2017-10-16 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-22285: - Description: The current implementation of `ApproxCountDistinctForIntervals` is `ImperativeAggre

[jira] [Created] (SPARK-22310) Refactor join estimation to incorporate estimation logic for different kinds of statistics

2017-10-18 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-22310: Summary: Refactor join estimation to incorporate estimation logic for different kinds of statistics Key: SPARK-22310 URL: https://issues.apache.org/jira/browse/SPARK-22310

[jira] [Created] (SPARK-22326) Remove unnecessary hashCode and equals methods

2017-10-20 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-22326: Summary: Remove unnecessary hashCode and equals methods Key: SPARK-22326 URL: https://issues.apache.org/jira/browse/SPARK-22326 Project: Spark Issue Type: Bu

[jira] [Updated] (SPARK-17074) generate equi-height histogram for column

2017-10-24 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17074: - Affects Version/s: (was: 2.0.0) 2.3.0 > generate equi-height histogram

[jira] [Updated] (SPARK-22310) Refactor join estimation to incorporate estimation logic for different kinds of statistics

2017-10-27 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-22310: - Issue Type: Sub-task (was: Improvement) Parent: SPARK-21975 > Refactor join estimation t

[jira] [Updated] (SPARK-22310) Refactor join estimation to incorporate estimation logic for different kinds of statistics

2017-10-27 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-22310: - Issue Type: Improvement (was: Sub-task) Parent: (was: SPARK-16026) > Refactor join e

[jira] [Created] (SPARK-22394) Redundant synchronization for metastore access

2017-10-29 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-22394: Summary: Redundant synchronization for metastore access Key: SPARK-22394 URL: https://issues.apache.org/jira/browse/SPARK-22394 Project: Spark Issue Type: Im

[jira] [Commented] (SPARK-22394) Redundant synchronization for metastore access

2017-10-29 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16224028#comment-16224028 ] Zhenhua Wang commented on SPARK-22394: -- [~cloud_fan] [~smilegator] [~rxin] Do I unde

[jira] [Created] (SPARK-22400) rename some APIs and classes to make their meaning clearer

2017-10-30 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-22400: Summary: rename some APIs and classes to make their meaning clearer Key: SPARK-22400 URL: https://issues.apache.org/jira/browse/SPARK-22400 Project: Spark Is

[jira] [Updated] (SPARK-22400) rename some APIs and classes to make their meaning clearer

2017-10-30 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-22400: - Description: Both `ReadSupport` and `ReadTask` have a method called `createReader`, but they cre

[jira] [Created] (SPARK-22475) show histogram in DESC COLUMN command

2017-11-08 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-22475: Summary: show histogram in DESC COLUMN command Key: SPARK-22475 URL: https://issues.apache.org/jira/browse/SPARK-22475 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-22515) Estimation relation size based on numRows * rowSize

2017-11-13 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-22515: Summary: Estimation relation size based on numRows * rowSize Key: SPARK-22515 URL: https://issues.apache.org/jira/browse/SPARK-22515 Project: Spark Issue Typ

[jira] [Updated] (SPARK-22515) Estimation relation size based on numRows * rowSize

2017-11-14 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-22515: - Description: Currently, relation size is computed as the sum of file size, which is error-prone b

[jira] [Created] (SPARK-22529) No need to propagate catalog stats when cbo is disabled

2017-11-15 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-22529: Summary: No need to propagate catalog stats when cbo is disabled Key: SPARK-22529 URL: https://issues.apache.org/jira/browse/SPARK-22529 Project: Spark Issue

[jira] [Updated] (SPARK-22529) Only sizeInBytes in catalog stats needs to be propagated when cbo is disabled

2017-11-15 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-22529: - Summary: Only sizeInBytes in catalog stats needs to be propagated when cbo is disabled (was: No

[jira] [Updated] (SPARK-22529) Relation stats should be consistent with other plans based on cbo config

2017-11-15 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-22529: - Description: Currently, relation stats is the same whether cbo is enabled or not. While relation

[jira] [Updated] (SPARK-22529) Relation stats should be consistent with other plans based on cbo config

2017-11-15 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-22529: - Summary: Relation stats should be consistent with other plans based on cbo config (was: Only siz

[jira] [Updated] (SPARK-17129) Support statistics collection and cardinality estimation for partitioned tables

2017-11-17 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17129: - Description: Support statistics collection and cardinality estimation for partitioned tables. (w

[jira] [Created] (SPARK-22662) Failed to prune columns after rewriting predicate subquery

2017-11-30 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-22662: Summary: Failed to prune columns after rewriting predicate subquery Key: SPARK-22662 URL: https://issues.apache.org/jira/browse/SPARK-22662 Project: Spark Is

[jira] [Created] (SPARK-19271) Non-cbo estimation of aggregate

2017-01-18 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-19271: Summary: Non-cbo estimation of aggregate Key: SPARK-19271 URL: https://issues.apache.org/jira/browse/SPARK-19271 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-19271) Change non-cbo estimation of aggregate

2017-01-18 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-19271: - Summary: Change non-cbo estimation of aggregate (was: Non-cbo estimation of aggregate) > Change

[jira] [Updated] (SPARK-19350) Cardinality estimation of Limit and Sample

2017-01-24 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-19350: - Summary: Cardinality estimation of Limit and Sample (was: Improve cardinality estimation of Limi

[jira] [Updated] (SPARK-19350) Cardinality estimation of Limit and Sample

2017-01-24 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-19350: - Description: Currently, LocalLimit/GlobalLimit/Sample propagates the same row count and column s

[jira] [Updated] (SPARK-17079) broadcast decision based on cbo

2016-09-16 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17079: - Description: We decide if broadcast join should be used based on the cardinality and size of join

[jira] [Updated] (SPARK-17077) Cardinality estimation of group-by, project, union, etc.

2016-09-16 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17077: - Summary: Cardinality estimation of group-by, project, union, etc. (was: cardinality estimates of

[jira] [Created] (SPARK-17625) expectedOutputAttributes should be set when converting SimpleCatalogRelation to LogicalRelation

2016-09-21 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-17625: Summary: expectedOutputAttributes should be set when converting SimpleCatalogRelation to LogicalRelation Key: SPARK-17625 URL: https://issues.apache.org/jira/browse/SPARK-17625

[jira] [Created] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics

2016-09-22 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-17642: Summary: support DESC FORMATTED TABLE COLUMN command to show column-level statistics Key: SPARK-17642 URL: https://issues.apache.org/jira/browse/SPARK-17642 Project:

[jira] [Updated] (SPARK-17642) support DESC FORMATTED TABLE COLUMN command to show column-level statistics

2016-09-22 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17642: - Description: Support DESC FORMATTED TABLE COLUMN command to show column-level statistics. We shou

[jira] [Commented] (SPARK-17074) generate histogram information for column

2016-09-30 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15537248#comment-15537248 ] Zhenhua Wang commented on SPARK-17074: -- Hi, there's something I want to discuss here

[jira] [Commented] (SPARK-17074) generate histogram information for column

2016-10-05 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15550228#comment-15550228 ] Zhenhua Wang commented on SPARK-17074: -- OK, I'll try to extend QuantileSummaries to

[jira] [Commented] (SPARK-17626) TPC-DS performance improvements using star-schema heuristics

2016-10-08 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557867#comment-15557867 ] Zhenhua Wang commented on SPARK-17626: -- This is pretty good. We can use star schema

[jira] [Comment Edited] (SPARK-17626) TPC-DS performance improvements using star-schema heuristics

2016-10-08 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557867#comment-15557867 ] Zhenhua Wang edited comment on SPARK-17626 at 10/8/16 12:19 PM: ---

[jira] [Created] (SPARK-17881) Aggregation function for generating string histograms

2016-10-11 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-17881: Summary: Aggregation function for generating string histograms Key: SPARK-17881 URL: https://issues.apache.org/jira/browse/SPARK-17881 Project: Spark Issue T

[jira] [Commented] (SPARK-17074) generate histogram information for column

2016-10-12 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570366#comment-15570366 ] Zhenhua Wang commented on SPARK-17074: -- Well, I've got stuck here for a few days. I

[jira] [Comment Edited] (SPARK-17074) generate histogram information for column

2016-10-12 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570366#comment-15570366 ] Zhenhua Wang edited comment on SPARK-17074 at 10/13/16 12:29 AM: --

[jira] [Comment Edited] (SPARK-17074) generate histogram information for column

2016-10-12 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570366#comment-15570366 ] Zhenhua Wang edited comment on SPARK-17074 at 10/13/16 12:55 AM: --

[jira] [Commented] (SPARK-17827) StatisticsColumnSuite failures on big endian platforms

2016-10-13 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571881#comment-15571881 ] Zhenhua Wang commented on SPARK-17827: -- [~srowen] Thanks for notification. [~robbins

[jira] [Created] (SPARK-17997) Aggregation function for counting distinct values for multiple intervals

2016-10-18 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-17997: Summary: Aggregation function for counting distinct values for multiple intervals Key: SPARK-17997 URL: https://issues.apache.org/jira/browse/SPARK-17997 Project: Spa

[jira] [Created] (SPARK-18000) Aggregation function for computing endpoints for numeric histograms

2016-10-18 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-18000: Summary: Aggregation function for computing endpoints for numeric histograms Key: SPARK-18000 URL: https://issues.apache.org/jira/browse/SPARK-18000 Project: Spark

[jira] [Updated] (SPARK-17074) generate histogram information for column

2016-10-18 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-17074: - Description: We support two kinds of histograms: - Equi-width histogram: We have a fixed w

  1   2   3   >