[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061702#comment-16061702 ] Bryan Pendleton commented on DERBY-6940: Your approach to addressing the DERBY-3219 problems seems fine to me. I think it is quite clever and should get us the behavior that we need. Can you clarify why the exception handling is needed in the writeExternal method? That is, why isn't it more like: {code:java} writeBoolean( min == max); if (min != max) writeObject(maxVal); writeObject(minVal); {code} Regarding additional statistics, I don't have any to add at this time. I'm a big fan of incremental improvement, so I'm happy to add only the statistics that we need now, and in the future if we should determine that additional statistics would be valuable, we can address those needs as follow-on projects. However, yes, please, as you continue to study the code, and think about the approaches and possibilities, do let us know what additional statistics you think would be helpful, that is much appreciated! > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff, > EOFException_derby.log, EOFException.txt > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. > We eventually want to utilize the new statistics to make better selectivity > estimates / cost estimates that will help find the best query plan. Currently > Derby keeps two type of stats - the total row count and the number of unique > values. > We are initially extending the stats to include null count, the minimum value > and maximum value associated with each of the columns of an index. This would > be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, > >= ] , all of which currently rely on hardwired selectivity estimates. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DERBY-6942) Utilise additional statistics for selectivity estimates.
Harshvardhan Gupta created DERBY-6942: - Summary: Utilise additional statistics for selectivity estimates. Key: DERBY-6942 URL: https://issues.apache.org/jira/browse/DERBY-6942 Project: Derby Issue Type: Sub-task Reporter: Harshvardhan Gupta Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (DERBY-6942) Utilise additional statistics for selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harshvardhan Gupta reassigned DERBY-6942: - Assignee: Harshvardhan Gupta > Utilise additional statistics for selectivity estimates. > > > Key: DERBY-6942 > URL: https://issues.apache.org/jira/browse/DERBY-6942 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061215#comment-16061215 ] Harshvardhan Gupta commented on DERBY-6940: --- Also, I am creating another issue which will track the progress of integration of the new statistics into selectivity estimates. Meanwhile, I would like to invite your thoughts on other statistics we should consider. Most common values and distribution buckets are on the top of my mind right now, I'd appreciate your thoughts on the same. The average size of row written to disk is important for cost estimation and a quick examination of derby optimizer's code reveals that derby is already using it in some form, nevertheless I would revisit it during my analysis of cost estimation over the course of this project. > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff, > EOFException_derby.log, EOFException.txt > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. > We eventually want to utilize the new statistics to make better selectivity > estimates / cost estimates that will help find the best query plan. Currently > Derby keeps two type of stats - the total row count and the number of unique > values. > We are initially extending the stats to include null count, the minimum value > and maximum value associated with each of the columns of an index. This would > be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, > >= ] , all of which currently rely on hardwired selectivity estimates. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DERBY-6940) Enhance derby statistics for more accurate selectivity estimates.
[ https://issues.apache.org/jira/browse/DERBY-6940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16061206#comment-16061206 ] Harshvardhan Gupta commented on DERBY-6940: --- Hi Bryan, I thought of a workaround and was successful. In particular, I am comparing the maxVal and minVal and if both are equal I first write an indicator boolean and then write only one DataValueDescriptor object. In all other cases, I first write maxVal and then minVal, In this way the problematic object will always be written last once. public void writeExternal(ObjectOutput out) throws IOException { FormatableHashtable fh = new FormatableHashtable(); fh.putLong("numRows", numRows); fh.putLong("numUnique", numUnique); fh.putLong("nullCount", nullCount); out.writeObject(fh); try{ if (maxVal.equals(maxVal, minVal).getBoolean()) { out.writeBoolean(true); out.writeObject(minVal); return; } } catch(StandardException e){ } finally { out.writeBoolean(false); out.writeObject(maxVal); out.writeObject(minVal); } } public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException { FormatableHashtable fh = (FormatableHashtable)in.readObject(); numRows = fh.getLong("numRows"); numUnique = fh.getLong("numUnique"); nullCount = fh.getLong("nullCount"); if(in.readBoolean()){ maxVal = (DataValueDescriptor)in.readObject(); minVal = maxVal.cloneValue(true); } else{ maxVal = (DataValueDescriptor) in.readObject(); minVal = (DataValueDescriptor) in.readObject(); } } > Enhance derby statistics for more accurate selectivity estimates. > - > > Key: DERBY-6940 > URL: https://issues.apache.org/jira/browse/DERBY-6940 > Project: Derby > Issue Type: Sub-task > Components: SQL >Reporter: Harshvardhan Gupta >Assignee: Harshvardhan Gupta >Priority: Minor > Attachments: DERBY-6940_2.diff, DERBY-6940_3.diff, derby-6940.diff, > EOFException_derby.log, EOFException.txt > > > Derby should collect extra statistics during index build time, statistics > refresh time which will help optimizer make more precise selectivity > estimates and chose better execution paths. > We eventually want to utilize the new statistics to make better selectivity > estimates / cost estimates that will help find the best query plan. Currently > Derby keeps two type of stats - the total row count and the number of unique > values. > We are initially extending the stats to include null count, the minimum value > and maximum value associated with each of the columns of an index. This would > be useful in selectivity estimates for operators such as [ IS NULL, <, <=, >, > >= ] , all of which currently rely on hardwired selectivity estimates. -- This message was sent by Atlassian JIRA (v6.4.14#64029)