[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892343#action_12892343 ] Mamta A. Satoor commented on DERBY-3788: Just wanted to summarize what has been tried for this jira so far and a proposal for a possible solution worth trying 1)First I tried to see if the missing stats could be created inline when the SELECT query compilation finds that missing but that failed because SELECT query compilation marked the data dictionary in read only mode where as update statistics needs to update the data dictionary(this marking of read only for data dictionary happens in GenericStatement.prepMinion). Because of this read-only marking, we can't create the statistics inline of the compilation phase of the SELECT query. 2)Second approach was during the SELECT compile phase, if we come across missing stats, then schedule a task for later to update the stats. Finish the SELECT compilation and execution after the update stat task is scheduled on the queue (the scheduling mechanism used is only available with jdk 1.5 and higher). When the scheduled task was later fired, it would run no current connection exception. The reason for that was that there was no context set for update statistics to run. Tried resolving this by creating a connection context through following call InternalDriver.activeDriver().connect(url, info) I had the SELECT compilation pass the needed information to the background task so I can create the connection. This is pretty crude way of getting a connection to update the statistics. Along with the fact that the code to create the connection context was not very modular, some of the existing tests failed with this changes because of locking issues. This probably is because the locks held by the background thread for update stats interfered with locks required by the rest of the test. 3)The third approach I tried was when the SELECT query detects missing stats, abort the query compilation, (thus removing the data dictionary from read-only mode), create the stats and then restart the original SELECT query compilation and execution. The hope with this approach was that it will avoid running into locking issues. If we decide to pursue this approach more, we need to make sure that we look at the possibility of stats still missing when the original query is compile the 2nd time. This can happen say because the user does not have privileges to update the stats, the update stat ran into locking issues. If we won't account for possibility of missing stats during the 2nd time through the original query, we can end up in an infinite loop. Another problem with this approach was that update stats were happening in the same transaction as the original SELECT query. Which means that the locks acquired by the update stat will not be released until the transaction is committed. Came across locking issues with this when I ran the existing junit and derbyall tests. One way to fix this could be to start a nested user transaction and do the update stat in that transaction and commit that transaction(I think something like this is done for identity columns. If yes, then we might find pointers there to use for update stats in nested user transaction). Thus any locks acquired by the update stat work will be released. And after that, go back to the original query again for compilation and execution. 4)With any approach we take, we should detect the case where the table is empty and hence the stats might be missing. For such a case, we should not try to fire stats creation task. to rephrase, during the compile phase of the SELECT query, when we look for statistics, first check if the table is empty and if yes then skip the code for collecting stats. From what I recall, one can check if the table is empty by issuing open scan controller and asking it for the number of rows in the table. If we do not check for empty table, then we will end up in infinite loop. 5)For read-only db we should detect that we are readonly and don't try to do update statistics 6)Also, we probably want to create statistics only for user tables? Haven't thought enough about it. 7)What if the user executing the query is not the owner of the table. should statistics still be created? I think if the user does not have enough privileges, then skip the step of creating stats since it's going to fail anyways. 8)Once statistics are created, notify tabledescriptor about it because tabledescriptor cahces available statistics and it will never know of the new statistics since tabledescriptor holds on to that cache. 9)Once we have update stats working, another improvement could be to identify the cached compiled queries that may benefit from updated stats and have those queries recompile when they are executed next time. In other words, the relevant queries shou
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680484#action_12680484 ] Bryan Pendleton commented on DERBY-3788: Yes, that's the technique I remembered. Thanks for tracking it down! That seems fine to me. > Provide a zero-admin way of updating the statisitcs of an index > --- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: New Feature > Components: Performance >Affects Versions: 10.5.0.0 >Reporter: Mamta A. Satoor >Assignee: Mamta A. Satoor > Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, > DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, > DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, > DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt, > DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt, > DERBY_3788_Mgr.java, DERBY_3788_Repro.java > > > DERBY-269 provided a manual way of updating the statistics using the new > system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good > for Derby to provide an automatic way of updating the statistics without > requiring to run the stored procedure manually. There was some discussion on > DERBY-269 about providing the 0-admin way. I have copied it here for > reference. > * > Kathey Marsden - 22/May/05 03:53 PM > Some sort of zero admin solution for updating statistics would be prefferable > to the manual 'update statistics' > * > * > Mike Matrigali - 11/Jun/08 12:37 PM > I have not seen any other suggestions, how about the following zero admin > solution? It is not perfect - suggestions welcome. > Along with the statistics storing, save how many rows were in the table when > exact statistics were calculated. This number is 0 if none have been > calculated because index creation happened on an empty table. At query > compile time when we look up statistics we automatically recalculate the > statistics at certain threshholds - say something like row count growing past > next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere > around how many rows we can process in some small amount of time - like 1 > second on a modern laptop. If we are worried about response time, maybe we > background queue the stat gathering rather than waiting with maybe some quick > load if no stat has ever been gathered. The background gathering could be > optimized to not interfere with locks by using read uncommitted. > I think it would be useful to also have the manual call just to make it easy > to support customers and debug issues in the field. There is proably always > some dynamic data distribution change that in some case won't be picked up by > the automatic algorithm. Also just very useful for those who have complete > control of the create ddl, load data, run stats, deliver application process. > * -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680478#action_12680478
]
Mamta A. Satoor commented on DERBY-3788:
Bryan, thanks for taking the time for this jira entry. Yes, I was planning on
using nowait so that we do not wait and eventually possibly locking issues with
parent. This is exactly what we do in InsertResultSet.getSetAutoincrementValue
as shown below
try
{
/* If tcToUse == tc, then we are using parent xaction-- this
can happen if for some reason we couldn't start a nested
transaction
*/
newValue = dd.getSetAutoincrementValue(
constants.autoincRowLocation[index],
tcToUse, true, aiCache[index], (tcToUse == tc));
}
catch (StandardException se)
{
if (tcToUse == tc)
{
/* we've using the parent xaction and we've timed out; just
throw an error and exit.
*/
throw se;
}
if (se.getMessageId().equals(SQLState.LOCK_TIMEOUT))
{
// if we couldn't do this with a nested xaction, retry with
// parent-- we need to wait this time!
newValue = dd.getSetAutoincrementValue(
constants.autoincRowLocation[index],
tc, true, aiCache[index], true);
}
> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
> Issue Type: New Feature
> Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt,
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt,
> DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt,
> DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt,
> DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt,
> DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good
> for Derby to provide an automatic way of updating the statistics without
> requiring to run the stored procedure manually. There was some discussion on
> DERBY-269 about providing the 0-admin way. I have copied it here for
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM
> Some sort of zero admin solution for updating statistics would be prefferable
> to the manual 'update statistics'
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM
> I have not seen any other suggestions, how about the following zero admin
> solution? It is not perfect - suggestions welcome.
> Along with the statistics storing, save how many rows were in the table when
> exact statistics were calculated. This number is 0 if none have been
> calculated because index creation happened on an empty table. At query
> compile time when we look up statistics we automatically recalculate the
> statistics at certain threshholds - say something like row count growing past
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere
> around how many rows we can process in some small amount of time - like 1
> second on a modern laptop. If we are worried about response time, maybe we
> background queue the stat gathering rather than waiting with maybe some quick
> load if no stat has ever been gathered. The background gathering could be
> optimized to not interfere with locks by using read uncommitted.
> I think it would be useful to also have the manual call just to make it easy
> to support customers and debug issues in the field. There is proably always
> some dynamic data distribution change that in some case won't be picked up by
> the automatic algorithm. Also just very useful for those who have complete
> control of the create ddl, load data, run stats, deliver application process.
> *
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680349#action_12680349 ] Bryan Pendleton commented on DERBY-3788: Seems like the nested transactions can have locking problems, too; if the nested transaction's lock requests block, you can get an un-detected deadlock, I believe. I think we recently had to change some code which took locks in a nested transaction to take those locks "nowait", and if they couldn't be immediately satisfied, to retry the locks in the user transaction. Unfortunately I don't recall exactly which issue had this multi-step lock retry behavior. > Provide a zero-admin way of updating the statisitcs of an index > --- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: New Feature > Components: Performance >Affects Versions: 10.5.0.0 >Reporter: Mamta A. Satoor >Assignee: Mamta A. Satoor > Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, > DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, > DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, > DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt, > DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt, > DERBY_3788_Mgr.java, DERBY_3788_Repro.java > > > DERBY-269 provided a manual way of updating the statistics using the new > system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good > for Derby to provide an automatic way of updating the statistics without > requiring to run the stored procedure manually. There was some discussion on > DERBY-269 about providing the 0-admin way. I have copied it here for > reference. > * > Kathey Marsden - 22/May/05 03:53 PM > Some sort of zero admin solution for updating statistics would be prefferable > to the manual 'update statistics' > * > * > Mike Matrigali - 11/Jun/08 12:37 PM > I have not seen any other suggestions, how about the following zero admin > solution? It is not perfect - suggestions welcome. > Along with the statistics storing, save how many rows were in the table when > exact statistics were calculated. This number is 0 if none have been > calculated because index creation happened on an empty table. At query > compile time when we look up statistics we automatically recalculate the > statistics at certain threshholds - say something like row count growing past > next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere > around how many rows we can process in some small amount of time - like 1 > second on a modern laptop. If we are worried about response time, maybe we > background queue the stat gathering rather than waiting with maybe some quick > load if no stat has ever been gathered. The background gathering could be > optimized to not interfere with locks by using read uncommitted. > I think it would be useful to also have the manual call just to make it easy > to support customers and debug issues in the field. There is proably always > some dynamic data distribution change that in some case won't be picked up by > the automatic algorithm. Also just very useful for those who have complete > control of the create ddl, load data, run stats, deliver application process. > * -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671349#action_12671349 ] Mamta A. Satoor commented on DERBY-3788: Currently, I am pursuing running the missing statistics inline in the compile phase of a SELECT query. The compile phase code is driven by GenericStatement.prepMinion. My first attempt was to run the update statistics inline in the optimize phase of a SELECT sql (because in the optimize phase of a query, we look for statistics for determining the query plan for execution and that is when we know that the statistics are missing) but there I ran into data dictionary being in read-only mode. In GenericStatement.prepMinion method, at line 307, we mark the data dictionary(DD) as read-only for the duration of bind and optimize phase and hence statistics can't be updated because DD has been marked readonly at this time. At line 422, after the optimize phase, we mark the DD as done reading and hence DD is not in read-only mode during the generate phase. Based on this, I tried running the statistics inline in the generate phase of SELECT query to establish the fact that the statistics can indeed be run in-;ine. I ran into only one coding issues DERBY-4048 (about AlterTableConstantAction not using execute transaction for updating the statistics). After resolving DERBY-4048, I am able to run the update statistics inline in the generate phase of a SELECT query for a simple test case as shown below create table t1 (c11 int, c12 char); create index i1 on t1(c11); insert into t1 values (1,'1'),(2,'2'),(3,'3'); -- no statistics will be found for t1.i1 because table was empty when index i1 was created select * from sys.sysstatistics; -- during the generate phase of following query, we will run update statistics inline for t1.i1 select * from t1 where c11=1; -- this time, there will be statistics for t1.i1 select * from sys.sysstatistics; Note, that there were no locking issues/privilege issues involved in the simple test case above. But when update statistics runs into some problem, the question is how do we just revert back the update statistics work and let the original SELECT query proceed with execution. When I ran junit, towards the very beginning, I saw lot of problems related to not having enough privileges, context stack issue etc because of update statistics. I think the right approach would be to start with the compile phase of SELECT query and during optimization, when we find that the statistics are missing, then quit the compilation of SELECT query there, run the update statistics and refire the compilation of the SELECT query. This way, update statistics will not run into any locking issues with SELECT query. I am going to try to spend some time working on this approach and see how it goes. Would appreicate if anyone has any feedback. > Provide a zero-admin way of updating the statisitcs of an index > --- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: New Feature > Components: Performance >Affects Versions: 10.5.0.0 >Reporter: Mamta A. Satoor >Assignee: Mamta A. Satoor > Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, > DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, > DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, DERBY_3788_Mgr.java, > DERBY_3788_Repro.java > > > DERBY-269 provided a manual way of updating the statistics using the new > system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good > for Derby to provide an automatic way of updating the statistics without > requiring to run the stored procedure manually. There was some discussion on > DERBY-269 about providing the 0-admin way. I have copied it here for > reference. > * > Kathey Marsden - 22/May/05 03:53 PM > Some sort of zero admin solution for updating statistics would be prefferable > to the manual 'update statistics' > * > * > Mike Matrigali - 11/Jun/08 12:37 PM > I have not seen any other suggestions, how about the following zero admin > solution? It is not perfect - suggestions welcome. > Along with the statistics storing, save how many rows were in the table when > exact statistics were calculated. This number is 0 if none have been > calculated because index creation happened on an empty table. At query > compile time when we look up statistics we automatically recalculate the > statistics at certain threshholds - say something like row count growing past > next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere > around how many rows we can process in some small amount of time - like 1 > s
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652427#action_12652427
]
Mamta A. Satoor commented on DERBY-3788:
Knut, couple more responses to some of your comments.
3) Creating an EmbedConnection30 object directly breaks the
modularity. Unless the calls to the internal methods are necessary, it
may be better to use InternalDriver.activeDriver().connect(url, info)
instead.
There was no specific need to be directly creating EmbedConnection30 object, so
I have used your recommendation to get the Connection object.
5) There's a comment in DDImpl5.updateStatisticsInBackGround() saying
that "cm is null the very first time, and whenever we aren't actually
nested." I'm not sure I understand that comment. Why is it null the
first time? And isn't the method always called in a nested context?
And if it is null, wouldn't that cause a NullPointerException in
EmbedConnection's constructor when url=null is passed in?
I used the current model to get the ContextManager similar to how we get it in
jdbc,InternalDriver:getConnectionContext and there we do a check for null
ContextManager. Looking at that code, I thought there might be a case where
ContextManager could be null. To address the NullPointerException that may
result because of url being null in case of null ContextManager, I have changed
the code in DDImpl5.updateStatisticsInBackGround() to fire the background
update statistics only if ContextManager is not null. So the new code looks as
follows
public void updateStatisticsInBackGround(String schemaName,
String tableName, String indexName) throws
StandardException{
String url = null;
Properties info = null;
if (executorForUpdateStatistics==null)
{
executorForUpdateStatistics = new
ThreadPoolExecutor(5,5,0L,
TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(5));
executorForUpdateStatistics.setRejectedExecutionHandler(
new
ThreadPoolExecutor.CallerRunsPolicy());
}
ContextService csf = ContextService.getFactory();
ContextManager cm = csf.getCurrentContextManager();
ConnectionContext localCC = null;
/*
cm is null the very first time, and whenever
we aren't actually nested.
*/
if (cm != null) {
localCC = (ConnectionContext)
(cm.getContext(ConnectionContext.CONTEXT_ID));
TransactionResourceImpl tr = localCC.getTR();
url = tr.getUrl();
info = tr.getInfo();
executorForUpdateStatistics.execute(new
BackgroundUpdateStatisticTask
(schemaName,
tableName, indexName,
url, info));
}
return;
}
}
Will work on addressing the remaining comments.
> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
> Issue Type: New Feature
> Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt,
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, DERBY_3788_Mgr.java,
> DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good
> for Derby to provide an automatic way of updating the statistics without
> requiring to run the stored procedure manually. There was some discussion on
> DERBY-269 about providing the 0-admin way. I have copied it here for
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM
> Some sort of zero admin solution for updating statistics would be prefferable
> to the manual 'update statistics'
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM
> I have not seen any other suggestions, how about the following zero admin
> solution? It is not perfect - suggestions welcome.
> Along with the statistics storing, save how many rows were in the table when
> exact statistics were calculated. This number is 0 if none have been
> calculated because index creation happened on an empty table. At query
>
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652114#action_12652114 ] Mamta A. Satoor commented on DERBY-3788: My understanding of DERBY-3892 is as follows 1)A query is exectued and the best plan for it is picked up based on the current data in the database 2)The data changes quite a bit which makes the plan chosen for the query not an optimal plan but during the next execution of the query, we continue to use the plan chosen in step 1). Based on the above understanding of DERBY-3892, the current work going on for DERBY-3788 is not going to fix DERBY-3892. This jira entry will go and build the statistics if during the query compilation it is found that the required statistics do not exist. Hope this answers your question about DERBY-3892, Dag. > Provide a zero-admin way of updating the statisitcs of an index > --- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: New Feature > Components: Performance >Affects Versions: 10.5.0.0 >Reporter: Mamta A. Satoor >Assignee: Mamta A. Satoor > Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, > DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, DERBY_3788_Mgr.java, > DERBY_3788_Repro.java > > > DERBY-269 provided a manual way of updating the statistics using the new > system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good > for Derby to provide an automatic way of updating the statistics without > requiring to run the stored procedure manually. There was some discussion on > DERBY-269 about providing the 0-admin way. I have copied it here for > reference. > * > Kathey Marsden - 22/May/05 03:53 PM > Some sort of zero admin solution for updating statistics would be prefferable > to the manual 'update statistics' > * > * > Mike Matrigali - 11/Jun/08 12:37 PM > I have not seen any other suggestions, how about the following zero admin > solution? It is not perfect - suggestions welcome. > Along with the statistics storing, save how many rows were in the table when > exact statistics were calculated. This number is 0 if none have been > calculated because index creation happened on an empty table. At query > compile time when we look up statistics we automatically recalculate the > statistics at certain threshholds - say something like row count growing past > next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere > around how many rows we can process in some small amount of time - like 1 > second on a modern laptop. If we are worried about response time, maybe we > background queue the stat gathering rather than waiting with maybe some quick > load if no stat has ever been gathered. The background gathering could be > optimized to not interfere with locks by using read uncommitted. > I think it would be useful to also have the manual call just to make it easy > to support customers and debug issues in the field. There is proably always > some dynamic data distribution change that in some case won't be picked up by > the automatic algorithm. Also just very useful for those who have complete > control of the create ddl, load data, run stats, deliver application process. > * -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649882#action_12649882 ] Dag H. Wanvik commented on DERBY-3788: -- Would a solution to this issue solve DERBY-3892? If so, it might be good to say so there and link it to this issue. > Provide a zero-admin way of updating the statisitcs of an index > --- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: New Feature > Components: Performance >Affects Versions: 10.5.0.0 >Reporter: Mamta A. Satoor >Assignee: Mamta A. Satoor > Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, > DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, DERBY_3788_Mgr.java, > DERBY_3788_Repro.java > > > DERBY-269 provided a manual way of updating the statistics using the new > system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good > for Derby to provide an automatic way of updating the statistics without > requiring to run the stored procedure manually. There was some discussion on > DERBY-269 about providing the 0-admin way. I have copied it here for > reference. > * > Kathey Marsden - 22/May/05 03:53 PM > Some sort of zero admin solution for updating statistics would be prefferable > to the manual 'update statistics' > * > * > Mike Matrigali - 11/Jun/08 12:37 PM > I have not seen any other suggestions, how about the following zero admin > solution? It is not perfect - suggestions welcome. > Along with the statistics storing, save how many rows were in the table when > exact statistics were calculated. This number is 0 if none have been > calculated because index creation happened on an empty table. At query > compile time when we look up statistics we automatically recalculate the > statistics at certain threshholds - say something like row count growing past > next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere > around how many rows we can process in some small amount of time - like 1 > second on a modern laptop. If we are worried about response time, maybe we > background queue the stat gathering rather than waiting with maybe some quick > load if no stat has ever been gathered. The background gathering could be > optimized to not interfere with locks by using read uncommitted. > I think it would be useful to also have the manual call just to make it easy > to support customers and debug issues in the field. There is proably always > some dynamic data distribution change that in some case won't be picked up by > the automatic algorithm. Also just very useful for those who have complete > control of the create ddl, load data, run stats, deliver application process. > * -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649820#action_12649820 ] Mamta A. Satoor commented on DERBY-3788: Thanks again for your time, Knut. I will try to address some of the questions in this comment. I am planning on taking next week as vacation so will get back on this task after that. 1)In this jira entry, missing statisitcs will be detected when the user executes a select. I had originally just for the sake of trying attempted to create the statistics in the select thread when I find that the statistics are missing. But executing an update inside of a select processing caused problem because data dictionary was not in write mode. Because of that, I couldn't update the statistics during the select processing. One very simple example where one will find missing statistics is as follows create table t1 (c11 int, c12 char); create index i1 on t1(c11); insert into t1 values (1,'1'),(2,'2'),(3,'3'); select * from sys.sysstatistics; My goal is to fire the statistics in background when user executes say following sql select * from t1 where c11=1; 2)As for the calls to setting the context and unsetting, I think you are right. I will remove them and see how it goes. > Provide a zero-admin way of updating the statisitcs of an index > --- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: New Feature > Components: Performance >Affects Versions: 10.5.0.0 >Reporter: Mamta A. Satoor >Assignee: Mamta A. Satoor > Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, > DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, DERBY_3788_Mgr.java, > DERBY_3788_Repro.java > > > DERBY-269 provided a manual way of updating the statistics using the new > system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good > for Derby to provide an automatic way of updating the statistics without > requiring to run the stored procedure manually. There was some discussion on > DERBY-269 about providing the 0-admin way. I have copied it here for > reference. > * > Kathey Marsden - 22/May/05 03:53 PM > Some sort of zero admin solution for updating statistics would be prefferable > to the manual 'update statistics' > * > * > Mike Matrigali - 11/Jun/08 12:37 PM > I have not seen any other suggestions, how about the following zero admin > solution? It is not perfect - suggestions welcome. > Along with the statistics storing, save how many rows were in the table when > exact statistics were calculated. This number is 0 if none have been > calculated because index creation happened on an empty table. At query > compile time when we look up statistics we automatically recalculate the > statistics at certain threshholds - say something like row count growing past > next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere > around how many rows we can process in some small amount of time - like 1 > second on a modern laptop. If we are worried about response time, maybe we > background queue the stat gathering rather than waiting with maybe some quick > load if no stat has ever been gathered. The background gathering could be > optimized to not interfere with locks by using read uncommitted. > I think it would be useful to also have the manual call just to make it easy > to support customers and debug issues in the field. There is proably always > some dynamic data distribution change that in some case won't be picked up by > the automatic algorithm. Also just very useful for those who have complete > control of the create ddl, load data, run stats, deliver application process. > * -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648558#action_12648558 ] Knut Anders Hatlen commented on DERBY-3788: --- Thanks for the updated patch, Mamta. I'm afraid I cannot offer very much guidance, but I have some questions and comments: 1) This patch addresses the problem with non-existing statistics, not the problem with outdated statistics. In which situations don't the statistics exist? If it doesn't happen very often, it might be fine to update the statistics in the same thread. 2) Are the calls to EmbedConnection30.setupContextStack() and restoreContextStack() needed around the call to execute()? I thought execute() would call setup/restoreContextStack() itself. 3) Creating an EmbedConnection30 object directly breaks the modularity. Unless the calls to the internal methods are necessary, it may be better to use InternalDriver.activeDriver().connect(url, info) instead. 4) I think that the creation of a new connection will reboot the database if it has been shut down in the user thread, which may lead to unpredictable behaviour. It also seems like it will preserve all connection attributes, like attributes to reencrypt the database or to start replication master. 5) There's a comment in DDImpl5.updateStatisticsInBackGround() saying that "cm is null the very first time, and whenever we aren't actually nested." I'm not sure I understand that comment. Why is it null the first time? And isn't the method always called in a nested context? And if it is null, wouldn't that cause a NullPointerException in EmbedConnection's constructor when url=null is passed in? 6) DDImpl5.updateStatisticsInBackGround() updates the shared variable executorForUpdateStatistics if it is null. But it is not protected by synchronization, so race conditions are possible. 7) DDImpl5.stop() should call super.stop(). 8) In BackgroundUpdateStatisticsTask, using a prepared statement with the table name and the index name parametrized would be better because it would handle quoting special characters correctly (not handled in the current patch) and it would reduce the number of entries in the statement cache. 9) As to the locking issues, I would have tried to call Connection.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED) in the background thread to see if that solved/reduced the issues. > Provide a zero-admin way of updating the statisitcs of an index > --- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: New Feature > Components: Performance >Affects Versions: 10.5.0.0 >Reporter: Mamta A. Satoor >Assignee: Mamta A. Satoor > Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, > DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, DERBY_3788_Mgr.java, > DERBY_3788_Repro.java > > > DERBY-269 provided a manual way of updating the statistics using the new > system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good > for Derby to provide an automatic way of updating the statistics without > requiring to run the stored procedure manually. There was some discussion on > DERBY-269 about providing the 0-admin way. I have copied it here for > reference. > * > Kathey Marsden - 22/May/05 03:53 PM > Some sort of zero admin solution for updating statistics would be prefferable > to the manual 'update statistics' > * > * > Mike Matrigali - 11/Jun/08 12:37 PM > I have not seen any other suggestions, how about the following zero admin > solution? It is not perfect - suggestions welcome. > Along with the statistics storing, save how many rows were in the table when > exact statistics were calculated. This number is 0 if none have been > calculated because index creation happened on an empty table. At query > compile time when we look up statistics we automatically recalculate the > statistics at certain threshholds - say something like row count growing past > next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere > around how many rows we can process in some small amount of time - like 1 > second on a modern laptop. If we are worried about response time, maybe we > background queue the stat gathering rather than waiting with maybe some quick > load if no stat has ever been gathered. The background gathering could be > optimized to not interfere with locks by using read uncommitted. > I think it would be useful to also have the manual call just to make it easy > to support customers and debug issues in the field. There is proably always > some dynamic data distribution change that in some case won't be picked up by > the automatic a
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647253#action_12647253 ] Knut Anders Hatlen commented on DERBY-3788: --- Hi Mamta, Sorry for posting the comments so late. I don't see the DataDictionaryImpl5 class in the patch. Did you forget to do svn add? Which isolation level does the background thread use when it updates the statistics? I see that Mike has suggested read uncommitted in order to minimize the impact on locking. > Provide a zero-admin way of updating the statisitcs of an index > --- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: New Feature > Components: Performance >Affects Versions: 10.5.0.0 >Reporter: Mamta A. Satoor >Assignee: Mamta A. Satoor > Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, > DERBY_3788_Mgr.java, DERBY_3788_Repro.java > > > DERBY-269 provided a manual way of updating the statistics using the new > system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good > for Derby to provide an automatic way of updating the statistics without > requiring to run the stored procedure manually. There was some discussion on > DERBY-269 about providing the 0-admin way. I have copied it here for > reference. > * > Kathey Marsden - 22/May/05 03:53 PM > Some sort of zero admin solution for updating statistics would be prefferable > to the manual 'update statistics' > * > * > Mike Matrigali - 11/Jun/08 12:37 PM > I have not seen any other suggestions, how about the following zero admin > solution? It is not perfect - suggestions welcome. > Along with the statistics storing, save how many rows were in the table when > exact statistics were calculated. This number is 0 if none have been > calculated because index creation happened on an empty table. At query > compile time when we look up statistics we automatically recalculate the > statistics at certain threshholds - say something like row count growing past > next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere > around how many rows we can process in some small amount of time - like 1 > second on a modern laptop. If we are worried about response time, maybe we > background queue the stat gathering rather than waiting with maybe some quick > load if no stat has ever been gathered. The background gathering could be > optimized to not interfere with locks by using read uncommitted. > I think it would be useful to also have the manual call just to make it easy > to support customers and debug issues in the field. There is proably always > some dynamic data distribution change that in some case won't be picked up by > the automatic algorithm. Also just very useful for those who have complete > control of the create ddl, load data, run stats, deliver application process. > * -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646924#action_12646924 ] Mamta A. Satoor commented on DERBY-3788: I was wondering if anyone got a chance to look at the patch and have any comments about how I am getting an EmbedConnection in the background thread to update the statistics. Thanks. > Provide a zero-admin way of updating the statisitcs of an index > --- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: New Feature > Components: Performance >Affects Versions: 10.5.0.0 >Reporter: Mamta A. Satoor >Assignee: Mamta A. Satoor > Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, > DERBY_3788_Mgr.java, DERBY_3788_Repro.java > > > DERBY-269 provided a manual way of updating the statistics using the new > system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good > for Derby to provide an automatic way of updating the statistics without > requiring to run the stored procedure manually. There was some discussion on > DERBY-269 about providing the 0-admin way. I have copied it here for > reference. > * > Kathey Marsden - 22/May/05 03:53 PM > Some sort of zero admin solution for updating statistics would be prefferable > to the manual 'update statistics' > * > * > Mike Matrigali - 11/Jun/08 12:37 PM > I have not seen any other suggestions, how about the following zero admin > solution? It is not perfect - suggestions welcome. > Along with the statistics storing, save how many rows were in the table when > exact statistics were calculated. This number is 0 if none have been > calculated because index creation happened on an empty table. At query > compile time when we look up statistics we automatically recalculate the > statistics at certain threshholds - say something like row count growing past > next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere > around how many rows we can process in some small amount of time - like 1 > second on a modern laptop. If we are worried about response time, maybe we > background queue the stat gathering rather than waiting with maybe some quick > load if no stat has ever been gathered. The background gathering could be > optimized to not interfere with locks by using read uncommitted. > I think it would be useful to also have the manual call just to make it easy > to support customers and debug issues in the field. There is proably always > some dynamic data distribution change that in some case won't be picked up by > the automatic algorithm. Also just very useful for those who have complete > control of the create ddl, load data, run stats, deliver application process. > * -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629192#action_12629192
]
Mamta A. Satoor commented on DERBY-3788:
I now do have the basic framework of using the ThreadPoolExecutor in new
extended class DataDictionary5(DD5). I have called the method in DD5 as
updateStatisticsInBackGround rather than scheduleUpdateStatisticstask. This
method in DD5 right now just queues println tasks everytime an update
statistics request is sent its ways. This is what the method code in DD5 looks
like right now
public void updateStatisticsInBackGround(String schemaName,
String tableName, String indexName) {
System.out.println("came to jdk 1.5 implementation");
if (executorForUpdateStatistics==null)
{
executorForUpdateStatistics = new
ThreadPoolExecutor(5,5,0L,
TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(5));
executorForUpdateStatistics.setRejectedExecutionHandler(
new
ThreadPoolExecutor.CallerRunsPolicy());
}
executorForUpdateStatistics.execute(new
BackgroundUpdateStatisticTask (schemaName,
tableName, indexName));
return;
}
The new class BackgroundUpdateStatisticTask right now just does println as
shown below
class BackgroundUpdateStatisticTask implements Runnable{
private String schemaName;
private String tableName;
private String indexName;
public BackgroundUpdateStatisticTask(String schemaName,
String tableName, String indexName) {
this.schemaName=schemaName;
this.tableName=tableName;
this.indexName=indexName;
}
public void run()
{
System.out.println("Hello World : Updating statistics for "
+ schemaName + ":" + tableName + ":" +
indexName);
}
}
As the next step, I now want to actually do update statistics in
BackgroundUpdateStatisticTask rather than just println. I thought I would be
able to do something like following in the run method in
BackgroundUpdateStatisticTask
public void run()
{
System.out.println("Hello World : Updating statistics for "
+ schemaName + ":" + tableName + ":" +
indexName);
try {
SystemProcedures.SYSCS_UPDATE_STATISTICS(schemaName,
tableName,
indexName);
} catch (SQLException e)
{
System.out.println("got exception");
e.printStackTrace();
}
}
But a call to SystemProcedures.SYSCS_UPDATE_STATISTICS inside of the run method
results in
java.sql.SQLException: No current connection.
at
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:87)
at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:103)
at org.apache.derby.impl.jdbc.Util.noCurrentConnection(Util.java:207)
at
org.apache.derby.catalog.SystemProcedures.getDefaultConn(SystemProcedures.java:185)
at
org.apache.derby.catalog.SystemProcedures.SYSCS_UPDATE_STATISTICS(SystemProcedures.java:737)
at
org.apache.derby.impl.sql.catalog.BackgroundUpdateStatisticTask.run(DataDictionaryImpl5.java:90)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)
So, it appears that I do not have the context setting done correctly in order
to run the stored procedure implementation in
SystemProcedures.SYSCS_UPDATE_STATISTICS. I am currently trying to see how we
do the required setup when a JDBC connection is made but at this point, I am
not clear on how might be able to use some of that code from JDBC connection
time into BackgroundUpdateStatisticTask.run method. I will appreciate any help
I can get to do the context setting so I can run the update statistics. thanks.
> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
> Issue Type: New Feature
> Components: Performa
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626713#action_12626713
]
Mamta A. Satoor commented on DERBY-3788:
As per my comment on this jira entry yesterday, I am trying to use
ThreadPoolExecutor in Derby codeline so it can be used to fire the update
statistics tasks in the background. ThreadPoolExecutor was added as part of
jdk1.5 When I include the import of this class into DataDictionary.java(I will
refer to it as DD), I ofcourse need to make sure that the DD.java gets compiled
with 1.5 and higher. But at run time, requiring the user to have 1.5 or higher
is not going to work.
At this point, I am thinking of addressing this by having a subclass of DD,
called say DD5 which will be loaded by the monitor code if we know that we are
dealing with jdk 1.5 or higher. For the earlier versions, *for now*, trying to
schedule update statistics tasks in the background will be a no-op. This way, I
will be able to have the new code run(rather do no-op for jdk1.4 and lower) in
all supported versions of jdks. If anyone has any feedback on my approach,
please let me know.
Some background information : iapi.sql.dictionary.TableDescriptor has an
existing method called statisticsExist which will return true if the statistics
exist. This method gets called by the query optimization phase along with
AlterTableConstantAction. For the query optimization phase, we want to be able
to schedule update statistics tasks in the background if
TableDescriptor.statisticsExist () returns false. Following is the pseudo-code
I have in mind:
during the query optimization
if (TableDescriptor.statisticsExist == false) {
TableDescriptor.createStatisticsInBackGround(schemaname, tablename,
indexname) <- new method in TableDescriptor
}
TableDescriptor with new method
createStatisticsInBackGround(.) {
DD.scheduleUpdateStatisticstask(...) <-- new method in DD.
}
DD5 will do the real work of scheduling the taks in background. In DD(ie when
running with jdk1.4 and lower), this method will be no-op.
DD5 with new method (for jdk1.5 and higher)
scheduleUpdateStatisticstask(...) {
create ThreadPoolExecutor if does not already exist
queue update statistics task on the ThreadPoolExecutor
}
DD with new method (for jdk1.4 and lower)
scheduleUpdateStatisticstask(...) {
no-op
}
> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
> Issue Type: New Feature
> Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good
> for Derby to provide an automatic way of updating the statistics without
> requiring to run the stored procedure manually. There was some discussion on
> DERBY-269 about providing the 0-admin way. I have copied it here for
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM
> Some sort of zero admin solution for updating statistics would be prefferable
> to the manual 'update statistics'
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM
> I have not seen any other suggestions, how about the following zero admin
> solution? It is not perfect - suggestions welcome.
> Along with the statistics storing, save how many rows were in the table when
> exact statistics were calculated. This number is 0 if none have been
> calculated because index creation happened on an empty table. At query
> compile time when we look up statistics we automatically recalculate the
> statistics at certain threshholds - say something like row count growing past
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere
> around how many rows we can process in some small amount of time - like 1
> second on a modern laptop. If we are worried about response time, maybe we
> background queue the stat gathering rather than waiting with maybe some quick
> load if no stat has ever been gathered. The background gathering could be
> optimized to not interfere with locks by using read uncommitted.
> I think it would be useful to also have the manual call just to make it easy
> to support customers and debug issues in the field. There is proably always
> some dynamic data distribution change that in some case won't be picked up by
> the automatic algorithm. Also just very useful for those who
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621910#action_12621910
]
Mamta A. Satoor commented on DERBY-3788:
I am looking to see if the statistics do not exist for an index during a query
compile phase, then try to create them while compiling the query. But trying to
create statistics during query compilation throw following exception.
ERROR XCL21: You are trying to execute a Data Definition statement (CREATE,
DROP, or ALTER) while preparing a different statement. This is not allowed. It
can happen if you execute a Data Definition statement from within a static
initializer of a Java class that is being used from within a SQL statement.
The exception is being thrown from
DataDictionaryImpl.startWriting(LanguageConnectionContext) through following
piece of code
/*
** Don't allow DDL if we're binding a SQL statement.
*/
if (lcc.getBindCount() != 0)
{
throw StandardException.newException(SQLState.LANG_DDL_IN_BIND);
}
My question is can we disable this say when statistics are getting updated
while a query is getting executed? Is there any other way of achieving creating
statistics on fly when we know that we are going to need them during the
execution of the current query? Appreciate any feedack.
> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
> Issue Type: New Feature
> Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
>
> DERBY-269 provided a manual way of updating the statistics using the new
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good
> for Derby to provide an automatic way of updating the statistics without
> requiring to run the stored procedure manually. There was some discussion on
> DERBY-269 about providing the 0-admin way. I have copied it here for
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM
> Some sort of zero admin solution for updating statistics would be prefferable
> to the manual 'update statistics'
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM
> I have not seen any other suggestions, how about the following zero admin
> solution? It is not perfect - suggestions welcome.
> Along with the statistics storing, save how many rows were in the table when
> exact statistics were calculated. This number is 0 if none have been
> calculated because index creation happened on an empty table. At query
> compile time when we look up statistics we automatically recalculate the
> statistics at certain threshholds - say something like row count growing past
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere
> around how many rows we can process in some small amount of time - like 1
> second on a modern laptop. If we are worried about response time, maybe we
> background queue the stat gathering rather than waiting with maybe some quick
> load if no stat has ever been gathered. The background gathering could be
> optimized to not interfere with locks by using read uncommitted.
> I think it would be useful to also have the manual call just to make it easy
> to support customers and debug issues in the field. There is proably always
> some dynamic data distribution change that in some case won't be picked up by
> the automatic algorithm. Also just very useful for those who have complete
> control of the create ddl, load data, run stats, deliver application process.
> *
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618494#action_12618494 ] Mamta A. Satoor commented on DERBY-3788: Currently, iapi.sql.dictionary.TableDescriptor has a method called statisticsExist which checks for the passed conglomerate if there is statistics availble or not. This method gets called 1)during the optimization phase of a query by impl.sql.compile.FromBaseTable.estimateCost 2)during the optimization phase of a query by impl.sql.compile.PredicateList.selectivity 3)by impl.sql.execute.AlterTableConstantAction to determine if the statistics already exist and if yes, then drop and recreate it. This happens during the compress table request(not sure if anything other than compress table in AlterTableConstantAction calls it but we do not need to worry about it because we only want to create the statistics if they don't already exist during the compile phase of a sql query) What I am considering doing is adding another method to TableDescriptor called say createStatistics which will be called during the optimization phase of a query by the 2 classes impl.sql.compile.FromBaseTable and impl.sql.compile.PredicateList. The only awkward thing I am finding is the code required to update the statistics need many objects like LanguageConnectionContext, DataDictionary, TransactionController, etc (there may be more) and they are not available to TableDescriptor class so I will have to pass these objects when a call is made to the new method in TableDescriptor from impl.sql.compile. Does this sound like not a smooth way of getting the objects? Maybe this new method should be defined in DataDictionary rather than TableDescriptor. I will work more on what is the right place for the new method. In the mean time, if anyone has any thoughts, please post them here. Once we have this new method to create the statistics, we can hopefully remove the redudant code that already exists to create the statistics in 3 different classes, namely, AlterTableConstantAction, CreateIndexConstantAction and InsetResultSet. > Provide a zero-admin way of updating the statisitcs of an index > --- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: New Feature > Components: Performance >Affects Versions: 10.5.0.0 >Reporter: Mamta A. Satoor >Assignee: Mamta A. Satoor > > DERBY-269 provided a manual way of updating the statistics using the new > system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good > for Derby to provide an automatic way of updating the statistics without > requiring to run the stored procedure manually. There was some discussion on > DERBY-269 about providing the 0-admin way. I have copied it here for > reference. > * > Kathey Marsden - 22/May/05 03:53 PM > Some sort of zero admin solution for updating statistics would be prefferable > to the manual 'update statistics' > * > * > Mike Matrigali - 11/Jun/08 12:37 PM > I have not seen any other suggestions, how about the following zero admin > solution? It is not perfect - suggestions welcome. > Along with the statistics storing, save how many rows were in the table when > exact statistics were calculated. This number is 0 if none have been > calculated because index creation happened on an empty table. At query > compile time when we look up statistics we automatically recalculate the > statistics at certain threshholds - say something like row count growing past > next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere > around how many rows we can process in some small amount of time - like 1 > second on a modern laptop. If we are worried about response time, maybe we > background queue the stat gathering rather than waiting with maybe some quick > load if no stat has ever been gathered. The background gathering could be > optimized to not interfere with locks by using read uncommitted. > I think it would be useful to also have the manual call just to make it easy > to support customers and debug issues in the field. There is proably always > some dynamic data distribution change that in some case won't be picked up by > the automatic algorithm. Also just very useful for those who have complete > control of the create ddl, load data, run stats, deliver application process. > * -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618461#action_12618461 ] Mamta A. Satoor commented on DERBY-3788: I am looking at creating the statistics automatically if they are found to be absent when needed. As a next step later, we can look at updating the statistics automatically under certain threshold if they appear to be out of sync. > Provide a zero-admin way of updating the statisitcs of an index > --- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: New Feature > Components: Performance >Affects Versions: 10.5.0.0 >Reporter: Mamta A. Satoor > > DERBY-269 provided a manual way of updating the statistics using the new > system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good > for Derby to provide an automatic way of updating the statistics without > requiring to run the stored procedure manually. There was some discussion on > DERBY-269 about providing the 0-admin way. I have copied it here for > reference. > * > Kathey Marsden - 22/May/05 03:53 PM > Some sort of zero admin solution for updating statistics would be prefferable > to the manual 'update statistics' > * > * > Mike Matrigali - 11/Jun/08 12:37 PM > I have not seen any other suggestions, how about the following zero admin > solution? It is not perfect - suggestions welcome. > Along with the statistics storing, save how many rows were in the table when > exact statistics were calculated. This number is 0 if none have been > calculated because index creation happened on an empty table. At query > compile time when we look up statistics we automatically recalculate the > statistics at certain threshholds - say something like row count growing past > next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere > around how many rows we can process in some small amount of time - like 1 > second on a modern laptop. If we are worried about response time, maybe we > background queue the stat gathering rather than waiting with maybe some quick > load if no stat has ever been gathered. The background gathering could be > optimized to not interfere with locks by using read uncommitted. > I think it would be useful to also have the manual call just to make it easy > to support customers and debug issues in the field. There is proably always > some dynamic data distribution change that in some case won't be picked up by > the automatic algorithm. Also just very useful for those who have complete > control of the create ddl, load data, run stats, deliver application process. > * -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index
[ https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618394#action_12618394 ] Mamta A. Satoor commented on DERBY-3788: Added a new test case with revision 681085 which shows that updating the statistics will make a query pickup better index compare to prior to statistics availability > Provide a zero-admin way of updating the statisitcs of an index > --- > > Key: DERBY-3788 > URL: https://issues.apache.org/jira/browse/DERBY-3788 > Project: Derby > Issue Type: New Feature > Components: Performance >Affects Versions: 10.5.0.0 >Reporter: Mamta A. Satoor > > DERBY-269 provided a manual way of updating the statistics using the new > system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good > for Derby to provide an automatic way of updating the statistics without > requiring to run the stored procedure manually. There was some discussion on > DERBY-269 about providing the 0-admin way. I have copied it here for > reference. > * > Kathey Marsden - 22/May/05 03:53 PM > Some sort of zero admin solution for updating statistics would be prefferable > to the manual 'update statistics' > * > * > Mike Matrigali - 11/Jun/08 12:37 PM > I have not seen any other suggestions, how about the following zero admin > solution? It is not perfect - suggestions welcome. > Along with the statistics storing, save how many rows were in the table when > exact statistics were calculated. This number is 0 if none have been > calculated because index creation happened on an empty table. At query > compile time when we look up statistics we automatically recalculate the > statistics at certain threshholds - say something like row count growing past > next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere > around how many rows we can process in some small amount of time - like 1 > second on a modern laptop. If we are worried about response time, maybe we > background queue the stat gathering rather than waiting with maybe some quick > load if no stat has ever been gathered. The background gathering could be > optimized to not interfere with locks by using read uncommitted. > I think it would be useful to also have the manual call just to make it easy > to support customers and debug issues in the field. There is proably always > some dynamic data distribution change that in some case won't be picked up by > the automatic algorithm. Also just very useful for those who have complete > control of the create ddl, load data, run stats, deliver application process. > * -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
