[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2010-07-26 Thread Mamta A. Satoor (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892343#action_12892343
 ] 

Mamta A. Satoor commented on DERBY-3788:


Just wanted to summarize what has been tried for this jira so far and a 
proposal for a possible solution worth trying
1)First I tried to see if the missing stats could be created inline when the 
SELECT query compilation finds that missing but that failed because SELECT 
query compilation marked the data dictionary in read only mode where as update 
statistics needs to update the data dictionary(this marking of read only for 
data dictionary happens in GenericStatement.prepMinion). Because of this 
read-only marking, we can't create the statistics inline of the compilation 
phase of the SELECT query. 
2)Second approach was during the SELECT compile phase, if we come across 
missing stats, then schedule a task for later to update the stats. Finish the 
SELECT compilation and execution after the update stat task is scheduled on the 
queue (the scheduling mechanism used is only available with jdk 1.5 and 
higher). When the scheduled task was later fired, it would run no current 
connection exception. The reason for that was that there was no context set for 
update statistics to run. Tried resolving this by creating a connection context 
through following call InternalDriver.activeDriver().connect(url, info) I had 
the SELECT compilation pass the needed information to the background task so I 
can create the connection. This is pretty crude way of getting a connection to 
update the statistics. Along with the fact that the code to create the 
connection context was not very modular, some of the existing tests failed with 
this changes because of locking issues. This probably is because the locks held 
by the background thread for update stats interfered with locks required by the 
rest of the test. 
3)The third approach I tried was when the SELECT query detects missing stats, 
abort the query compilation, (thus removing the data dictionary from read-only 
mode), create the stats and then restart the original SELECT query compilation 
and execution. The hope with this approach was that it will avoid running into 
locking issues. If we decide to pursue this approach more, we need to make sure 
that we look at the possibility of stats still missing when the original query 
is compile the 2nd time. This can happen say because the user does not have 
privileges to update the stats, the update stat ran into locking issues. If we 
won't account for possibility of missing stats during the 2nd time through the 
original query, we can end up in an infinite loop. Another problem with this 
approach was that update stats were happening in the same transaction as the 
original SELECT query. Which means that the locks acquired by the update stat 
will not be released until the transaction is committed. Came across locking 
issues with this when I ran the existing junit and derbyall tests. One way to 
fix this could be to start a nested user transaction and do the update stat in 
that transaction and commit that transaction(I think something like this is 
done for identity columns. If yes, then we might find pointers there to use for 
update stats in nested user transaction). Thus any locks acquired by the update 
stat work will be released. And after that, go back to the original query again 
for compilation and execution. 
4)With any approach we take, we should detect the case where the table is empty 
and hence the stats might be missing. For such a case, we should not try to 
fire stats creation task. to rephrase, during the compile phase of the SELECT 
query, when we look for statistics, first check if the table is empty and if 
yes then skip the code for collecting stats. From what I recall, one can check 
if the table is empty by issuing open scan controller and asking it for the 
number of rows in the table. If we do not check for empty table, then we will 
end up in infinite loop. 
5)For read-only db we should detect that we are readonly and don't try to do 
update statistics 
6)Also, we probably want to create statistics only for user tables? Haven't 
thought enough about it. 
7)What if the user executing the query is not the owner of the table. should 
statistics still be created? I think if the user does not have enough 
privileges, then skip the step of creating stats since it's going to fail 
anyways. 
8)Once statistics are created, notify tabledescriptor about it because 
tabledescriptor  cahces available statistics and it will never know of the new 
statistics since tabledescriptor holds on to that cache. 
9)Once we have update stats working, another improvement could be to identify 
the cached compiled queries that may benefit from updated stats and have those 
queries recompile when they are executed next time. In other words, the 
relevant queries shou

[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2009-03-10 Thread Bryan Pendleton (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680484#action_12680484
 ] 

Bryan Pendleton commented on DERBY-3788:


Yes, that's the technique I remembered. Thanks for tracking it down! That seems 
fine to me.


> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, 
> DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, 
> DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt, 
> DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt, 
> DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2009-03-10 Thread Mamta A. Satoor (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680478#action_12680478
 ] 

Mamta A. Satoor commented on DERBY-3788:


Bryan, thanks for taking the time for this jira entry. Yes, I was planning on 
using nowait so that we do not wait and eventually possibly locking issues with 
parent. This is exactly what we do in InsertResultSet.getSetAutoincrementValue 
as shown below
try 
{
/* If tcToUse == tc, then we are using parent xaction-- this
   can happen if for some reason we couldn't start a nested
   transaction
   */
newValue = dd.getSetAutoincrementValue(
   constants.autoincRowLocation[index],
   tcToUse, true, aiCache[index], (tcToUse == tc));
}
catch (StandardException se)
{
if (tcToUse == tc)
{
/* we've using the parent xaction and we've timed out; just
   throw an error and exit.
*/
throw se;
}
if (se.getMessageId().equals(SQLState.LOCK_TIMEOUT))
{
// if we couldn't do this with a nested xaction, retry with
// parent-- we need to wait this time!
newValue = dd.getSetAutoincrementValue(
constants.autoincRowLocation[index],
tc, true, aiCache[index], true);
}


> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, 
> DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, 
> DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt, 
> DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt, 
> DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2009-03-09 Thread Bryan Pendleton (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680349#action_12680349
 ] 

Bryan Pendleton commented on DERBY-3788:


Seems like the nested transactions can have locking problems, too; if the 
nested transaction's
lock requests block, you can get an un-detected deadlock, I believe. I think we 
recently
had to change some code which took locks in a nested transaction to take those 
locks
"nowait", and if they couldn't be immediately satisfied, to retry the locks in 
the user transaction.
Unfortunately I don't recall exactly which issue had this multi-step lock retry 
behavior.

> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, 
> DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, 
> DERBY3788_update_all_missing_stats_after_bind_patch4_diff.txt, 
> DERBY3788_update_all_missing_stats_after_bind_patch4_stat.txt, 
> DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2009-02-06 Thread Mamta A. Satoor (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671349#action_12671349
 ] 

Mamta A. Satoor commented on DERBY-3788:


Currently, I am pursuing running the missing statistics inline in the compile 
phase of a SELECT query. The compile phase code is driven by 
GenericStatement.prepMinion. 

My first attempt was to run the update statistics inline in the optimize phase 
of a SELECT sql (because in the optimize phase of a query, we look for 
statistics for determining the query plan for execution and that is when we 
know that the statistics are missing) but there I ran into data dictionary 
being in read-only mode. In GenericStatement.prepMinion method, at line 307, we 
mark the data dictionary(DD) as read-only for the duration of bind and optimize 
phase and hence statistics can't be updated because DD has been marked readonly 
at this time. 

At line 422, after the optimize phase, we mark the DD as done reading and hence 
DD is not in read-only mode during the generate phase. Based on this, I tried 
running the statistics inline in the generate phase of SELECT query to 
establish the fact that the statistics can indeed be run in-;ine. I ran into 
only one coding issues DERBY-4048 (about AlterTableConstantAction not using 
execute transaction for updating the statistics). After resolving DERBY-4048, I 
am able to run the update statistics inline in the generate phase of a SELECT 
query for a simple test case as shown below
create table t1 (c11 int, c12 char);
create index i1 on t1(c11);
insert into t1 values (1,'1'),(2,'2'),(3,'3');
-- no statistics will be found for t1.i1 because table was empty when index i1 
was created
select * from sys.sysstatistics;
-- during the generate phase of following query, we will run update statistics 
inline for t1.i1
select * from t1 where c11=1;
-- this time, there will be statistics for t1.i1
select * from sys.sysstatistics;

Note, that there were no locking issues/privilege issues involved in the simple 
test case above. But when update statistics runs into some problem, the 
question is how do we just revert back the update statistics work and let the 
original SELECT query proceed with execution. When I ran junit, towards the 
very beginning, I saw lot of problems related to not having enough privileges, 
context stack issue etc because of update statistics.

I think the right approach would be to start with the compile phase of SELECT 
query and during optimization, when we find that the statistics are missing, 
then quit the compilation of SELECT query there, run the update statistics and 
refire the compilation of the SELECT query. This way, update statistics will 
not run into any locking issues with SELECT query. I am going to try to spend 
some time working on this approach and see how it goes. Would appreicate if 
anyone has any feedback.

> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, 
> DERBY3788_patch3_diff.txt, DERBY3788_patch3_stat.txt, DERBY_3788_Mgr.java, 
> DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> s

[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2008-12-02 Thread Mamta A. Satoor (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652427#action_12652427
 ] 

Mamta A. Satoor commented on DERBY-3788:


Knut, couple more responses to some of your comments.
3) Creating an EmbedConnection30 object directly breaks the 
modularity. Unless the calls to the internal methods are necessary, it 
may be better to use InternalDriver.activeDriver().connect(url, info) 
instead. 
There was no specific need to be directly creating EmbedConnection30 object, so 
I have used your recommendation to get the Connection object.

5) There's a comment in DDImpl5.updateStatisticsInBackGround() saying 
that "cm is null the very first time, and whenever we aren't actually 
nested." I'm not sure I understand that comment. Why is it null the 
first time? And isn't the method always called in a nested context? 
And if it is null, wouldn't that cause a NullPointerException in 
EmbedConnection's constructor when url=null is passed in? 
I used the current model to get the ContextManager similar to how we get it in 
jdbc,InternalDriver:getConnectionContext and there we do a check for null 
ContextManager. Looking at that code, I thought there might be a case where 
ContextManager could be null. To address the NullPointerException that may 
result because of url being null in case of null ContextManager, I have changed 
the code in DDImpl5.updateStatisticsInBackGround() to fire the background 
update statistics only if ContextManager is not null. So the new code looks as 
follows
public void updateStatisticsInBackGround(String schemaName,
String tableName, String indexName) throws 
StandardException{
String url = null;
Properties info = null;
if (executorForUpdateStatistics==null)
{
executorForUpdateStatistics = new 
ThreadPoolExecutor(5,5,0L,
TimeUnit.MILLISECONDS,
  new LinkedBlockingQueue(5));
executorForUpdateStatistics.setRejectedExecutionHandler(
new 
ThreadPoolExecutor.CallerRunsPolicy()); 
}
ContextService csf = ContextService.getFactory();

ContextManager cm = csf.getCurrentContextManager();
ConnectionContext localCC = null;

/*
cm is null the very first time, and whenever
we aren't actually nested.
 */
if (cm != null) {
localCC = (ConnectionContext)
(cm.getContext(ConnectionContext.CONTEXT_ID));
TransactionResourceImpl tr = localCC.getTR();
url = tr.getUrl();
info = tr.getInfo();
executorForUpdateStatistics.execute(new 
BackgroundUpdateStatisticTask 
(schemaName,
tableName, indexName, 
url, info));
}

return;
}
}

Will work on addressing the remaining comments. 

> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, DERBY_3788_Mgr.java, 
> DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> 

[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2008-12-01 Thread Mamta A. Satoor (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652114#action_12652114
 ] 

Mamta A. Satoor commented on DERBY-3788:


My understanding of DERBY-3892 is as follows
1)A query is exectued and the best plan for it is picked up based on the 
current data in the database
2)The data changes quite a bit which makes the plan chosen for the query not an 
optimal plan but during the next execution of the query, we continue to use the 
plan chosen in step 1). 

Based on the above understanding of DERBY-3892, the current work going on for 
DERBY-3788 is not going to fix DERBY-3892. This jira entry will go and build 
the statistics if during the query compilation it is found that the required 
statistics do not exist. Hope this answers your question about DERBY-3892, Dag.

> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, DERBY_3788_Mgr.java, 
> DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2008-11-21 Thread Dag H. Wanvik (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649882#action_12649882
 ] 

Dag H. Wanvik commented on DERBY-3788:
--

Would a solution to this issue solve DERBY-3892? If so, it might be good to say 
so there and link it to this issue.

> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, DERBY_3788_Mgr.java, 
> DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2008-11-21 Thread Mamta A. Satoor (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649820#action_12649820
 ] 

Mamta A. Satoor commented on DERBY-3788:


Thanks again for your time, Knut. I will try to address some of the questions 
in this comment. I am planning on taking next week as vacation so will get back 
on this task after that.
1)In this jira entry, missing statisitcs will be detected when the user 
executes a select. I had originally just for the sake of trying attempted to 
create the statistics in the select thread when I find that the statistics are 
missing. But executing an update inside of a select processing caused problem 
because data dictionary was not in write mode. Because of that, I couldn't 
update the statistics during the select processing. One very simple example 
where one will find missing statistics is as follows
create table t1 (c11 int, c12 char);
create index i1 on t1(c11);
insert into t1 values (1,'1'),(2,'2'),(3,'3');
select * from sys.sysstatistics;

My goal is to fire the statistics in background when user executes say 
following sql
select * from t1 where c11=1;

2)As for the calls to setting the context and unsetting, I think you are right. 
I will remove them and see how it goes.

> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, DERBY_3788_Mgr.java, 
> DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2008-11-18 Thread Knut Anders Hatlen (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648558#action_12648558
 ] 

Knut Anders Hatlen commented on DERBY-3788:
---

Thanks for the updated patch, Mamta.

I'm afraid I cannot offer very much guidance, but I have some
questions and comments:

1) This patch addresses the problem with non-existing statistics, not
the problem with outdated statistics. In which situations don't the
statistics exist? If it doesn't happen very often, it might be fine to
update the statistics in the same thread.

2) Are the calls to EmbedConnection30.setupContextStack() and
restoreContextStack() needed around the call to execute()? I thought
execute() would call setup/restoreContextStack() itself.

3) Creating an EmbedConnection30 object directly breaks the
modularity. Unless the calls to the internal methods are necessary, it
may be better to use InternalDriver.activeDriver().connect(url, info)
instead.

4) I think that the creation of a new connection will reboot the
database if it has been shut down in the user thread, which may lead
to unpredictable behaviour. It also seems like it will preserve all
connection attributes, like attributes to reencrypt the database or to
start replication master.

5) There's a comment in DDImpl5.updateStatisticsInBackGround() saying
that "cm is null the very first time, and whenever we aren't actually
nested." I'm not sure I understand that comment. Why is it null the
first time? And isn't the method always called in a nested context?
And if it is null, wouldn't that cause a NullPointerException in
EmbedConnection's constructor when url=null is passed in?

6) DDImpl5.updateStatisticsInBackGround() updates the shared variable
executorForUpdateStatistics if it is null. But it is not protected by
synchronization, so race conditions are possible.

7) DDImpl5.stop() should call super.stop().

8) In BackgroundUpdateStatisticsTask, using a prepared statement with
the table name and the index name parametrized would be better because
it would handle quoting special characters correctly (not handled in
the current patch) and it would reduce the number of entries in the
statement cache.

9) As to the locking issues, I would have tried to call
Connection.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED)
in the background thread to see if that solved/reduced the issues.

> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY3788_patch2_diff.txt, DERBY3788_patch2_stat.txt, DERBY_3788_Mgr.java, 
> DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic a

[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2008-11-13 Thread Knut Anders Hatlen (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12647253#action_12647253
 ] 

Knut Anders Hatlen commented on DERBY-3788:
---

Hi Mamta,

Sorry for posting the comments so late. I don't see the DataDictionaryImpl5 
class in the patch. Did you forget to do svn add?

Which isolation level does the background thread use when it updates the 
statistics? I see that Mike has suggested read uncommitted in order to minimize 
the impact on locking.

> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2008-11-12 Thread Mamta A. Satoor (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646924#action_12646924
 ] 

Mamta A. Satoor commented on DERBY-3788:


I was wondering if anyone got a chance to look at the patch and have any 
comments about how I am getting an EmbedConnection in the background thread to 
update the statistics. Thanks.

> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY3788_patch1_diff.txt, DERBY3788_patch1_stat.txt, 
> DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2008-09-08 Thread Mamta A. Satoor (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629192#action_12629192
 ] 

Mamta A. Satoor commented on DERBY-3788:


I now do have the basic framework of using the ThreadPoolExecutor in new 
extended class DataDictionary5(DD5). I have called the method in DD5 as 
updateStatisticsInBackGround rather than scheduleUpdateStatisticstask. This 
method in DD5 right now just queues println tasks everytime an update 
statistics request is sent its ways. This is what the method code in DD5 looks 
like right now

public void updateStatisticsInBackGround(String schemaName,
String tableName, String indexName) {
System.out.println("came to jdk 1.5 implementation");
if (executorForUpdateStatistics==null)
{
executorForUpdateStatistics = new 
ThreadPoolExecutor(5,5,0L,
TimeUnit.MILLISECONDS,
new LinkedBlockingQueue(5));
executorForUpdateStatistics.setRejectedExecutionHandler(
new 
ThreadPoolExecutor.CallerRunsPolicy()); 
}
executorForUpdateStatistics.execute(new 
BackgroundUpdateStatisticTask (schemaName,
tableName, indexName));
return;
}

The new class BackgroundUpdateStatisticTask right now just does println as 
shown below
class BackgroundUpdateStatisticTask implements Runnable{
private String schemaName;
private String tableName;
private String indexName;

public BackgroundUpdateStatisticTask(String schemaName,
String tableName, String indexName) {
this.schemaName=schemaName;
this.tableName=tableName;
this.indexName=indexName;
}
public void run()
{
System.out.println("Hello World : Updating statistics for "
+ schemaName + ":" + tableName + ":" + 
indexName);
}
}

As the next step, I now want to actually do update statistics in 
BackgroundUpdateStatisticTask rather than just println. I thought I would be 
able to do something like following in the run method in 
BackgroundUpdateStatisticTask 
public void run()
{
System.out.println("Hello World : Updating statistics for "
+ schemaName + ":" + tableName + ":" + 
indexName);
try {
SystemProcedures.SYSCS_UPDATE_STATISTICS(schemaName, 
tableName,
indexName);
} catch (SQLException e)
{
System.out.println("got exception");
e.printStackTrace();
}
}

But a call to SystemProcedures.SYSCS_UPDATE_STATISTICS inside of the run method 
results in 

java.sql.SQLException: No current connection.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(SQLExceptionFactory.java:45)
at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:87)
at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Util.java:103)
at org.apache.derby.impl.jdbc.Util.noCurrentConnection(Util.java:207)
at 
org.apache.derby.catalog.SystemProcedures.getDefaultConn(SystemProcedures.java:185)
at 
org.apache.derby.catalog.SystemProcedures.SYSCS_UPDATE_STATISTICS(SystemProcedures.java:737)
at 
org.apache.derby.impl.sql.catalog.BackgroundUpdateStatisticTask.run(DataDictionaryImpl5.java:90)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
at java.lang.Thread.run(Thread.java:595)

So, it appears that I do not have the context setting done correctly in order 
to run the stored procedure implementation in 
SystemProcedures.SYSCS_UPDATE_STATISTICS. I am currently trying to see how we 
do the required setup when a JDBC connection is made but at this point, I am 
not clear on how might be able to use some of that code from JDBC connection 
time into BackgroundUpdateStatisticTask.run method. I will appreciate any help 
I can get to do the context setting so I can run the update statistics. thanks.

> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performa

[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2008-08-28 Thread Mamta A. Satoor (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626713#action_12626713
 ] 

Mamta A. Satoor commented on DERBY-3788:


As per my comment on this jira entry yesterday, I am trying to use 
ThreadPoolExecutor in Derby codeline so it can be used to fire the update 
statistics tasks in the background. ThreadPoolExecutor was added as part of 
jdk1.5 When I include the import of this class into DataDictionary.java(I will 
refer to it as DD), I ofcourse need to make sure that the DD.java gets compiled 
with 1.5 and higher. But at run time, requiring the user to have 1.5 or higher 
is not going to work. 

At this point, I am thinking of addressing this by having a subclass of DD, 
called say DD5 which will be loaded by the monitor code if we know that we are 
dealing with jdk 1.5 or higher. For the earlier versions, *for now*, trying to 
schedule update statistics tasks in the background will be a no-op. This way, I 
will be able to have the new code run(rather do no-op for jdk1.4 and lower) in 
all supported versions of jdks. If anyone has any feedback on my approach, 
please let me know.

Some background information : iapi.sql.dictionary.TableDescriptor has an 
existing method called statisticsExist which will return true if the statistics 
exist. This method gets called by the query optimization phase along with 
AlterTableConstantAction. For the query optimization phase, we want to be able 
to schedule update statistics tasks in the background if 
TableDescriptor.statisticsExist () returns false. Following is the pseudo-code 
I have in mind:

during the query optimization 
if (TableDescriptor.statisticsExist == false) {
TableDescriptor.createStatisticsInBackGround(schemaname, tablename, 
indexname) <- new method in TableDescriptor
}

TableDescriptor with new method
createStatisticsInBackGround(.) {
  DD.scheduleUpdateStatisticstask(...) <-- new method in DD. 
 }

DD5 will do the real work of scheduling the taks in background. In DD(ie when 
running with jdk1.4 and lower), this method will be no-op.

DD5 with new method (for jdk1.5 and higher)
scheduleUpdateStatisticstask(...) {
   create ThreadPoolExecutor if does not already exist
   queue update statistics task on the ThreadPoolExecutor
}

DD with new method (for jdk1.4 and lower)
scheduleUpdateStatisticstask(...) {
no-op
}


> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
> Attachments: DERBY_3788_Mgr.java, DERBY_3788_Repro.java
>
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who 

[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2008-08-12 Thread Mamta A. Satoor (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621910#action_12621910
 ] 

Mamta A. Satoor commented on DERBY-3788:


I am looking to see if the statistics do not exist for an index during a query 
compile phase, then try to create them while compiling the query. But trying to 
create statistics during query compilation throw following exception. 
ERROR XCL21: You are trying to execute a Data Definition statement (CREATE, 
DROP, or ALTER) while preparing a different statement. This is not allowed. It 
can happen if you execute a Data Definition statement from within a static 
initializer of a Java class that is being used from within a SQL statement.

The exception is being thrown from 
DataDictionaryImpl.startWriting(LanguageConnectionContext) through following 
piece of code
/*
** Don't allow DDL if we're binding a SQL statement.
*/
if (lcc.getBindCount() != 0)
{
throw StandardException.newException(SQLState.LANG_DDL_IN_BIND);
}

My question is can we disable this say when statistics are getting updated 
while a query is getting executed? Is there any other way of achieving creating 
statistics on fly when we know that we are going to need them during the 
execution of the current query? Appreciate any feedack.

> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2008-07-30 Thread Mamta A. Satoor (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618494#action_12618494
 ] 

Mamta A. Satoor commented on DERBY-3788:


Currently, iapi.sql.dictionary.TableDescriptor has a method called 
statisticsExist which checks for the passed conglomerate if there is statistics 
availble or not. This method gets called 
1)during the optimization phase of a query by 
impl.sql.compile.FromBaseTable.estimateCost
2)during the optimization phase of a query by 
impl.sql.compile.PredicateList.selectivity
3)by impl.sql.execute.AlterTableConstantAction to determine if the statistics 
already exist and if yes, then drop and recreate it. This happens during the 
compress table request(not sure if anything other than compress table in 
AlterTableConstantAction calls it but we do not need to worry about it because 
we only want to create the statistics if they don't already exist during the 
compile phase of a sql query)

What I am considering doing is adding another method to TableDescriptor called 
say createStatistics which will be called during the optimization phase of a 
query by the 2 classes impl.sql.compile.FromBaseTable and 
impl.sql.compile.PredicateList. The only awkward thing I am finding is the code 
required to update the statistics need many objects like 
LanguageConnectionContext, DataDictionary, TransactionController, etc (there 
may be more) and they are not available to TableDescriptor class so I will have 
to pass these objects when a call is made to the new method in TableDescriptor 
from impl.sql.compile. Does this sound like not a smooth way of getting the 
objects? Maybe this new method should be defined in DataDictionary rather than 
TableDescriptor. I will work more on what is the right place for the new 
method. In the mean time, if anyone has any thoughts, please post them here.

Once we have this new method to create the statistics, we can hopefully remove 
the redudant code that already exists to create the statistics in 3 different 
classes, namely, AlterTableConstantAction, CreateIndexConstantAction and 
InsetResultSet.

> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>Assignee: Mamta A. Satoor
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2008-07-30 Thread Mamta A. Satoor (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618461#action_12618461
 ] 

Mamta A. Satoor commented on DERBY-3788:


I am looking at creating the statistics automatically if they are found to be 
absent when needed. As a next step later, we can look at updating the 
statistics automatically under certain threshold if they appear to be out of 
sync.

> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (DERBY-3788) Provide a zero-admin way of updating the statisitcs of an index

2008-07-30 Thread Mamta A. Satoor (JIRA)

[ 
https://issues.apache.org/jira/browse/DERBY-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12618394#action_12618394
 ] 

Mamta A. Satoor commented on DERBY-3788:


Added a new test case with revision 681085 which shows that updating the 
statistics will make a query pickup better index compare to prior to statistics 
availability


> Provide a zero-admin way of updating the statisitcs of an index
> ---
>
> Key: DERBY-3788
> URL: https://issues.apache.org/jira/browse/DERBY-3788
> Project: Derby
>  Issue Type: New Feature
>  Components: Performance
>Affects Versions: 10.5.0.0
>Reporter: Mamta A. Satoor
>
> DERBY-269 provided a manual way of updating the statistics using the new 
> system stored procedure SYSCS_UTIL.SYSCS_UPDATE_STATISTICS. It will be good 
> for Derby to provide an automatic way of updating the statistics without 
> requiring to run the stored procedure manually. There was some discussion on 
> DERBY-269 about providing the 0-admin way. I have copied it here for 
> reference.
> *
> Kathey Marsden - 22/May/05 03:53 PM 
> Some sort of zero admin solution for updating statistics would be prefferable 
> to the manual 'update statistics' 
> *
> *
> Mike Matrigali - 11/Jun/08 12:37 PM 
> I have not seen any other suggestions, how about the following zero admin 
> solution? It is not perfect - suggestions welcome. 
> Along with the statistics storing, save how many rows were in the table when 
> exact statistics were calculated. This number is 0 if none have been 
> calculated because index creation happened on an empty table. At query 
> compile time when we look up statistics we automatically recalculate the 
> statistics at certain threshholds - say something like row count growing past 
> next threshhold : 10, 100, 1000, 10 - with upper limit being somewhere 
> around how many rows we can process in some small amount of time - like 1 
> second on a modern laptop. If we are worried about response time, maybe we 
> background queue the stat gathering rather than waiting with maybe some quick 
> load if no stat has ever been gathered. The background gathering could be 
> optimized to not interfere with locks by using read uncommitted. 
> I think it would be useful to also have the manual call just to make it easy 
> to support customers and debug issues in the field. There is proably always 
> some dynamic data distribution change that in some case won't be picked up by 
> the automatic algorithm. Also just very useful for those who have complete 
> control of the create ddl, load data, run stats, deliver application process. 
> *

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.