Re: [fpc-pascal] TSQLQuery and buffering.
On 2017-03-24 14:42, Gary Doades wrote: Hi everyone, Firstly, I realise this is an old subject, but I still can't find any easy answer. Really, the question is simple: Is there ANY way of TSQLQuery NOT reading the entire result set into memory? What about making multiple queries and setting sql's LIMIT? Should you really be writing sql queries that return millions of results? Is this a design choice that could be a bad one? For example in many web programs, I only retrieve 10-100 results back by using SQL's LIMIT feature... Example: SELECT * FROM Orders LIMIT 15, 10 SQLDB likely has a similar setting/feature so that you don't have to write actual SQL code to achieve it. And in your case, you may actually really need to retrieve millions of results of the database, I don't know your use case. But generally in my web programs where I have millions of results, I use SQL to my advantage and it's language features, such as LIMIT, instead of retrieving all of the results at once which kind of defeats the purpose of an sql database since it's job is to give you only the data you need. Are you using pascal to do the work that the database could already do for you? I could be misunderstanding your situation, and again your use case could be different. This is really killing me. I've got quite a lot of web code using sqldb and it works really well. The classes for connection, transaction, query etc. are perfect for general use and make it so easy to write database applications. I find it excellent that I only need to change the connection class in my code to use a different database. For web code (mine included), fetching a few dozen, hundred or even a few thousand rows is no real problem. The sqldb classes are *VERY* fast and very easy to use. The problem comes when I need to write some apps to either analyse or move a lot of data. Initially all I need to do is analyse some data by simply reading through rows, doing some calculations and other analysis. For even 1 million rows this is very fast (10 seconds using MySQL) so no major problems. However, it does use quite a lot of memory. If you need a temporary work around simply use SQL Limit feature.. then your data that comes back into your memory on your end is only what you have limited the result set to... If you need to combine all this data into one big data set to do analysis though, maybe sql LIMIT is the wrong tool for the job, I don't know. But it's a big warning sign if you have a program that retrieves millions of results from a database and you only need to display 10 of those items to the end user - in that case LIMIT is really useful. But, you are analyzing these millions of results maybe monolithically so your case may be different. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
On Mon, 27 Mar 2017, Gary Doades wrote: >> The problem therefore lies in either the Pascal layer on top of the native >> client libs or in the way the client libs themselves work :( I suspect the >> latter. > As far as I know, the DB-Specific pascal layer does not buffer anything, it > just fetches the result. > There is of course little to no control over how the client lib fetches the > result. > If memory serves well, mySQL has 2 separate calls: mysql_use_result and > mysql_store_result where the difference is exactly how it fetches the result > set. > Please report your findings, if any :) Yeah, exactly right. The Pascal code in mysqlconn.inc calls mysql_store_result which does indeed fetch the entire result set into memory. It could be changed (perhaps) to use mysql_use_result instead which does row-at-a-time streaming, but I'm not sure I have the expertise to make that change or what any other implications of that change might be. I may hack it for my own purposes in the short term to see what happens! Thanks, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
On Mon, 27 Mar 2017, Gary Doades wrote: Indeed, that's why I can't currently see where the problem lies. It shouldn't buffer the rows/records, but it does... or at least something does and I'm pretty sure it's not my program. The DB-client library maybe? Yup! I was running some other tests and copied the table contents over to both postgres and MS SQL Server. Postgres has the same issue, all rows are (seemingly) buffered on the TSQLQuery.Open and huge amounts of RAM used. However, using MS SQL Server almost no memory is used at all! All program results are identical and all I did in the program was swap out the TSQLConnection with different "drivers". The problem therefore lies in either the Pascal layer on top of the native client libs or in the way the client libs themselves work :( I suspect the latter. As far as I know, the DB-Specific pascal layer does not buffer anything, it just fetches the result. There is of course little to no control over how the client lib fetches the result. If memory serves well, mySQL has 2 separate calls: mysql_use_result and mysql_store_result where the difference is exactly how it fetches the result set. Postgres probably fetches everything; I don't see how PQGetValue can work index-based otherwise. Fore interbase/firebird, I don't know. I will need to do some more investigation in a different way if I am to work round this one. I apologise for potentially wasting people's time. It might be good for people to know the above issues (limitations?) anyway. Please report your findings, if any :) Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
> >> Indeed, that's why I can't currently see where the problem lies. It >> shouldn't buffer the rows/records, but it does... or at least >> something does and I'm pretty sure it's not my program. > The DB-client library maybe? Yup! I was running some other tests and copied the table contents over to both postgres and MS SQL Server. Postgres has the same issue, all rows are (seemingly) buffered on the TSQLQuery.Open and huge amounts of RAM used. However, using MS SQL Server almost no memory is used at all! All program results are identical and all I did in the program was swap out the TSQLConnection with different "drivers". The problem therefore lies in either the Pascal layer on top of the native client libs or in the way the client libs themselves work :( I suspect the latter. I will need to do some more investigation in a different way if I am to work round this one. I apologise for potentially wasting people's time. It might be good for people to know the above issues (limitations?) anyway. Thanks, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
On Monday 27 March 2017 10:20:20 Gary Doades wrote: > >> It may be that UniDirectional is meant to not buffer all rows, but at > >> the moment it certainly seems to. > > > > Strange, looking at source code it seems to me, that buffering should not > > happen. As far as TUniDirectionalBufIndex should be used and his > > AddRecord method does not allocate new memory. > > Indeed, that's why I can't currently see where the problem lies. It > shouldn't buffer the rows/records, but it does... or at least something > does and I'm pretty sure it's not my program. The DB-client library maybe? Martin ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
It may be that UniDirectional is meant to not buffer all rows, but at the moment it certainly seems to. Strange, looking at source code it seems to me, that buffering should not happen. As far as TUniDirectionalBufIndex should be used and his AddRecord method does not allocate new memory. Indeed, that's why I can't currently see where the problem lies. It shouldn't buffer the rows/records, but it does... or at least something does and I'm pretty sure it's not my program. I've pared my test program down to the bare minimum of just looping over the result set, but all the actual fetching from the DB and buffering occurs on just the Open call on the TSQLQuery. Can you please attach your test program ? (most simplified version, which I can use to test) L. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
> >> It may be that UniDirectional is meant to not buffer all rows, but at the >> moment it certainly seems to. > Strange, looking at source code it seems to me, that buffering should not > happen. > As far as TUniDirectionalBufIndex should be used and his AddRecord method > does not allocate new memory. Indeed, that's why I can't currently see where the problem lies. It shouldn't buffer the rows/records, but it does... or at least something does and I'm pretty sure it's not my program. I've pared my test program down to the bare minimum of just looping over the result set, but all the actual fetching from the DB and buffering occurs on just the Open call on the TSQLQuery. Thanks, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
It may be that UniDirectional is meant to not buffer all rows, but at the moment it certainly seems to. Strange, looking at source code it seems to me, that buffering should not happen. As far as TUniDirectionalBufIndex should be used and his AddRecord method does not allocate new memory. L. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
> From your original message one could think that you were using > PacketRecords:=-1 which means fetch all records at once, but if you are using > the standard setting, which is 10 and yet it is still fetching everything at > once, it > sounds like a bug to me. As far as I can tell setting UniDirectional to true also sets PacketRecords to -1. Even if PacketRecords were still 10 though it still buffers the entire result set, but it just fetches 10 rows at a time from the DB server until it has them all. Thanks, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
On Sunday 26 March 2017 18:37:36 Michael Van Canneyt wrote: > > > Looking at the above unit, I can only presume you load the field directly > from the cursor returned by the native library. > Correct. > > As far as I can make out, TSQLResult is usable only for scanning a result > set. TDataset has a lot more functions, hence has a lot more overhead. > Correct. Therefore MSEgui has TmseBufdataset/TmseSQLQuery when the whole shebang is needed and TSQLResult when it is not. There is also the descendant TifiSQLResult which can be used as source for TConnectedifiDatasource in order to feed ifi-datapoints with DB-data. TSQLResult also can be used as source for TSQLLookupBuffer. Martin ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
En Sat, 25 Mar 2017 02:32:33 -0600, Gary Doadesescribió: Many thanks for your reply. I had read about UniDirectional and I have indeed tried this. It doesn't seem to make any significant difference. Looking through the source code for TBufDataset it looks like UniDirectional just turns off building various indexes/structures etc. and fetches the result set all in one go instead of 10 row chunks. It still buffers everything in memory. From your original message one could think that you were using PacketRecords:=-1 which means fetch all records at once, but if you are using the standard setting, which is 10 and yet it is still fetching everything at once, it sounds like a bug to me. Jesus Reyes A. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
r On Sat, 25 Mar 2017, Martin Schreiber wrote: On Saturday 25 March 2017 10:28:37 Michael Van Canneyt wrote: On Sat, 25 Mar 2017, Martin Schreiber wrote: > On Saturday 25 March 2017 09:32:33 Gary Doades wrote: >> I Understand about the "normal" use of TSQLQuery and data sets and I >> also use this for small result set that need the data changing some way >> and writing back to the DB. What I need now is an equally convenient and >> powerful way of processing "big data". > > MSEgui has TSQLResult for that purpose without the TDataset overhead. > https://gitlab.com/mseide-msegui/mseide-msegui/blob/master/lib/common/db/ >msesqlresult.pas Can you explain in 2 words how this differs from TDataset ? At first sight it is not fundamentally different. The structure looks almost exactly the same, just the names are different. Please follow the datatransfer from database to the destination in application in T*Dataset and compare it with TSQLResult and you probably will see the difference. Looking at the above unit, I can only presume you load the field directly from the cursor returned by the native library. Makes sense. And while you're at it, please explain what the 'overhead' is of TDataset. That TBufDataset introduces overhead is a design decision. But TDataset itself introduces no "overhead". TDataset/TDataSource/TDatalink/TField is a complex machine, don't you think? Follow the code of a TDataset.Next() call step by step and you will see what I mean. As far as I can make out, TSQLResult is usable only for scanning a result set. TDataset has a lot more functions, hence has a lot more overhead. But TSQLResult is an interesting idea; I've long been looking for something more lightweight for use in web apps, where the typical use case is just scan a result set and transfer it to the browser. I had thought to reuse TSQLStatement and keep TField/TFieldDef so as not to invent too much new classes. I'll maybe use your code as inspiration. Thanks! Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
On Sat, 25 Mar 2017, Gary Doades wrote: I had read about UniDirectional and I have indeed tried this. It doesn't seem to make any significant difference. Looking through the source code for TBufDataset it looks like UniDirectional just turns off building various indexes/structures etc. and fetches the result set all in one go instead of 10 row chunks. It still buffers everything in memory. In that case, it is a bug. The very purpose of UniDirectional is exactly NOT to buffer anything, just to keep the current record in memory. Ah, OK. I might have another trawl through the sqldb source then to see if it is a simple mistake I can fix or a bigger job. I guess I should file a bug report? Please do. And if you find the cause, please attach a patch :) Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
On Saturday 25 March 2017 10:28:37 Michael Van Canneyt wrote: > On Sat, 25 Mar 2017, Martin Schreiber wrote: > > On Saturday 25 March 2017 09:32:33 Gary Doades wrote: > >> I Understand about the "normal" use of TSQLQuery and data sets and I > >> also use this for small result set that need the data changing some way > >> and writing back to the DB. What I need now is an equally convenient and > >> powerful way of processing "big data". > > > > MSEgui has TSQLResult for that purpose without the TDataset overhead. > > https://gitlab.com/mseide-msegui/mseide-msegui/blob/master/lib/common/db/ > >msesqlresult.pas > > Can you explain in 2 words how this differs from TDataset ? > At first sight it is not fundamentally different. > The structure looks almost exactly the same, just the names are different. > Please follow the datatransfer from database to the destination in application in T*Dataset and compare it with TSQLResult and you probably will see the difference. > And while you're at it, please explain what the 'overhead' is of TDataset. > That TBufDataset introduces overhead is a design decision. But TDataset > itself introduces no "overhead". > TDataset/TDataSource/TDatalink/TField is a complex machine, don't you think? Follow the code of a TDataset.Next() call step by step and you will see what I mean. Martin ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
>> I had read about UniDirectional and I have indeed tried this. It doesn't >> seem to make any significant difference. Looking through the source code for >> TBufDataset it looks like UniDirectional just turns off building various >> indexes/structures etc. and fetches the result set all in one go instead of >> 10 row chunks. >> It still buffers everything in memory. > In that case, it is a bug. The very purpose of UniDirectional is exactly NOT > to buffer anything, just to keep the current record in memory. Ah, OK. I might have another trawl through the sqldb source then to see if it is a simple mistake I can fix or a bigger job. I guess I should file a bug report? Thanks, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
> > MSEgui has TSQLResult for that purpose without the TDataset overhead. > https://gitlab.com/mseide-msegui/mseide-msegui/blob/master/lib/common/db/msesqlresult.pas Thanks. Although it looks interesting I'd rather not get into another set of classes etc. at this point. I may come back to it later though! ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
On Sat, 25 Mar 2017, Gary Doades wrote: On Fri, 24 Mar 2017, Gary Doades wrote: Really, the question is simple: Is there ANY way of TSQLQuery NOT reading the entire result set into memory? Set Unidirectional to True, and it will keep only 1 row in memory. When you are simply scanning through the result set, this is all you need. Hi Michael, Many thanks for your reply. I had read about UniDirectional and I have indeed tried this. It doesn't seem to make any significant difference. Looking through the source code for TBufDataset it looks like UniDirectional just turns off building various indexes/structures etc. and fetches the result set all in one go instead of 10 row chunks. It still buffers everything in memory. In that case, it is a bug. The very purpose of UniDirectional is exactly NOT to buffer anything, just to keep the current record in memory. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
On Sat, 25 Mar 2017, Martin Schreiber wrote: On Saturday 25 March 2017 09:32:33 Gary Doades wrote: I Understand about the "normal" use of TSQLQuery and data sets and I also use this for small result set that need the data changing some way and writing back to the DB. What I need now is an equally convenient and powerful way of processing "big data". MSEgui has TSQLResult for that purpose without the TDataset overhead. https://gitlab.com/mseide-msegui/mseide-msegui/blob/master/lib/common/db/msesqlresult.pas Can you explain in 2 words how this differs from TDataset ? At first sight it is not fundamentally different. The structure looks almost exactly the same, just the names are different. And while you're at it, please explain what the 'overhead' is of TDataset. That TBufDataset introduces overhead is a design decision. But TDataset itself introduces no "overhead". Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
On Saturday 25 March 2017 09:32:33 Gary Doades wrote: > > I Understand about the "normal" use of TSQLQuery and data sets and I also > use this for small result set that need the data changing some way and > writing back to the DB. What I need now is an equally convenient and > powerful way of processing "big data". > MSEgui has TSQLResult for that purpose without the TDataset overhead. https://gitlab.com/mseide-msegui/mseide-msegui/blob/master/lib/common/db/msesqlresult.pas Martin ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
On Fri, 24 Mar 2017, Gary Doades wrote: > >> Really, the question is simple: Is there ANY way of TSQLQuery NOT >> reading the entire result set into memory? > Set Unidirectional to True, and it will keep only 1 row in memory. > When you are simply scanning through the result set, this is all you need. Hi Michael, Many thanks for your reply. I had read about UniDirectional and I have indeed tried this. It doesn't seem to make any significant difference. Looking through the source code for TBufDataset it looks like UniDirectional just turns off building various indexes/structures etc. and fetches the result set all in one go instead of 10 row chunks. It still buffers everything in memory. I can see this from running a test program. When I call the SQL query .Open method then I can see from Task Manager that it is fetching all rows from the database. My test program reaches about 1.7GB of ram fetching 4.8 million rows. Only after the complete fetching does the next line of code execute and my processing loop start. Clearly this is looping over the now buffered dataset. The memory is finally freed when the Query's Close method is called. It may be that UniDirectional is meant to not buffer all rows, but at the moment it certainly seems to. This is FPC 3.0.2 BTW. I've also tried setting ReadOnly to true on the SQL Query. I Understand about the "normal" use of TSQLQuery and data sets and I also use this for small result set that need the data changing some way and writing back to the DB. What I need now is an equally convenient and powerful way of processing "big data". Thanks, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] TSQLQuery and buffering.
On Fri, 24 Mar 2017, Gary Doades wrote: Hi everyone, Firstly, I realise this is an old subject, but I still can't find any easy answer. Really, the question is simple: Is there ANY way of TSQLQuery NOT reading the entire result set into memory? Set Unidirectional to True, and it will keep only 1 row in memory. When you are simply scanning through the result set, this is all you need. The problem is that most drivers also do not support arbitrary navigation in the result set, so when we want to support navigating backwards, we have no choice but to keep everything in memory. (barring re-running the query and refetching everything, I'm sure you don't want that either) If you need to move back and forth in the result set, then you 'll need to set unidirectional to true, and store the needed records yourself. You can copy the structure of the dataset really easy to a TBufDataset using copyfromdataset). in the TBufDataset you can keep as many records as you like, and just delete the old ones (you need to call mergechangelog or somesuch). We could maybe try to find a way to copy the buffer from one bufdataset to another (since TSQLQuery is just a TBufDataset Descendent) Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
[fpc-pascal] TSQLQuery and buffering.
Hi everyone, Firstly, I realise this is an old subject, but I still can't find any easy answer. Really, the question is simple: Is there ANY way of TSQLQuery NOT reading the entire result set into memory? This is really killing me. I've got quite a lot of web code using sqldb and it works really well. The classes for connection, transaction, query etc. are perfect for general use and make it so easy to write database applications. I find it excellent that I only need to change the connection class in my code to use a different database. For web code (mine included), fetching a few dozen, hundred or even a few thousand rows is no real problem. The sqldb classes are *very* fast and very easy to use. The problem comes when I need to write some apps to either analyse or move a lot of data. Initially all I need to do is analyse some data by simply reading through rows, doing some calculations and other analysis. For even 1 million rows this is very fast (10 seconds using MySQL) so no major problems. However, it does use quite a lot of memory. Further use of this simply explodes. I have tables for analysis (and later extracting/moving) that are over 40 million rows. This currently is simply not possible in fpc, at least using the excellent sqldb classes. After blowing 16GB of RAM+swap it takes a while to recover. To be fair I've tried looking through the code and see how it's implemented, but I can't see any easy way of avoiding the memory buffering of the entire result set. Apart from re-implementing a lot of classes or going down to the driver level I can't currently see any easy way round this. I know there is some old discussion on this, but I'm kind of hoping that there may have been a few changes or something recently that I've missed. Any help appreciated. Regards, Gary. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal