Re: [libdbi-users] libdbi+mysql speed issues

2013-01-08 Thread markus . hoenicka
Markus Hoenicka writes:
 > Hi,
 > 
 > I'll have to read the code again a little more thoroughly, but to the  
 > best of my knowledge libdbi emulates MySQL's approach to retrieving  
 > rows from result sets. In order to walk through the rows of e.g. a  
 > PostgreSQL result set you have to retrieve the rows by index  
 > sequentially, so libdbi has to maintain an internal pointer anyway. We  
 > do not have to add one, so there are no extra changes. Also, libdbi  
 > internally already mixes the cursor style and next_row style calls,  
 > because we have to cater for database engines which use either of  
 > these methods without exposing these differences to the libdbi user.
 > 
 > As for the API change, we have extensive driver API changes between  
 > 0.8.x and the upcoming(TM) 0.9 release anyway, think of the recent  
 > addition of the transaction stuff. You won't be able to keep your  
 > 0.8.x drivers once you switch to libdbi 0.9. You'll probably notice  
 > problems only if you build from cvs regularly (and only if you update  
 > one but not the other), but I expect those users to know what they're  
 > doing.
 > 

Hi,

I've implemented the suggested changes and checked in the updated
files. If you want to speed test the code, please check out, build,
and install the current cvs revisions of both libdbi and
libdbi-drivers. I've updated the mysql, pgsql, and sqlite3 drivers at
this time, but making the remaining drivers compile is trivial. I've
added the suggested check to speed up sequential row fetching in
mysql. I do see a small but significant decrease in the time required
to run gmake check (11.78 vs. 12.51 s), although the test code does
not test retrieving boatloads of rows specifically. I'd appreciate if
someone with a nice testcase (Olivier?) could give the changes a try
and report some numbers.

regards,
Markus

-- 
Markus Hoenicka
http://www.mhoenicka.de
AQ score 38

--
Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery
and much more. Keep your Java skills current with LearnJavaNow -
200+ hours of step-by-step video tutorials by Java experts.
SALE $49.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122612 
___
libdbi-users mailing list
libdbi-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libdbi-users


Re: [libdbi-users] libdbi+mysql speed issues

2013-01-08 Thread Markus Hoenicka

Mike Rylander  was heard to say:

> Markus,
>
> Would it be worth the ugliness of a module-global variable to bury the
> value of the row index used in the previous call to  
> dbd_mysql.c::dbd_goto_row()
> inside the MySQL driver itself, side-stepping the function signature change
> for other drivers?  The benefit of avoiding the API change may not be
> outweighed by the potential fragility of naively tracking the state
> internally, of course, since in practice folks install and upgrade to new
> versions of drivers and the libdbi core at the same time, but then again,
> it may.  I'm a little nervous about the potential for problems when mixing
> direct goto_row() (cursor style) and next_row() calls, but I haven't looked
> at the code to see if there's actually an issue there...
>

Hi,

I'll have to read the code again a little more thoroughly, but to the  
best of my knowledge libdbi emulates MySQL's approach to retrieving  
rows from result sets. In order to walk through the rows of e.g. a  
PostgreSQL result set you have to retrieve the rows by index  
sequentially, so libdbi has to maintain an internal pointer anyway. We  
do not have to add one, so there are no extra changes. Also, libdbi  
internally already mixes the cursor style and next_row style calls,  
because we have to cater for database engines which use either of  
these methods without exposing these differences to the libdbi user.

As for the API change, we have extensive driver API changes between  
0.8.x and the upcoming(TM) 0.9 release anyway, think of the recent  
addition of the transaction stuff. You won't be able to keep your  
0.8.x drivers once you switch to libdbi 0.9. You'll probably notice  
problems only if you build from cvs regularly (and only if you update  
one but not the other), but I expect those users to know what they're  
doing.

regards,
Markus

-- 
Markus Hoenicka
http://www.mhoenicka.de
AQ score 38



--
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512
___
libdbi-users mailing list
libdbi-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libdbi-users


Re: [libdbi-users] libdbi+mysql speed issues

2013-01-08 Thread Mike Rylander
Markus,

Would it be worth the ugliness of a module-global variable to bury the
value of the row index used in the previous call to dbd_mysql.c::dbd_goto_row()
inside the MySQL driver itself, side-stepping the function signature change
for other drivers?  The benefit of avoiding the API change may not be
outweighed by the potential fragility of naively tracking the state
internally, of course, since in practice folks install and upgrade to new
versions of drivers and the libdbi core at the same time, but then again,
it may.  I'm a little nervous about the potential for problems when mixing
direct goto_row() (cursor style) and next_row() calls, but I haven't looked
at the code to see if there's actually an issue there...

--miker


On Tue, Jan 8, 2013 at 4:46 AM, Markus Hoenicka <
markus.hoeni...@mhoenicka.de> wrote:

> Olivier Doucet  was heard to say:
>
> > Hello everyone,
> >
> > I'm following a quite old topic about libdbi speed issues.
> > I was able to track the cause of these issues : The major problem is
> > how libdbi goes from one row to another.
> >
> > RRDTool (the tool that used libdbi and that I was inspecting) is using
> > dbi_result_next_row() function (as stated in libdbi documentation
> > btw).
> >
> > This function moves from one row to another with function
> > dbi_result_seek_row(), incrementing currentRow index each time. This
> > gives a call to dbd_mysql.c::dbd_goto_row() that uses
> > mysql_data_seek() each time...
> >
> > That's why for a query result of 34k rows (yes it happens. No it is
> > not a problem in the query itself), we have tens of thousands of call
> > to this function (which is very low), and this is definitely not
> > needed, because as we use fetch_row(), we automatically move from one
> > row to another. Seeking is just a useless task (as internal driver
> > does not know where we are, and needs to start from row 0 and seek to
> > the given row - where we already were).
> >
> > I'm absolutely not a libdbi user, and I don't know what could be done
> > outside libdbi to not use dbi_result_next_row() and use directly
> > RESULT->onn->driver->functions->fetch_row() directly. Is it possible ?
> >
> > And/or patching dbi_result.c :
> > just check RESULT->currowidx near line 102 before calling doing
> > goto_row() function and call it only if we are not on the good row. Am
> > I right ?
> >
>
> Hi,
>
> your analysis is pretty much correct. If you look at the comments in
> dbd_mysql.c::dbd_goto_row(), the original author of the mysql driver
> was well aware of the limitations of his implementation. The reason is
> that other database APIs, e.g. PostgreSQL, allow to fetch rows from a
> result set by index, whereas the MySQL API assumes that you step
> through the rows sequentially. The original design of libdbi appears
> to somewhat favor PostgreSQL in this respect.
>
> Anyway, without having thought about the issue in too much detail, one
> possible solution comes to mind. We could modify the driver function
> dbd_goto_row() by passing both the wanted row index rowidx and the
> current row index currowidx(which libdbi keeps track of anyway). This
> would allow drivers to decide whether they have to actually seek the
> position. pgsql doesn't have to anyway, and mysql doesn't have to if
> rowidx = currowidx+1. This API change would not mandate changes to
> existing drivers as they may ignore the additional parameter and keep
> working as before, but it may offer options to speed up queries in
> some drivers.
>
> regards,
> Markus
>
>
>
> --
> Markus Hoenicka
> http://www.mhoenicka.de
> AQ score 38
>
>
>
>
> --
> Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
> and more. Get SQL Server skills now (including 2012) with LearnDevNow -
> 200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
> SALE $99.99 this month only - learn more at:
> http://p.sf.net/sfu/learnmore_122512
> ___
> libdbi-users mailing list
> libdbi-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/libdbi-users
>



-- 
Mike Rylander
 | Director of Research and Development
 | Equinox Software, Inc. / Your Library's Guide to Open Source
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com
--
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512___
libdbi-users mailing list
libdbi-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libdbi-users


Re: [libdbi-users] libdbi+mysql speed issues

2013-01-08 Thread Markus Hoenicka
Olivier Doucet  was heard to say:

> Hi Markus,
>
> 2013/1/8 Markus Hoenicka :
>> We could modify the driver function
>> dbd_goto_row() by passing both the wanted row index rowidx and the
>> current row index currowidx(which libdbi keeps track of anyway).
>
> This is one way to fix the problem, I agree. Unfortunately my level in
> C is too low to make such huge changes without breaking everything
> else :) Anyone willing to create the patch for this ?
>

I'll commit these changes asap (may take a day or two). Can you build  
libdbi and libdbi-drivers from the cvs sources? You seem to have a  
good test case to see if these changes help.

> Is there a way, outside libdbi, to fix this problem ? For example, go
> over dbd_goto_row() and call fetch_row directly() ? Or maybe the
> behaviour is different between database engines ?
>

I'm afraid you can't do that except if you bypass the abstraction  
layer altogether and use libmysqlclient natively.

regards,
Markus


-- 
Markus Hoenicka
http://www.mhoenicka.de
AQ score 38



--
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512
___
libdbi-users mailing list
libdbi-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libdbi-users


Re: [libdbi-users] libdbi+mysql speed issues

2013-01-08 Thread Olivier Doucet
Hi Markus,

2013/1/8 Markus Hoenicka :
> We could modify the driver function
> dbd_goto_row() by passing both the wanted row index rowidx and the
> current row index currowidx(which libdbi keeps track of anyway).

This is one way to fix the problem, I agree. Unfortunately my level in
C is too low to make such huge changes without breaking everything
else :) Anyone willing to create the patch for this ?

Is there a way, outside libdbi, to fix this problem ? For example, go
over dbd_goto_row() and call fetch_row directly() ? Or maybe the
behaviour is different between database engines ?

Olivier

--
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512
___
libdbi-users mailing list
libdbi-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libdbi-users


Re: [libdbi-users] libdbi+mysql speed issues

2013-01-08 Thread Markus Hoenicka
Olivier Doucet  was heard to say:

> Hello everyone,
>
> I'm following a quite old topic about libdbi speed issues.
> I was able to track the cause of these issues : The major problem is
> how libdbi goes from one row to another.
>
> RRDTool (the tool that used libdbi and that I was inspecting) is using
> dbi_result_next_row() function (as stated in libdbi documentation
> btw).
>
> This function moves from one row to another with function
> dbi_result_seek_row(), incrementing currentRow index each time. This
> gives a call to dbd_mysql.c::dbd_goto_row() that uses
> mysql_data_seek() each time...
>
> That's why for a query result of 34k rows (yes it happens. No it is
> not a problem in the query itself), we have tens of thousands of call
> to this function (which is very low), and this is definitely not
> needed, because as we use fetch_row(), we automatically move from one
> row to another. Seeking is just a useless task (as internal driver
> does not know where we are, and needs to start from row 0 and seek to
> the given row - where we already were).
>
> I'm absolutely not a libdbi user, and I don't know what could be done
> outside libdbi to not use dbi_result_next_row() and use directly
> RESULT->onn->driver->functions->fetch_row() directly. Is it possible ?
>
> And/or patching dbi_result.c :
> just check RESULT->currowidx near line 102 before calling doing
> goto_row() function and call it only if we are not on the good row. Am
> I right ?
>

Hi,

your analysis is pretty much correct. If you look at the comments in  
dbd_mysql.c::dbd_goto_row(), the original author of the mysql driver  
was well aware of the limitations of his implementation. The reason is  
that other database APIs, e.g. PostgreSQL, allow to fetch rows from a  
result set by index, whereas the MySQL API assumes that you step  
through the rows sequentially. The original design of libdbi appears  
to somewhat favor PostgreSQL in this respect.

Anyway, without having thought about the issue in too much detail, one  
possible solution comes to mind. We could modify the driver function  
dbd_goto_row() by passing both the wanted row index rowidx and the  
current row index currowidx(which libdbi keeps track of anyway). This  
would allow drivers to decide whether they have to actually seek the  
position. pgsql doesn't have to anyway, and mysql doesn't have to if  
rowidx = currowidx+1. This API change would not mandate changes to  
existing drivers as they may ignore the additional parameter and keep  
working as before, but it may offer options to speed up queries in  
some drivers.

regards,
Markus



-- 
Markus Hoenicka
http://www.mhoenicka.de
AQ score 38



--
Master SQL Server Development, Administration, T-SQL, SSAS, SSIS, SSRS
and more. Get SQL Server skills now (including 2012) with LearnDevNow -
200+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only - learn more at:
http://p.sf.net/sfu/learnmore_122512
___
libdbi-users mailing list
libdbi-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/libdbi-users