Re: [base] DataQuery.order(...) 2.x API question
On 2012-10-09 17:24, Bob MacCallum wrote: > Just for closure, I am now in business. > > [mysqld] > tmp_table_size = 5G > > > Background for whoever cares: > > I'm running something reporter-wise in parallel (4 threads) and wanted > roughly equal workload per "chunk" of reporters (some reporters have > more experiments+data than others) so having them in random order > before chunking does the trick. > > The tmp tables that the SELECT * FROM Reporters ORDER BY RAND(123) > creates are 3.2G each (times 4) and while doing this there's some > serious I/O clogging the machine but it pulls through. If only gnu > parallel had a -stagger [secs] option! ORDER BY RAND() is known for performance issues once the number of rows grows large. I guess it is because MySQL need to create a temporary table with *all* the original data plus the random number for each row before the data can be sorted. It might be more efficient to first do 'select id from Reporters' (you'll need to use the Reporter.getDynamicQuery() for this), put all id's in an array, randomize that in memory and then bring in the actual reporter data one by one (or 100 by 100 or something). That should at least avoid the temporary table issue since the id's can be pulled in directly from the primary key index. In theory it should be quicker, but I guess it has to be verified in practice as well. /Nicklas -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net
Re: [base] DataQuery.order(...) 2.x API question
Just for closure, I am now in business. [mysqld] tmp_table_size = 5G Background for whoever cares: I'm running something reporter-wise in parallel (4 threads) and wanted roughly equal workload per "chunk" of reporters (some reporters have more experiments+data than others) so having them in random order before chunking does the trick. The tmp tables that the SELECT * FROM Reporters ORDER BY RAND(123) creates are 3.2G each (times 4) and while doing this there's some serious I/O clogging the machine but it pulls through. If only gnu parallel had a -stagger [secs] option! On Tue, Oct 9, 2012 at 3:46 PM, Bob MacCallum wrote: > Getting closer - I have set up a custom dialect: > > package org.vectorbase.funcgen.hibernate; > import org.hibernate.Hibernate; > import org.hibernate.dialect.MySQLInnoDBDialect; > import org.hibernate.dialect.function.StandardSQLFunction; > > public class CustomDialect extends MySQLInnoDBDialect { > public CustomDialect() { > super(); > registerFunction("srand", new StandardSQLFunction("rand", > Hibernate.DOUBLE) ); > } > } > > configure this in base.config > > And it all goes fine until some tmp tables run out of disk space or something > (We have around 400k reporters FYI.) > > Looks like Nicklas has seen this before :-) > https://forum.hibernate.org/viewtopic.php?f=1&t=994753 > > I reckon I'll get it to work sooner or later. > > cheers, > Bob. > > On Tue, Oct 9, 2012 at 2:15 PM, Bob MacCallum > wrote: >> Hi Nicklas, >> Thanks very much for the quick reply. I am *so* close to getting this >> working - it seems that the argument-less rand() is available in HQL >> but I will have to jump through a few hoops to provide the seeded >> version. >> I'll post an update later... >> cheers, >> Bob. >> >> On Tue, Oct 9, 2012 at 12:50 PM, Nicklas Nordborg wrote: >>> On 2012-10-09 13:11, Bob MacCallum wrote: Hi all, I'm trying to get a Reporter query from the BASE2.x API which is basically "ORDER BY RAND(123)" - to get reproducible "random" order (I can explain *why* in another email if anyone cares to know). I tried this DataQuery repQuery = net.sf.basedb.core.Reporter.getQuery(); repQuery.order(Orders.asc(new HqlExpression("RAND(123)", "xxx"))); But it seems that HqlExpression isn't for public use. I can't find something like "Expressions.hql". >>> >>> This can be found in net.sf.basedb.core.query.Hql.expression(...) >>> Could somebody please save me a ton of time and tell me how to put some arbitrary SQL into the ORDER BY clause of a DataQuery? >>> >>> Note that it is not SQL that is needed but HQL (Hibernate Query >>> Language). Don't know if RAND() is available in HQL. If not, it should >>> be possible to create a custom Hibernate dialect and use that in the >>> 'db.dialect' setting in base.config. I have never done that so I don't >>> know any details or how hard/easy it is. >>> >>> Another possibility is to use Reporter.getDynamicQuery() instead which >>> lets you work with SQL, but code-wise it's a bit harder to work with >>> since you have to process the result row-by-row and column-by-column >>> instead of as ReporterData objects. >>> >>> /Nicklas >>> >>> >>> -- >>> Don't let slow site performance ruin your business. Deploy New Relic APM >>> Deploy New Relic app performance management and know exactly >>> what is happening inside your Ruby, Python, PHP, Java, and .NET app >>> Try New Relic at no cost today and get our sweet Data Nerd shirt too! >>> http://p.sf.net/sfu/newrelic-dev2dev >>> ___ >>> The BASE general discussion mailing list >>> basedb-users@lists.sourceforge.net >>> unsubscribe: send a mail with subject "unsubscribe" to >>> basedb-users-requ...@lists.sourceforge.net -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net
Re: [base] DataQuery.order(...) 2.x API question
Getting closer - I have set up a custom dialect: package org.vectorbase.funcgen.hibernate; import org.hibernate.Hibernate; import org.hibernate.dialect.MySQLInnoDBDialect; import org.hibernate.dialect.function.StandardSQLFunction; public class CustomDialect extends MySQLInnoDBDialect { public CustomDialect() { super(); registerFunction("srand", new StandardSQLFunction("rand", Hibernate.DOUBLE) ); } } configure this in base.config And it all goes fine until some tmp tables run out of disk space or something (We have around 400k reporters FYI.) Looks like Nicklas has seen this before :-) https://forum.hibernate.org/viewtopic.php?f=1&t=994753 I reckon I'll get it to work sooner or later. cheers, Bob. On Tue, Oct 9, 2012 at 2:15 PM, Bob MacCallum wrote: > Hi Nicklas, > Thanks very much for the quick reply. I am *so* close to getting this > working - it seems that the argument-less rand() is available in HQL > but I will have to jump through a few hoops to provide the seeded > version. > I'll post an update later... > cheers, > Bob. > > On Tue, Oct 9, 2012 at 12:50 PM, Nicklas Nordborg wrote: >> On 2012-10-09 13:11, Bob MacCallum wrote: >>> Hi all, >>> >>> I'm trying to get a Reporter query from the BASE2.x API which is >>> basically "ORDER BY RAND(123)" - to get reproducible "random" order (I >>> can explain *why* in another email if anyone cares to know). >>> >>> I tried this >>> >>> DataQuery repQuery = >>> net.sf.basedb.core.Reporter.getQuery(); >>> >>> repQuery.order(Orders.asc(new HqlExpression("RAND(123)", "xxx"))); >>> >>> But it seems that HqlExpression isn't for public use. I can't find >>> something like "Expressions.hql". >> >> This can be found in net.sf.basedb.core.query.Hql.expression(...) >> >>> Could somebody please save me a ton of time and tell me how to put >>> some arbitrary SQL into the ORDER BY clause of a DataQuery? >> >> Note that it is not SQL that is needed but HQL (Hibernate Query >> Language). Don't know if RAND() is available in HQL. If not, it should >> be possible to create a custom Hibernate dialect and use that in the >> 'db.dialect' setting in base.config. I have never done that so I don't >> know any details or how hard/easy it is. >> >> Another possibility is to use Reporter.getDynamicQuery() instead which >> lets you work with SQL, but code-wise it's a bit harder to work with >> since you have to process the result row-by-row and column-by-column >> instead of as ReporterData objects. >> >> /Nicklas >> >> >> -- >> Don't let slow site performance ruin your business. Deploy New Relic APM >> Deploy New Relic app performance management and know exactly >> what is happening inside your Ruby, Python, PHP, Java, and .NET app >> Try New Relic at no cost today and get our sweet Data Nerd shirt too! >> http://p.sf.net/sfu/newrelic-dev2dev >> ___ >> The BASE general discussion mailing list >> basedb-users@lists.sourceforge.net >> unsubscribe: send a mail with subject "unsubscribe" to >> basedb-users-requ...@lists.sourceforge.net -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net
Re: [base] DataQuery.order(...) 2.x API question
Hi Nicklas, Thanks very much for the quick reply. I am *so* close to getting this working - it seems that the argument-less rand() is available in HQL but I will have to jump through a few hoops to provide the seeded version. I'll post an update later... cheers, Bob. On Tue, Oct 9, 2012 at 12:50 PM, Nicklas Nordborg wrote: > On 2012-10-09 13:11, Bob MacCallum wrote: >> Hi all, >> >> I'm trying to get a Reporter query from the BASE2.x API which is >> basically "ORDER BY RAND(123)" - to get reproducible "random" order (I >> can explain *why* in another email if anyone cares to know). >> >> I tried this >> >> DataQuery repQuery = >> net.sf.basedb.core.Reporter.getQuery(); >> >> repQuery.order(Orders.asc(new HqlExpression("RAND(123)", "xxx"))); >> >> But it seems that HqlExpression isn't for public use. I can't find >> something like "Expressions.hql". > > This can be found in net.sf.basedb.core.query.Hql.expression(...) > >> Could somebody please save me a ton of time and tell me how to put >> some arbitrary SQL into the ORDER BY clause of a DataQuery? > > Note that it is not SQL that is needed but HQL (Hibernate Query > Language). Don't know if RAND() is available in HQL. If not, it should > be possible to create a custom Hibernate dialect and use that in the > 'db.dialect' setting in base.config. I have never done that so I don't > know any details or how hard/easy it is. > > Another possibility is to use Reporter.getDynamicQuery() instead which > lets you work with SQL, but code-wise it's a bit harder to work with > since you have to process the result row-by-row and column-by-column > instead of as ReporterData objects. > > /Nicklas > > > -- > Don't let slow site performance ruin your business. Deploy New Relic APM > Deploy New Relic app performance management and know exactly > what is happening inside your Ruby, Python, PHP, Java, and .NET app > Try New Relic at no cost today and get our sweet Data Nerd shirt too! > http://p.sf.net/sfu/newrelic-dev2dev > ___ > The BASE general discussion mailing list > basedb-users@lists.sourceforge.net > unsubscribe: send a mail with subject "unsubscribe" to > basedb-users-requ...@lists.sourceforge.net -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net
Re: [base] DataQuery.order(...) 2.x API question
On 2012-10-09 13:11, Bob MacCallum wrote: > Hi all, > > I'm trying to get a Reporter query from the BASE2.x API which is > basically "ORDER BY RAND(123)" - to get reproducible "random" order (I > can explain *why* in another email if anyone cares to know). > > I tried this > > DataQuery repQuery = > net.sf.basedb.core.Reporter.getQuery(); > > repQuery.order(Orders.asc(new HqlExpression("RAND(123)", "xxx"))); > > But it seems that HqlExpression isn't for public use. I can't find > something like "Expressions.hql". This can be found in net.sf.basedb.core.query.Hql.expression(...) > Could somebody please save me a ton of time and tell me how to put > some arbitrary SQL into the ORDER BY clause of a DataQuery? Note that it is not SQL that is needed but HQL (Hibernate Query Language). Don't know if RAND() is available in HQL. If not, it should be possible to create a custom Hibernate dialect and use that in the 'db.dialect' setting in base.config. I have never done that so I don't know any details or how hard/easy it is. Another possibility is to use Reporter.getDynamicQuery() instead which lets you work with SQL, but code-wise it's a bit harder to work with since you have to process the result row-by-row and column-by-column instead of as ReporterData objects. /Nicklas -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net
[base] DataQuery.order(...) 2.x API question
Hi all, I'm trying to get a Reporter query from the BASE2.x API which is basically "ORDER BY RAND(123)" - to get reproducible "random" order (I can explain *why* in another email if anyone cares to know). I tried this DataQuery repQuery = net.sf.basedb.core.Reporter.getQuery(); repQuery.order(Orders.asc(new HqlExpression("RAND(123)", "xxx"))); But it seems that HqlExpression isn't for public use. I can't find something like "Expressions.hql". Could somebody please save me a ton of time and tell me how to put some arbitrary SQL into the ORDER BY clause of a DataQuery? many thanks, Bob. -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ The BASE general discussion mailing list basedb-users@lists.sourceforge.net unsubscribe: send a mail with subject "unsubscribe" to basedb-users-requ...@lists.sourceforge.net