Re: [base] DataQuery.order(...) 2.x API question

2012-10-09 Thread Nicklas Nordborg
On 2012-10-09 17:24, Bob MacCallum wrote:
> Just for closure, I am now in business.
>
> [mysqld]
> tmp_table_size = 5G
>
>
> Background for whoever cares:
>
> I'm running something reporter-wise in parallel (4 threads) and wanted
> roughly equal workload per "chunk" of reporters (some reporters have
> more experiments+data than others) so having them in random order
> before chunking does the trick.
>
> The tmp tables that the SELECT * FROM Reporters ORDER BY RAND(123)
> creates are 3.2G each (times 4) and while doing this there's some
> serious I/O clogging the machine but it pulls through.  If only gnu
> parallel had a -stagger [secs] option!


ORDER BY RAND() is known for performance issues once the number of rows 
grows large. I guess it is because MySQL need to create a temporary 
table with *all* the original data plus the random number for each row 
before the data can be sorted.

It might be more efficient to first do 'select id from Reporters' 
(you'll need to use the Reporter.getDynamicQuery() for this), put all 
id's in an array, randomize that in memory and then bring in the actual 
reporter data one by one (or 100 by 100 or something). That should at 
least avoid the temporary table issue since the id's can be pulled in 
directly from the primary key index. In theory it should be quicker, but 
I guess it has to be verified in practice as well.

/Nicklas


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net


Re: [base] DataQuery.order(...) 2.x API question

2012-10-09 Thread Bob MacCallum
Just for closure, I am now in business.

[mysqld]
tmp_table_size = 5G


Background for whoever cares:

I'm running something reporter-wise in parallel (4 threads) and wanted
roughly equal workload per "chunk" of reporters (some reporters have
more experiments+data than others) so having them in random order
before chunking does the trick.

The tmp tables that the SELECT * FROM Reporters ORDER BY RAND(123)
creates are 3.2G each (times 4) and while doing this there's some
serious I/O clogging the machine but it pulls through.  If only gnu
parallel had a -stagger [secs] option!




On Tue, Oct 9, 2012 at 3:46 PM, Bob MacCallum
 wrote:
> Getting closer - I have set up a custom dialect:
>
> package org.vectorbase.funcgen.hibernate;
> import org.hibernate.Hibernate;
> import org.hibernate.dialect.MySQLInnoDBDialect;
> import org.hibernate.dialect.function.StandardSQLFunction;
>
> public class CustomDialect extends MySQLInnoDBDialect {
>   public CustomDialect()  {
> super();
> registerFunction("srand", new StandardSQLFunction("rand",
> Hibernate.DOUBLE) );
>   }
> }
>
> configure this in base.config
>
> And it all goes fine until some tmp tables run out of disk space or something
> (We have around 400k reporters FYI.)
>
> Looks like Nicklas has seen this before :-)
> https://forum.hibernate.org/viewtopic.php?f=1&t=994753
>
> I reckon I'll get it to work sooner or later.
>
> cheers,
> Bob.
>
> On Tue, Oct 9, 2012 at 2:15 PM, Bob MacCallum
>  wrote:
>> Hi Nicklas,
>> Thanks very much for the quick reply.  I am *so* close to getting this
>> working - it seems that the argument-less rand() is available in HQL
>> but I will have to jump through a few hoops to provide the seeded
>> version.
>> I'll post an update later...
>> cheers,
>> Bob.
>>
>> On Tue, Oct 9, 2012 at 12:50 PM, Nicklas Nordborg  wrote:
>>> On 2012-10-09 13:11, Bob MacCallum wrote:
 Hi all,

 I'm trying to get a Reporter query from the BASE2.x API which is
 basically "ORDER BY RAND(123)" - to get reproducible "random" order (I
 can explain *why* in another email if anyone cares to know).

 I tried this

 DataQuery repQuery =
 net.sf.basedb.core.Reporter.getQuery();

 repQuery.order(Orders.asc(new HqlExpression("RAND(123)", "xxx")));

 But it seems that HqlExpression isn't for public use.  I can't find
 something like "Expressions.hql".
>>>
>>> This can be found in net.sf.basedb.core.query.Hql.expression(...)
>>>
 Could somebody please save me a ton of time and tell me how to put
 some arbitrary SQL into the ORDER BY clause of a DataQuery?
>>>
>>> Note that it is not SQL that is needed but HQL (Hibernate Query
>>> Language). Don't know if RAND() is available in HQL. If not, it should
>>> be possible to create a custom Hibernate dialect and use that in the
>>> 'db.dialect' setting in base.config. I have never done that so I don't
>>> know any details or how hard/easy it is.
>>>
>>> Another possibility is to use Reporter.getDynamicQuery() instead which
>>> lets you work with SQL, but code-wise it's a bit harder to work with
>>> since you have to process the result row-by-row and column-by-column
>>> instead of as ReporterData objects.
>>>
>>> /Nicklas
>>>
>>>
>>> --
>>> Don't let slow site performance ruin your business. Deploy New Relic APM
>>> Deploy New Relic app performance management and know exactly
>>> what is happening inside your Ruby, Python, PHP, Java, and .NET app
>>> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
>>> http://p.sf.net/sfu/newrelic-dev2dev
>>> ___
>>> The BASE general discussion mailing list
>>> basedb-users@lists.sourceforge.net
>>> unsubscribe: send a mail with subject "unsubscribe" to
>>> basedb-users-requ...@lists.sourceforge.net

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net


Re: [base] DataQuery.order(...) 2.x API question

2012-10-09 Thread Bob MacCallum
Getting closer - I have set up a custom dialect:

package org.vectorbase.funcgen.hibernate;
import org.hibernate.Hibernate;
import org.hibernate.dialect.MySQLInnoDBDialect;
import org.hibernate.dialect.function.StandardSQLFunction;

public class CustomDialect extends MySQLInnoDBDialect {
  public CustomDialect()  {
super();
registerFunction("srand", new StandardSQLFunction("rand",
Hibernate.DOUBLE) );
  }
}

configure this in base.config

And it all goes fine until some tmp tables run out of disk space or something
(We have around 400k reporters FYI.)

Looks like Nicklas has seen this before :-)
https://forum.hibernate.org/viewtopic.php?f=1&t=994753

I reckon I'll get it to work sooner or later.

cheers,
Bob.

On Tue, Oct 9, 2012 at 2:15 PM, Bob MacCallum
 wrote:
> Hi Nicklas,
> Thanks very much for the quick reply.  I am *so* close to getting this
> working - it seems that the argument-less rand() is available in HQL
> but I will have to jump through a few hoops to provide the seeded
> version.
> I'll post an update later...
> cheers,
> Bob.
>
> On Tue, Oct 9, 2012 at 12:50 PM, Nicklas Nordborg  wrote:
>> On 2012-10-09 13:11, Bob MacCallum wrote:
>>> Hi all,
>>>
>>> I'm trying to get a Reporter query from the BASE2.x API which is
>>> basically "ORDER BY RAND(123)" - to get reproducible "random" order (I
>>> can explain *why* in another email if anyone cares to know).
>>>
>>> I tried this
>>>
>>> DataQuery repQuery =
>>> net.sf.basedb.core.Reporter.getQuery();
>>>
>>> repQuery.order(Orders.asc(new HqlExpression("RAND(123)", "xxx")));
>>>
>>> But it seems that HqlExpression isn't for public use.  I can't find
>>> something like "Expressions.hql".
>>
>> This can be found in net.sf.basedb.core.query.Hql.expression(...)
>>
>>> Could somebody please save me a ton of time and tell me how to put
>>> some arbitrary SQL into the ORDER BY clause of a DataQuery?
>>
>> Note that it is not SQL that is needed but HQL (Hibernate Query
>> Language). Don't know if RAND() is available in HQL. If not, it should
>> be possible to create a custom Hibernate dialect and use that in the
>> 'db.dialect' setting in base.config. I have never done that so I don't
>> know any details or how hard/easy it is.
>>
>> Another possibility is to use Reporter.getDynamicQuery() instead which
>> lets you work with SQL, but code-wise it's a bit harder to work with
>> since you have to process the result row-by-row and column-by-column
>> instead of as ReporterData objects.
>>
>> /Nicklas
>>
>>
>> --
>> Don't let slow site performance ruin your business. Deploy New Relic APM
>> Deploy New Relic app performance management and know exactly
>> what is happening inside your Ruby, Python, PHP, Java, and .NET app
>> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
>> http://p.sf.net/sfu/newrelic-dev2dev
>> ___
>> The BASE general discussion mailing list
>> basedb-users@lists.sourceforge.net
>> unsubscribe: send a mail with subject "unsubscribe" to
>> basedb-users-requ...@lists.sourceforge.net

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net


Re: [base] DataQuery.order(...) 2.x API question

2012-10-09 Thread Bob MacCallum
Hi Nicklas,
Thanks very much for the quick reply.  I am *so* close to getting this
working - it seems that the argument-less rand() is available in HQL
but I will have to jump through a few hoops to provide the seeded
version.
I'll post an update later...
cheers,
Bob.

On Tue, Oct 9, 2012 at 12:50 PM, Nicklas Nordborg  wrote:
> On 2012-10-09 13:11, Bob MacCallum wrote:
>> Hi all,
>>
>> I'm trying to get a Reporter query from the BASE2.x API which is
>> basically "ORDER BY RAND(123)" - to get reproducible "random" order (I
>> can explain *why* in another email if anyone cares to know).
>>
>> I tried this
>>
>> DataQuery repQuery =
>> net.sf.basedb.core.Reporter.getQuery();
>>
>> repQuery.order(Orders.asc(new HqlExpression("RAND(123)", "xxx")));
>>
>> But it seems that HqlExpression isn't for public use.  I can't find
>> something like "Expressions.hql".
>
> This can be found in net.sf.basedb.core.query.Hql.expression(...)
>
>> Could somebody please save me a ton of time and tell me how to put
>> some arbitrary SQL into the ORDER BY clause of a DataQuery?
>
> Note that it is not SQL that is needed but HQL (Hibernate Query
> Language). Don't know if RAND() is available in HQL. If not, it should
> be possible to create a custom Hibernate dialect and use that in the
> 'db.dialect' setting in base.config. I have never done that so I don't
> know any details or how hard/easy it is.
>
> Another possibility is to use Reporter.getDynamicQuery() instead which
> lets you work with SQL, but code-wise it's a bit harder to work with
> since you have to process the result row-by-row and column-by-column
> instead of as ReporterData objects.
>
> /Nicklas
>
>
> --
> Don't let slow site performance ruin your business. Deploy New Relic APM
> Deploy New Relic app performance management and know exactly
> what is happening inside your Ruby, Python, PHP, Java, and .NET app
> Try New Relic at no cost today and get our sweet Data Nerd shirt too!
> http://p.sf.net/sfu/newrelic-dev2dev
> ___
> The BASE general discussion mailing list
> basedb-users@lists.sourceforge.net
> unsubscribe: send a mail with subject "unsubscribe" to
> basedb-users-requ...@lists.sourceforge.net

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net


Re: [base] DataQuery.order(...) 2.x API question

2012-10-09 Thread Nicklas Nordborg
On 2012-10-09 13:11, Bob MacCallum wrote:
> Hi all,
>
> I'm trying to get a Reporter query from the BASE2.x API which is
> basically "ORDER BY RAND(123)" - to get reproducible "random" order (I
> can explain *why* in another email if anyone cares to know).
>
> I tried this
>
> DataQuery repQuery =
> net.sf.basedb.core.Reporter.getQuery();
>
> repQuery.order(Orders.asc(new HqlExpression("RAND(123)", "xxx")));
>
> But it seems that HqlExpression isn't for public use.  I can't find
> something like "Expressions.hql".

This can be found in net.sf.basedb.core.query.Hql.expression(...)

> Could somebody please save me a ton of time and tell me how to put
> some arbitrary SQL into the ORDER BY clause of a DataQuery?

Note that it is not SQL that is needed but HQL (Hibernate Query 
Language). Don't know if RAND() is available in HQL. If not, it should 
be possible to create a custom Hibernate dialect and use that in the 
'db.dialect' setting in base.config. I have never done that so I don't 
know any details or how hard/easy it is.

Another possibility is to use Reporter.getDynamicQuery() instead which 
lets you work with SQL, but code-wise it's a bit harder to work with 
since you have to process the result row-by-row and column-by-column 
instead of as ReporterData objects.

/Nicklas


--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net


[base] DataQuery.order(...) 2.x API question

2012-10-09 Thread Bob MacCallum
Hi all,

I'm trying to get a Reporter query from the BASE2.x API which is
basically "ORDER BY RAND(123)" - to get reproducible "random" order (I
can explain *why* in another email if anyone cares to know).

I tried this

DataQuery repQuery =
net.sf.basedb.core.Reporter.getQuery();

repQuery.order(Orders.asc(new HqlExpression("RAND(123)", "xxx")));

But it seems that HqlExpression isn't for public use.  I can't find
something like "Expressions.hql".
Could somebody please save me a ton of time and tell me how to put
some arbitrary SQL into the ORDER BY clause of a DataQuery?

many thanks,
Bob.

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
basedb-users-requ...@lists.sourceforge.net