On Mon, 22 Aug 2005, Kevin Burton wrote:
assume other databases) this has a significant performance advantage.
Every SQL call costs about 1ms so for a list of 1000 items this can
save 999ms.
I'd like to know more about this. What exactly does it do ? How is it
implemented ?
If you have a list of 1000x items, and you call getFoo() on each of
them, and FOO_ID is a FK then Torque will do a SELECT * FROM FOO WHERE
ID = N and for each item.
We simply do a SELECT * FROM FOO WHERE ID IN (...) and then load this
into a HashMap and then do a join on the client.
Very fast...
This is definitely very useful and should be included into Torque in my
opinion.
I admit I was confused at first at the name you gave it. Is it already
used elswhere ? (I could not get to the meaning of "pre" in this context,
but it means "before accessing the getters" in the context, in the
line of "prefetch"), but still to me a more speaking name of the method
would have been "readJoined". But this may be a matter of taste.
I wondered what the API was, but now I have just seen the example that you
give at the bottom of your mail:
FooPeer.prejoin( FooPeer.RESOURCE_ID, BarPeer.VALUE )
.p()
.prejoin( FooPeer.FEED_ID )
.prejoin( CatPeer.RESOURCE_ID, BarPeer.VALUE )
and I do not understand it. I thought you would have to pass a list of Foo
objects in somewhere ? So I would have guessed that instead of having
prejoin(String, String), you would need to have prejoin(List, String,
String) somewhere, where the List contains a List of Foo objects ?
Also, what kind of Objects do you return with the resulting call so that
you can do prejoin() on it again ? And what does the p() method and the
prejoin(String) method do ?
? support for prepared statements (which are MUCH faster in MySQL)
Torque is using prepared statements for updates and deletes, and there is
some support for prepared statements on selects also. However, the support
of prepared statements for selects is quite limited at the moment, so if
you can do better, why not ? But merging this will be a real mess.
Yeah.. true. And we developed a new query API too which I'm not too
sure about.
The API just generates SQL and supports prepared statemetns. The
Criteria API couldn't do what I wanted and I needed something I could
extend easily.
If you ask for my opinion, Torque will stick with Criteria at the moment,
just because people are used to it. But it would interest me what problems
you had with Criteria and how you tackled them. Perhaps some of this can
be used in Criteria ?
? when performing an UPDATE/INSERT only send columns that have been
modified
...
Maybe if some people are interested, one could make it
configurable via a generator option.
Yes. That's what we did actually. It will depend on the implementor I think.
This would also solve the problem of
backward compatibility.
Yeah.. totally. This is very important.
Ok, if its configurable, my personal opinion is we should take it in.
Maybe some other developers could let us know what they think ?
? client-side join for IN clauses by specifying a List of BaseObjects
which have their getId() method called.
This sounds like a rather specialized thing to me. It sounds as if it
would only work on primary keys. Is that correct ?
I have an API that I can use to as an anonymous fuction basically
called a KeyMapper you can use for non--PKs. The default just uses
the PK.
It's very handy actually. I've used it for significant performance updates.
Hm, still not convinced about that. How standalone is it ? Meaning, do you
have to change much in Torque, or is it rather an add-on ? I feel I would
object if it means many changes in main Torque, but why not give the user
the possibility to use it or not if its quite standalone ?
? lazy result set processing. For example if you have a table with
10000 columns and you only call getColumnA() then we don't have to
fetch the remaining columns. This has a HUGE performance advantage at
runtime.
I'd like to know more about this. I would guess that the object would have
to keep the database connection open either until all of its getters are
called or until its destructor is called. Is that correct ?
Ah! yeah.. ha.. So it turns out that MySQL only returns the result as
a bytearray so you have all the data you need. you can parse out the
results ANYTIME you want. The trick was that the JDBC spec doesn't
allow this because once close() is called you're goin to get an
Exception.
I wrote a patch to the MySQL JDBC driver which is standard so that you
can use holdOpenResults=true and then use lazy parsing of the results.
That said though this may apply to other DBs as well.
Regardless it's a BIG performance update. 8-10x from my benchmarks.
I am sorry to say that, but in my opinion it sounds like a big hack, which
will not be transferable to other databases. It may be worth while if you
want extreme performance, but I think Torque should not go into the
business of patching db drivers.
? New database connection pool called BDCP which is simple to
debug/understand, FAST, and reliable. We had countless problems with
DBCP.
I am against having a db connection pool inside Torque. Reason is that a
db connection pool is a tool which is too general to be embedded in a
project like Torque. It should be a standalone project.
Sure.. maybe. I could put BDCP somewhere else I guess. I'd like to
include the adapter though.
If you find a place to host it so that it is publicly accessible, we'll do
that. Maybe you could even ask the dbcp people if they are interested ?
? save() should update the identity of the object with ID that was
generated on the server.
Not sure if I grasp what you mean by this. You mean if an id is
autoincrement, and you save a new object, then the id should be updated in
memory ? This is already the case. But maybe I misunderstood you. Please
correct me if I'm wrong.
Yeah.. that way you can update FK references for that objects ID. Is
that already in the current code?
Yes, the id of the objects change in memory when they have been assigned
on a doInsert. Also the ForeignKeys in the collections stored by this
object are assigned.
? BETA support for memcached. This means you can call retrieveById and
the prejoin API and have the results come from memcache and NOT from a
SQL box. This should theoretically be much faster but I'm still working
on it. The results were actually 4x slower but I'm optimistic that I
can have this fixed.
There is an experimental interface for using JCS as memory cache in Torque
right now. You have to set the generator option useManagers=true for it. I
have no big experience in this part of Torque, maybe someone else who is
using the managers should comment on it.
OK. Maybe I can look at hte implementation when I need memcached
again. I don't need it right now.
? REPLACE support for MySQL
As a non-mysql user, what does REPLACE do ? Is this standard SQL ? Does it
also work on other databases ?
Hm.. I don't know if it's standard but honestly I need an API to use
non-standard SQL. MySQL has a bunch of extensions that are just
required for building a decent DB app.
That's another reason I did the custom query API I was talking about
because with the Criteria API I couldn't do anything nonstandard.
On the query side, you can always use criteria.custom for nonstandard
queries. On the update side, never thought about it. The only problem that
I see at the moment is that it is problematic with prepared statements
because there is no setFunction() or the like in prepared statements (at
least I did not find it).
It would be nice to have a hook on the update side to hook in custom SQL.
Maybe something exists in the depths of Torque, I never digged for it.
In my opinion, this would be much preferable over suppurting custom stuff
for mysql.
? Support for working in MASTER/SLAVE environments when working
with MySQL. save() and doUpdate, doInsert, etc work on a dedicated
connection to the master while SELECTs perform on another dedicated
connection. The SLAVE connection could also be used with a load balancer
if necessary.
Again, is this mysql-specific ?
It would be for any DB that's MASTER/SLAVE oriented. I just altered
the generator to have another DB pool and then I used that on save().
It's not too much work to add this and keep it non-mysql specific.
If it is made configurable with default=off (so the standard user
does not see it), I think it should be added.
? doSelectFirstRecord which returns the object type instead of just
Object when you know there will only ever be one result. This is just
syntactic sugar and prevents casting.
Personally, I am undecided on that. I do not mind casting too much, and it
adds still another method to the Peer class, making them even more
stuffed. On the other hand, it might come in handy for some users.
Then again, it is not difficult to implement this in the templates, so if
someone has a real need for it, no problem to implement it. Anyone else
cares to comment ?
Hm. Wonder if there' s a way to extend the generators to support
plugins. I was thinking about this all along as altering the
generator to add app-specific code in the Base* peers in an easy
manner would be pretty sweet.
The same thought lingered in the back of my mind. It should not be
problematic for appending stuff, only a convention is needed how add-on
templates should be named and where to put them. This would be a great
feature.
? Support for a HashList mechanism where I can take a left join and make
it into a Map where the keys are the left most column and the values are
a list of all the items in that key. This really helps for tree
structures since SQL can't return hierarchical structures. This
mechansim only supports on level of nesting right now.
I do not understand this right now. Can you give an example ?
Sure.. Let's say you want to return a folder structure in a result set:
foldera | foo
foldera | bar
folderb | cat
folderb | dog
Then you'd get a HashList like:
foldera: (foo, bar)
folderb: (cat,dog)
Then you get an API on top of a list.
? doSelectAsHashMap which can take a column,
and a query, and return the result as a HashMap instead of just a
regular List.
Same comment as on doSelectFirstRecord. Might come in handy some time, but
adds yet another method. What happens if the column is not the promary
key, so that you might have multiple objects for one key ?
You can specific a column and I've done that. I acutally needed this
code for the prejoin support which provided a great deal of extra
flexibility.
Why do you use a HashMap to store this ? Why don't you use a list of
Folder objects and store the retrieved objects in the appropriate collections ?
Some organizing stuff at the end:
As you might have noticed from the news and status list, we are currently
close to a 3.2 release of Torque. Your Stuff will definitely not be in
3.2. Also, we have moved from CVS to SVN a few weeks ago and the structure
of the project was changed during that project, too. Make sure that you
use the new structure, but do not rely on anything you read about the
project structure in the source repository except what you find in
http://db.apache.org/torque/subversion.html
We are in the process of updating the web site, but this may take some
days before it is completed.
Then, as your contributions are very significant contributions, we would
need to have a CLA from you and, if this is work done for your company,
from your company as well. See
http://www.apache.org/licenses/#clas
for information.
Thomas
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]