David,
Thanks for the insights.
> The main argument is that programs written in C are faster than
> programs written in Java. I'm not an expert in this, but years ago
> when this was a big debate and a lot of work in this area was being
> done on Java (back when JRockit actually had a market because the
> Sun JVMs were pretty slow) I did quite a bit of reading and
> following the news.
I do C for machines big and small. Many flavors of C for several OSes out
there, including Windows.
I've done Java from the time it was first introduced, when it was new-fangled. I think we started
playing with Java at about the same time?
My frustrations with C, and Java, have a long history. It may not be obvious here, but I have
always been a strong proponent of C++ over C (or Java over faster specialized languages). I
actually do agree with you that C isn't faster in many aspects.
> The idea that C is faster than Java may have been true 5-6 years
> ago, but it is not true any more. Java has WAY more efficient memory
> management with asynchronous and multi-threaded de-allocation
> combined with efficient real-time allocation.
Multi-threaded stuff in Java (including for garbage collection) existed many
years ago, since 1.2.
As you had implied before, stale cache entries (and also "wrong" cache clearing) will impact GC
performance a lot. The multi-threaded de-allocation amounts to little or nothing on its own. The
improvements come from better algos based on heuristics (95-99% accuracy within reasonable
timeframe). It's the field of indeterministic algos that solve complex problems (to 95%
optimality) much faster than deterministic algos do (to 100% optimality). Note that there must be
"critical mass" or "critical level of complexity" for such GCs to work well; still best to do
simple algos with deterministic approach.
The idea that C is faster than Java is mainly due to the fact that programmers can control memory
allocation in C. In Java, programmers are at the mercy of an indeterministic factor. Even today,
the fact that C programmers can clearly (deterministically) plan out memory usage makes for
extremely fast and specialized codes. Small and specialized codes have insufficient level of
complexity for the heuristics approach.
However, though C is technically faster "at the denominator components", the entire software may
not seem faster to the end-user (fallacy of composition?). The misconception that C is
faster/slower cuts both ways: some think that Java is faster than C now when it still isn't;
others think that faster C makes for faster softwares when it doesn't always.
Concurrency in software is what gives end-users a good (fast) experience. Concurrency is possible
in C, but cumbersome to code. To be in the business of software development, we need something
easier to cook, to lower costs drastically. Today, I would program phones (and many small
machines) with Java. I gave up C for C++ long ago (except for cheaper machines with OSes that
still need C).
> Most tight loop code in Java is just as fast as in C because of
> Just-In-Time (JIT) compilation features.
The speed of a C program (or any software) depends on the quality of the compiler. C speed varies
a great deal among different compilers for the same OS, true too for different OSes. True for Java
as well, but let's just say "Java" to mean the language plus its fastest VMs.
While Java has reduced its loop inefficiencies, loops are the easiest things to predictably
improve. Your idea that stale caches is possibly one of the biggest stumbling blocks is correct.
Even the Just-In-Time term will give you a hint about its comparison to All-Compiled-Prior.
Really, I do wish we program games and mission-critical stuff with Java; would cut lots of costs
for me.
> Just search around the internet... LOTS of work has been done on
> this and there are many side-by-side tests.
Some tests are made for certain situations, if you get what I mean. But then, situations are what
software usage pattern is all about. So I do agree with you that C is not necessarily faster.
> However, C versus Java is somewhat irrelevant because getting data
> locally from memory no matter the language (even a scripting
> language) is 1-3 orders of magnitude (10-1000 times) faster than
> communication over a network.
You were discussing caching in C and in Java. A C program that accesses its cache is faster than a
Java program that accesses its Java cache.
I know, you're still thinking about distributed app server and database.
> Even round-trips to a local hard disk is usually faster (especially
> for random access to small amounts of data where no streaming or
> larger blocks are possible).
Faster than over wire? Yes.
Faster than Java accessing cache in memory? Strangely, perhaps (though unlikely I think). Caches
are increasingly becoming the deciding factor in computing speeds today. Hard disks have cache now.
> So, yeah, all down to basic hardware stuff for the biggest performance
> impact.
True, especially when hardware is (is still?) getting cheaper at a faster rate than software.
Which is why I don't find "distributing" the app server and RDBMS apart a very common solution
nowadays. Unless the geographical locations of both are truly far apart, I'd rather not separate
the 2 for the sake of separating.
> There are some entity engine performance tests in the WebTools
> webapp that compare caching to database performance. If you want to
> test database caching versus entity engine caching...
As I said, the main plus of Entity Engine caching for me is the ability to programmatically
specify caching strategies to a very flexible degree. I wouldn't do the above test, since I can't
tweak database caching in similarly useful ways. Well, we theoretically can, but why would I want
to learn every RDBMS's caching trick out there when I can just learn a single caching engine in
OFBiz? Don't know if you missed this point in my original post to Ritz123.
About "network" in terms of localhost, it's really accessing the memory (database cache). Given
that OFBiz will be accessing memories in both cases, it boils down to which caching algo is more
efficient. (Which reminds me, I might want to play with OFBiz's caching algo at some point.)
Actually, I see your point. OFBiz will still need to use its network codes (JDBC) to talk to the
local database cache.
> BTW, the session management in OFBiz is handled by Tomcat, and it is
> in memory on the app server (or depending on configuration it may
> save sessions locally to the hard disk for larger session data
> loads).
Yeah, I forgot about that. Thanks.
Shouldn't session stuff be in database? We currently have server hits histories (in database)
linked to sessions, yet sessions are in Tomcat files. You just reminded me to handle sessions with
JDBC. Thanks.
I counted about 8 times (or just about every paragraph or section) where I agreed with you, or saw
your point. Maybe it's a habit to see the other side of things, to meet ideas at least halfway, to
pick out the best in people or answers, to second people's observations/opinions often (started
out seconding Chris). I don't seem to get that from you a whole lot. Or maybe I just need to
express myself better.
I hope the above clears up the common misconceptions (on ML) about C being faster/slower than
Java, as well as clarify to you my stance on C vs Java.
As a means to an ends, I often simply aim to spell out the truth about OFBiz, while emphasizing
the best useful parts.
Jonathon
David E Jones wrote:
Jonathon,
Now that you've explained how you got to the conclusion of caching in
the database is faster than caching in the app server, you comments make
more sense and it is only now possible to answer your question with any
appropriate detail.
The main argument is that programs written in C are faster than programs
written in Java. I'm not an expert in this, but years ago when this was
a big debate and a lot of work in this area was being done on Java (back
when JRockit actually had a market because the Sun JVMs were pretty
slow) I did quite a bit of reading and following the news. The idea that
C is faster than Java may have been true 5-6 years ago, but it is not
true any more. Java has WAY more efficient memory management with
asynchronous and multi-threaded de-allocation combined with efficient
real-time allocation. Most tight loop code in Java is just as fast as in
C because of Just-In-Time (JIT) compilation features. Just search around
the internet... LOTS of work has been done on this and there are many
side-by-side tests.
However, C versus Java is somewhat irrelevant because getting data
locally from memory no matter the language (even a scripting language)
is 1-3 orders of magnitude (10-1000 times) faster than communication
over a network. Even round-trips to a local hard disk is usually faster
(especially for random access to small amounts of data where no
streaming or larger blocks are possible).
So, yeah, all down to basic hardware stuff for the biggest performance
impact.
There are some entity engine performance tests in the WebTools webapp
that compare caching to database performance. If you want to test
database caching versus entity engine caching just setup your database
to turn on caching for a certain table and then from the WebTools main
page click on the "Entity Engine" link under the "Performance Tests"
heading.
BTW, the session management in OFBiz is handled by Tomcat, and it is in
memory on the app server (or depending on configuration it may save
sessions locally to the hard disk for larger session data loads).
-David
On Apr 19, 2008, at 4:25 AM, Jonathon -- Improov wrote:
> The only point of caching in any case is performance. There are bad
> side-effects of caching, so performance must be more important than
> other things (like possibility of stale cache, etc).
I didn't suggest that OFBiz's caching makes things worse. Stale caches
must be handled correctly, or it becomes a bug, no compromise there.
In case you did mean "handled stale caches", then stale caches will
invariably mean less cache performance. Ranking performance much
higher in importance than stale caches isn't gonna make cache
performance any better in the face of stale caches.
But I get what you mean. Performance could be more important than high
RAM prices, if they are high.
> What I'm baffled by, and the only reason I responded, was the idea
> that caching in the database is _faster_ than local caching on the
> app server. Any time you do a database round trip you're talking
> about serialization of data, network communication, and
> deserialization... and all of that in both directions. Even a simple
> lookup where the database has it cached will take a few
> milliseconds.
First, DBCP removes a huge load of that "round trip". DBCP is
implemented correctly in OFBiz, yes?
Second, caching written in C is definitely faster than that written in
Java (OFBiz). That's the usual argument thrown to the ML, yes? I was
addressing that common misconception (calling it what it is).
> This really has nothing to do with OFBiz, it's all in the network
> and hardware that this performance impact comes in. These are
> usually huge factors in designing distributed systems.
Ritz123 didn't ask about a distributed system. Or maybe I should be
expeled from geekdom for not assuming all possibilities.
OFBiz's session management is in the database anyway, so it makes
sense to bundle the database together with OFBiz in the same machine.
Must be a really huge business that requires specialization of
hardware. Unless, of course, we are looking to under-utilize now in
the name of "future projected growth".
> So, I guess the point of my questions is how could remote caching
> ever be faster than local caching? In other words, I was asking
> because I must be misunderstanding something as this doesn't seem to
> make any sense.
That misconception really has nothing to do with network overheads.
Given Java's garbage collection, and given C's sleeker memory
management, accessing a local Java-based cache may not be faster (if
at all) than hitting a local port for data via a C program. No, I
haven't tested this recently. Hopefully my original answer was clear
enough to suggest that I wouldn't try to test it, since tweaking a
RDBMS's cache is not nearly as useful working OFBiz's cache.
> Not sure what you mean by "Oh my goodness. Not again." It's really
> not that sensational, just basic computer stuff, really quite boring
> and plain actually.
Then I'm really baffled why you wouldn't spell out that boring, basic
and plain answer for us all, rather than asking "have you tested it
110%? are you doubly sure? is that your final answer? have you called
your lifeline? do you need more hints?".
Try to look for the good parts in other people's answers. I'm trying
to offer as much help as I can on OFBiz, without misleading anyone
into thinking OFBiz is all rose and no thorns. If you spot holes, you
could help out by filling in for me?
Jonathon
David E Jones wrote:
The only point of caching in any case is performance. There are bad
side-effects of caching, so performance must be more important than
other things (like possibility of stale cache, etc).
What I'm baffled by, and the only reason I responded, was the idea
that caching in the database is _faster_ than local caching on the
app server. Any time you do a database round trip you're talking
about serialization of data, network communication, and
deserialization... and all of that in both directions. Even a simple
lookup where the database has it cached will take a few milliseconds.
This really has nothing to do with OFBiz, it's all in the network and
hardware that this performance impact comes in. These are usually
huge factors in designing distributed systems.
So, I guess the point of my questions is how could remote caching
ever be faster than local caching? In other words, I was asking
because I must be misunderstanding something as this doesn't seem to
make any sense.
Not sure what you mean by "Oh my goodness. Not again." It's really
not that sensational, just basic computer stuff, really quite boring
and plain actually.
-David
On Apr 19, 2008, at 2:37 AM, Jonathon -- Improov wrote:
Oh my goodness. Not again. I thought someone else posted his tests
of retrieval speeds regarding this?
Call it what it is, and we'll always be alert to necessities for
change. Side note: No point telling everybody "it's all there, you
just can't find it", when they might have the ability to actually
confirm without doubt that parts of functionalities are not there.
I don't have time to write every reason there is for OFBiz's
caching, please. Of course it's good to be able to specifically
spell out to OFBiz which entities you want cached; RDBMS do not
always know (if at all) which entities should be cached more. The
designer of the software will know the usage pattern the software is
designed for. So how many points do I get for that answer? :) Sigh.
So would you mind listing all the reasons for OFBiz's caching of the
database entries? :) You designed it, I believe. You would know.
Thanks!
Jonathon
David E Jones wrote:
On Apr 19, 2008, at 2:04 AM, Jonathon -- Improov wrote:
Chris is right that dynamic view entities are exactly like view
entities. Both run through the same Entity Engine. Caching by
OFBiz is also good, though much slower than RDBMS-native caches;
Really? What would be the point of local caching on the app server
then? (hint: it definitely isn't for database independence)
Have you actually done tests on this to see what typical times look
like, even for a database running on the same machine?
-David