Re: [qi4j-dev] Performance and RAM consumption

Falko Bräutigam Mon, 12 Dec 2011 10:25:58 -0800

Here are the exact numbers for loading the raw data from Lucene EntityStore:


Total number of Entity states: 9792
Total number of Properties: 691602

The Lucene index uses ~33.5MB on disk.

Loading all states and accessing each property (doing deserialization of theactual Java types, including Geometries):


  first run: 2151ms
  second run (warm Lucene cache): 1809ms

and Qi4J:

  first run: 35619ms
  second run (entities in UoW cache): 1544ms;

(this loads few more properties (707676) since the store does not load null 
values)

OpenJDK 1.7.0_01 on 32bit Linux.

-Falko


Falko Bräutigam schrieb:

Hi Niclas, thanks for the response.

Niclas Hedhman schrieb:
2011/12/12 Falko Bräutigam <[email protected]>:
It takes ~30s to load 10.000 entities (consisting of ~100kValueCompositesof different types). After loading all this eats up ~100MB of RAM.This is
to slow and to much.
1. Which store are you using? Neo4j *may* use less RAM than for
instance JDBM. Loading speed is highly EntityStore dependent, and
whether you are using indexing to retrieve those 10,000 entities. The
"raw" speed of Qi4j is probably ~10k entities per second, slowing down
as serialization and I/O become factors.
I'm using Lucene as EntityStore. Each entity is an document. I will testraw performance of the store and send numbers.
No indexing is involved here. The query loads the entire database.
I'm using Qi4j 1.0. (Yes, I know it's old but I don't have the time toport my patches and test all together for every Qi4j release). Anyhow,do you think load speed differs that much between v1.0 and 1.4?
2. Not sure what you are saying. 100,000 values in total? So on
average, each value taken 1kB is "too much"? Well, what is the
composition of those values, otherwise it is hard to analyze.
100,000 values in total. Each entity consists of ~10 values. Each entitytakes 10kB and each value takes 1kb on average.
But I think that the general problem is deep inside Qi4j's data
management, which relies on 4 HashMaps that probably occupy a few kB
each, quickly eating away at the memory consumption. Perhaps some
"hinting" system could improve the consumption size. Perhaps even more
clever lookup mechanisms, especially when the number of properties per
entity is low.
10K entities are not much for a GIS application. Given the currentmemory
foodprint (and 1GB Java heap) not even 10 users can work with the
application concurrently.
Where does the 1GB heap comes from in a multi-user application? It is
hard to discuss in the abstract when your argument is based on
concrete examples.
Sorry, I just wanted to give an example. 1GB is the just "usual" heapsize of our deployments. This is fairly big for our customers but noproblem. A problem *would* be if the application would require 10GB+ ofheap.
Usually a GIS application works in a pipelined mode when renderingfeatures(entities). Memory is never a problem with that architecture.UnfortunatellyQi4j holds all entities of an UoW in memory. We discussed thisearlier onthis list. So I added a cache SPI to UnitOfWorkInstance. This worksbut itdoes not actually cure the problem of memory consumption because ofthe time
needed to re-instantiate the entities.
Interestingly enough, I was sketching on another architecture today,
where I observed the "read-only" case being very separated (well, it
is CQRS related) and UnitOfWork isolation not being an issue. If 2.x
would move towards a "read-only"-mode of UoWs, could that solve
"streaming", i.e. as soon as you are done "rendering" you simply drop
the Entity and it is not held in memory?
Exactly. As long as the Entity is not modified it is subject to be GCedanytime.
I'm not quite sure if the OuW API needs to be changed (or maybe I don'tget the idea of "read-only" UoW). Using a copy-on-write cache to handlyinstances internally the UoW API does not need to be changed and all thememory (and loading) problems should go away. Am I missing something here?
If so (to help you in 1.4), does that mean that your "render task"
could be broken up to a series of smaller chunks rendered one at a
time, or are there other constraints preventing this.
This is exactly how rendering is done (for other, non-Qi4j data stores).Features are fetched in chunks from underlying store. They are passing apipeline of chained processors, the renderer is the last processor. Asoon as the feature is rendered it is subject to be GCed.
The problem is that, for this use case I don't need the domain specificlayer (on top of the raw data states) at all. Or at least I don't needthe entity Entity/Mixin/Concern stuff in each and every case. It dependson the processors in the pipeline. The rendering itself is not domainspecific. But a Concern might by changing a Property, which *could*influence rendering however. So I was thinking that detaching themodelling layer of Qi4j from the raw data could be great. Then one couldre-use the same Composite instance to access several (many) entitystates. Sort of flyweight.
-Falko


--
Falko Bräutigam
http://polymap.org/polymap3

_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev

Re: [qi4j-dev] Performance and RAM consumption

Reply via email to