Here are the exact numbers for loading the raw data from Lucene EntityStore:

Total number of Entity states: 9792
Total number of Properties: 691602

The Lucene index uses ~33.5MB on disk.

Loading all states and accessing each property (doing deserialization of the actual Java types, including Geometries):

  first run: 2151ms
  second run (warm Lucene cache): 1809ms

and Qi4J:

  first run: 35619ms
  second run (entities in UoW cache): 1544ms;

(this loads few more properties (707676) since the store does not load null 
values)

OpenJDK 1.7.0_01 on 32bit Linux.

-Falko


Falko Bräutigam schrieb:
Hi Niclas, thanks for the response.

Niclas Hedhman schrieb:
2011/12/12 Falko Bräutigam <[email protected]>:

It takes ~30s to load 10.000 entities (consisting of ~100k ValueComposites of different types). After loading all this eats up ~100MB of RAM. This is
to slow and to much.

1. Which store are you using? Neo4j *may* use less RAM than for
instance JDBM. Loading speed is highly EntityStore dependent, and
whether you are using indexing to retrieve those 10,000 entities. The
"raw" speed of Qi4j is probably ~10k entities per second, slowing down
as serialization and I/O become factors.

I'm using Lucene as EntityStore. Each entity is an document. I will test raw performance of the store and send numbers.

No indexing is involved here. The query loads the entire database.

I'm using Qi4j 1.0. (Yes, I know it's old but I don't have the time to port my patches and test all together for every Qi4j release). Anyhow, do you think load speed differs that much between v1.0 and 1.4?

2. Not sure what you are saying. 100,000 values in total? So on
average, each value taken 1kB is "too much"? Well, what is the
composition of those values, otherwise it is hard to analyze.

100,000 values in total. Each entity consists of ~10 values. Each entity takes 10kB and each value takes 1kb on average.

But I think that the general problem is deep inside Qi4j's data
management, which relies on 4 HashMaps that probably occupy a few kB
each, quickly eating away at the memory consumption. Perhaps some
"hinting" system could improve the consumption size. Perhaps even more
clever lookup mechanisms, especially when the number of properties per
entity is low.

10K entities are not much for a GIS application. Given the current memory
foodprint (and 1GB Java heap) not even 10 users can work with the
application concurrently.

Where does the 1GB heap comes from in a multi-user application? It is
hard to discuss in the abstract when your argument is based on
concrete examples.

Sorry, I just wanted to give an example. 1GB is the just "usual" heap size of our deployments. This is fairly big for our customers but no problem. A problem *would* be if the application would require 10GB+ of heap.

Usually a GIS application works in a pipelined mode when rendering features (entities). Memory is never a problem with that architecture. Unfortunatelly Qi4j holds all entities of an UoW in memory. We discussed this earlier on this list. So I added a cache SPI to UnitOfWorkInstance. This works but it does not actually cure the problem of memory consumption because of the time
needed to re-instantiate the entities.

Interestingly enough, I was sketching on another architecture today,
where I observed the "read-only" case being very separated (well, it
is CQRS related) and UnitOfWork isolation not being an issue. If 2.x
would move towards a "read-only"-mode of UoWs, could that solve
"streaming", i.e. as soon as you are done "rendering" you simply drop
the Entity and it is not held in memory?

Exactly. As long as the Entity is not modified it is subject to be GCed anytime.

I'm not quite sure if the OuW API needs to be changed (or maybe I don't get the idea of "read-only" UoW). Using a copy-on-write cache to handly instances internally the UoW API does not need to be changed and all the memory (and loading) problems should go away. Am I missing something here?

If so (to help you in 1.4), does that mean that your "render task"
could be broken up to a series of smaller chunks rendered one at a
time, or are there other constraints preventing this.

This is exactly how rendering is done (for other, non-Qi4j data stores). Features are fetched in chunks from underlying store. They are passing a pipeline of chained processors, the renderer is the last processor. A soon as the feature is rendered it is subject to be GCed.

The problem is that, for this use case I don't need the domain specific layer (on top of the raw data states) at all. Or at least I don't need the entity Entity/Mixin/Concern stuff in each and every case. It depends on the processors in the pipeline. The rendering itself is not domain specific. But a Concern might by changing a Property, which *could* influence rendering however. So I was thinking that detaching the modelling layer of Qi4j from the raw data could be great. Then one could re-use the same Composite instance to access several (many) entity states. Sort of flyweight.

-Falko

--
Falko Bräutigam
http://polymap.org/polymap3

_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev

Reply via email to