Here are the exact numbers for loading the raw data from Lucene EntityStore:
Total number of Entity states: 9792
Total number of Properties: 691602
The Lucene index uses ~33.5MB on disk.
Loading all states and accessing each property (doing deserialization of the
actual Java types, including Geometries):
first run: 2151ms
second run (warm Lucene cache): 1809ms
and Qi4J:
first run: 35619ms
second run (entities in UoW cache): 1544ms;
(this loads few more properties (707676) since the store does not load null
values)
OpenJDK 1.7.0_01 on 32bit Linux.
-Falko
Falko Bräutigam schrieb:
Hi Niclas, thanks for the response.
Niclas Hedhman schrieb:
2011/12/12 Falko Bräutigam <[email protected]>:
It takes ~30s to load 10.000 entities (consisting of ~100k
ValueComposites
of different types). After loading all this eats up ~100MB of RAM.
This is
to slow and to much.
1. Which store are you using? Neo4j *may* use less RAM than for
instance JDBM. Loading speed is highly EntityStore dependent, and
whether you are using indexing to retrieve those 10,000 entities. The
"raw" speed of Qi4j is probably ~10k entities per second, slowing down
as serialization and I/O become factors.
I'm using Lucene as EntityStore. Each entity is an document. I will test
raw performance of the store and send numbers.
No indexing is involved here. The query loads the entire database.
I'm using Qi4j 1.0. (Yes, I know it's old but I don't have the time to
port my patches and test all together for every Qi4j release). Anyhow,
do you think load speed differs that much between v1.0 and 1.4?
2. Not sure what you are saying. 100,000 values in total? So on
average, each value taken 1kB is "too much"? Well, what is the
composition of those values, otherwise it is hard to analyze.
100,000 values in total. Each entity consists of ~10 values. Each entity
takes 10kB and each value takes 1kb on average.
But I think that the general problem is deep inside Qi4j's data
management, which relies on 4 HashMaps that probably occupy a few kB
each, quickly eating away at the memory consumption. Perhaps some
"hinting" system could improve the consumption size. Perhaps even more
clever lookup mechanisms, especially when the number of properties per
entity is low.
10K entities are not much for a GIS application. Given the current
memory
foodprint (and 1GB Java heap) not even 10 users can work with the
application concurrently.
Where does the 1GB heap comes from in a multi-user application? It is
hard to discuss in the abstract when your argument is based on
concrete examples.
Sorry, I just wanted to give an example. 1GB is the just "usual" heap
size of our deployments. This is fairly big for our customers but no
problem. A problem *would* be if the application would require 10GB+ of
heap.
Usually a GIS application works in a pipelined mode when rendering
features
(entities). Memory is never a problem with that architecture.
Unfortunatelly
Qi4j holds all entities of an UoW in memory. We discussed this
earlier on
this list. So I added a cache SPI to UnitOfWorkInstance. This works
but it
does not actually cure the problem of memory consumption because of
the time
needed to re-instantiate the entities.
Interestingly enough, I was sketching on another architecture today,
where I observed the "read-only" case being very separated (well, it
is CQRS related) and UnitOfWork isolation not being an issue. If 2.x
would move towards a "read-only"-mode of UoWs, could that solve
"streaming", i.e. as soon as you are done "rendering" you simply drop
the Entity and it is not held in memory?
Exactly. As long as the Entity is not modified it is subject to be GCed
anytime.
I'm not quite sure if the OuW API needs to be changed (or maybe I don't
get the idea of "read-only" UoW). Using a copy-on-write cache to handly
instances internally the UoW API does not need to be changed and all the
memory (and loading) problems should go away. Am I missing something here?
If so (to help you in 1.4), does that mean that your "render task"
could be broken up to a series of smaller chunks rendered one at a
time, or are there other constraints preventing this.
This is exactly how rendering is done (for other, non-Qi4j data stores).
Features are fetched in chunks from underlying store. They are passing a
pipeline of chained processors, the renderer is the last processor. A
soon as the feature is rendered it is subject to be GCed.
The problem is that, for this use case I don't need the domain specific
layer (on top of the raw data states) at all. Or at least I don't need
the entity Entity/Mixin/Concern stuff in each and every case. It depends
on the processors in the pipeline. The rendering itself is not domain
specific. But a Concern might by changing a Property, which *could*
influence rendering however. So I was thinking that detaching the
modelling layer of Qi4j from the raw data could be great. Then one could
re-use the same Composite instance to access several (many) entity
states. Sort of flyweight.
-Falko
--
Falko Bräutigam
http://polymap.org/polymap3
_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev