The PartitionRegionHelper.getLocalPrimaryData does not apply in this case
because it is a replicate region.

I would guess that the "frozen thread" is not interesting. It looks like
just a thread waiting to read from the network. Geode can have many threads
waiting for network messages and if the other side never sends one then it
could appear to be frozen.

You can configure your own implementation
of org.apache.geode.cache.util.ObjectSizer on your region (you should have
an attribute for it on gfe:replicated-region; I'm not sure what it is named
but look for one with "sizer" in its name). You could try a really simple
ObjectSizer (just have it return 1024 for example) and see if it takes care
of this performance problem.

I think what is happening in this case is that your data is stored in the
cache in serialized form. When your function calls "get" it needs to
deserialize the data and since it is an LRU geode calculates the new size
of the data since the deserialized form can be different than the
serialized form. When a function does a scan like this then it causes all
the data in the server it does the scan in to now be stored deserialized.

If your values were serialized with PDX and on your cache you set
"pdx-read-serialized=true" then doing the get will not change the form it
is stored in. It will cause the get to return a PdxInstance but you can
then call "getObject" on it and get your domain class.

Note that if your get was done from a different member then it would not
deserialize the data on the server. Instead the server processes the remote
get, finds the serialized form stored on that server, and sends it back to
the client. The client can then deserialize it on the client. If most of
your reads will be done remotely then you want your data stored on the
server in serialized form. Otherwise each remote read had to serialize the
data to send it back. But if most of your reads are done locally (for
example from a function) then it can be optimal to have it stored in
deserialized form if that is what the local read ends up needing.


On Thu, Aug 16, 2018 at 10:28 AM Anthony Baker <[email protected]> wrote:

> Hi Pieter!  Just to double-check, do you have any GC issues?  How big are
> your “big” objects?  What serialization approach are you using (Java /
> DataSerializable / PDX)?
>
> Anthony
>
>
> On Aug 16, 2018, at 7:09 AM, Michael Stolz <[email protected]> wrote:
>
> One thing to make sure of is that the function is only accessing data that
> is local to each of the nodes where it is running.
> To do this you must do something like this:
> Region<String, String> localPrimaryData =
> PartitionRegionHelper.getLocalPrimaryData(exampleRegion);
> Then you can iterate over the entries in this local Region.
>
> --
> Mike Stolz
> Principal Engineer, GemFire Product Lead
> Mobile: +1-631-835-4771
> Download the GemFire book here.
> <https://content.pivotal.io/ebooks/scaling-data-services-with-pivotal-gemfire>
>
> On Thu, Aug 16, 2018 at 4:37 AM, Pieter van Zyl <[email protected]
> > wrote:
>
>> Good morning.
>>
>> We are busy with a prototype to evaluate the use of Geode in our company.
>> Now we are trying to go through all our regions to perform some form of
>> validations. We are using a function to perform the validation.
>>
>> While iterating through the regions it seem to slow down dramatically.
>>
>> The total database has about 98 million objects. We fly through about 24
>> million in 1.5 minutes.
>>
>> Then we hit certain objects in a Region that are large and eveything
>> slows down. We then process about 10 000 entries every 1.5 hours.
>> We needed to set the server and locator timeouts so that we don't get
>> kicked off.
>>
>> The objects can be quit large.
>>
>> Using YourKit I can see the following:
>>
>> ValidationThread0  Runnable CPU usage on sample: 1s
>>   it.unimi.dsi.fastutil.objects.ReferenceOpenHashSet.rehash(int)
>> ReferenceOpenHashSet.java:578
>>   it.unimi.dsi.fastutil.objects.ReferenceOpenHashSet.add(Object)
>> ReferenceOpenHashSet.java:279
>>   org.apache.geode.internal.size.ObjectTraverser$VisitStack.add(Object,
>> Object) ObjectTraverser.java:159
>>   org.apache.geode.internal.size.ObjectTraverser.doSearch(Object,
>> ObjectTraverser$VisitStack) ObjectTraverser.java:83
>>
>> org.apache.geode.internal.size.ObjectTraverser.breadthFirstSearch(Object,
>> ObjectTraverser$Visitor, boolean) ObjectTraverser.java:50
>>
>> *org.apache.geode.internal.size.ObjectGraphSizer.size(Object,
>> ObjectGraphSizer$ObjectFilter, boolean) ObjectGraphSizer.java:98
>> org.apache.geode.internal.size.ReflectionObjectSizer.sizeof(Object)
>> ReflectionObjectSizer.java:66*
>>   org.apache.geode.internal.size.SizeClassOnceObjectSizer.sizeof(Object)
>> SizeClassOnceObjectSizer.java:60
>>
>> org.apache.geode.internal.cache.eviction.SizeLRUController.sizeof(Object)
>> SizeLRUController.java:68
>>
>> org.apache.geode.internal.cache.eviction.HeapLRUController.entrySize(Object,
>> Object) HeapLRUController.java:92
>>
>> org.apache.geode.internal.cache.entries.VersionedStatsDiskLRURegionEntryHeapLongKey.updateEntrySize(EvictionController,
>> Object) VersionedStatsDiskLRURegionEntryHeapLongKey.java:207
>>
>> org.apache.geode.internal.cache.VMLRURegionMap.beginChangeValueForm(EvictableEntry,
>> CachedDeserializable, Object) VMLRURegionMap.java:178
>>
>> org.apache.geode.internal.cache.VMCachedDeserializable.getDeserializedValue(Region,
>> RegionEntry) VMCachedDeserializable.java:119
>>
>> org.apache.geode.internal.cache.LocalRegion.getDeserialized(RegionEntry,
>> boolean, boolean, boolean, boolean) LocalRegion.java:1293
>>
>> org.apache.geode.internal.cache.LocalRegion.getDeserializedValue(RegionEntry,
>> KeyInfo, boolean, boolean, boolean, EntryEventImpl, boolean, boolean)
>> LocalRegion.java:1232
>>
>> org.apache.geode.internal.cache.LocalRegionDataView.getDeserializedValue(KeyInfo,
>> LocalRegion, boolean, boolean, boolean, EntryEventImpl, boolean, boolean)
>> LocalRegionDataView.java:43
>>   org.apache.geode.internal.cache.LocalRegion.get(Object, Object,
>> boolean, boolean, boolean, ClientProxyMembershipID, EntryEventImpl,
>> boolean, boolean, boolean) LocalRegion.java:1384
>>   org.apache.geode.internal.cache.LocalRegion.get(Object, Object,
>> boolean, boolean, boolean, ClientProxyMembershipID, EntryEventImpl,
>> boolean) LocalRegion.java:1334
>>   org.apache.geode.internal.cache.LocalRegion.get(Object, Object,
>> boolean, EntryEventImpl) LocalRegion.java:1319
>>   org.apache.geode.internal.cache.AbstractRegion.get(Object)
>> AbstractRegion.java:408
>>   org.rdb.geode.session.GeodeDatabaseSessionObject.lazyLoadField(String)
>> GeodeDatabaseSessionObject.java:240
>>
>> net.lautus.gls.domain.life.accounting.AccountingTransaction.lazyLoadField(String)
>> AccountingTransaction.java:1
>>   org.rdb.internal.aspect.PersistenceAspect.getField(JoinPoint, Object)
>> PersistenceAspect.java:68
>>
>> net.lautus.gls.domain.life.accounting.AccountingTransaction.thoroughValidate()
>> AccountingTransaction.java:33
>>
>> net.lautus.gls.tools.validation.ValidateDomainObjectScript.run(DatabaseSession,
>> PersistentDomainObject) ValidateDomainObjectScript.java:36
>>
>> net.lautus.gls.tools.validation.ValidateDomainObjectScript.run(DatabaseSession,
>> Object) ValidateDomainObjectScript.java:13
>>
>> org.rdb.util.validator.internal.geode.GeodeValidationRunnable.validateInstance(Object,
>> InstanceScript, DatabaseSession) GeodeValidationRunnable.java:100
>>
>> org.rdb.util.validator.internal.geode.GeodeValidationRunnable.operation(TransactionStrategy,
>> OrderedObject) GeodeValidationRunnable.java:84
>>
>> org.rdb.util.validator.internal.geode.GeodeValidationRunnable.operation(TransactionStrategy,
>> Object) GeodeValidationRunnable.java:22
>>   org.rdb.util.finder.WorkerRunnable.execute() WorkerRunnable.java:39
>>   org.rdb.util.finder.ThreadRunnable.run() ThreadRunnable.java:45
>>   java.lang.Thread.run() Thread.java:748
>>
>> My worry is this logic:
>> *ObjectGraphSizer.size*
>>
>>>
>>> Find the size of an object and all objects reachable from it using
>>> breadth first search. This
>>> method will include objects reachable from static fields
>>
>>
>> We have tried to use the size logic and we found that we have a lot of
>> connect graphs/objects and a root object that reported 19gig.
>>
>> Our objects have a lot of fields.
>>
>> While our objects do use ID's to other objects for one-to-one and
>> one-to-many objects we actually resolve these ID's and build up a tree in
>> memory
>> Account {
>> Bank bank
>>
>> }
>>
>> Transform for storage on disk as:
>> Account {
>> long bankId
>>
>> }
>>
>>
>> Read from disk:
>> Account {
>> long bankId
>>
>> }
>>  the on first access transform to:
>> Account {
>> Bank bank
>>
>> }
>>
>> This means that we could build up the whole connected tree in memory.
>>
>> I know Geode is not a Graph database or Object database and so we might
>> not be using it for the correct use case.....maybe that is our fundamental
>> problem.
>>
>> But even so....isn't this size check that is being performed during LRU
>> eviction shown in the stack trace a big calculation?
>> Is there a possibility to turn it off?
>> Is it trying to see all connected objects so that all of them can be
>> evicted?
>>
>> Some information on the environment:
>>
>> The database size on disk is around 47Gig
>>
>> The VM has 16 cores and and 102 gig memory
>>
>> VM settings
>>
>>     -agentpath:/home/r2d2/yourkit/bin/linux-x86-64/libyjpagent.so
>>     -javaagent:lib/aspectj/lib/aspectjweaver.jar
>>     -Dgemfire.EXPIRY_THREADS=16
>>     -Dgemfire.PREFER_SERIALIZED=false
>>     -Dgemfire.enable.network.partition.detection=false
>>     -Dgemfire.autopdx.ignoreConstructor=true
>>     -Dgemfire.ALLOW_PERSISTENT_TRANSACTIONS=true
>>     -Dgemfire.member-timeout=600000
>>     -Xms75g
>>     -Xmx75g
>>     -XX:+UseConcMarkSweepGC
>>     -XX:+UseParNewGC
>>     -XX:+CMSParallelRemarkEnabled
>>     -XX:+UseCMSInitiatingOccupancyOnly
>>     -XX:CMSInitiatingOccupancyFraction=70
>>     -XX:+DisableExplicitGC
>>     -XX:NewSize=21g
>>     -XX:MaxNewSize=21g
>>     -XX:+PrintGCDetails
>>     -XX:+PrintTenuringDistribution
>>     -XX:+PrintGCTimeStamps
>>     -XX:+PrintGCApplicationStoppedTime
>>     -verbose:gc
>>     -Xloggc:/home/r2d2/rdb-geode-server/gc/gc.log
>>     -Djava.rmi.server.hostname=localhost
>>     -Dcom.sun.management.jmxremote.port=9010
>>     -Dcom.sun.management.jmxremote.rmi.port=9010
>>     -Dcom.sun.management.jmxremote.local.only=false
>>     -Dcom.sun.management.jmxremote.authenticate=false
>>     -Dcom.sun.management.jmxremote.ssl=false
>>     -XX:+UseGCLogFileRotation
>>     -XX:NumberOfGCLogFiles=10
>>     -XX:GCLogFileSize=1M
>>
>> <!-- copy-on-read:
>> https://gemfire.docs.pivotal.io/geode/basic_config/data_entries_custom_classes/managing_data_entries.html--
>> >
>>     <gfe:cache properties-ref="gemfire-props"
>> pdx-serializer-ref="pdxSerializer" pdx-persistent="true"
>>                pdx-disk-store="pdx-disk-store"
>> eviction-heap-percentage="80" critical-heap-percentage="90"
>>                id="gemfireCache" copy-on-read="false"
>> enable-auto-reconnect="true">
>>
>>     </gfe:cache>
>>
>>
>> Kindly
>> Pieter
>>
>>
>
>

Reply via email to