The PartitionRegionHelper.getLocalPrimaryData does not apply in this case because it is a replicate region.
I would guess that the "frozen thread" is not interesting. It looks like just a thread waiting to read from the network. Geode can have many threads waiting for network messages and if the other side never sends one then it could appear to be frozen. You can configure your own implementation of org.apache.geode.cache.util.ObjectSizer on your region (you should have an attribute for it on gfe:replicated-region; I'm not sure what it is named but look for one with "sizer" in its name). You could try a really simple ObjectSizer (just have it return 1024 for example) and see if it takes care of this performance problem. I think what is happening in this case is that your data is stored in the cache in serialized form. When your function calls "get" it needs to deserialize the data and since it is an LRU geode calculates the new size of the data since the deserialized form can be different than the serialized form. When a function does a scan like this then it causes all the data in the server it does the scan in to now be stored deserialized. If your values were serialized with PDX and on your cache you set "pdx-read-serialized=true" then doing the get will not change the form it is stored in. It will cause the get to return a PdxInstance but you can then call "getObject" on it and get your domain class. Note that if your get was done from a different member then it would not deserialize the data on the server. Instead the server processes the remote get, finds the serialized form stored on that server, and sends it back to the client. The client can then deserialize it on the client. If most of your reads will be done remotely then you want your data stored on the server in serialized form. Otherwise each remote read had to serialize the data to send it back. But if most of your reads are done locally (for example from a function) then it can be optimal to have it stored in deserialized form if that is what the local read ends up needing. On Thu, Aug 16, 2018 at 10:28 AM Anthony Baker <[email protected]> wrote: > Hi Pieter! Just to double-check, do you have any GC issues? How big are > your “big” objects? What serialization approach are you using (Java / > DataSerializable / PDX)? > > Anthony > > > On Aug 16, 2018, at 7:09 AM, Michael Stolz <[email protected]> wrote: > > One thing to make sure of is that the function is only accessing data that > is local to each of the nodes where it is running. > To do this you must do something like this: > Region<String, String> localPrimaryData = > PartitionRegionHelper.getLocalPrimaryData(exampleRegion); > Then you can iterate over the entries in this local Region. > > -- > Mike Stolz > Principal Engineer, GemFire Product Lead > Mobile: +1-631-835-4771 > Download the GemFire book here. > <https://content.pivotal.io/ebooks/scaling-data-services-with-pivotal-gemfire> > > On Thu, Aug 16, 2018 at 4:37 AM, Pieter van Zyl <[email protected] > > wrote: > >> Good morning. >> >> We are busy with a prototype to evaluate the use of Geode in our company. >> Now we are trying to go through all our regions to perform some form of >> validations. We are using a function to perform the validation. >> >> While iterating through the regions it seem to slow down dramatically. >> >> The total database has about 98 million objects. We fly through about 24 >> million in 1.5 minutes. >> >> Then we hit certain objects in a Region that are large and eveything >> slows down. We then process about 10 000 entries every 1.5 hours. >> We needed to set the server and locator timeouts so that we don't get >> kicked off. >> >> The objects can be quit large. >> >> Using YourKit I can see the following: >> >> ValidationThread0 Runnable CPU usage on sample: 1s >> it.unimi.dsi.fastutil.objects.ReferenceOpenHashSet.rehash(int) >> ReferenceOpenHashSet.java:578 >> it.unimi.dsi.fastutil.objects.ReferenceOpenHashSet.add(Object) >> ReferenceOpenHashSet.java:279 >> org.apache.geode.internal.size.ObjectTraverser$VisitStack.add(Object, >> Object) ObjectTraverser.java:159 >> org.apache.geode.internal.size.ObjectTraverser.doSearch(Object, >> ObjectTraverser$VisitStack) ObjectTraverser.java:83 >> >> org.apache.geode.internal.size.ObjectTraverser.breadthFirstSearch(Object, >> ObjectTraverser$Visitor, boolean) ObjectTraverser.java:50 >> >> *org.apache.geode.internal.size.ObjectGraphSizer.size(Object, >> ObjectGraphSizer$ObjectFilter, boolean) ObjectGraphSizer.java:98 >> org.apache.geode.internal.size.ReflectionObjectSizer.sizeof(Object) >> ReflectionObjectSizer.java:66* >> org.apache.geode.internal.size.SizeClassOnceObjectSizer.sizeof(Object) >> SizeClassOnceObjectSizer.java:60 >> >> org.apache.geode.internal.cache.eviction.SizeLRUController.sizeof(Object) >> SizeLRUController.java:68 >> >> org.apache.geode.internal.cache.eviction.HeapLRUController.entrySize(Object, >> Object) HeapLRUController.java:92 >> >> org.apache.geode.internal.cache.entries.VersionedStatsDiskLRURegionEntryHeapLongKey.updateEntrySize(EvictionController, >> Object) VersionedStatsDiskLRURegionEntryHeapLongKey.java:207 >> >> org.apache.geode.internal.cache.VMLRURegionMap.beginChangeValueForm(EvictableEntry, >> CachedDeserializable, Object) VMLRURegionMap.java:178 >> >> org.apache.geode.internal.cache.VMCachedDeserializable.getDeserializedValue(Region, >> RegionEntry) VMCachedDeserializable.java:119 >> >> org.apache.geode.internal.cache.LocalRegion.getDeserialized(RegionEntry, >> boolean, boolean, boolean, boolean) LocalRegion.java:1293 >> >> org.apache.geode.internal.cache.LocalRegion.getDeserializedValue(RegionEntry, >> KeyInfo, boolean, boolean, boolean, EntryEventImpl, boolean, boolean) >> LocalRegion.java:1232 >> >> org.apache.geode.internal.cache.LocalRegionDataView.getDeserializedValue(KeyInfo, >> LocalRegion, boolean, boolean, boolean, EntryEventImpl, boolean, boolean) >> LocalRegionDataView.java:43 >> org.apache.geode.internal.cache.LocalRegion.get(Object, Object, >> boolean, boolean, boolean, ClientProxyMembershipID, EntryEventImpl, >> boolean, boolean, boolean) LocalRegion.java:1384 >> org.apache.geode.internal.cache.LocalRegion.get(Object, Object, >> boolean, boolean, boolean, ClientProxyMembershipID, EntryEventImpl, >> boolean) LocalRegion.java:1334 >> org.apache.geode.internal.cache.LocalRegion.get(Object, Object, >> boolean, EntryEventImpl) LocalRegion.java:1319 >> org.apache.geode.internal.cache.AbstractRegion.get(Object) >> AbstractRegion.java:408 >> org.rdb.geode.session.GeodeDatabaseSessionObject.lazyLoadField(String) >> GeodeDatabaseSessionObject.java:240 >> >> net.lautus.gls.domain.life.accounting.AccountingTransaction.lazyLoadField(String) >> AccountingTransaction.java:1 >> org.rdb.internal.aspect.PersistenceAspect.getField(JoinPoint, Object) >> PersistenceAspect.java:68 >> >> net.lautus.gls.domain.life.accounting.AccountingTransaction.thoroughValidate() >> AccountingTransaction.java:33 >> >> net.lautus.gls.tools.validation.ValidateDomainObjectScript.run(DatabaseSession, >> PersistentDomainObject) ValidateDomainObjectScript.java:36 >> >> net.lautus.gls.tools.validation.ValidateDomainObjectScript.run(DatabaseSession, >> Object) ValidateDomainObjectScript.java:13 >> >> org.rdb.util.validator.internal.geode.GeodeValidationRunnable.validateInstance(Object, >> InstanceScript, DatabaseSession) GeodeValidationRunnable.java:100 >> >> org.rdb.util.validator.internal.geode.GeodeValidationRunnable.operation(TransactionStrategy, >> OrderedObject) GeodeValidationRunnable.java:84 >> >> org.rdb.util.validator.internal.geode.GeodeValidationRunnable.operation(TransactionStrategy, >> Object) GeodeValidationRunnable.java:22 >> org.rdb.util.finder.WorkerRunnable.execute() WorkerRunnable.java:39 >> org.rdb.util.finder.ThreadRunnable.run() ThreadRunnable.java:45 >> java.lang.Thread.run() Thread.java:748 >> >> My worry is this logic: >> *ObjectGraphSizer.size* >> >>> >>> Find the size of an object and all objects reachable from it using >>> breadth first search. This >>> method will include objects reachable from static fields >> >> >> We have tried to use the size logic and we found that we have a lot of >> connect graphs/objects and a root object that reported 19gig. >> >> Our objects have a lot of fields. >> >> While our objects do use ID's to other objects for one-to-one and >> one-to-many objects we actually resolve these ID's and build up a tree in >> memory >> Account { >> Bank bank >> >> } >> >> Transform for storage on disk as: >> Account { >> long bankId >> >> } >> >> >> Read from disk: >> Account { >> long bankId >> >> } >> the on first access transform to: >> Account { >> Bank bank >> >> } >> >> This means that we could build up the whole connected tree in memory. >> >> I know Geode is not a Graph database or Object database and so we might >> not be using it for the correct use case.....maybe that is our fundamental >> problem. >> >> But even so....isn't this size check that is being performed during LRU >> eviction shown in the stack trace a big calculation? >> Is there a possibility to turn it off? >> Is it trying to see all connected objects so that all of them can be >> evicted? >> >> Some information on the environment: >> >> The database size on disk is around 47Gig >> >> The VM has 16 cores and and 102 gig memory >> >> VM settings >> >> -agentpath:/home/r2d2/yourkit/bin/linux-x86-64/libyjpagent.so >> -javaagent:lib/aspectj/lib/aspectjweaver.jar >> -Dgemfire.EXPIRY_THREADS=16 >> -Dgemfire.PREFER_SERIALIZED=false >> -Dgemfire.enable.network.partition.detection=false >> -Dgemfire.autopdx.ignoreConstructor=true >> -Dgemfire.ALLOW_PERSISTENT_TRANSACTIONS=true >> -Dgemfire.member-timeout=600000 >> -Xms75g >> -Xmx75g >> -XX:+UseConcMarkSweepGC >> -XX:+UseParNewGC >> -XX:+CMSParallelRemarkEnabled >> -XX:+UseCMSInitiatingOccupancyOnly >> -XX:CMSInitiatingOccupancyFraction=70 >> -XX:+DisableExplicitGC >> -XX:NewSize=21g >> -XX:MaxNewSize=21g >> -XX:+PrintGCDetails >> -XX:+PrintTenuringDistribution >> -XX:+PrintGCTimeStamps >> -XX:+PrintGCApplicationStoppedTime >> -verbose:gc >> -Xloggc:/home/r2d2/rdb-geode-server/gc/gc.log >> -Djava.rmi.server.hostname=localhost >> -Dcom.sun.management.jmxremote.port=9010 >> -Dcom.sun.management.jmxremote.rmi.port=9010 >> -Dcom.sun.management.jmxremote.local.only=false >> -Dcom.sun.management.jmxremote.authenticate=false >> -Dcom.sun.management.jmxremote.ssl=false >> -XX:+UseGCLogFileRotation >> -XX:NumberOfGCLogFiles=10 >> -XX:GCLogFileSize=1M >> >> <!-- copy-on-read: >> https://gemfire.docs.pivotal.io/geode/basic_config/data_entries_custom_classes/managing_data_entries.html-- >> > >> <gfe:cache properties-ref="gemfire-props" >> pdx-serializer-ref="pdxSerializer" pdx-persistent="true" >> pdx-disk-store="pdx-disk-store" >> eviction-heap-percentage="80" critical-heap-percentage="90" >> id="gemfireCache" copy-on-read="false" >> enable-auto-reconnect="true"> >> >> </gfe:cache> >> >> >> Kindly >> Pieter >> >> > >
