Hi Arseny, Have you success with reproducing the issue and getting stacktrace? Do you observe same behavior on OracleJDK?
On Tue, Dec 26, 2017 at 2:43 PM, Andrey Mashenkov < [email protected]> wrote: > Hi Arseny, > > This looks like a known issues that is unresolved yet [1], > but we can't sure it is same issue as there is no stacktrace in logs > attached. > > > [1] https://issues.apache.org/jira/browse/IGNITE-7278 > > On Tue, Dec 26, 2017 at 12:54 PM, Arseny Kovalchuk < > [email protected]> wrote: > >> Hi guys. >> >> We've successfully tested Ignite as in-memory solution, it showed >> acceptable performance. But we cannot get stable work of Ignite cluster >> with native persistence enabled. Our first error we've got is Segmentation >> fault (JVM crash) while memory restoring on start. >> >> [2017-12-22 11:11:51,992] INFO [exchange-worker-#46%ignite-instance-0%] >> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager: >> - Read checkpoint status [startMarker=/ignite-work-dire >> ctory/db/ignite_instance_0/cp/1513938154201-8c574131-763d- >> 4cfa-99b6-0ce0321d61ab-START.bin, endMarker=/ignite-work-directo >> ry/db/ignite_instance_0/cp/1513932413840-55ea1713-8e9e- >> 44cd-b51a-fcad8fb94de1-END.bin] >> [2017-12-22 11:11:51,993] INFO [exchange-worker-#46%ignite-instance-0%] >> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager: >> - Checking memory state [lastValidPos=FileWALPointer [idx=391, >> fileOffset=220593830, len=19573, forceFlush=false], >> lastMarked=FileWALPointer [idx=394, fileOffset=38532201, len=19573, >> forceFlush=false], lastCheckpointId=8c574131-763d-4cfa-99b6-0ce0321d61ab] >> [2017-12-22 11:11:51,993] WARN [exchange-worker-#46%ignite-instance-0%] >> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager: >> - Ignite node stopped in the middle of checkpoint. Will restore memory >> state and finish checkpoint on node start. >> [CodeBlob (0x00007f9b58f24110)] >> Framesize: 0 >> BufferBlob (0x00007f9b58f24110) used for StubRoutines (2) >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (sharedRuntime.cpp:842), pid=221, tid=0x00007f9b473c1ae8 >> # fatal error: exception happened outside interpreter, nmethods and >> vtable stubs at pc 0x00007f9b58f248f6 >> # >> # JRE version: OpenJDK Runtime Environment (8.0_151-b12) (build >> 1.8.0_151-b12) >> # Java VM: OpenJDK 64-Bit Server VM (25.151-b12 mixed mode linux-amd64 >> compressed oops) >> # Derivative: IcedTea 3.6.0 >> # Distribution: Custom build (Tue Nov 21 11:22:36 GMT 2017) >> # Core dump written. Default location: /opt/ignite/core or core.221 >> # >> # An error report file with more information is saved as: >> # /ignite-work-directory/core_dump_221.log >> # >> # If you would like to submit a bug report, please include >> # instructions on how to reproduce the bug and visit: >> # http://icedtea.classpath.org/bugzilla >> # >> >> >> >> Please find logs and configs attached. >> >> We deploy Ignite along with our services in Kubernetes (v 1.8) on >> premises. Ignite cluster is a StatefulSet of 5 Pods (5 instances) of Ignite >> version 2.3. Each Pod mounts PersistentVolume backed by CEPH RBD. >> >> We put about 230 events/second into Ignite, 70% of events are ~200KB in >> size and 30% are 5000KB. Smaller events have indexed fields and we query >> them via SQL. >> >> The cluster is activated from a client node which also streams events >> into Ignite from Kafka. We use custom implementation of streamer which uses >> cache.putAll() API. >> >> We got the error when we stopped and restarted cluster again. It happened >> only on one instance. >> >> The general question is: >> >> *Is it possible to tune up (or implement) native persistence in a way >> when it just reports about error in data or corrupted data, then skip it >> and continue to work without that corrupted part. Thus it will make the >> cluster to continue operating regardless of errors on storage?* >> >> >> >> Arseny Kovalchuk >> >> Senior Software Engineer at Synesis >> skype: arseny.kovalchuk >> mobile: +375 (29) 666-16-16 >> LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en> >> > > > > -- > Best regards, > Andrey V. Mashenkov > -- Best regards, Andrey V. Mashenkov
