Please share the whole log file. It might be the case that something goes
wrong with volumes you attached to Ignite pods.

-
Denis


On Thu, Aug 22, 2019 at 8:07 AM Shiva Kumar <shivakumar....@gmail.com>
wrote:

> Hi Denis,
>
> Thanks for your response,
> yes in our test also we have seen OOM errors and pod crash.
> so we will follow the recommendation for RAM requirements and also I was
> checking to ignite documentation on disk space required for WAL + WAL
> archive.
> here in this link
> https://apacheignite.readme.io/docs/write-ahead-log#section-wal-archive
>
> it says: archive size is defined as 4 times the size of the checkpointing
> buffer and checkpointing buffer is a function of the data region (
> https://apacheignite.readme.io/docs/durable-memory-tuning#section-checkpointing-buffer-size
> )
>
> but in this link
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood#IgnitePersistentStore-underthehood-SubfoldersGeneration
>
> under *Estimating disk space* section it explains something to estimate
> disk space required for WAL but it is not clear, can you please help me the
> correct recommendation for calculating the disk space required for WAL+WAL
> archive.
>
> In one of my testing, I configured 4GB for data region and 10GB for
> WAL+WAL archive but our pods crashing as disk mounted for WAL+WAL archive
> runs out of space.
>
> [ignite@ignite-cluster-ignite-node-2 ignite]$* df -h*
> Filesystem      Size  Used Avail Use% Mounted on
> overlay         158G   39G  112G  26% /
> tmpfs            63G     0   63G   0% /dev
> tmpfs            63G     0   63G   0% /sys/fs/cgroup
> /dev/vda1       158G   39G  112G  26% /etc/hosts
> shm              64M     0   64M   0% /dev/shm
> */dev/vdq        9.8G  9.7G   44M 100% /opt/ignite/wal*
> /dev/vdr         50G  1.4G   48G   3% /opt/ignite/persistence
> tmpfs            63G   12K   63G   1% /run/secrets/
> kubernetes.io/serviceaccount
> tmpfs            63G     0   63G   0% /proc/acpi
> tmpfs            63G     0   63G   0% /proc/scsi
> tmpfs            63G     0   63G   0% /sys/firmware
>
>
> and this is the error message in ignite node:
>
> "ERROR","JVM will be halted immediately due to the failure:
> [failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION, err=class
> o.a.i.IgniteCheckedException: Failed to archive WAL segment
> [srcFile=/opt/ignite/wal/node00-37ea8ba6-3198-46a1-9e9e-38aff27ed9c9/0000000000000006.wal,
> dstFile=/opt/ignite/wal/archive/node00-37ea8ba6-3198-46a1-9e9e-38aff27ed9c9/0000000000000236.wal.tmp]]]"
>
>
> On Thu, Aug 22, 2019 at 8:04 PM Denis Mekhanikov <dmekhani...@gmail.com>
> wrote:
>
>> Shivakumar,
>>
>> Such allocation doesn’t allow full memory utilization, so it’s possible,
>> that nodes will crash because of out of memory errors.
>> So, it’s better to follow the given recommendation.
>>
>> If you want us to investigate reasons of the failures, please provide
>> logs and configuration of the failed nodes.
>>
>> Denis
>> On 21 Aug 2019, 16:17 +0300, Shiva Kumar <shivakumar....@gmail.com>,
>> wrote:
>>
>> Hi all,
>> we are testing field use case before deploying in the field and we want
>> to know whether below resource limits are suitable in production.
>> There are 3 nodes (3 pods on kubernetes) running. Each having below
>> configuration
>>
>>                            DefaultDataRegion: 60GB
>>                                                 JVM: 32GB
>> Resource allocated for each container: 64GB
>>
>> And ignite documents says (JVM+ All DataRegion) should not exceed 70% of
>> total RAM allocated to each node(container).
>> but we started testing with the above configuration and up to 9 days
>> ignite cluster was running successfully and there was some data ingestion
>> but suddenly pods crashed and they were unable to recover from the crash.
>> does the above resource configuration not good for node recovery??
>>
>>

Reply via email to