Re: Apache Ignite 3.0.0 beta 1 RELEASE [Time, Scope, Manager]

2022-11-02 Thread Igor Sapego
+1 from me

Best Regards,
Igor


On Wed, Nov 2, 2022 at 3:48 AM Stanislav Lukyanov 
wrote:

> Igniters,
>
> The initial code freeze date for 3.0.0 beta 1 was missed, so we need to
> pick a new timeline.
>
> There are currently 5 tickets in progress or in review that are in the
> scope, with significant progress in each of them.
>
> Let's set the following dates:
>
> Scope Freeze: October 12, 2022
> Code Freeze: November 4, 2022
> Voting Date: November 7, 2022
> Release Date: November 11, 2022
>
> WDYT?
>
> Thanks,
> Stan
>
> > On 13 Oct 2022, at 16:10, Andrey Gura  wrote:
> >
> > Igniters,
> >
> > I removed the "3.0.0-alpha6" version and created the "3.0.0-beta1"
> > version. All issues were rescheduled to the "3.0.0-beta1" version.
> >
> > Despite the fact that formally was on 12th october there is still
> > possibility to add issues to the release scope. Deadline is 14th
> > october. Tomorrow I will announce the scope freeze officially. It
> > means that any issue could be added to the release only after
> > discussion with the community and a release manager.
> >
> > On Mon, Oct 10, 2022 at 7:41 AM Aleksandr Pakhomov 
> wrote:
> >>
> >> +1
> >>
> >>> On 7 Oct 2022, at 23:05, Andrey Gura  wrote:
> >>>
> >>> Hi, Igniters!
> >>>
> >>> It's time for a new release of Apache Ignite 3 beta 1. The expected
> >>> feature list consists of:
> >>>
> >>> - RPM and DEB packages: simplified installation and node management
> >>> with system services.
> >>> - Client's Partition Awareness: Clients are now aware of data
> >>> distribution over the cluster nodes which helps avoid additional
> >>> network transmissions and lowers operations latency.
> >>> - C++ client:  Basic C++ client, able to perform operations on data.
> >>> - Autogenerated values: now a function can be specified as a default
> >>> value generator during a table creation. Currently only
> >>> gen_random_uuid is supported.
> >>> - SQL Transactions.
> >>> - Transactional Protocol: improved locking model, multi-version based
> >>> lock-free read-only transactions.
> >>> - Storage: A number of improvements to memory-only and on-disk engines
> >>> based on Page Memory.
> >>> - Indexes: Basic functionality, hash and sorted indexes.
> >>> - Client logging: A LoggerFactory may be provided during client
> >>> creation to specify a custom logger for logs generated by the client.
> >>> - Metrics framework: Collection and export of cluster metrics.
> >>>
> >>> I want to propose myself to be the release manager of the Apache
> >>> Ignite 3 beta 1.
> >>>
> >>> Also I propose the following milestones for the release:
> >>>
> >>> Scope Freeze: October 12, 2022
> >>> Code Freeze: October 20, 2022
> >>> Voting Date: October 31, 2022
> >>> Release Date: November 5, 2022
> >>>
> >>> WDYT?
> >>
>
>


Re: [DISCUSSION] Add DataStreamer's default per-node-batch-setting for PDS.

2022-11-02 Thread Vladimir Steshin

    Hi, Stan.


    Thank you for the answer.

>>>  "your data streamer queue size is something like"
You are right about writes queue on primary node. It has just some fixed 
size. But based on number of the CPUs. (x8). Even for my laptop I get 
16x8=128 batches. I wonder why so much by default for persistence.


>>> "Can you check the heap dump in your tests to see what actually 
occupies most of the heap?"
The backup nodes collect `GridDhtAtomicSingleUpdateRequest`with key/data 
`byte[]`. That's where we don't wait for in this case.


    I thought we might slightly adjust the default setting at least to 
make simple test more reliable. As a user, I wouldn't like that I just 
take a tool/product just to try/research and it fails quick. But yes, 
user still has the related setting `perNodeParallelOperations()`


WDYT?

30.10.2022 21:24, Stanislav Lukyanov пишет:

Hi Vladimir,

I think this is potentially an issue but I don't think this is about PDS at all.

The description is a bit vague, I have to say. AFAIU what you see is that when 
the caches are persistent the streamer writes data faster than the nodes 
(especially, backup nodes) process the writes.
Therefore, the nodes accumulate the writes in the queues, the queues grow, and 
then you might go OOM.

The solution to just have lesser queues when there is persistent (and therefore 
it's more likely the queues will reach the max size) is not the best one, in my 
opinion.
If the default max queue size is too large, it should be less always, 
regardless of why the queues grow.

Furthermore, I have a feeling that what gives you OOM isn't the data streamer 
queue... AFAIR your data streamer queue size is something like (entrySize * 
bufferSize * perNodeParallelOperations),
which for 1 kb entries and 16 threads gives (1kb * 512 * 16 * 8) = 64mb which 
is usually peanuts for server Java.

Can you check the heap dump in your tests to see what actually occupies most of 
the heap?

Thanks,
Stan


On 28 Oct 2022, at 11:54, Vladimir Steshin  wrote:

 Hi Folks,

 I found that Datastreamer may consume heap or use increased heap amount 
when loading into a persistent cache.
This may happen with streamer's 'allowOverwite'==true and the cache is in 
PRIMARY_SYNC mode.

 What I don't like here is that the case looks simple. Not the defaults, 
but user might meet the issue just in a trival test, trying/researching the 
streamer.

 Streamer has related 'perNodeParallelOperations()' which helps. But 
addinional DFLT_PARALLEL_PERSISTENT_OPS_MULTIPLIER might be set for PDS.

 My question are:
1) Is it an issue at all? Need to fix? A minor?
2) Should we bring additional default DFLT_PARALLEL_PERSISTENT_OPS_MULTIPLIER 
for PDS because it reduces heap consumption?
3) Better solution is backpressure. But does it worth the case?

Ticket:https://issues.apache.org/jira/browse/IGNITE-17735
PR:https://github.com/apache/ignite/pull/10343