Re: [VOTE] Release Apache Ignite 3.0.0-beta1 RC2

2022-11-15 Thread Vladislav Pyatkov
+1

On Tue, Nov 15, 2022 at 3:35 PM Denis C  wrote:
>
> +1
>
> вт, 15 нояб. 2022 г. в 13:33, Alexander Lapin :
>
> > +1
> >
> > вт, 15 нояб. 2022 г. в 08:48, Pavel Tupitsyn :
> >
> > > +1 (binding)
> > >
> > > On Mon, Nov 14, 2022 at 9:05 PM Вячеслав Коптилин <
> > > slava.kopti...@gmail.com>
> > > wrote:
> > >
> > > > Dear Community,
> > > >
> > > > Ignite 3 is moving forward and I think we're in a good spot to release
> > > the
> > > > first beta version. In the last few months the following major features
> > > > have been added:
> > > > - RPM and DEB packages: simplified installation and node management
> > with
> > > > system services.
> > > > - Client's Partition Awareness: Clients are now aware of data
> > > distribution
> > > > over the cluster nodes which helps avoid additional network
> > transmissions
> > > > and lowers operations latency.
> > > > - C++ client:  Basic C++ client, able to perform operations on data.
> > > > - Autogenerated values: now a function can be specified as a default
> > > value
> > > > generator during a table creation. Currently only gen_random_uuid is
> > > > supported.
> > > > - SQL Transactions.
> > > > - Transactional Protocol: improved locking model, multi-version based
> > > > lock-free read-only transactions.
> > > > - Storage: A number of improvements to memory-only and on-disk engines
> > > > based on Page Memory.
> > > > - Indexes: Basic functionality, hash and sorted indexes.
> > > > - Client logging: A LoggerFactory may be provided during client
> > creation
> > > to
> > > > specify a custom logger for logs generated by the client.
> > > > - Metrics framework: Collection and export of cluster metrics.
> > > >
> > > > I propose to release 3.0.0-beta1 with the features listed above.
> > > >
> > > > Release Candidate:
> > > > https://dist.apache.org/repos/dist/dev/ignite/3.0.0-beta1-rc2/
> > > > Maven Staging:
> > > >
> > https://repository.apache.org/content/repositories/orgapacheignite-1556/
> > > > Tag: https://github.com/apache/ignite-3/tree/3.0.0-beta1-rc2
> > > >
> > > > +1 - accept Apache Ignite 3.0.0-beta1 RC2
> > > >  0 - don't care either way
> > > > -1 - DO NOT accept Apache Ignite 3.0.0-beta1 RC2 (explain why)
> > > >
> > > > Voting guidelines: https://www.apache.org/foundation/voting.html
> > > > How to verify the release:
> > https://www.apache.org/info/verification.html
> > > >
> > > > The vote will be closed on Wednesday, 16 November 2022, 18:00:00 (UTC
> > > time)
> > > >
> > > >
> > >
> > https://www.timeanddate.com/countdown/generic?iso=20221116T18=1440=Apache+Ignite+3.0.0-beta1+RC2=cursive=1
> > > >
> > > > Thanks,
> > > > S.
> > > >
> > >
> >



-- 
Vladislav Pyatkov


Re: [ANNOUNCE] SCOPE FREEZE for Apache Ignite 3.0.0 beta 1 RELEASE

2022-10-28 Thread Vladislav Pyatkov
ttps://issues.apache.org/jira/browse/IGNITE-17816
> >> >>>>
> >> >>>> So, I propose adding them into the release scope.
> >> >>>>
> >> >>>> ср, 19 окт. 2022 г. в 15:53, Вячеслав Коптилин <
> >> >> slava.kopti...@gmail.com>:
> >> >>>>
> >> >>>>> Hi Yuriy,
> >> >>>>>
> >> >>>>> I agree, let's add them to the scope.
> >> >>>>>
> >> >>>>> Thanks,
> >> >>>>> S.
> >> >>>>>
> >> >>>>>
> >> >>>>> ср, 19 окт. 2022 г. в 15:20, Юрий :
> >> >>>>>
> >> >>>>>> Dear Release managers and Igniters,
> >> >>>>>>
> >> >>>>>> I would like to add the following tickets to Ignite 3.0.0 beta1:
> >> >>>>>>
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-17820 - improvement
> >> SQL
> >> >>>> and
> >> >>>>>> required for the next ticket
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-17748 - related to
> >> >>>> support
> >> >>>>> of
> >> >>>>>> indexes
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-17612 - fix issue
> >> when
> >> >>>> some
> >> >>>>>> queries couldn't be done.
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-17330 - support RO
> >> >>>>>> transaction
> >> >>>>>> by SQL
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-17859 - index
> filling
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-17813 - related to
> >> >>>> support
> >> >>>>>> indexes by SQL
> >> >>>>>> https://issues.apache.org/jira/browse/IGNITE-17655 - related to
> >> >>>> support
> >> >>>>>> indexes by SQL
> >> >>>>>>
> >> >>>>>> ср, 19 окт. 2022 г. в 12:11, Вячеслав Коптилин <
> >> >>>> slava.kopti...@gmail.com
> >> >>>>>> :
> >> >>>>>>
> >> >>>>>>> Hello Alexander,
> >> >>>>>>>
> >> >>>>>>> Thank you for pointing this out. I fully support including RO
> >> >>>>>> transactions
> >> >>>>>>> into the scope of Ignite 3.0.0-beta1 release.
> >> >>>>>>>
> >> >>>>>>> Thanks,
> >> >>>>>>> S.
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> ср, 19 окт. 2022 г. в 11:42, Alexander Lapin <
> lapin1...@gmail.com
> >> >:
> >> >>>>>>>
> >> >>>>>>>> Igniters,
> >> >>>>>>>>
> >> >>>>>>>> I would like to add following tickets to ignite-3.0.0-beta1
> >> >>>>>>>> https://issues.apache.org/jira/browse/IGNITE-17806
> >> >>>>>>>> https://issues.apache.org/jira/browse/IGNITE-17759
> >> >>>>>>>> https://issues.apache.org/jira/browse/IGNITE-17637
> >> >>>>>>>> https://issues.apache.org/jira/browse/IGNITE-17263
> >> >>>>>>>> https://issues.apache.org/jira/browse/IGNITE-17260
> >> >>>>>>>>
> >> >>>>>>>> It's all about read-only transactions.
> >> >>>>>>>>
> >> >>>>>>>> Best regards,
> >> >>>>>>>> Alexander
> >> >>>>>>>>
> >> >>>>>>>> пт, 14 окт. 2022 г. в 19:45, Andrey Gura :
> >> >>>>>>>>
> >> >>>>>>>>> Igniters,
> >> >>>>>>>>>
> >> >>>>>>>>> The 'ignite-3.0.0-beta1' branch was created (the latest commit
> >> is
> >> >>>>>>>>> 8160ef31ecf8d49f227562b6f0ab090c6b4438c1).
> >> >>>>>>>>>
> >> >>>>>>>>> The scope for the release is frozen.
> >> >>>>>>>>>
> >> >>>>>>>>> It means the following:
> >> >>>>>>>>>
> >> >>>>>>>>> - Any issue could be added to the release (fixVersion ==
> >> >>>>> 3.0.0-beta1)
> >> >>>>>>>>> only after discussion with the community and a release manager
> >> in
> >> >>>>>> this
> >> >>>>>>>>> thread.
> >> >>>>>>>>> - Any commit to the release branch must be also applied to the
> >> >>>>> 'main'
> >> >>>>>>>>> branch.
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> --
> >> >>>>>> Живи с улыбкой! :D
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> Живи с улыбкой! :D
> >> >>>>
> >> >>
> >> >>
> >>
> >>
>


-- 
Vladislav Pyatkov
Architect-Consultant "GridGain Rus" Llc.
+7-929-537-79-60


Re: [VOTE] @Nullable/@NotNull annotation usage in Ignite 3

2022-01-13 Thread Vladislav Pyatkov
I am sure the shorter, the better.
+1 to option 4.

On Thu, Jan 13, 2022 at 2:06 PM ткаленко кирилл 
wrote:

> +1 to option 2
>


-- 
Vladislav Pyatkov


Re: [DISCUSSION] Error handling in Ignite 3

2021-04-15 Thread Vladislav Pyatkov
eptions).
> >
>
> Nested exceptions are not forbidden to use. They can provide additional
> details on the error for debug purposes, but not strictly required, because
> error code + message should provide enough information to the user.
>
>
> >- For async methods returning a Future we may have a universal rule on
> >how to handle exceptions. For example, we may specify that any async
> > method
> >can throw only invalid argument exceptions. All other errors are
> > reported
> >via the exceptionally(IgniteException -> {}) callback even if the
> async
> >method was executed synchronously.
> >
>
> This is ok to me.
>
>
> >
> >
> > вт, 13 апр. 2021 г. в 12:08, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com
> > >:
> >
> > > Igniters,
> > >
> > > I would like to start the discussion about error handling in Ignite 3
> and
> > > how we can improve it compared to Ignite 2.
> > >
> > > The error handling in Ignite 2 was not very good because of generic
> > > CacheException thrown on almost any occasion, having deeply nested root
> > > cause and often containing no useful information on further steps to
> fix
> > > the issue.
> > >
> > > I aim to fix it by introducing some rules on error handling.
> > >
> > > *Public exception structure.*
> > >
> > > A public exception must have an error code, a cause, and an action.
> > >
> > > * The code - the combination of 2 byte scope id and 2 byte error number
> > > within the module. This allows up to 2^16 errors for each scope, which
> > > should be enough. The error code string representation can look like
> > > RFT-0001 or TBL-0001
> > > * The cause - short string description of an issue, readable by user.
> > This
> > > can have dynamic parameters depending on the error type for better user
> > > experience, like "Can't write a snapshot, no space left on device {0}"
> > > * The action - steps for a user to resolve error situation described in
> > the
> > > documentation in the corresponding error section, for example "Clean up
> > > disk space and retry the operation".
> > >
> > > Common errors should have their own scope, for example IGN-0001
> > >
> > > All public methods throw only unchecked
> > > org.apache.ignite.lang.IgniteException containing aforementioned
> fields.
> > > Each public method must have a section in the javadoc with a list of
> all
> > > possible error codes for this method.
> > >
> > > A good example with similar structure can be found here [1]
> > >
> > > *Async timeouts.*
> > >
> > > Because almost all API methods in Ignite 3 are async, they all will
> have
> > a
> > > configurable default timeout and can complete with timeout error if a
> > > computation is not finished in time, for example if a response has not
> > been
> > > yet received.
> > > I suggest to complete the async op future with TimeoutException in this
> > > case to make it on par with synchronous execution using future.get,
> which
> > > will throw java.util.concurrent.TimeoutException on timeout.
> > > For reference, see java.util.concurrent.CompletableFuture#orTimeout
> > > No special error code should be used for this scenario.
> > >
> > > *Internal exceptions hierarchy.*
> > >
> > > All internal exceptions should extend
> > > org.apache.ignite.internal.lang.IgniteInternalException for checked
> > > exceptions and
> > > org.apache.ignite.internal.lang.IgniteInternalCheckedException for
> > > unchecked exceptions.
> > >
> > > Thoughts ?
> > >
> > > [1] https://docs.oracle.com/cd/B10501_01/server.920/a96525/preface.htm
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> > >
> >
> >
> > --
> > Best regards,
> > Alexey
> >
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>


-- 
Vladislav Pyatkov


Re: Access to Apache Ignite Wiki

2021-04-12 Thread Vladislav Pyatkov
Hi Ilya,

Thanks for the note.
I created an account with the same login (v.pyatkov) in the Apache wiki.
Could you please gran me permissions?

On Mon, Apr 12, 2021 at 5:15 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> You need to create an account on https://cwiki.apache.org/ first.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 12 апр. 2021 г. в 11:44, Vladislav Pyatkov :
>
> > Hi, igniters!
> > Please grant me access to Ignite Wiki to help with editing and updating
> > info.
> > It is my JIRA account v.pyatkov
> >
> > --
> > Vladislav Pyatkov
> >
>


-- 
Vladislav Pyatkov


Access to Apache Ignite Wiki

2021-04-12 Thread Vladislav Pyatkov
Hi, igniters!
Please grant me access to Ignite Wiki to help with editing and updating
info.
It is my JIRA account v.pyatkov

-- 
Vladislav Pyatkov


[jira] [Created] (IGNITE-14476) Get rid of using storage implementation explicitly in ConfigurationRoot annotation

2021-04-05 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-14476:
--

 Summary: Get rid of using storage implementation explicitly in 
ConfigurationRoot annotation
 Key: IGNITE-14476
 URL: https://issues.apache.org/jira/browse/IGNITE-14476
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov
 Fix For: 3.0.0-alpha2


Today we are using generated schema classes in public API, but we don't want to 
provide an implementation in it. 
For example:
{code:java}
@ConfigurationRoot(rootName = "rest", storage = 
InMemoryConfigurationStorage.class)
public class RestConfigurationSchema {
...
{code}
The mention of InMemoryConfigurationStorage should be changed to a specific 
constant:
{code:java}
@ConfigurationRoot(rootName = "rest", storage = 
IgniteConsts.MEMORY_CONFIGURATION_STORAGE)
public class RestConfigurationSchema {
...
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14419) SPI suite hangs sporadically on private TC

2021-03-25 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-14419:
--

 Summary: SPI suite hangs sporadically on private TC
 Key: IGNITE-14419
 URL: https://issues.apache.org/jira/browse/IGNITE-14419
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


[SPI|https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_Spi?branch=%3Cdefault%3E=overview=builds]
 suite times out on master branch from time to time.

Hangs and successful runs happen without changes or with unrelated changes.

Logs from three last timeouts are attached.

>From analysis of last three timed out runs:
 # Suite hangs either on testJoinErrorMissedAddFinishedMessage2 or 
testClientConnectToCluster tests.

 # In both cases there are no obvious exceptions or assertions in logs, but 
internal components output warnings about hanging exchange: *Still waiting for 
initial partition map exchange*.

Most likely these hangs are caused by the same problem with hanging PME. Tests 
need to be investigated further.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14185) Synchronous checkpoints on several nodes greatly increase a latency of distributed transaction

2021-02-15 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-14185:
--

 Summary: Synchronous checkpoints on several nodes greatly increase 
a latency of distributed transaction
 Key: IGNITE-14185
 URL: https://issues.apache.org/jira/browse/IGNITE-14185
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


If we have several nodes where the checkpoints configured identical (with the 
same frequency), we can get a distributed lag of transaction processing. Even 
if anyone of them separately holds an exclusive lock significantly low time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14140) Checkpointer thread holds write lock too long

2021-02-08 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-14140:
--

 Summary: Checkpointer thread holds write lock too long
 Key: IGNITE-14140
 URL: https://issues.apache.org/jira/browse/IGNITE-14140
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Reporter: Vladislav Pyatkov
Assignee: Vladislav Pyatkov


Free lists flushing optimization can block db-checkpoint-thread when it got 
Write lock. It might block all transactions for several hundreds milliseconds.

{noformat}
"db-checkpoint-thread-#334%DPL_GRID%DplGridNodeName%" #667 daemon prio=5 
os_prio=0 tid=0x7e765c123800 nid=0xee0b8 runnable [0x7e767f535000] 
java.lang.Thread.State: RUNNABLE at sun.misc.Unsafe.getObjectVolatile(Native 
Method) at 
java.util.concurrent.atomic.AtomicReferenceArray.getRaw(AtomicReferenceArray.java:130)
 at 
java.util.concurrent.atomic.AtomicReferenceArray.get(AtomicReferenceArray.java:125)
 at 
org.apache.ignite.internal.processors.cache.persistence.freelist.AbstractFreeList.getBucketCache(AbstractFreeList.java:690)
 at 
org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.flushBucketsCache(PagesList.java:374)
 at 
org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.saveMetadata(PagesList.java:343)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:373)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.syncMetadata(GridCacheOffheapManager.java:336)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.syncMetadata(GridCacheOffheapManager.java:322)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.onMarkCheckpointBegin(GridCacheOffheapManager.java:247)
 at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointWorkflow.markCheckpointBegin(CheckpointWorkflow.java:281)
 at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.doCheckpoint(Checkpointer.java:388)
 at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.Checkpointer.body(Checkpointer.java:264)
 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119) 
at java.lang.Thread.run(Thread.java:748)
{noformat}

We can to reduce time into Write lock if switch off optimization before the 
lock will be gotten and enable it after the lock will be left off.
This image confirms that all time consume of storing the metadata cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14139) Incorrect initialize checkpoint-runner-cpu thread pool

2021-02-08 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-14139:
--

 Summary: Incorrect initialize checkpoint-runner-cpu thread pool
 Key: IGNITE-14139
 URL: https://issues.apache.org/jira/browse/IGNITE-14139
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov
Assignee: Vladislav Pyatkov


First initialization of checkpoint thread pool for CPU is incorrect.
Look at the constructor of {{CheckpointWorkflow}}:
At start, we initialize the pool:
{code:java}
this.checkpointCollectPagesInfoPool = initializeCheckpointPool();
{code}
and only after, we set a size of the pool:
{code:java}
this.checkpointCollectInfoThreads = checkpointCollectInfoThreads;
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-14138) Historical rebalance kills cluster

2021-02-08 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-14138:
--

 Summary: Historical rebalance kills cluster
 Key: IGNITE-14138
 URL: https://issues.apache.org/jira/browse/IGNITE-14138
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


{noformat}
[2021-01-12T05:11:02,142][ERROR][rebalance-#508%---%][] Critical system error 
detected. Will be handled accordingly to configured handler 
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet 
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], 
failureCtx=FailureContext [type=CRITICAL_ERROR, err=class 
o.a.i.IgniteCheckedException: Failed to continue supplying [grp=SQL_USAGES_EPE, 
demander=48254935-7aa9-4ab5-b398-fdaec334fab7, topVer=AffinityTopologyVersion 
[topVer=3, minorTopVer=1
org.apache.ignite.IgniteCheckedException: Failed to continue supplying 
[grp=SQL_1, demander=48254935-7aa9-4ab5-b398-fdaec334fab7, 
topVer=AffinityTopologyVersion [topVer=3, minorTopVer=1]]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:571)
 [ignite-core.jar]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleDemandMessage(GridDhtPreloader.java:398)
 [ignite-core.jar]
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:489)
 [ignite-core.jar]
at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:474)
 [ignite-core.jar]
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142)
 [ignite-core.jar]
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
 [ignite-core.jar]
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$800(GridCacheIoManager.java:109)
 [ignite-core.jar]
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1707)
 [ignite-core.jar]
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1721)
 [ignite-core.jar]
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4300(GridIoManager.java:157)
 [ignite-core.jar]
at 
org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:3011)
 [ignite-core.jar]
at 
org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1662)
 [ignite-core.jar]
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4900(GridIoManager.java:157)
 [ignite-core.jar]
at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1629)
 [ignite-core.jar]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: org.apache.ignite.IgniteCheckedException: Could not find start 
pointer for partition [part=4, partCntrSince=1115]
at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointHistory.searchEarliestWalPointer(CheckpointHistory.java:557)
 ~[ignite-core.jar]
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.historicalIterator(GridCacheOffheapManager.java:1121)
 ~[ignite-core.jar]
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.rebalanceIterator(IgniteCacheOffheapManagerImpl.java:1195)
 ~[ignite-core.jar]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionSupplier.handleDemandMessage(GridDhtPartitionSupplier.java:322)
 ~[ignite-core.jar]
... 16 more
{noformat}
I believe that it should throw IgniteHistoricalIteratorException instead of 
IgniteCheckedException, so it can be properly handled and rebalance can move to 
the full rebalance instead of killing nodes



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re[2]: [DISCUSSION] Request for thread unsafe Compute functionality deprecation.

2021-02-05 Thread Vladislav Pyatkov
Hi Zhenya,

I don't understand your proposal without a patch.
I see that you want to mark as @Depricate three methods in IgniteCompute
interface, but what will you give instead of?

It seems to me, this interface can invoke some job without contains a class
locally.
How will this be supported in your mind?

On Thu, Jan 28, 2021 at 6:58 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> Please publish it. I don't see why not.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> чт, 28 янв. 2021 г. в 14:28, Zhenya Stanilovsky  >:
>
> >
> >
> > Hi Ilya , of course it contains in my PR (i don`t know if it shout be
> > published before this discussion will be finished).
> > Little changes from single thread into multiple, for example here [1]
> will
> > highlight a problem, or i can just publish my PR.
> >
> > [1]
> >
> https://github.com/apache/ignite/blob/master/modules/core/src/test/java/org/apache/ignite/internal/IgniteExplicitImplicitDeploymentSelfTest.java#L221
> >
> > >
> > >>
> > >>>Hello!
> > >>>
> > >>>Do you have some kind of reproducer which demonstrates the issue?
> > >>>
> > >>>Regards,
> > >>>--
> > >>>Ilya Kasnacheev
> > >>>
> > >>>
> > >>>чт, 28 янв. 2021 г. в 10:32, Zhenya Stanilovsky <
> > arzamas...@mail.ru.invalid
> > >>>>:
> > >>>
> > >>>>
> > >>>> Hello Igniters !
> > >>>> In the process of Ignite usage i found that some part of Compute
> > >>>> functionality are thread unsafe and seems was designed with such
> > >>>> limitations initially.
> > >>>> Example : one (client, but it doesn`t matter at all) instance is
> > >>>> shared between numerous of fabric, all of them calls something like
> :
> > >>>> IgniteCompute#execute(ComputeTask, T)
> > >>>> or
> > >>>> IgniteCompute#execute(java.lang.Class>,
> T)
> > >>>> and appropriate «async» methods — what kind of instance will be
> > called is
> > >>>> nondeterministic for now and as a confirmation of my words — i found
> > no
> > >>>> tests covered multi thread usage of Computing i also found nothing
> on
> > >>>> documentation page [1].
> > >>>> We have all necessary info for correct processing of such cases:
> > >>>> from initiator (ignite.compute(...) starter) side we have Class or
> it
> > >>>> instance and appropriate class loader which will be wired by class
> > loader
> > >>>> id from execution side.
> > >>>> I create a fix and seems all work perfectly well besides one place,
> > this
> > >>>> functionality :
> > >>>> /**
> > >>>> * Executes given task within the cluster group. For step-by-step
> > >>>> explanation of task execution process
> > >>>> * refer to {@link ComputeTask} documentation.
> > >>>> * 
> > >>>> * If task for given name has not been deployed yet, then {@code
> > taskName}
> > >>>> will be
> > >>>> * used as task class name to auto-deploy the task (see {@link
> > >>>> #localDeployTask(Class, ClassLoader)} method).
> > >>>> */
> > >>>> public  R execute(String taskName, T arg) throws
> > IgniteException;
> > >>>> and attendant
> > >>>> /**
> > >>>> * Finds class loader for the given class.
> > >>>> *
> > >>>> * @param rsrcName Class name or class alias to find class loader
> for.
> > >>>> * @return Deployed class loader, or {@code null} if not deployed.
> > >>>> */
> > >>>> public DeploymentResource findResource(String rsrcName);
> > >>>> is thread unsafe by default, no guarantee that concurrent call of
> > >>>> localDeployTask and execute will bring expected result.
> > >>>> My proposal is to deprecate (or probably annotate [2], as a minimal
> > >>>> — additionally document it) this methods and to append additional :
> > >>>> public DeploymentResource findResource(String rsrcName, ClassLoader
> > >>>> clsLdr);
> > >>>> Only one problem i can observe here, if someone creates new class
> > loaders
> > >>>> and appropriate class instances in loop (i don`t know the purpose)
> and
> > >>>> doesn`t undeploy them then he will get possibly OOM here.
> > >>>>
> > >>>> Such approach will give a possibility to use compute in concurrent
> > >>>> scenario. If there is no objections here i will mark this methods
> and
> > >>>> publish my PR, of course with additional tests.
> > >>>>
> > >>>> What do you think ?
> > >>>>
> > >>>>
> > >>>> [1]
> > >>>>
> > https://ignite.apache.org/docs/latest/code-deployment/peer-class-loading
> > >>>> [2]
> > >>>>
> > https://jcip.net/annotations/doc/net/jcip/annotations/NotThreadSafe.html
> > >>>>
> > >>>>
> > >>
> > >>
> > >>
> > >>
>


-- 
Vladislav Pyatkov


[jira] [Created] (IGNITE-14073) False alarm to lose all transaction nodes

2021-01-27 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-14073:
--

 Summary: False alarm to lose all transaction nodes
 Key: IGNITE-14073
 URL: https://issues.apache.org/jira/browse/IGNITE-14073
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov
Assignee: Vladislav Pyatkov


This exception will happen when losing a primary and other one node during the 
transaction.
But it may not be truth, because the transaction will be able to continue on 
backups (if they are still alive).

{noformat}
 [2021-01-23 
22:32:50,584][ERROR][test-runner-#1%near.IgniteTxExceptionNodeFailTest%][root] 
Transaction was not committed.
class org.apache.ignite.IgniteException: Failed to commit a transaction (all 
partition owners have left the grid, partition data has been lost) 
[cacheName=default, partition=3, key=386050343]
at 
org.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:1096)
at 
org.apache.ignite.internal.processors.cache.transactions.TransactionProxyImpl.commit(TransactionProxyImpl.java:323)
at 
org.apache.ignite.internal.processors.cache.distributed.near.IgniteTxExceptionNodeFailTest.cacheWithBackups(IgniteTxExceptionNodeFailTest.java:280)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$7.run(GridAbstractTest.java:2367)
at java.lang.Thread.run(Thread.java:748)
Caused by: class 
org.apache.ignite.internal.processors.cache.CacheInvalidStateException: Failed 
to commit a transaction (all partition owners have left the grid, partition 
data has been lost) [cacheName=default, partition=3, key=386050343]
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture$FinishMiniFuture.onNodeLeft(GridNearTxFinishFuture.java:993)
at 
org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxFinishFuture.onNodeLeft(GridNearTxFinishFuture.java:167)
at 
org.apache.ignite.internal.processors.cache.GridCacheMvccManager$4.onEvent(GridCacheMvccManager.java:265)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager$LocalListenerWrapper.onEvent(GridEventStorageManager.java:1393)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:888)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.notifyListeners(GridEventStorageManager.java:873)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record0(GridEventStorageManager.java:349)
at 
org.apache.ignite.internal.managers.eventstorage.GridEventStorageManager.record(GridEventStorageManager.java:312)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.recordEvent(GridDiscoveryManager.java:2948)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:3164)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2968)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
... 1 more
{noformat}
It will frighten a client, because it looks like a data lose.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13977) Code enhancement after review of encryption persistent storage

2021-01-11 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13977:
--

 Summary: Code enhancement after review of encryption persistent 
storage
 Key: IGNITE-13977
 URL: https://issues.apache.org/jira/browse/IGNITE-13977
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


# There are a lot of difficult code snipped in `GridCacheOffheapManager` where 
the type of page choosing.
 # CacheGroupReencryptionTest.testPhysicalRecoveryWithUpdates test is flaky, 
when checkpoint is triggered before expected.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13866) Validate index does not stop after process of control.sh was interrupted

2020-12-16 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13866:
--

 Summary: Validate index does not stop after process of control.sh 
was interrupted
 Key: IGNITE-13866
 URL: https://issues.apache.org/jira/browse/IGNITE-13866
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


Validate index command might to continue in a cluster even after the command is 
emergency terminated.

For example: we can type CTRL+c in console for terminate an incorrect command 
invocation, but this command does not be terminated in cluster. In the end of 
this we have a several processes the result of which does not need anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13864) Assertion error happens on stale latch's acknowledge

2020-12-16 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13864:
--

 Summary: Assertion error happens on stale latch's acknowledge
 Key: IGNITE-13864
 URL: https://issues.apache.org/jira/browse/IGNITE-13864
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


There are several Assertion errors on TC logs, they are bounded with exchange 
latch.
Seem it happens because latch manager is not handling stale acknowledge.

{noformat}

 

{{[18:39:26]W: [org.gridgain:ignite-core] [2020-03-26 
18:39:26,680][ERROR][sys-#53190%distributed.CacheLoadingConcurrentGridStartSelfTest2%][GridIoManager]
 An error occurred processing the message [msg=GridIoMessage [plc=2, 
topic=TOPIC_EXCHANGE, topicOrd=31, or
dered=false, timeout=0, skipOnTimeout=false, 
msg=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.LatchAckMessage@779ce9da],
 nodeId=5bd19ec1-da96-41a1-a3e0-ebb55321].
[18:39:26]W: [org.gridgain:ignite-core] java.lang.AssertionError
[18:39:26]W: [org.gridgain:ignite-core] at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.processAck(ExchangeLatchManager.java:399)
[18:39:26]W: [org.gridgain:ignite-core] at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.latch.ExchangeLatchManager.lambda$new$0(ExchangeLatchManager.java:119)
[18:39:26]W: [org.gridgain:ignite-core] at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1654)
[18:39:26]W: [org.gridgain:ignite-core] at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1274)
[18:39:26]W: [org.gridgain:ignite-core] at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4500(GridIoManager.java:145)
[18:39:26]W: [org.gridgain:ignite-core] at 
org.apache.ignite.internal.managers.communication.GridIoManager$8.execute(GridIoManager.java:1159)
[18:39:26]W: [org.gridgain:ignite-core] at 
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:50)
[18:39:26]W: [org.gridgain:ignite-core] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[18:39:26]W: [org.gridgain:ignite-core] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[18:39:26]W: [org.gridgain:ignite-core] at 
java.lang.Thread.run(Thread.java:748)}}

{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13594) Model classes require manual deserialization if used inside Job loaded by p2p

2020-10-19 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13594:
--

 Summary: Model classes require manual deserialization if used 
inside Job loaded by p2p
 Key: IGNITE-13594
 URL: https://issues.apache.org/jira/browse/IGNITE-13594
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


After fix in IGNITE-5038[,|https://ggsystems.atlassian.net/browse/GG-28146,] 
now users can use model classes inside CompuJobs, but they still need to change 
their code and add manual deserialization like
Object personVal = binaryVal.deserialize(testClsLdr);
If they want to use them.

I believe that we can do it under the hood and proper classloader can be chosen 
automatically.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13593) IgniteClientCacheStartFailoverTest.testRebalanceStateConcurrentStart (Cache 2) is flaky

2020-10-19 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13593:
--

 Summary: 
IgniteClientCacheStartFailoverTest.testRebalanceStateConcurrentStart (Cache 2) 
is flaky
 Key: IGNITE-13593
 URL: https://issues.apache.org/jira/browse/IGNITE-13593
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


[https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=749390831986783178=testDetails_IgniteTests24Java8=%3Cdefault%3E]

Flaky rate is 14%

 

There are two kinds of fails in this test (as a TC says):
 # Exception on MVCC cache, because tests adds identical keys in one moment.
 This exception will fix here.
 # Assertion error, because size of cache as different as expected.
 This behavior is difficulty reproduced and happened very rare in TC. It will 
be fixed in another ticket if it appears again after this issue would be closed.

The reason of flacking of this test is an exception on MVCC cache:

{noformat}

javax.cache.CacheException: class 
org.apache.ignite.transactions.TransactionSerializationException: Cannot 
serialize transaction due to write conflict (transaction is marked for 
rollback) at 
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1265)
 at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.cacheException(IgniteCacheProxyImpl.java:2077)
 at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(IgniteCacheProxyImpl.java:1313)
 at 
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(GatewayProtectedCacheProxy.java:817)
 at 
org.apache.ignite.internal.processors.cache.IgniteClientCacheStartFailoverTest$8.call(IgniteClientCacheStartFailoverTest.java:399)
 at 
org.apache.ignite.internal.processors.cache.IgniteClientCacheStartFailoverTest$8.call(IgniteClientCacheStartFailoverTest.java:375)
 at org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:87) 
Caused by: class 
org.apache.ignite.transactions.TransactionSerializationException: Cannot 
serialize transaction due to write conflict (transaction is marked for 
rollback) at 
org.apache.ignite.internal.util.IgniteUtils$16.apply(IgniteUtils.java:1011) at 
org.apache.ignite.internal.util.IgniteUtils$16.apply(IgniteUtils.java:1009) ... 
7 more Caused by: class 
org.apache.ignite.internal.transactions.IgniteTxSerializationCheckedException: 
Cannot serialize transaction due to write conflict (transaction is marked for 
rollback) at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.serializationError(GridCacheMapEntry.java:7123)
 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.access$700(GridCacheMapEntry.java:136)
 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry$MvccUpdateLockListener.apply(GridCacheMapEntry.java:5629)
 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry$MvccUpdateLockListener.apply(GridCacheMapEntry.java:5482)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:407)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:355)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:343)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:520)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:498)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:464)
 at 
org.apache.ignite.internal.processors.cache.mvcc.MvccProcessorImpl$LockFuture.run(MvccProcessorImpl.java:1952)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)

{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Re[2]: Apache Ignite 2.9.0 RELEASE [Time, Scope, Manager]

2020-10-12 Thread Vladislav Pyatkov
;>>>>>> long
> >> >>>>>>>>>>>>>>>> term: register classes and generate a dynamic message
> >> >> factory
> >> >>>>>>>> with
> >> >>>>>>>>>>>> a switch
> >> >>>>>>>>>>>>>>>> statement once all messages are registered (not in 2.9
> >> >>>> though,
> >> >>>>>>>>>>>> obviously).
> >> >>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>> ср, 9 сент. 2020 г. в 14:53, Alex Plehanov <
> >> >>>>>>>>>  plehanov.a...@gmail.com
> >> >>>>>>>>>>>>> :
> >> >>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>> Hello guys,
> >> >>>>>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>>>>> I've tried to optimize tracing implementation (ticket
> >> [1]),
> >> >>>> it
> >> >>>>>>>>>>>> reduced the
> >> >>>>>>>>>>>>>>>>> drop, but not completely removed it.
> >> >>>>>>>>>>>>>>>>> Ivan Rakov, Alexander Lapin, can you please review the
> >> >>>> patch?
> >> >>>>>>>>>>>>>>>>> Ivan Artiukhov, can you please benchmark the patch [2]
> >> >>>> against
> >> >>>>>>>>>>>> 2.8.1
> >> >>>>>>>>>>>>>>>>> release on your environment?
> >> >>>>>>>>>>>>>>>>> With this patch on our environment, it's about a 3%
> drop
> >> >>>> left,
> >> >>>>>>>>>>>> it's close
> >> >>>>>>>>>>>>>>>>> to measurement error and I think such a drop is not a
> >> >>>>>>>>>>>> showstopper. Guys,
> >> >>>>>>>>>>>>>>>>> WDYT?
&g

[jira] [Created] (IGNITE-13501) AssertionError: Invalid value in testMergeServersFail1_8

2020-09-30 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13501:
--

 Summary: AssertionError: Invalid value in testMergeServersFail1_8
 Key: IGNITE-13501
 URL: https://issues.apache.org/jira/browse/IGNITE-13501
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


java.lang.AssertionError: Invalid value 
[node=distributed.CacheExchangeMergeTest0, client=false, order=1, cache=c6] 
expected:<1> but was:

Reproduced by [1] [2]

[1] 
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-3187056670751319047=testDetails

Covered of case of one phase committed transaction with zero backups. The 
reason of this issue in that a primary node fails during a near node is sending 
a request.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13417) Cache Interceptors deserialization on client nodes

2020-09-09 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13417:
--

 Summary: Cache Interceptors deserialization on client nodes
 Key: IGNITE-13417
 URL: https://issues.apache.org/jira/browse/IGNITE-13417
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


After fix https://issues.apache.org/jira/browse/IGNITE-1903, Cache Interceptors 
still don't work

Looks like we need to add @SerializeSeparately to this field too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13402) [Suite] PDS 3 flaky failed on TC

2020-09-03 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13402:
--

 Summary: [Suite] PDS 3 flaky failed on TC
 Key: IGNITE-13402
 URL: https://issues.apache.org/jira/browse/IGNITE-13402
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


{noformat}
 java.lang.AssertionError: Invalid topology version 
[topVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], group=Group1]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.readyTopologyVersion(GridDhtPartitionTopologyImpl.java:317)
at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.nextVersion(GridCacheAdapter.java:3663)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpiredInternal(GridCacheOffheapManager.java:2821)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.purgeExpired(GridCacheOffheapManager.java:2747)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.expire(GridCacheOffheapManager.java:1090)
at 
org.apache.ignite.internal.processors.cache.GridCacheTtlManager.expire(GridCacheTtlManager.java:242)
at 
org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.lambda$body$0(GridCacheSharedTtlCleanupManager.java:178)
at 
java.util.concurrent.ConcurrentHashMap.computeIfPresent(ConcurrentHashMap.java:1769)
 [2020-07-28 06:36:24,540][INFO 
][exchange-worker-#38244%persistence.IgnitePdsContinuousRestartTestWithExpiryPolicy2%][FileWriteAheadLogManager]
 Resuming logging to WAL segment 
[file=/opt/buildagent/work/bde9b45ddb020b34/incubator-ignite/work/db/wal/persistence_IgnitePdsContinuousRestartTestWithExpiryPolicy2/.wal,
 offset=3573451, ver=2]
at 
org.apache.ignite.internal.processors.cache.GridCacheSharedTtlCleanupManager$CleanupWorker.body(GridCacheSharedTtlCleanupManager.java:177)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)
 [2020-07-28 
06:36:24,544][ERROR][ttl-cleanup-worker-#38230%persistence.IgnitePdsContinuousRestartTestWithExpiryPolicy2%][IgniteTestResources]
 Critical system error detected. Will be handled accordingly to configured 
handler [hnd=NoOpFailureHandler [super=AbstractFailureHandler 
[ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, 
SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext 
[type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: GridWorker 
[name=ttl-cleanup-worker, 
igniteInstanceName=persistence.IgnitePdsContinuousRestartTestWithExpiryPolicy2, 
finished=true, heartbeatTs=1595907384509]]]
 class org.apache.ignite.IgniteException: GridWorker [name=ttl-cleanup-worker, 
igniteInstanceName=persistence.IgnitePdsContinuousRestartTestWithExpiryPolicy2, 
finished=true, heartbeatTs=1595907384509]
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1859)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$2.apply(IgnitionEx.java:1854)
at 
org.apache.ignite.internal.worker.WorkersRegistry.onStopped(WorkersRegistry.java:168)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:152)
at java.lang.Thread.run(Thread.java:748)

{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSSION] Maintenance Mode feature

2020-08-31 Thread Vladislav Pyatkov
not fail nor be stopped for manual
> > >> > cleanup.
> > >> > Manual cleanup is not always an option (e.g. restricted access to
> file
> > >> > system); in managed environments failed node will be restarted
> > >> > automatically so user won't have time for performing necessary
> > >> operations.
> > >> > Thus node needs to function in a special mode allowing user to
> connect
> > >> > to
> > >> > it and perform necessary actions.
> > >> >
> > >> > Another example is described in IEP-47 [2] where defragmentation is
> > >> > being
> > >> > developed. Node defragmenting its PDS should not join the cluster
> > until
> > >> the
> > >> > process is finished so it needs to enter Maintenance Mode as well.
> > >> >
> > >> > *Suggested design*
> > >> > I suggest MM to work as follows:
> > >> > 1. Node enters MM if special markers are found on disk. These
> markers
> > >> > called Maintenance Records could be created automatically (e.g. when
> > >> > storage component detects corrupted storage) or by user request
> (when
> > >> user
> > >> > requests defragmentation of some caches). So entering MM requires
> node
> > >> > restart.
> > >> > 2. Started in MM node doesn't join the cluster but finishes startup
> > >> routine
> > >> > so it is able to receive commands and provide metrics to the user.
> > >> > 3. When all necessary maintenance operations are finished,
> Maintenance
> > >> > Records for these operations are deleted from disk and node
> restarted
> > >> again
> > >> > to enter normal service.
> > >> >
> > >> > *Example*
> > >> > To put it into a context let's consider an example of how I see the
> MM
> > >> > workflow in case of PDS corruption.
> > >> >
> > >> >   1. Node has failed in the middle of checkpoint when WAL is
> disabled
> > >> > for
> > >> >   a particular cache -> data files of the cache are potentially
> > >> corrupted.
> > >> >   2. On next startup node detects this situation, creates
> Maintenance
> > >> >   Record on disk and shuts down.
> > >> >   3. On next startup node sees Maintenance Record, enters
> Maintenance
> > >> Mode
> > >> >   and waits for user to do specific actions: clean potentially
> > >> > corrupted
> > >> PDS.
> > >> >   4. When user has done necessary actions he/she removes Maintenance
> > >> >   Record using Maintenance Mode API exposed via control.{sh|bat}
> > script
> > >> or
> > >> >   JMX.
> > >> >   5. On next startup node goes to normal operations as maintenance
> > >> > reason
> > >> >   is fixed.
> > >> >
> > >> >
> > >> > I prepared a PR [3] for ticket [1] with draft implementation. It is
> > not
> > >> > ready to be merged to master branch but is already fully functional
> > and
> > >> can
> > >> > be reviewed.
> > >> >
> > >> > Hope you'll share your feedback on the feature and/or any thoughts
> on
> > >> > implementation.
> > >> >
> > >> > Thank you!
> > >> >
> > >> > [1] https://issues.apache.org/jira/browse/IGNITE-13366
> > >> > [2]
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-47:+Native+persistence+defragmentation
> > >> > [3] https://github.com/apache/ignite/pull/8189
> > >>
> > >>
> > >
> >
> >
> > --
> >
> > Best regards,
> > Ivan Pavlukhin
> >
>


-- 
Vladislav Pyatkov


[jira] [Created] (IGNITE-13379) Exception occur on SQL caches when client reconnect

2020-08-21 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13379:
--

 Summary: Exception occur on SQL caches when client reconnect
 Key: IGNITE-13379
 URL: https://issues.apache.org/jira/browse/IGNITE-13379
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov
Assignee: Vladislav Pyatkov


When client started only subset of all cluster caches, it can have issues on 
reconnect.

If cache isn't started on client, it still registered some SQL structures:
{{GridQueryProcessor#initQueryStructuresForNotStartedCache}}

but these structures are not cleared on disconnect:
GridCacheProcessor#onReconnected

This leads to exception on reconnect:
{noformat}
class org.apache.ignite.IgniteCheckedException: Type with name 'Timestamp' 
already indexed in cache 'TEST_CACHE2'.
 at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.registerCache0(GridQueryProcessor.java:1712)
 at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.onCacheStart0(GridQueryProcessor.java:834)
 at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.onCacheStart(GridQueryProcessor.java:911)
 at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.initQueryStructuresForNotStartedCache(GridQueryProcessor.java:889)
 at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processCacheStartRequests(CacheAffinitySharedManager.java:968)
 at 
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:857)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:1205)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:850)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3258)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3104)
 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
 at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Exception handling in thin client: should we pass stack traces to the client?

2020-08-20 Thread Vladislav Pyatkov
Hi,

I agree with Zhenya, that a stack from server side will be able to help in
investigation of issues, but it really confused in production environment.
I see all participants tell the same.

Pavel, do you mean this behavior should be switching by configuration?

On Thu, Aug 20, 2020 at 5:00 PM Pavel Tupitsyn  wrote:

> Link to the original discussion:
>
>
> http://apache-ignite-developers.2346864.n4.nabble.com/Exception-handling-in-thin-client-should-we-pass-stack-traces-to-the-client-td22392.html
>
> On Thu, Aug 20, 2020 at 4:46 PM Zhenya Stanilovsky
>  wrote:
>
> >
> > I want to resurrect this discussion, i don`t understand what sensitive
> > information you are talking about ?
> > Can you show some examples or something else ? I never listen that thread
> > dumps belong to sensitive info.
> > I believe that one linear error can`t help user to recognize problem and
> > logs from server side can be simple unreachable or logging disabled at
> all.
> > So i suggest to request full thread dump in case of server side error
> > occurred.
> >
> > what do you think ?
> >
> >
> > >Igniters,
> > >
> > >We had a discussion about how to propagate error information from
> cluster
> > >nodes to the client. My opinion is that we should pass a kind of vendor
> > >code plus optional error message, if vendor code is not very specific.
> > >
> > >Alternative idea is to pass the whole stack trace as well. I agree that
> > >this is very useful for debugging purposes, but on the other hand IMO it
> > >imposes security risk. By sending invalid requests to the server user
> > might
> > >get sensitive information about server configuration, such as it's
> > version,
> > >version of the underlying database, frameworks etc.. This information
> may
> > >help attacker to apply some version-specific attacks. This is precise
> > >reason why default error pages of web servers with stack traces are
> always
> > >replaces with some stubs.
> > >
> > >This is why I think we should not include stack traces.
> > >
> > >What do you think?
> > >
> > >Vladimir.
> >
> >
> >
> >
>


-- 
Vladislav Pyatkov


[jira] [Created] (IGNITE-13377) WalModeChangeAdvancedSelfTest.testServerRestartNonCoordinator

2020-08-20 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13377:
--

 Summary: 
WalModeChangeAdvancedSelfTest.testServerRestartNonCoordinator
 Key: IGNITE-13377
 URL: https://issues.apache.org/jira/browse/IGNITE-13377
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Update of the default inline size for variable types

2020-08-19 Thread Vladislav Pyatkov
Hi,

In my mind, the inline size 64 will be able to significant grow of storage
size.
It can be difficult to understand by users.

Earlier I remember we panned to replace inline value to hash code in the
case where size of value more than inline size.
It will help to comparison of "==", "!=", but will not grow size of storage.

I think optimization with hash code looks more preferable and in last way
anyone can to grow size of baseline though API.


On Wed, Aug 19, 2020 at 9:22 AM Zhenya Stanilovsky
 wrote:

>
>
> >Hi guys,
>
> Evgeniy, hola!
> >
> >Currently if a varlength type (such as String or byte[]) is encountered in
> >the composite index inline size just defaults to 10, which is almost
> always
> >not enough. I am going to change this and implement following changes:
> >
> >1) For a column of the variable length keep using 10 as the default size
> in
> >case of the one-column index. But if the index is composite the default
> >index size will be calculated as the sum of sizes of all indexed columns.
> >For example, for the index like (INT, VARCHAR, VARCHAR, INT) default
> inline
> >size will be 5 + 10 + 10 + 5 = 30 (5 for each int, 10 for each string).
>
> Why exactly this approach ? Why not 5 + 10 and its all here ? Do you have
> some logical base, statistical distribution or something near it, for now
> this look as your own decision and nothing more, i`m wrong ?
> >
> >2) For sql varchar and binary columns with defined length (for example
> >VARCHAR(XX)) use XX + 3 as default inline size for the column (need 3
> extra
> >bytes for the inner representation of the type).
>
> The same question here, why you want o cover all varchar len ? do you
> compare with other vendors approach ?
> >
> >3) Maximum default index size still will be limited by
> >IGNITE_MAX_INDEX_PAYLOAD_SIZE, but its default value will be increased to
> >64. For example for the index (VARCHAR, VARCHAR, VARCHAR, VARCHAR,
> VARCHAR,
> >VARCHAR, VARCHAR) default index size will be only 64. Same for the columns
> >with defined length: by default VARCHAR(100) column will create index only
> >with size equal to 64.
> >
> >Please tell if you have any concerns. Update can be found at
> >https://github.com/apache/ignite/pull/8161
> >
> >Best regards,
> >Evgeniy
> >
>
>
>
>



-- 
Vladislav Pyatkov


[jira] [Created] (IGNITE-13265) Historical iterator for atomic group should transfer few more rows than required

2020-07-17 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13265:
--

 Summary: Historical iterator for atomic group should transfer few 
more rows than required
 Key: IGNITE-13265
 URL: https://issues.apache.org/jira/browse/IGNITE-13265
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


On a historical rebalance some updates move from one node to another wherein 
this update may have various order in nodes. Reordering can happen in smell 
interval, but it cannot avoid at all in current implementation atomic protocol.

This mean we will reduce a probably of loosing update if we make a margin from 
initial counter for the historical iterator on atomic cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Choosing historical rebalance heuristics

2020-07-16 Thread Vladislav Pyatkov
I completely forget about another promise to favor of using historical
rebalance where it is possible. When cluster decided to use a full balance,
demander nodes should clear not empty partitions.
This can to consume a long time, in some cases that may be compared with a
time of rebalance.
It also accepts a side of heuristics above.

On Thu, Jul 16, 2020 at 12:09 AM Vladislav Pyatkov 
wrote:

> Ivan,
>
> I agree with a combined approach: threshold for small partitions and count
> of update for partition that outgrew it.
> This helps to avoid partitions that update not frequently.
>
> Reading of a big WAL piece (more than 100Gb) it can happen, when a client
> configured it intentionally.
> There are no doubts we can to read it, otherwise WAL space was not
> configured that too large.
>
> I don't see a connection optimization of iterator and issue in atomic
> protocol.
> Reordering in WAL, that happened in checkpoint where counter was not
> changing, is an extremely rare case and the issue will not solve for
> generic case, this should be fixed in bound of protocol.
>
> I think we can modify the heuristic so
> 1) Exclude partitions by threshold (IGNITE_PDS_WAL_REBALANCE_THRESHOLD -
> reduce it to 500)
> 2) Select only that partition for historical rebalance where difference
> between counters less that partition size.
>
> Also implement mentioned optimization for historical iterator, that may
> reduce a time on reading large WAL interval.
>
> On Wed, Jul 15, 2020 at 3:15 PM Ivan Rakov  wrote:
>
>> Hi Vladislav,
>>
>> Thanks for raising this topic.
>> Currently present IGNITE_PDS_WAL_REBALANCE_THRESHOLD (default is 500_000)
>> is controversial. Assuming that the default number of partitions is 1024,
>> cache should contain a really huge amount of data in order to make WAL
>> delta rebalancing possible. In fact, it's currently disabled for most
>> production cases, which makes rebalancing of persistent caches
>> unreasonably
>> long.
>>
>> I think, your approach [1] makes much more sense than the current
>> heuristic, let's move forward with the proposed solution.
>>
>> Though, there are some other corner cases, e.g. this one:
>> - Configured size of WAL archive is big (>100 GB)
>> - Cache has small partitions (e.g. 1000 entries)
>> - Infrequent updates (e.g. ~100 in the whole WAL history of any node)
>> - There is another cache with very frequent updates which allocate >99% of
>> WAL
>> In such scenario we may need to iterate over >100 GB of WAL in order to
>> fetch <1% of needed updates. Even though the amount of network traffic is
>> still optimized, it would be more effective to transfer partitions with
>> ~1000 entries fully instead of reading >100 GB of WAL.
>>
>> I want to highlight that your heuristic definitely makes the situation
>> better, but due to possible corner cases we should keep the fallback lever
>> to restrict or limit historical rebalance as before. Probably, it would be
>> handy to keep IGNITE_PDS_WAL_REBALANCE_THRESHOLD property with a low
>> default value (1000, 500 or even 0) and apply your heuristic only for
>> partitions with bigger size.
>>
>> Regarding case [2]: it looks like an improvement that can mitigate some
>> corner cases (including the one that I have described). I'm ok with it as
>> long as it takes data updates reordering on backup nodes into account. We
>> don't track skipped updates for atomic caches. As a result, detection of
>> the absence of updates between two checkpoint markers with the same
>> partition counter can be false positive.
>>
>> --
>> Best Regards,
>> Ivan Rakov
>>
>> On Tue, Jul 14, 2020 at 3:03 PM Vladislav Pyatkov 
>> wrote:
>>
>> > Hi guys,
>> >
>> > I want to implement a more honest heuristic for historical rebalance.
>> > Before, a cluster makes a choice between the historical rebalance or
>> not it
>> > only from a partition size. This threshold more known by a name of
>> property
>> > IGNITE_PDS_WAL_REBALANCE_THRESHOLD.
>> > It might prevent a historical rebalance when a partition is too small,
>> but
>> > not if WAL contains more updates than a size of partition, historical
>> > rebalance still can be chosen.
>> > There is a ticket where need to implement more fair heuristic[1].
>> >
>> > My idea for implementation is need to estimate a size of data which
>> will be
>> > transferred owe network. In other word if need to rebalance a part of
>> WAL
>> > that contains N updates, for recover a partition on another node,

Re: Choosing historical rebalance heuristics

2020-07-15 Thread Vladislav Pyatkov
Ivan,

I agree with a combined approach: threshold for small partitions and count
of update for partition that outgrew it.
This helps to avoid partitions that update not frequently.

Reading of a big WAL piece (more than 100Gb) it can happen, when a client
configured it intentionally.
There are no doubts we can to read it, otherwise WAL space was not
configured that too large.

I don't see a connection optimization of iterator and issue in atomic
protocol.
Reordering in WAL, that happened in checkpoint where counter was not
changing, is an extremely rare case and the issue will not solve for
generic case, this should be fixed in bound of protocol.

I think we can modify the heuristic so
1) Exclude partitions by threshold (IGNITE_PDS_WAL_REBALANCE_THRESHOLD -
reduce it to 500)
2) Select only that partition for historical rebalance where difference
between counters less that partition size.

Also implement mentioned optimization for historical iterator, that may
reduce a time on reading large WAL interval.

On Wed, Jul 15, 2020 at 3:15 PM Ivan Rakov  wrote:

> Hi Vladislav,
>
> Thanks for raising this topic.
> Currently present IGNITE_PDS_WAL_REBALANCE_THRESHOLD (default is 500_000)
> is controversial. Assuming that the default number of partitions is 1024,
> cache should contain a really huge amount of data in order to make WAL
> delta rebalancing possible. In fact, it's currently disabled for most
> production cases, which makes rebalancing of persistent caches unreasonably
> long.
>
> I think, your approach [1] makes much more sense than the current
> heuristic, let's move forward with the proposed solution.
>
> Though, there are some other corner cases, e.g. this one:
> - Configured size of WAL archive is big (>100 GB)
> - Cache has small partitions (e.g. 1000 entries)
> - Infrequent updates (e.g. ~100 in the whole WAL history of any node)
> - There is another cache with very frequent updates which allocate >99% of
> WAL
> In such scenario we may need to iterate over >100 GB of WAL in order to
> fetch <1% of needed updates. Even though the amount of network traffic is
> still optimized, it would be more effective to transfer partitions with
> ~1000 entries fully instead of reading >100 GB of WAL.
>
> I want to highlight that your heuristic definitely makes the situation
> better, but due to possible corner cases we should keep the fallback lever
> to restrict or limit historical rebalance as before. Probably, it would be
> handy to keep IGNITE_PDS_WAL_REBALANCE_THRESHOLD property with a low
> default value (1000, 500 or even 0) and apply your heuristic only for
> partitions with bigger size.
>
> Regarding case [2]: it looks like an improvement that can mitigate some
> corner cases (including the one that I have described). I'm ok with it as
> long as it takes data updates reordering on backup nodes into account. We
> don't track skipped updates for atomic caches. As a result, detection of
> the absence of updates between two checkpoint markers with the same
> partition counter can be false positive.
>
> --
> Best Regards,
> Ivan Rakov
>
> On Tue, Jul 14, 2020 at 3:03 PM Vladislav Pyatkov 
> wrote:
>
> > Hi guys,
> >
> > I want to implement a more honest heuristic for historical rebalance.
> > Before, a cluster makes a choice between the historical rebalance or not
> it
> > only from a partition size. This threshold more known by a name of
> property
> > IGNITE_PDS_WAL_REBALANCE_THRESHOLD.
> > It might prevent a historical rebalance when a partition is too small,
> but
> > not if WAL contains more updates than a size of partition, historical
> > rebalance still can be chosen.
> > There is a ticket where need to implement more fair heuristic[1].
> >
> > My idea for implementation is need to estimate a size of data which will
> be
> > transferred owe network. In other word if need to rebalance a part of WAL
> > that contains N updates, for recover a partition on another node, which
> > have to contain M rows at all, need chooses a historical rebalance on the
> > case where N < M (WAL history should be presented as well).
> >
> > This approach is easy implemented, because a coordinator node has the
> size
> > of partitions and counters' interval. But in this case cluster still can
> > find not many updates in too long WAL history. I assume a possibility to
> > work around it, if rebalance historical iterator will not handle
> > checkpoints where not contains updates of particular cache. Checkpoints
> can
> > skip if counters for the cache (maybe even a specific partitions) was not
> > changed between it and next one.
> >
> > Ticket for improvement rebalance historical iterator[2]
> >
> > I want to hear a view of community on the thought above.
> > Maybe anyone has another opinion?
> >
> > [1]: https://issues.apache.org/jira/browse/IGNITE-13253
> > [2]: https://issues.apache.org/jira/browse/IGNITE-13254
> >
> > --
> > Vladislav Pyatkov
> >
>


-- 
Vladislav Pyatkov


Choosing historical rebalance heuristics

2020-07-14 Thread Vladislav Pyatkov
Hi guys,

I want to implement a more honest heuristic for historical rebalance.
Before, a cluster makes a choice between the historical rebalance or not it
only from a partition size. This threshold more known by a name of property
IGNITE_PDS_WAL_REBALANCE_THRESHOLD.
It might prevent a historical rebalance when a partition is too small, but
not if WAL contains more updates than a size of partition, historical
rebalance still can be chosen.
There is a ticket where need to implement more fair heuristic[1].

My idea for implementation is need to estimate a size of data which will be
transferred owe network. In other word if need to rebalance a part of WAL
that contains N updates, for recover a partition on another node, which
have to contain M rows at all, need chooses a historical rebalance on the
case where N < M (WAL history should be presented as well).

This approach is easy implemented, because a coordinator node has the size
of partitions and counters' interval. But in this case cluster still can
find not many updates in too long WAL history. I assume a possibility to
work around it, if rebalance historical iterator will not handle
checkpoints where not contains updates of particular cache. Checkpoints can
skip if counters for the cache (maybe even a specific partitions) was not
changed between it and next one.

Ticket for improvement rebalance historical iterator[2]

I want to hear a view of community on the thought above.
Maybe anyone has another opinion?

[1]: https://issues.apache.org/jira/browse/IGNITE-13253
[2]: https://issues.apache.org/jira/browse/IGNITE-13254

-- 
Vladislav Pyatkov


[jira] [Created] (IGNITE-13254) Historical rebalance iterator may skip checkpoint if it not contains updates

2020-07-14 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13254:
--

 Summary: Historical rebalance iterator may skip checkpoint if it 
not contains updates
 Key: IGNITE-13254
 URL: https://issues.apache.org/jira/browse/IGNITE-13254
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13253) Advanced heuristics for historical rebalance

2020-07-14 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13253:
--

 Summary: Advanced heuristics for historical rebalance
 Key: IGNITE-13253
 URL: https://issues.apache.org/jira/browse/IGNITE-13253
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


Before, cluster detects partitions that have not to rebalance by history, by 
them size. This threshold might be set through a system property 
IGNITE_PDS_WAL_REBALANCE_THRESHOLD. But it is not fair deciding which 
partitions will be rebalanced by WAL only by them size. WAL can have much more 
records than size of a partition (many update by one key) and that rebalance 
required more data than full transferring by network.
Need to implement a heuristic, that might to estimate data size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13245) Rebalance future might hangs in no final state though all partitions are owned

2020-07-11 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13245:
--

 Summary: Rebalance future might hangs in no final state though all 
partitions are owned
 Key: IGNITE-13245
 URL: https://issues.apache.org/jira/browse/IGNITE-13245
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


It is very specific case, when supplier go out of cluster and in the same time, 
its partitions have not needed rebalance in new topology.

Loot at my PR for to understand it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13191) Public-facing API for "waiting for backups on shutdown"

2020-06-27 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13191:
--

 Summary: Public-facing API for "waiting for backups on shutdown"
 Key: IGNITE-13191
 URL: https://issues.apache.org/jira/browse/IGNITE-13191
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


We should introduce "should wait for backups on shutdown" flag in Ignition 
and/or IgniteConfiguration.

Maybe we should do the same to "cancel compute tasks" flag.

Also make sure that we can shut down node explicitly, overriding this flag but 
without JVM termination.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-13168) Retrigger historical rebalance if it was cancelled in case WAL history is still available

2020-06-19 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13168:
--

 Summary: Retrigger historical rebalance if it was cancelled in 
case WAL history is still available
 Key: IGNITE-13168
 URL: https://issues.apache.org/jira/browse/IGNITE-13168
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


If historical rebalance is cancelled, full rebalance will be unconditionally 
triggered on the PME that caused the cancellation (only outdated OWNING 
partitions can be rebalanced by history in the current implementation).
We have to allow MOVING partitions to be historically rebalanced as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Various shutdown guaranties

2020-06-08 Thread Vladislav Pyatkov
Hi

We need to have ability to calling shutdown with various guaranties.
For example:
Need to reboot a node, but after that node should be available for
historical rebalance (all partitions in MOVING state should have gone to
OWNING).

Implemented a circled reboot of cluster, but all data should be available
on that time (at least one copy of partition should be available in
cluster).

Need to wait not only data available, but all jobs (before this behavior
available through a stop(false) method invocation).

All these reason required various behavior before shutting down node.
I propose slightly modify public API and add here method which shown on
shutdown behavior directly:
Ignite.close(Shutdown)

public enum Shutdownn {
/**
 * Stop immediately as soon components are ready.
 */
IMMEDIATE,
/**
 * Stop node when all partitions completed moving from/to this node to
another.
 */
NORMAL,
/**
 * Node will stop if and only if it does not store any unique
partitions, that does not have copies on cluster.
 */
GRACEFUL,
/**
 * Node stops graceful and wait all jobs before shutdown.
 */
ALL
}

Method close without parameter Ignite.close() will get shutdown behavior
configured for cluster wide. It will be implemented through distributed
meta storage and additional utilities for configuration.
Also, will be added a method to configure shutdown on start, this is look
as IgniteConfiguration.setShutdown(Shutdown).
If shutting down did not configure all be worked as before according to
IMMEDIATE behavior.
All other close method will be marked as deprecated.

I will be waiting for your opinions.

-- 
Vladislav Pyatkov


[jira] [Created] (IGNITE-13072) Synchronization problems when different classloaders are used for deployment of same class

2020-05-26 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-13072:
--

 Summary: Synchronization problems when different classloaders are 
used for deployment of same class
 Key: IGNITE-13072
 URL: https://issues.apache.org/jira/browse/IGNITE-13072
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov
Assignee: Vladislav Pyatkov


If you concurrently deploy one class using different classloaders you can get 
error:

{noformat}

2020-04-28 
14:36:42.523[ERROR][sys-stripe-45-#46%GRID%GridNodeName%][o.a.i.i.m.d.GridDeploymentLocalStore]
 Found more than one active deployment for the same resource [cls=class 
org.some.class.old.InvokeIndexRemover, depMode=SHARED, dep=GridDeployment 
[ts=1588067100125, depMode=SHARED, 
clsLdr=org.some.class.factory.NodeClassLoader@14035d21, 
clsLdrId=85ab310c171-a9fad11c-9f8c-4d2a-8146-6c87254303e7, userVer=0, loc=true, 
sampleClsName=org.some.class.predicates.CompositePredicate, 
pendingUndeploy=false, undeployed=false, usage=0]]
 
2020-04-28 
14:36:42.544[ERROR][sys-stripe-45-#46%GRID%GridNodeName%][o.a.i.i.p.cache.GridCacheIoManager]
 Failed to process message [senderId=f104e069-9d80-4202-b50a-b3dc1804ac89, 
msg=GridNearAtomicSingleUpdateRequest [key=KeyCacheObject [hasValBytes=true], 
super=GridNearAtomicSingleUpdateRequest [key=KeyCacheObject [hasValBytes=true], 
parent=GridNearAtomicAbstractSingleUpdateRequest [nodeId=null, futId=1376257, 
topVer=AffinityTopologyVersion [topVer=35, minorTopVer=0], 
parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=]
java.lang.AssertionError: null
 at 
org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore.getDeployment(GridDeploymentLocalStore.java:203)
 at 
org.apache.ignite.internal.managers.deployment.GridDeploymentManager.getLocalDeployment(GridDeploymentManager.java:383)
 at 
org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CacheClassLoader.findClass(GridCacheDeploymentManager.java:802)
 at 
org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager$CacheClassLoader.loadClass(GridCacheDeploymentManager.java:794)
 at org.apache.ignite.internal.util.IgniteUtils.forName(IgniteUtils.java:8561)
 at 
org.apache.ignite.internal.MarshallerContextImpl.getClass(MarshallerContextImpl.java:374)
 at 
org.apache.ignite.internal.binary.BinaryContext.descriptorForTypeId(BinaryContext.java:700)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1757)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1716)
 at 
org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:313)
 at 
org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:99)
 at 
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:82)
 at org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:9959)
 at 
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10017)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateInvokeRequest.finishUnmarshal(GridNearAtomicSingleUpdateInvokeRequest.java:200)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.unmarshall(GridCacheIoManager.java:1560)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:582)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:386)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:312)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:102)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:301)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
 at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:546)
 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
 at java.lang.Thread.run(Thread.java:748)

{noformat}

 

Looks like we lack synchronization for modifying 
{{LocalDeploymentSpi.ldrRsrcs}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12935) Disadvantages in log of historical rebalance

2020-04-24 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-12935:
--

 Summary: Disadvantages in log of historical rebalance
 Key: IGNITE-12935
 URL: https://issues.apache.org/jira/browse/IGNITE-12935
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


# Mention in the log only partitions for which there are no nodes that suit as 
historical supplier
 For these partitions, print minimal counter (since which we should perform 
historical rebalancing) with corresponding node and maximum reserved counter 
(since which cluster can perform historical rebalancing) with corresponding 
node.
This will let us know:
 ## Whether history was reserved at all
 ## How much reserved history we lack to perform a historical rebalancing
 ## I see resulting output like this:
Historical rebalancing wasn't scheduled for some partitions:
 History wasn't reserved for: [list of partitions and groups]
 History was reserved, but minimum present counter is less than maximum 
reserved: [[grp=GRP, part=ID, minCntr=cntr, minNodeId=ID, maxReserved=cntr, 
maxReservedNodeId=ID], ...]
 ## We can also aggregate previous message by (minNodeId) to easily find the 
exact node (or nodes) which were the reason of full rebalance.
 # Log results of reserveHistoryForExchange(). They can be compactly 
represented as mappings: (grpId -> checkpoint (id, timestamp)). For every 
group, also log message about why the previous checkpoint wasn't successfully 
reserved.
There can be three reasons:
 ## Previous checkpoint simply isn't present in the history (the oldest is 
reserved)
 ## WAL reservation failure (call below returned false)

{code:java}
chpEntry = entry(cpTs);boolean reserved =  
cctx.wal().reserve(chpEntry.checkpointMark());// If checkpoint WAL history 
can't be reserved, stop searching.
if (!reserved)
  break;
{code}

 ## Checkpoint was marked as inapplicable for historical rebalancing

{code:java}
for (Integer grpId : new HashSet<>(groupsAndPartitions.keySet()))
  if (!isCheckpointApplicableForGroup(grpId, chpEntry))
groupsAndPartitions.remove(grpId);
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Discuss idle_verify with moving partitions changes.

2020-03-23 Thread Vladislav Pyatkov
Hi Zhenya,

I see your point. Need to show some message, because cluster is not idle
(rebalance is going).
When cluster not idle we cannot validate partitions honestly. After several
minutes we can to get absolutely different result, without any client's
operation of cache happened.

May be enough showing some message more clear for end user. For example:
"Result has not valid, rebalance is going."

Another thing you meaning - issue in indexes, when rebalance is following.
I think idex_validate should fail in this case, because indexes always in
load during rebalance.


On Mon, Mar 23, 2020 at 10:20 AM Zhenya Stanilovsky
 wrote:

>
> Igniters, i found that near idle check commands only shows partitions in
> MOVING states as info in log and not take into account this fact as
> erroneous idle cluster state.
> control.sh --cache idle_verify, control.sh --cache validate_indexes
> --check-crc
>
> for example command would show something like :
>
> Arguments: --cache idle_verify --yes
>
> 
> idle_verify task was executed with the following args: caches=[],
> excluded=[], cacheFilter=[DEFAULT]
> idle_verify check has finished, no conflicts have been found.
> Verification was skipped for 21 MOVING partitions:
> Skipped partition: PartitionKeyV2 [grpId=1544803905, grpName=default,
> partId=7]
> Partition instances: [PartitionHashRecordV2 [isPrimary=false,
> consistentId=gridCommandHandlerTest2, updateCntr=3, partitionState=MOVING,
> state=MOVING]] .. and so on
>
> I found this erroneous and can lead to further cluster index corruption,
> for example in case when only command OK result checked.
>
> If no objections would be here, i plan to inform about moving states as
> not OK exit code too.
>
>



-- 
Vladislav Pyatkov
Architect-Consultant "GridGain Rus" Llc.
+7-929-537-79-60


[jira] [Created] (IGNITE-12818) SoLinger is not set for reader-sockets in discovery

2020-03-20 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-12818:
--

 Summary: SoLinger is not set for reader-sockets in discovery
 Key: IGNITE-12818
 URL: https://issues.apache.org/jira/browse/IGNITE-12818
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


{noformat}
Thread [name="tcp-disco-client-message-worker-#29%DPL_GRID%DplGridNodeName%", 
id=543, state=RUNNABLE, blockCnt=0, waitCnt=109538]
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431)
at sun.security.ssl.OutputRecord.write(OutputRecord.java:417)
at 
sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:879)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:850)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
- locked sun.security.ssl.AppOutputStream@6e6441c6
at java.io.OutputStream.write(OutputStream.java:75)
at 
o.a.i.spi.discovery.tcp.TcpDiscoverySpi.writeToSocket(TcpDiscoverySpi.java:1613)
at 
o.a.i.spi.discovery.tcp.ServerImpl$ClientMessageWorker.processMessage(ServerImpl.java:7281)
at 
o.a.i.spi.discovery.tcp.ServerImpl$ClientMessageWorker.processMessage(ServerImpl.java:7156)
at 
o.a.i.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:7538)
at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
at 
o.a.i.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:7469)
at o.a.i.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)

Thread [name="grid-timeout-worker-#39%DPL_GRID%DplGridNodeName%", id=230, 
state=WAITING, blockCnt=49, waitCnt=902487]
Lock [object=java.util.concurrent.locks.ReentrantLock$NonfairSync@7dcea545, 
ownerName=tcp-disco-client-message-worker-#29%DPL_GRID%DplGridNodeName%, 
ownerId=543]
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:848)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:720)
at sun.security.ssl.SSLSocketImpl.sendAlert(SSLSocketImpl.java:2066)
at sun.security.ssl.SSLSocketImpl.warning(SSLSocketImpl.java:1893)
at sun.security.ssl.SSLSocketImpl.closeInternal(SSLSocketImpl.java:1656)
- locked sun.security.ssl.SSLSocketImpl@5c2090f8
at sun.security.ssl.SSLSocketImpl.close(SSLSocketImpl.java:1594)
at o.a.i.i.util.IgniteUtils.closeQuiet(IgniteUtils.java:4089)
at 
o.a.i.spi.discovery.tcp.TcpDiscoverySpi$SocketTimeoutObject.onTimeout(TcpDiscoverySpi.java:2462)
at 
o.a.i.i.processors.timeout.GridSpiTimeoutObject.onTimeout(GridSpiTimeoutObject.java:42)
at 
o.a.i.i.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:279)
at o.a.i.i.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
{noformat}
Need to use SoLinger for socket got through `sock = srvrSock.accept();` like it 
used in `TcpDiscoverySpi#createSocket`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12780) Deadlock between db-checkpoint-thread and checkpoint-runner

2020-03-12 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-12780:
--

 Summary: Deadlock between db-checkpoint-thread and 
checkpoint-runner
 Key: IGNITE-12780
 URL: https://issues.apache.org/jira/browse/IGNITE-12780
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


Look at this run:
https://ci.ignite.apache.org/buildConfiguration/IgniteTests24Java8_PdsIndexing/5121878?buildTab=log=3

{noformat}
"db-checkpoint-thread-#46926%db.IgniteSequentialNodeCrashRecoveryTest0%" #55580 
prio=5 os_prio=0 tid=0x7efb2000c800 nid=0x77e waiting on condition 
[0x7eff31add000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:178)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.fillCacheGroupState(GridCacheDatabaseSharedManager.java:4367)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:4147)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3728)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3617)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)



"checkpoint-runner-#46927%db.IgniteSequentialNodeCrashRecoveryTest0%" #55581 
prio=5 os_prio=0 tid=0x7efbd4009000 nid=0x77f waiting on condition 
[0x7eff317da000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xe5c23ed8> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.checkpointReadLock(GridCacheDatabaseSharedManager.java:1645)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.init0(GridCacheOffheapManager.java:1688)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.fullSize(GridCacheOffheapManager.java:2061)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.lambda$fillCacheGroupState$1(GridCacheDatabaseSharedManager.java:4336)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer$$Lambda$565/253081186.run(Unknown
 Source)
at 
org.apache.ignite.internal.util.IgniteUtils.lambda$wrapIgniteFuture$3(IgniteUtils.java:11392)
at 
org.apache.ignite.internal.util.IgniteUtils$$Lambda$561/471384364.run(Unknown 
Source)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12689) Partitions should become owned after a checkpoint, regardless of a topology change. Nevertheless a rebalance is not required.

2020-02-17 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-12689:
--

 Summary: Partitions should become owned after a checkpoint, 
regardless of a topology change. Nevertheless a rebalance is not required.
 Key: IGNITE-12689
 URL: https://issues.apache.org/jira/browse/IGNITE-12689
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


After checkpoint completed we try to own all partitions of rebalanced cache 
(see WalStateManager#onGroupRebalanceFinished):
{code}
cpFut.futureFor(FINISHED).listen(new IgniteInClosureX() {
@Override public void applyx(IgniteInternalFuture future) {
if (X.hasCause(future.error(), NodeStoppingException.class))
return;

for (Integer grpId0 : groupsToEnable) {
try {
cctx.database().walEnabled(grpId0, true, true);
}
catch (Exception e) {
if (!X.hasCause(e, NodeStoppingException.class))
throw e;
}

CacheGroupContext grp = cctx.cache().cacheGroup(grpId0);

if (grp != null)
grp.topology().ownMoving(lastGroupTop);
else if (log.isDebugEnabled())
log.debug("Cache group was destroyed before checkpoint 
finished, [grpId=" + grpId0 + ']');
}

if (log.isDebugEnabled())
log.debug("Refresh partitions due to rebalance finished");

// Trigger exchange for switching to ideal assignment when all nodes 
are ready.
cctx.exchange().refreshPartitions();
}
});
{code}
But in case of topology changes during checkpoint pass, we are need to invoke 
rebalance manually (see GridDhtPartitionTopologyImpl#ownMoving):
{code}
if (lastAffChangeVer.compareTo(rebFinishedTopVer) > 0) {
if (log.isInfoEnabled()) {
log.info("Affinity topology changed, no MOVING partitions will be owned 
" +
"[rebFinishedTopVer=" + rebFinishedTopVer +
", lastAffChangeVer=" + lastAffChangeVer + "]");
}
{code}

That will be hardly ever happends, but if it was we restarted whole rebalance 
(over all partitions).I am advice start rebalance only when it needed and mark 
partitions as own if it definitely not need (when change of topology does not 
fluent to assignment).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12671) Update of partition's states can stuck when rebalance completed during exchange

2020-02-12 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-12671:
--

 Summary: Update of partition's states can stuck when rebalance 
completed during exchange
 Key: IGNITE-12671
 URL: https://issues.apache.org/jira/browse/IGNITE-12671
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


Single message is ignoring during exchange:

{code:java|GridCachePartitionExchangeManager.java}
if (exchangeInProgress()) {
  if (log.isInfoEnabled())
log.info("Ignore single message without exchange id (there is exchange in 
progress) [nodeId=" + node.id() + "]");

  return;
}
{code}

By thew reason the message does not be received after exchange. As result 
waiting ideal assignment stuck until next rebalance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12522) Extend test coverage [IGNITE-12104] Check deployment from cache before to load it from local or version storage

2020-01-09 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-12522:
--

 Summary: Extend test coverage [IGNITE-12104] Check deployment from 
cache before to load it from local or version storage
 Key: IGNITE-12522
 URL: https://issues.apache.org/jira/browse/IGNITE-12522
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12290) Re-balance fully restart for case when WAL disabled

2019-10-14 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-12290:
--

 Summary: Re-balance fully restart for case when WAL disabled
 Key: IGNITE-12290
 URL: https://issues.apache.org/jira/browse/IGNITE-12290
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


Re-balance will restart by any topology event. In case when WAL was disabled 
new re-balance clearing all re-balanced partition and start over.
Data about re-balanced partitions should stored and migrated when re-balance 
cancelled and started again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12104) Check deployment from cache before to load it from local or version storage

2019-08-26 Thread Vladislav Pyatkov (Jira)
Vladislav Pyatkov created IGNITE-12104:
--

 Summary: Check deployment from cache before to load it from local 
or version storage
 Key: IGNITE-12104
 URL: https://issues.apache.org/jira/browse/IGNITE-12104
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov
 Fix For: 2.8


{noformat}
"pub-#3217917%DPL_GRID%DplGridNodeName%" #3223897 prio=5 os_prio=0 
tid=0x7f47a414f800 nid=0x1dca46 runnable [0x7eaca31b]
java.lang.Thread.State: RUNNABLE
at java.lang.String.concat(String.java:2034)
at java.net.URLClassLoader$1.run(URLClassLoader.java:364)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
- locked <0x7f4c8dd6c888> (a java.lang.Object)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
- locked <0x7f4c8db4f530> (a java.lang.Object)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
- locked <0x7f4ba0138340> (a 
com.sbt.core.envelope.container.loader.NamedClassLoader)
at java.lang.ClassLoader.loadClass(ClassLoader.java:411)
- locked <0x7f4ba012a800> (a 
com.sbt.core.envelope.container.loader.ImplClassLoader)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at 
org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore.getDeployment(GridDeploymentLocalStore.java:191)
at 
org.apache.ignite.internal.managers.deployment.GridDeploymentManager.getGlobalDeployment(GridDeploymentManager.java:462)
at 
org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:983)
at 
org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1921)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IGNITE-11844) Should to filtered indexes by cache name instead of validate all caches in group

2019-05-13 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11844:
--

 Summary: Should to filtered indexes by cache name instead of 
validate all caches in group
 Key: IGNITE-11844
 URL: https://issues.apache.org/jira/browse/IGNITE-11844
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


control.sh utility method validate_indexes checks all indexes of all caches in 
group. Just do specify one caches (from generic group) in caches list, then all 
indexes from all caches (that group) will be start to validate and this can 
consume more time, than checks indexes only specified caches.
Will be correct to validate only indexes of specified caches, for the purpose 
need to filtered caches, by list from parameters, in shared group.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11834) Confusing message on rebalance

2019-05-06 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11834:
--

 Summary: Confusing message on rebalance
 Key: IGNITE-11834
 URL: https://issues.apache.org/jira/browse/IGNITE-11834
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


When rebalance was scheduled by caches print message like
{noformat}
Rebalancing scheduled [order=[c8], top=AffinityTopologyVersion [topVer=6, 
minorTopVer=0], force=true, evt=NODE_JOINED, 
node=9b5ff0c4-cfd7-489d-a02d-470342d5]
{noformat}
but force flag ({{force=true}}) does not mean that is force rebalance.
I suggest log force flag correct by the value of {{forcePreload}}, and change 
name of available flag. For example {{exchnageRebalance}} or {{exchnage}} 
according to its meaning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11763) GridP2PComputeWithNestedEntryProcessorTest failed on TC

2019-04-17 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11763:
--

 Summary: GridP2PComputeWithNestedEntryProcessorTest failed on TC
 Key: IGNITE-11763
 URL: https://issues.apache.org/jira/browse/IGNITE-11763
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


Test failed with exception:

{noformat}
[2019-04-16 19:50:21,725][ERROR][main][root] Test failed.
javax.cache.CacheException: class org.apache.ignite.IgniteCheckedException: 
Failed to execute query on node [query=GridCacheQueryBean 
[qry=GridCacheQueryAdapter [type=SCAN, clsName=null, clause=null, 
filter=org.apache.ignite.tests.p2p.pedicates.CompositePredicate@2b939b78, 
transform=null, part=null, incMeta=false, metrics=GridCacheQueryMetricsAdapter 
[minTime=9223372036854775807, maxTime=0, sumTime=0, avgTime=0.0, execs=0, 
completed=0, fails=0], pageSize=1024, timeout=0, incBackups=false, 
forceLocal=false, dedup=false, prj=null, keepBinary=true, 
subjId=008694d2-98a2-4add-9ccc-b7674e6d717f, taskHash=0, mvccSnapshot=null, 
dataPageScanEnabled=null], rdc=null, trans=null], 
nodeId=8575809f-3373-4c47-8684-a318c221]
at 
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1318)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryFutureAdapter.next(GridCacheQueryFutureAdapter.java:168)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$5.onHasNext(GridCacheDistributedQueryManager.java:643)
at 
org.apache.ignite.internal.util.GridCloseableIteratorAdapter.hasNextX(GridCloseableIteratorAdapter.java:53)
at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.hasNext(GridIteratorAdapter.java:45)
at 
org.apache.ignite.internal.processors.cache.QueryCursorImpl.getAll(QueryCursorImpl.java:123)
at 
org.apache.ignite.p2p.GridP2PComputeWithNestedEntryProcessorTest.scanByCopositeFirstPredicate(GridP2PComputeWithNestedEntryProcessorTest.java:205)
at 
org.apache.ignite.p2p.GridP2PComputeWithNestedEntryProcessorTest.scnaCacheData(GridP2PComputeWithNestedEntryProcessorTest.java:188)
at 
org.apache.ignite.p2p.GridP2PComputeWithNestedEntryProcessorTest.processTest(GridP2PComputeWithNestedEntryProcessorTest.java:140)
at 
org.apache.ignite.p2p.GridP2PComputeWithNestedEntryProcessorTest.testContinuousMode(GridP2PComputeWithNestedEntryProcessorTest.java:105)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.apache.ignite.testframework.junits.GridAbstractTest$6.run(GridAbstractTest.java:2044)
at java.lang.Thread.run(Thread.java:748)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to execute 
query on node [query=GridCacheQueryBean [qry=GridCacheQueryAdapter [type=SCAN, 
clsName=null, clause=null, 
filter=org.apache.ignite.tests.p2p.pedicates.CompositePredicate@2b939b78, 
transform=null, part=null, incMeta=false, metrics=GridCacheQueryMetricsAdapter 
[minTime=9223372036854775807, maxTime=0, sumTime=0, avgTime=0.0, execs=0, 
completed=0, fails=0], pageSize=1024, timeout=0, incBackups=false, 
forceLocal=false, dedup=false, prj=null, keepBinary=true, 
subjId=008694d2-98a2-4add-9ccc-b7674e6d717f, taskHash=0, mvccSnapshot=null, 
dataPageScanEnabled=null], rdc=null, trans=null], 
nodeId=8575809f-3373-4c47-8684-a318c221]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryFutureAdapter.onPage(GridCacheQueryFutureAdapter.java:384)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager.processQueryResponse(GridCacheDistributedQueryManager.java:402)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager.access$000(GridCacheDistributedQueryManager.java:64)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$1.apply(GridCacheDistributedQueryManager.java:94)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$1.apply

[jira] [Created] (IGNITE-11734) IgniteCache.replace(k, v, nv) requires classes when element is null or old value - null

2019-04-12 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11734:
--

 Summary: IgniteCache.replace(k, v, nv) requires classes when 
element is null or old value - null
 Key: IGNITE-11734
 URL: https://issues.apache.org/jira/browse/IGNITE-11734
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11698) Issue with P2P class loader

2019-04-08 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11698:
--

 Summary: Issue with P2P class loader
 Key: IGNITE-11698
 URL: https://issues.apache.org/jira/browse/IGNITE-11698
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


Sometimes classes of remote query filter loading incorrect.
{noformat}
Exception in thread "main" javax.cache.CacheException: class 
org.apache.ignite.IgniteCheckedException: Failed to execute query on node 
[query=GridCacheQueryBean [qry=GridCacheQueryAdapter [type=SCAN, clsName=null, 
clause=null, filter=CompositePredicate@7ba93755, transform=null, part=null, 
incMeta=false, metrics=GridCacheQueryMetricsAdapter 
[minTime=9223372036854775807, maxTime=0, sumTime=0, avgTime=0.0, execs=0, 
completed=0, fails=0], pageSize=1024, timeout=0, incBackups=false, 
forceLocal=false, dedup=false, prj=null, keepBinary=false, 
subjId=f4870536-0f68-4e19-a87c-3862cbd30497, taskHash=0, mvccSnapshot=null, 
dataPageScanEnabled=null], rdc=null, trans=null], 
nodeId=40a03665-a203-4dc0-9a79-9aaede7a5dfa]
at 
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1318)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryFutureAdapter.next(GridCacheQueryFutureAdapter.java:168)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$5.onHasNext(GridCacheDistributedQueryManager.java:643)
at 
org.apache.ignite.internal.util.GridCloseableIteratorAdapter.hasNextX(GridCloseableIteratorAdapter.java:53)
at 
org.apache.ignite.internal.util.GridCloseableIteratorAdapter.nextX(GridCloseableIteratorAdapter.java:38)
at 
org.apache.ignite.internal.util.lang.GridIteratorAdapter.next(GridIteratorAdapter.java:35)
at 
org.apache.ignite.internal.processors.cache.AutoClosableCursorIterator.next(AutoClosableCursorIterator.java:59)
at ClientP2P.query(ClientP2P.java:61)
at ClientP2P.main(ClientP2P.java:45)
Caused by: class org.apache.ignite.IgniteCheckedException: Failed to execute 
query on node [query=GridCacheQueryBean [qry=GridCacheQueryAdapter [type=SCAN, 
clsName=null, clause=null, filter=CompositePredicate@7ba93755, transform=null, 
part=null, incMeta=false, metrics=GridCacheQueryMetricsAdapter 
[minTime=9223372036854775807, maxTime=0, sumTime=0, avgTime=0.0, execs=0, 
completed=0, fails=0], pageSize=1024, timeout=0, incBackups=false, 
forceLocal=false, dedup=false, prj=null, keepBinary=false, 
subjId=f4870536-0f68-4e19-a87c-3862cbd30497, taskHash=0, mvccSnapshot=null, 
dataPageScanEnabled=null], rdc=null, trans=null], 
nodeId=40a03665-a203-4dc0-9a79-9aaede7a5dfa]
at 
org.apache.ignite.internal.processors.cache.query.GridCacheQueryFutureAdapter.onPage(GridCacheQueryFutureAdapter.java:384)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager.processQueryResponse(GridCacheDistributedQueryManager.java:402)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager.access$000(GridCacheDistributedQueryManager.java:64)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$1.apply(GridCacheDistributedQueryManager.java:94)
at 
org.apache.ignite.internal.processors.cache.query.GridCacheDistributedQueryManager$1.apply(GridCacheDistributedQueryManager.java:92)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1126)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$800(GridCacheIoManager.java:109)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1691)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1561)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4100(GridIoManager.java:127)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2753)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1521)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4400(GridIoManager.java:127)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1490)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: class org.apache.ignite.IgniteException: BinaryPred

Re: in-memory compression

2019-04-01 Thread Vladislav Pyatkov
Hi,

I looked you report in site and have some questions.

What means the thesis?
*Your dataset is fit for columnar compression, ea. repeating values and/or
a timeseries-like*
*dataset.*
As I undersent, you are overwrite file io factory (TBLigniteFileIoFactory).
But in this lavel available only bytes, not data entries.

What caused the performance drop to 31% in your test?

On Mon, Apr 1, 2019 at 8:54 AM <999.comput...@gmail.com> wrote:

> Hi developers,
>
> We have released TBLignite compression, an Ignite plugin that provides
> in-memory compression: http://tblcore.com/download/.
> Compression rates are similar to those of SQL Server 2016 columnar
> compression (10-20x) and our testing show significant performance
> improvements for datasets that are larger than the available amount of
> memory.
>
> Currently we are at version 0.1, but we are working on a thoroughly tested
> 1.0 release so everyone can scratch their compression itch. A couple of
> questions regarding this:
> 1) Are there any Ignite regression tests that you can recommend for 3rd
> party software? Basically we are looking for a way to test all the possible
> page formats that we need to support.
> 2) Once we release version 1.0, we would like TBLignite to be added to the
> Ignite "3rd party binary" page. Who decides what gets on this page?
> 3) And are there any acceptance criteria or tests that need to be passed?
>
> We would like to hear from you.
> Pascal Schuchhard
> TBLcore
>
>

-- 
Vladislav Pyatkov


[jira] [Created] (IGNITE-11643) Optimize GC pressure on GridDhtPartitionTopologyImpl#updateRebalanceVersion

2019-03-27 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11643:
--

 Summary: Optimize GC pressure on 
GridDhtPartitionTopologyImpl#updateRebalanceVersion
 Key: IGNITE-11643
 URL: https://issues.apache.org/jira/browse/IGNITE-11643
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


Have surplused HashMap in the method 
{{GridDhtPartitionTopologyImpl#updateRebalanceVersion}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11474) Add possibility to run idle_verify in not idle cluster

2019-03-05 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11474:
--

 Summary: Add possibility to run idle_verify in not idle cluster
 Key: IGNITE-11474
 URL: https://issues.apache.org/jira/browse/IGNITE-11474
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


We are capable to make sort of READ_ONLY mode for blocking all data load.
Using this mode we should to add specific parameter for idle_verify, which 
exclude data load and after cluster switched to READ_ONLY continue the task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11425) Log information about inaccessible nodes through Communication

2019-02-26 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11425:
--

 Summary: Log information about inaccessible nodes through 
Communication
 Key: IGNITE-11425
 URL: https://issues.apache.org/jira/browse/IGNITE-11425
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


In case of long getting communication TCP client (longe than this 
CONNECTION_ESTABLISH_THRESHOLD_MS = 100) message will printed:

{noformat}
[sys-#20167%dht.CacheGetReadFromBackupFailoverTest0%][TcpCommunicationSpi] TCP 
client created [client=GridTcpNioCommunicationClient 
[ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker 
[super=AbstractNioClientWorker [idx=3, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, 
bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-3, 
igniteInstanceName=dht.CacheGetReadFromBackupFailoverTest0, finished=false, 
heartbeatTs=1550512236151, hashCode=140561231, interrupted=false, 
runner=grid-nio-worker-tcp-comm-3-#20147%dht.CacheGetReadFromBackupFailoverTest0%]]],
 writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], 
inRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=0, 
sentCnt=0, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode 
[id=8a660330-6ddb-4031-b955-4cb4f4b2, addrs=ArrayList [127.0.0.1], 
sockAddrs=HashSet [/127.0.0.1:47502], discPort=47502, order=5, intOrder=4, 
lastExchangeTime=1550512235890, loc=false, ver=2.8.0#20190218-sha1:29232e37, 
isClient=false], connected=false, connectCnt=2, queueLimit=4096, reserveCnt=2, 
pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=0, 
resendCnt=0, rcvCnt=0, sentCnt=0, reserved=true, lastAck=0, nodeLeft=false, 
node=TcpDiscoveryNode [id=8a660330-6ddb-4031-b955-4cb4f4b2, addrs=ArrayList 
[127.0.0.1], sockAddrs=HashSet [/127.0.0.1:47502], discPort=47502, order=5, 
intOrder=4, lastExchangeTime=1550512235890, loc=false, 
ver=2.8.0#20190218-sha1:29232e37, isClient=false], connected=false, 
connectCnt=2, queueLimit=4096, reserveCnt=2, pairedConnections=false], 
super=GridNioSessionImpl [locAddr=/127.0.0.1:38770, rmtAddr=/127.0.0.1:45212, 
createTime=1550512236151, closeTime=0, bytesSent=0, bytesRcvd=0, bytesSent0=0, 
bytesRcvd0=0, sndSchedTime=1550512236151, lastSndTime=1550512236151, 
lastRcvTime=1550512236151, readsPaused=false, 
filterChain=FilterChain[filters=[GridNioCodecFilter 
[parser=org.apache.ignite.internal.util.nio.GridDirectParser@d240a48, 
directMode=true], GridConnectionBytesVerifyFilter], accepted=false, 
markedForClose=false]], super=GridAbstractCommunicationClient 
[lastUsed=1550512236151, closed=false, connIdx=0]], duration=211ms]
{noformt}

but in some cases we can not to get client during time out, and the message 
reduce to
 
TCP client created [client=null, duration=60004 ms]

According to the message you cannot understand which nodes were inaccessible.
Moreover, wants to see the connection trouble earlier than the 10 minutes after.

Should to log ip/host for clear understanding what was the node and log WARN 
message each time when need to increase timeout:
{code}
if (lastWaitingTimeout < 6)
  lastWaitingTimeout *= 2;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11291) Assertion error in time of rebalance completion lead to to critical failure node

2019-02-11 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11291:
--

 Summary: Assertion error in time of rebalance completion lead to 
to critical failure node
 Key: IGNITE-11291
 URL: https://issues.apache.org/jira/browse/IGNITE-11291
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


{noformat}
java.lang.AssertionError: Got removed exception on entry with dht local 
candidate: [IgniteTxEntry [key=KeyCacheObjectImpl [part=11859, 
val=3338011748811769508, hasValBytes=true], cacheId=-313938805, 
txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=11859, val=3338011748811769508, 
hasValBytes=true], cacheId=-313938805], val=[op=UPDATE, 
val=com.sbt.bm.ucp.common.dpl.model.party.hashcodes.DPartyHashCode_DPL_PROXY 
[idHash=1985166276, hash=1530579783, colocationKey=11859, sync Flag=null, 
hashUpdateTime=Sat Feb 09 20:16:55 MSK 2019, lastChangeDate=1549732615042, 
partition_DPL_id=4, ownerId=ucp, externalSystem=21, 
serializedHashCodesMap={"Address":["581a1367d4f50172c41168747c99105df1d3a4c6","24d3413bc5d3151dd784e088eaf4603e8e86c40b","fd7ce1449cd539a5976d358fb15060820c630f36"],"BirthDate":["a5cc7c1ac9821a6b19a455ae7eac55f4a4475bd7","415fe3810f8c4555a7188fe62275b34cdd1384cd","1311875998ef2aaccc5860ac6f991107ad4fa558"],"BirthPlace":["2fb5ea4cf7e9cfbdf77baae18f26804a10f51d57","76ddc5f2d959d27044c180957d9d0aed7369c0a2"],"Gender":["ad99678bb0e6de91d6a2ce620ba4fa98206aa9b9"],"IndividualIdentification":["c5b73f0087461fa4b980362344d38afddd1677b6","157952428fbe8c5b80da4db12c945cb4ad1f33f8","4d2257f62b1aa9cf290728d147776dd569b226a7"],"IndividualName":["256dba12a3b9d95dc1ada524d1a67cb590eb3ec2","2302eda39fb8f3751f3646542ad54ae717f5bc2c","9b8084de661530d5dc6ea140df793f4aa417a114"],"PartyToPartyGroup":["8d41984edf2e9ff916da0a74b884554cc2fceca9"],"PhoneNumber":["c89cbfb741ba5ea0fa13091a1cc7591c69374c0c","149081529b36fe29bc72bf88c66e51bdab3ae2d7","3c6bc9cc9e47ecd2f79b1506a22799ba69b95227","389830cd794fd6748f46ab7d8d878a2fec75cf63","9d5c3ecd5c951a16745cefd771b79c17bbc8c665","dfd8959391fd18e89505bfde31e0aa9a80513fec"],"Individual":["82dc325513d53990709bafbf1a334b602025ba17","544aab947d3a33787c7feffc93c4a4572e957dca","2728a40ce9759fa7110bc5d0d3c8b5c193210a2d"]},
 uid=null, isDeleted=false, isImmutable=false, checksum=null, 
id=3338011748811769508, externalClientId=75662144, 
colocationId=1216693183554769678]], prevVal=[op=NOOP, val=null], 
oldVal=[op=NOOP, val=null], entryProcessorsCol=null, ttl=-1, 
conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, 
filters=CacheEntryPredicate[] [], filtersPassed=false, filtersSet=false, 
entry=GridCacheMapEntry [key=KeyCacheObjectImpl [part=11859, 
val=3338011748811769508, hasValBytes=true], val=null, startVer=1551196730698, 
ver=GridCacheVersion [topVer=156972865, order=1546089814280, nodeOrder=65], 
hash=777160356, extras=GridCacheObsoleteEntryExtras 
[obsoleteVer=GridCacheVersion [topVer=2147483647, order=0, nodeOrder=0]], 
flags=2]GridDistributedCacheEntry [super=]GridDhtCacheEntry [rdrs=ReaderId[] 
[], part=11859, super=], prepared=1, locked=false, nodeId=null, 
locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=2, 
partUpdateCntr=0, serReadVer=GridCacheVersion[topVer=156972865, 
order=1546089814280, nodeOrder=65], xidVer=null]]
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.checkReadConflict(GridDhtTxPrepareFuture.java:1164)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.prepare0(GridDhtTxPrepareFuture.java:1223)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture.access$000(GridDhtTxPrepareFuture.java:109)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture$2.apply(GridDhtTxPrepareFuture.java:701)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxPrepareFuture$2.apply(GridDhtTxPrepareFuture.java:696)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblock(GridFutureAdapter.java:347)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.unblockAll(GridFutureAdapter.java:335)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:495)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:474)
at 
org.apache.ignite.internal.processors.c

Re: Running single test multiple times on TeamCity

2019-02-10 Thread Vladislav Pyatkov
Hi,

I think more test falling on TC due to context in which running.
If you added it in debug suite, it can stop failing anymore like local run.

On Mon, Feb 11, 2019 at 9:37 AM Павлухин Иван  wrote:

> Hi,
>
> During a couple of last weeks I was fixing several flaky tests.
> Sometimes it was quite hard to reproduce a test locally. So, one
> option was running a particular test on TC several times in a row. To
> setup such run I did code modifications in several places.
>
> I thought about how to simplify the thing. And I came up with some
> sort of solution which I would like to share. Basically it is custom
> junit runner DebugSuite and a configuration annotation
> DebugSuite.Config which allows to choose a method to run and number of
> executions. You can see a draft in PR [1].
>
> As always there are several options to solve a problem. One
> alternative way is creating something similar to parameterized build
> job Jenkins employs [2] (I have not checked for TC analog yet) and
> using maven features to run single test repeatedly (have not checked
> as well). But all in all we need to answer following questions:
> 1. Do we need such tool? (Or perhaps we already have something and
> there is no need to reinvent the wheel.)
> 2. What is the best way for us to implement the tool?
>
> [1] https://github.com/apache/ignite/pull/6076
> [2] https://wiki.jenkins.io/display/JENKINS/Parameterized+Build
>
> --
> Best regards,
> Ivan Pavlukhin
>


-- 
Vladislav Pyatkov


[jira] [Created] (IGNITE-11270) Batch join to topology

2019-02-08 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11270:
--

 Summary: Batch join to topology
 Key: IGNITE-11270
 URL: https://issues.apache.org/jira/browse/IGNITE-11270
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


In first cluster start many nodes will trying to join. This case leed to many 
time consuming join process (TcpDiscoveryJoinRequestMessage -> 
TcpDiscoveryNodeAddedMessage -> TcpDiscoveryNodeAddFinishedMessage).
Finally, collect of topology required to much time.
We can to merge some of TcpDiscoveryJoinRequestMessage and join to topology as 
one batch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11269) Optimize node join to topology

2019-02-08 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11269:
--

 Summary: Optimize node join to topology
 Key: IGNITE-11269
 URL: https://issues.apache.org/jira/browse/IGNITE-11269
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


When coordinator recived TcpDiscoveryJoinRequestMessage appropriate 
TcpDiscoveryNodeAddedMessage had been sent in should not to process new recived 
TcpDiscoveryJoinRequestMessage until first joined node does not complitly 
joined (TcpDiscoveryNodeAddFinishedMessage was sented).
This solution allow to faster join node to topology without blocking ring of 
huge TcpDiscoveryNodeAddedMessage's.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11262) Compression on Discovery data bag

2019-02-08 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11262:
--

 Summary: Compression on Discovery data bag
 Key: IGNITE-11262
 URL: https://issues.apache.org/jira/browse/IGNITE-11262
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


Size of GridComponetns data may increase significantly in large deployment.

Examples:
1) In case of more then 3K caches with QueryEntry configured - size of 
{{DiscoveryDataBag}}{{GridCacheProcessor}} data bag consume more then 20 Mb
2) If cluster contain more then 13K objects - 
{{GridMarshallerMappingProcessor}} size more then 1 Mb
3) Cluster with more then 3К types in binary format - 
{{CacheObjectBinaryProcessorImpl}} size can grow to 10Mb

The data in most cases contain duplicated structure and simple zip compression 
can led to seriously reduce size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11120) Remove static fields from GridDhtLockFuture

2019-01-29 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11120:
--

 Summary: Remove static fields from GridDhtLockFuture
 Key: IGNITE-11120
 URL: https://issues.apache.org/jira/browse/IGNITE-11120
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


{code}

/** Logger reference. */
private static final AtomicReference logRef = new 
AtomicReference<>();

/** Logger. */
private static IgniteLogger log;

/** Logger. */
private static IgniteLogger msgLog;

{code}

 

In that case we can to miss log messages, when restart node without restart JVM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11023) Processing data bag on GridMarshallerMappingProcessor consume many time

2019-01-22 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-11023:
--

 Summary: Processing data bag on GridMarshallerMappingProcessor 
consume many time
 Key: IGNITE-11023
 URL: https://issues.apache.org/jira/browse/IGNITE-11023
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


I have measure a processing data bag time on each join node and discovered what 
GridMarshallerMappingProcessor consume more time then others.

It slow down on collecting topology, in particular case if joining some node 
simultaneous.

 

{noformat}

2019-01-11 20:35:01.207 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] Starting 
processing discovery data bag
2019-01-11 20:35:01.207 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component ClusterProcessor processed joining node data bag in 0ms
2019-01-11 20:35:01.207 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component IgnitePluginProcessor processed joining node data bag in 0ms
2019-01-11 20:35:01.208 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component CacheObjectBinaryProcessorImpl processed joining node data bag in 0ms
2019-01-11 20:35:01.208 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component IgniteAuthenticationProcessor processed joining node data bag in 0ms
2019-01-11 20:35:01.219 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component GridCacheProcessor processed joining node data bag in 10ms
2019-01-11 20:35:01.219 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component GridQueryProcessor processed joining node data bag in 0ms
2019-01-11 20:35:01.219 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component GridContinuousProcessor processed joining node data bag in 0ms
2019-01-11 20:35:01.463 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component GridMarshallerMappingProcessor processed joining node data bag in 
242ms
2019-01-11 20:35:01.463 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] Total 
time of processing discovery data bag: 252ms
2019-01-11 20:35:01.780 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] Starting 
processing discovery data bag
2019-01-11 20:35:01.781 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component ClusterProcessor processed joining node data bag in 0ms
2019-01-11 20:35:01.781 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component IgnitePluginProcessor processed joining node data bag in 0ms
2019-01-11 20:35:01.781 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component CacheObjectBinaryProcessorImpl processed joining node data bag in 0ms
2019-01-11 20:35:01.781 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component IgniteAuthenticationProcessor processed joining node data bag in 0ms
2019-01-11 20:35:01.791 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component GridCacheProcessor processed joining node data bag in 10ms
2019-01-11 20:35:01.792 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component GridQueryProcessor processed joining node data bag in 0ms
2019-01-11 20:35:01.792 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component GridContinuousProcessor processed joining node data bag in 0ms
2019-01-11 20:35:02.134 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component GridMarshallerMappingProcessor processed joining node data bag in 
338ms
2019-01-11 20:35:02.134 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] Total 
time of processing discovery data bag: 348ms
2019-01-11 20:35:02.326 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] Starting 
processing discovery data bag
2019-01-11 20:35:02.326 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component ClusterProcessor processed joining node data bag in 0ms
2019-01-11 20:35:02.326 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component IgnitePluginProcessor processed joining node data bag in 0ms
2019-01-11 20:35:02.326 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component CacheObjectBinaryProcessorImpl processed joining node data bag in 0ms
2019-01-11 20:35:02.326 [INFO 
][tcp-disco-msg-worker-#2%NodeName%][o.a.i.i.m.d.GridDiscoveryManager] 
Component IgniteAuthenticationProcessor processed joining node data bag in 0ms
2019-01-11 20:35:02.337 [INFO 
][tcp-disco-msg-worker-#2%NodeName

[jira] [Created] (IGNITE-10933) Node may hang on join to topology and not move forward

2019-01-14 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-10933:
--

 Summary: Node may hang on join to topology and not move forward
 Key: IGNITE-10933
 URL: https://issues.apache.org/jira/browse/IGNITE-10933
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


Several nodes join to topology simultaneously and hang on a long time.

That can be on first start all cluster nodes or join nodes to completed 
topology.

In the logs of problem nodes can see messages:

{noformat}

2019-01-11 18:37:39.296 [WARN ][Thread-56][o.a.i.s.d.tcp.TcpDiscoverySpi] Node 
has not been connected to topology and will repeat join process. Check remote 
nodes logs for possible error messages. Note that large topology may require sig
nificant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration 
property if getting this message on the starting nodes [networkTimeout=5000]

 2019-01-11 18:43:09.374 [WARN ][Thread-56][o.a.i.s.d.tcp.TcpDiscoverySpi] Node 
has not been connected to topology and will repeat join process. Check remote 
nodes logs for possible error messages. Note that large topology may require sig
nificant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration 
property if getting this message on the starting nodes [networkTimeout=5000]

...

{noformat}

and this long time without others.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10108) Non-static class is passed between cluster nodes

2018-11-01 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-10108:
--

 Summary: Non-static class is passed between cluster nodes
 Key: IGNITE-10108
 URL: https://issues.apache.org/jira/browse/IGNITE-10108
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


Need to avoid passing anonymous classes on compute, because this lead to 
serialize whole test-class context.
By the reason need to refactor that place

{code}

ignite.compute().withTimeout(5_000).broadcastAsync(new IgniteRunnable() {

...

});

{code}

in method \{{GridCommonAbstractTest#manualCacheRebalancing}} into private 
static nested class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10092) Race in partition state when checkpoint started at the middle of starts caches

2018-10-31 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-10092:
--

 Summary: Race in partition state when checkpoint started at the 
middle of starts caches
 Key: IGNITE-10092
 URL: https://issues.apache.org/jira/browse/IGNITE-10092
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-10028) Incorrect handling of page on replacement

2018-10-26 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-10028:
--

 Summary: Incorrect handling of page on replacement
 Key: IGNITE-10028
 URL: https://issues.apache.org/jira/browse/IGNITE-10028
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


We can to pass incorrect page version to IgniteCacheSnapshotManager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9934) Improve logging on partition map exchange

2018-10-18 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-9934:
-

 Summary: Improve logging on partition map exchange
 Key: IGNITE-9934
 URL: https://issues.apache.org/jira/browse/IGNITE-9934
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


Partition Map Exchange (PME) is a cluster wide process, be the reason it does 
not completed before then each node do not done its part of job.

Coordinator, as a not witch managed the process, can to print quantity nodes 
finished its stage of PME and other than, which not yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Please review my PR 4968 (IGNITE-9738)

2018-10-18 Thread Vladislav Pyatkov
D.Govoruchin,

I tried to correct all your comments and TC in correct state.
Please, review my changes.

-- 
Vladislav Pyatkov


[jira] [Created] (IGNITE-9885) Issue in termination of GridWorkerFuture

2018-10-15 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-9885:
-

 Summary: Issue in termination of GridWorkerFuture
 Key: IGNITE-9885
 URL: https://issues.apache.org/jira/browse/IGNITE-9885
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


Can to start a closure through method like 
{{GridClosureProcessor#runLocalSafe(java.lang.Runnable)}}

but does not possible to wait termination of the task after cancellation.

For understanding need to look at {{GridWorkerFuture.cancel}}. The method affix 
an {{interrupted}} flag on executed thread, but not wait task termination. 
Having an instance of GridWorkerFuture, you can not to await real termination 
of task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9738) Client node can suddenly fail on start

2018-09-28 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-9738:
-

 Summary: Client node can suddenly fail on start
 Key: IGNITE-9738
 URL: https://issues.apache.org/jira/browse/IGNITE-9738
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


If client joining to large topology it can to spend some time on waiting 
{{TcpDiscoveryNodeAddFinishedMessage}}, but in that time it can not to send 
{{TcpDiscoveryClientMetricsUpdateMessage.}} By that reason server can to reset 
client from topology.

We should to sent {{TcpDiscoveryClientMetricsUpdateMessage as soon as 
possible}} without, waiting finish of join procedure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9707) TouchedExpiryPolicy with persistent on atomic cache, update TTL without lock

2018-09-26 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-9707:
-

 Summary: TouchedExpiryPolicy with persistent on atomic cache, 
update TTL without lock
 Key: IGNITE-9707
 URL: https://issues.apache.org/jira/browse/IGNITE-9707
 Project: Ignite
  Issue Type: Test
Reporter: Vladislav Pyatkov
 Attachments: AtomicCacheWithTtlTest.java

{noformat}

[2018-09-26 
18:06:29,882][ERROR][sys-stripe-0-#86%internal.AtomicCacheWithTtlTest2%][GridCacheIoManager]
 Failed to process message [senderId=949211e1-5e2a-41fe-b66b-195ae3300033, 
messageType=class 
o.a.i.i.processors.cache.distributed.near.GridNearSingleGetRequest]
java.lang.AssertionError
 at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1247)
 at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:1528)
 at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:352)
 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3605)
 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.storeValue(GridCacheMapEntry.java:3581)
 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateTtl(GridCacheMapEntry.java:2468)
 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateTtl(GridCacheMapEntry.java:2444)
 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerGet0(GridCacheMapEntry.java:680)
 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerGetVersioned(GridCacheMapEntry.java:554)
 at 
org.apache.ignite.internal.processors.cache.GridCacheAdapter.getAllAsync0(GridCacheAdapter.java:1994)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.getDhtAllAsync(GridDhtCacheAdapter.java:781)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtGetSingleFuture.getAsync(GridDhtGetSingleFuture.java:360)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtGetSingleFuture.map0(GridDhtGetSingleFuture.java:254)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtGetSingleFuture.map(GridDhtGetSingleFuture.java:237)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtGetSingleFuture.init(GridDhtGetSingleFuture.java:161)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.getDhtSingleAsync(GridDhtCacheAdapter.java:878)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheAdapter.processNearSingleGetRequest(GridDhtCacheAdapter.java:893)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$300(GridDhtAtomicCache.java:130)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:252)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$4.apply(GridDhtAtomicCache.java:247)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1056)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:581)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:380)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:306)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:101)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:295)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
 at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:496)
 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
 at java.lang.Thread.run(Thread.java:745)

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-9448) Change ZooKeeper version to 3.4.13

2018-08-31 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-9448:
-

 Summary: Change ZooKeeper version to 3.4.13
 Key: IGNITE-9448
 URL: https://issues.apache.org/jira/browse/IGNITE-9448
 Project: Ignite
  Issue Type: Test
  Components: zookeeper
Reporter: Vladislav Pyatkov


Should to change ZooKeeper dependency to last release - just now it is 3.4.13.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8965) Add logs in SegmentReservationStorage on exchange process

2018-07-09 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8965:
-

 Summary: Add logs in SegmentReservationStorage on exchange process
 Key: IGNITE-8965
 URL: https://issues.apache.org/jira/browse/IGNITE-8965
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8913) Uninformative SQL query cancellation message

2018-07-03 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8913:
-

 Summary: Uninformative SQL query cancellation message
 Key: IGNITE-8913
 URL: https://issues.apache.org/jira/browse/IGNITE-8913
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov
 Fix For: 2.5


When query timeouted or cancelled or other exception, we getting message: "The 
query was cancelled while executing".
Need make message more clear - text of query, node which the cancelled, reason 
of cancel query e.t.c.

{noformat}
2018-06-19 
00:00:10.653[ERROR][query-#93192%DPL_GRID%DplGridNodeName%][o.a.i.i.p.q.h.t.GridMapQueryExecutor]
 Failed to execute local query.
org.apache.ignite.cache.query.QueryCancelledException: The query was cancelled 
while executing.
 at 
org.apache.ignite.internal.processors.query.GridQueryCancel.set(GridQueryCancel.java:53)
 at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQuery(IgniteH2Indexing.java:1115)
 at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQueryWithTimer(IgniteH2Indexing.java:1207)
 at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQueryWithTimer(IgniteH2Indexing.java:1185)
 at 
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:683)
 at 
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:527)
 at 
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:218)
 at 
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:2333)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
2018-06-19 
00:00:11.629[ERROR][query-#93187%DPL_GRID%DplGridNodeName%][o.a.i.i.p.q.h.t.GridMapQueryExecutor]
 Failed to execute local query.
org.apache.ignite.cache.query.QueryCancelledException: The query was cancelled 
while executing.
 at 
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:670)
 at 
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:527)
 at 
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onMessage(GridMapQueryExecutor.java:218)
 at 
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor$2.onMessage(GridMapQueryExecutor.java:178)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:2333)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1556)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1184)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:125)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1091)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8866) Need attempt to upload class until node leave or fail topology by discovery SPI

2018-06-25 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8866:
-

 Summary: Need attempt to upload class until node leave or fail 
topology by discovery SPI
 Key: IGNITE-8866
 URL: https://issues.apache.org/jira/browse/IGNITE-8866
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


After one fail attempt to upload a class, client code getting exception:

{noformat}
10:04:46,253 INFO  [stdout] (Thread-732) java.lang.NoClassDefFoundError: 
ru/sbt/deposit_pf_api/core/utils/DplUtils
10:04:46,253 INFO  [stdout] (Thread-732)   at 
ru.sbt.deposit_pf_api.comparators.CommonPredicate.nodeIdIgnite(CommonPredicate.java:225)
10:04:46,253 INFO  [stdout] (Thread-732)   at 
ru.sbt.deposit_pf_api.comparators.CommonPredicate.cacheEntities(CommonPredicate.java:191)
10:04:46,253 INFO  [stdout] (Thread-732)   at 
ru.sbt.deposit_pf_api.comparators.CommonPredicate.(CommonPredicate.java:116)
{noformat}

And log contains some related warnings:
{noformat}
018-06-19 10:04:18.459 [WARN 
][pub-#3308%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDeploymentCommunication] 
Failed to receive peer response from node within duration 
[node=5861d763-a552-463e-817a-0742f7aad114, duration=5008]
2018-06-19 10:04:18.459 [WARN 
][pub-#3308%DPL_GRID%DplGridNodeName%][o.a.i.i.m.d.GridDeploymentPerVersionStore]
 Failed to send class-loading request to node (is node alive?) 
[node=5861d763-a552-463e-817a-0742f7aad114, 
clsName=ru.sbt.deposit_pf_api.core.utils.DplUtils, 
clsPath=ru/sbt/deposit_pf_api/core/utils/DplUtils.class, 
clsLdrId=370f1361461-5861d763-a552-463e-817a-0742f7aad114, 
parentClsLdr=com.sbt.dpl.gridgain.ignite.NodeClassLoader@1ce4a752]
{noformat}

I think should to upload class through p2p until node present in topology.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8829) Some configuration properties of TcpCommunicationSpi does not annotated appropriately

2018-06-19 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8829:
-

 Summary: Some configuration properties of TcpCommunicationSpi does 
not annotated appropriately
 Key: IGNITE-8829
 URL: https://issues.apache.org/jira/browse/IGNITE-8829
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


When I checked all properties of TcpCommunicationSpi, I have found an issue 
with getting all configuration properties from code. Because a part of them not 
be a configured property, but a part of a real SPI life.

I should was rid of these issues - all configurable properties must annotate as 
{{IgniteSpiConfiguration}}, but it not done for each.

I have found at least two properties for which not be done:
{{connectionsPerNode}}
{{usePairedConnections}}

and one property which not appropriate contract (it have only setter, but not 
getter):
{{addressResolver}}

Need to revised all properties CommunicationSpi and correct them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8754) Node outside of baseline does not start when service configured

2018-06-08 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8754:
-

 Summary: Node outside of baseline does not start when service 
configured
 Key: IGNITE-8754
 URL: https://issues.apache.org/jira/browse/IGNITE-8754
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov
 Attachments: ServiceOnNodeOutOfBaselineTest.java

Enough to configure service in {{ServiceConfiguration}} and the node does not 
started if the node outside of baseline.
{noformat}
"async-runnable-runner-1" #287 prio=5 os_prio=0 tid=0x24e0c800 
nid=0x4e6c waiting on condition [0xe87fe000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at 
org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart0(GridServiceProcessor.java:287)
at 
org.apache.ignite.internal.processors.service.GridServiceProcessor.onKernalStart(GridServiceProcessor.java:228)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1105)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2014)
at 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1723)
- locked <0x00076c142400> (a 
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1151)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:649)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:882)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:845)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:833)
at 
org.apache.ignite.testframework.junits.GridAbstractTest.startGrid(GridAbstractTest.java:799)
at 
org.gridgain.internal.ServiceOnNodeOutOfBaselineTest.lambda$test$0(ServiceOnNodeOutOfBaselineTest.java:107)
at 
org.gridgain.internal.ServiceOnNodeOutOfBaselineTest$$Lambda$22/781127963.run(Unknown
 Source)
at 
org.apache.ignite.testframework.GridTestUtils.lambda$runAsync$1(GridTestUtils.java:898)
at 
org.apache.ignite.testframework.GridTestUtils$$Lambda$23/1655470614.call(Unknown
 Source)
at 
org.apache.ignite.testframework.GridTestUtils.lambda$runAsync$2(GridTestUtils.java:956)
at 
org.apache.ignite.testframework.GridTestUtils$$Lambda$24/1782331932.run(Unknown 
Source)
at 
org.apache.ignite.testframework.GridTestUtils$6.call(GridTestUtils.java:1254)
at 
org.apache.ignite.testframework.GridTestThread.run(GridTestThread.java:86)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8710) Applying WAL works long time or fail at all, when *.wal files been removed

2018-06-05 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8710:
-

 Summary: Applying WAL works long time or fail at all, when *.wal 
files been removed
 Key: IGNITE-8710
 URL: https://issues.apache.org/jira/browse/IGNITE-8710
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


In specific cases when removed *.wal files or unmounted wal directories we got 
some warning message on start:

{noformat}
2018-06-02 12:10:06.127[INFO 
][Thread-100][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Checking memory 
state [lastValidPos=FileWALPointer [idx=0, fileOff=0, len=0], 
lastMarked=FileWALPointer [idx=0, fileOff=0, len=0], 
lastCheckpointId=----]
2018-06-02 12:10:06.546[WARN 
][Thread-100][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Found unexpected 
checkpoint marker, skipping [cpId=94b5ce03-87b7-489e-b08b-b4c5dc522bd5, 
expCpId=----, pos=FileWALPointer [idx=0, 
fileOff=44266869, len=977]]
2018-06-02 12:10:57.860[WARN 
][Thread-100][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Found unexpected 
checkpoint marker, skipping [cpId=3f6ab238-23f7-4924-b4ef-0cb68d914a04, 
expCpId=----, pos=FileWALPointer [idx=7, 
fileOff=872888269, len=460112]]
2018-06-02 12:11:46.600[INFO 
][Thread-100][o.a.i.i.p.c.p.w.FileWriteAheadLogManager] Stopping WAL iteration 
due to an exception: EOF at position [1073741824] expected to read [1] bytes, 
ptr=FileWALPointer [idx=15, fileOff=1073741824, len=0]
2018-06-02 12:12:21.181[WARN 
][Thread-100][o.a.i.i.p.c.p.GridCacheDatabaseSharedManager] Found unexpected 
checkpoint marker, skipping [cpId=3fe33806-ee11-49b7-8c47-648cd1adacbc, 
expCpId=----, pos=FileWALPointer [idx=23, 
fileOff=693360866, len=460112]]
{noformat}

And trying to recovery from WAL hangs a long try without success.

Should to stop the node and print message about not found necessary wal-files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8606) Node hangs on next exchange, when no access to marshaller's folder

2018-05-24 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8606:
-

 Summary: Node hangs on next exchange, when no access to 
marshaller's folder 
 Key: IGNITE-8606
 URL: https://issues.apache.org/jira/browse/IGNITE-8606
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


{noformat}
2018-05-18 11:12:57.572 
[ERROR][tcp-disco-msg-worker-#3%DPL_GRID%DplGridNodeName%][o.a.i.i.MarshallerMappingFileStore]
 Failed to write class name to file [platformId=0id=1713316383, 
clsName=com.sbt.dpl.gridgain.affinity.DPLIndexAffinityPrimaryFilter, 
file=/u01/pprb/work/marshaller/1713316383.classname0]
java.io.FileNotFoundException: /u01/pprb/work/marshaller/1713316383.classname0 
(No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:162)
at 
org.apache.ignite.internal.MarshallerMappingFileStore.writeMapping(MarshallerMappingFileStore.java:94)
at 
org.apache.ignite.internal.MarshallerMappingFileStore.mergeAndWriteMapping(MarshallerMappingFileStore.java:207)
at 
org.apache.ignite.internal.MarshallerContextImpl.onMappingDataReceived(MarshallerContextImpl.java:201)
at 
org.apache.ignite.internal.processors.marshaller.GridMarshallerMappingProcessor.processIncomingMappings(GridMarshallerMappingProcessor.java:356)
at 
org.apache.ignite.internal.processors.marshaller.GridMarshallerMappingProcessor.onJoiningNodeDataReceived(GridMarshallerMappingProcessor.java:336)
at 
org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$5.onExchange(GridDiscoveryManager.java:908)
at 
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.onExchange(TcpDiscoverySpi.java:1939)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processNodeAddedMessage(ServerImpl.java:4220)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2744)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2536)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6775)
at 
org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2621)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8458) AffinityAssigment absorbs a lot of java heap

2018-05-08 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8458:
-

 Summary: AffinityAssigment absorbs a lot of java heap
 Key: IGNITE-8458
 URL: https://issues.apache.org/jira/browse/IGNITE-8458
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


For the more hundred caches and several thousand partitions the size can grow 
out of 10 Gb.
In my case heap stored ~5К {{HistoryAffinityAssigment}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8440) Transaction may hangs on node in PREPARED state

2018-05-04 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8440:
-

 Summary: Transaction may hangs on node in PREPARED state
 Key: IGNITE-8440
 URL: https://issues.apache.org/jira/browse/IGNITE-8440
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


In some specific cases we can to see when transaction hangs on one node in 
{{PREPARED}} state, but does not hang in others.
That unhappy node waiting to get {{TxFinishRequest}}, but never got it and 
continue to print _long running transaction message_.

Should to check other nodes, when transaction hang on PREPARED state without 
progress.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8348) Debug information about discovery messages in TcpDiscoverySpiMBean

2018-04-20 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8348:
-

 Summary: Debug information about discovery messages in 
TcpDiscoverySpiMBean
 Key: IGNITE-8348
 URL: https://issues.apache.org/jira/browse/IGNITE-8348
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


In some discovery issue, like:
1) Behavior on instable network
2) Segmentation of several nodes
or others, when the SPI works does not obviously.
Wants to know - what kind of messages has been sent (receive) from particular 
node?

By that reason want to add method on TcpDiscoverySpiMBean:

{code}
/**
 * Print a list of discarder messages.
 */
@MXBeanDescription("Print a list of discarded messages to log.")
public void printListOfDiscardedMessages();

/**
 * Print a list of received messages.
 */
@MXBeanDescription("Print a list of received messages to log.")
public void printListOfReceivedMessages();

/**
 * Print a list of sent messages.
 */
@MXBeanDescription("Print a list of sent messages to log.")
public void printListOfSentMessages();
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8136) Discovery service wrong works if node stopping by segmentation and hangs

2018-04-04 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8136:
-

 Summary: Discovery service wrong works if node stopping by 
segmentation and hangs
 Key: IGNITE-8136
 URL: https://issues.apache.org/jira/browse/IGNITE-8136
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8087) Assertion error in time to rebalancing

2018-03-30 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8087:
-

 Summary: Assertion error in time to rebalancing
 Key: IGNITE-8087
 URL: https://issues.apache.org/jira/browse/IGNITE-8087
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


{noformat}

2018-03-30 
10:06:17.936[ERROR][sys-#308516%DPL_GRID%DplGridNodeName%][o.a.i.i.p.cache.GridCacheIoManager]
 Failed processing message [senderId=4754f275-a46b-4df5-b263-8369a9cb899b, 
msg=GridDhtPartitionSupplyMessage [updateSeq=151421, 
topVer=AffinityTopologyVersion [topVer=546, minorTopVer=8], missed=null, 
clean=null, msgSize=524554, estimatedKeysCnt=-1, size=1, parts=[45], 
super=GridCacheGroupIdMessage [grpId=218536256]]]
java.lang.AssertionError: GridDhtCacheEntry [rdrs=[], part=45, 
super=GridDistributedCacheEntry [super=GridCacheMapEntry 
[key=KeyCacheObjectImpl [part=45, val=1005, hasValBytes=true], val=null, 
startVer=1522624104073, ver=GridCacheVersion [topVer=133119151, 
order=1522038055581, nodeOrder=13], hash=1005, extras=null, flags=3]]]
 at 
org.apache.ignite.internal.processors.cache.GridCacheContext.onDeferredDelete(GridCacheContext.java:1644)
 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:446)
 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.unswap(GridCacheMapEntry.java:377)
 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:2713)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.preloadEntry(GridDhtPartitionDemander.java:798)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionDemander.handleSupplyMessage(GridDhtPartitionDemander.java:678)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.handleSupplyMessage(GridDhtPreloader.java:375)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:364)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$5.apply(GridCachePartitionExchangeManager.java:354)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:579)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$700(GridCacheIoManager.java:99)
 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$OrderedMessageListener.onMessage(GridCacheIoManager.java:1609)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1555)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4100(GridIoManager.java:126)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager$GridCommunicationMessageSet.unwind(GridIoManager.java:2751)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.unwindMessageSet(GridIoManager.java:1515)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4400(GridIoManager.java:126)
 at 
org.apache.ignite.internal.managers.communication.GridIoManager$10.run(GridIoManager.java:1484)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8030) Cluster hangs on deactivation process in time stopping indexed cache

2018-03-23 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8030:
-

 Summary: Cluster hangs on deactivation process in time stopping 
indexed cache
 Key: IGNITE-8030
 URL: https://issues.apache.org/jira/browse/IGNITE-8030
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov
 Attachments: thrdump-server.log

{noformat}

"sys-#10283%DPL_GRID%DplGridNodeName%" #13068 prio=5 os_prio=0 
tid=0x7f07040eb000 nid=0x2e0f waiting on condition [0x7e6deb9b8000]

   java.lang.Thread.State: WAITING (parking)

    at sun.misc.Unsafe.park(Native Method)

    - parking to wait for  <0x7f0bd2b0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)

    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)

    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)

    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireInterruptibly(AbstractQueuedSynchronizer.java:897)

    at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1222)

    at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lockInterruptibly(ReentrantReadWriteLock.java:998)

    at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.lock(GridH2Table.java:292)

    at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.lock(GridH2Table.java:253)

    at org.h2.command.ddl.DropTable.prepareDrop(DropTable.java:87)

    at org.h2.command.ddl.DropTable.update(DropTable.java:113)

    at org.h2.command.CommandContainer.update(CommandContainer.java:101)

    at org.h2.command.Command.executeUpdate(Command.java:260)

    - locked <0x7f0c276c85b8> (a org.h2.engine.Session)

    at 
org.h2.jdbc.JdbcStatement.executeUpdateInternal(JdbcStatement.java:137)

    - locked <0x7f0c276c85b8> (a org.h2.engine.Session)

    at org.h2.jdbc.JdbcStatement.executeUpdate(JdbcStatement.java:122)

    at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.dropTable(IgniteH2Indexing.java:654)

    at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.unregisterCache(IgniteH2Indexing.java:2482)

    at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.onCacheStop0(GridQueryProcessor.java:1684)

    - locked <0x7f0b69f822d0> (a java.lang.Object)

    at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.onCacheStop(GridQueryProcessor.java:879)

    at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.stopCache(GridCacheProcessor.java:1189)

    at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStop(GridCacheProcessor.java:2063)

    at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.onExchangeDone(GridCacheProcessor.java:2219)

    at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:1518)

    at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.finishExchangeOnCoordinator(GridDhtPartitionsExchangeFuture.java:2538)

    at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onAllReceived(GridDhtPartitionsExchangeFuture.java:2297)

    at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.processSingleMessage(GridDhtPartitionsExchangeFuture.java:2034)

    at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.access$100(GridDhtPartitionsExchangeFuture.java:122)

    at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1891)

    at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture$2.apply(GridDhtPartitionsExchangeFuture.java:1879)

    at 
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)

    at 
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)

    at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onReceiveSingleMessage(GridDhtPartitionsExchangeFuture.java:1879)

    at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.processSinglePartitionUpdate(GridCachePartitionExchangeManager.java:1523)

    at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.access$1000(GridCach

[jira] [Created] (IGNITE-8021) Destroyed caches can be return to life by restart grid

2018-03-22 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8021:
-

 Summary: Destroyed caches can be return to life by restart grid
 Key: IGNITE-8021
 URL: https://issues.apache.org/jira/browse/IGNITE-8021
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


Cache configuration files stay stored on file system after invoke \{{destroy}} 
method.

By the reason after restart grid all removed caches are start.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-8006) Starting multiple caches inhibits exchange process on joining node

2018-03-21 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-8006:
-

 Summary: Starting multiple caches inhibits exchange process on 
joining node
 Key: IGNITE-8006
 URL: https://issues.apache.org/jira/browse/IGNITE-8006
 Project: Ignite
  Issue Type: Improvement
Reporter: Vladislav Pyatkov


In some cases when we starts multiple caches (over 2K caches), we can to got a 
stop on exchange when new node joining to the cluster.

Coordinator-node wait to receive a single message from all other nodes, but 
last node (which want to joining to the cluster) stopped on starting caches:

 

{noformat}

Stack trace
 at java.lang.Thread.dumpStack(Thread.java:1329)
 at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.startCache(GridCacheProcessor.java:1159)
 at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(GridCacheProcessor.java:1900)
 at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor.startCachesOnLocalJoin(GridCacheProcessor.java:1764)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.initCachesOnLocalJoin(GridDhtPartitionsExchangeFuture.java:740)
 at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:622)
 at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2329)
 at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
 at java.lang.Thread.run(Thread.java:745)

{noformat}

 

that inhibits cluster exchange process, until all caches started on the last 
node.

 

We should to start caches in parallel threads or exclude the action from 
exchange init process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7930) Partition map hang in incorrect state when backup filter is assigned

2018-03-13 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-7930:
-

 Summary: Partition map hang in incorrect state when backup filter 
is assigned
 Key: IGNITE-7930
 URL: https://issues.apache.org/jira/browse/IGNITE-7930
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov
 Attachments: IgnitePdsRebalanceCompletionTest.java

The test ([^IgnitePdsRebalanceCompletionTest.java]) shown, which some partition 
turn up OWNING (but this should not be so) state and whole cluster hangs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7896) Files of evicted partitions do not removed from disk storage

2018-03-06 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-7896:
-

 Summary: Files of evicted partitions do not removed from disk 
storage
 Key: IGNITE-7896
 URL: https://issues.apache.org/jira/browse/IGNITE-7896
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov
 Attachments: IgnitePdsRebalanceCompletionAndPartitionFilesTest.java

Look at test reproduction: 
[^IgnitePdsRebalanceCompletionAndPartitionFilesTest.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7703) Add a method gets caches in batch

2018-02-14 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-7703:
-

 Summary: Add a method gets caches in batch
 Key: IGNITE-7703
 URL: https://issues.apache.org/jira/browse/IGNITE-7703
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


Ignite allows to start (and/or get) caches in batch, but not allows to do get 
without starting.

In some cases need to start particular subset of all cluster caches, but if 
calls this one by one:

_org.apache.ignite.Ignite#cache_

we have a risk to overload discovery layer by messages of 
_DynamicCacheChangeRequest_.

 

Will be better to add a specific method for gets of caches in batch.

_org.apache.ignite.Ignite#caches_



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-6991) SharedDeploymentTest.testDeploymentFromSecondAndThird Test fails in 100 percentage cases

2017-11-22 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-6991:
-

 Summary: SharedDeploymentTest.testDeploymentFromSecondAndThird 
Test fails in 100 percentage cases
 Key: IGNITE-6991
 URL: https://issues.apache.org/jira/browse/IGNITE-6991
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


{noformat}
java.lang.ClassNotFoundException: 
org.apache.ignite.tests.p2p.compute.ExternalCallable2
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at 
org.apache.ignite.testframework.GridTestExternalClassLoader.findClass(GridTestExternalClassLoader.java:143)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at 
org.apache.ignite.testframework.GridTestExternalClassLoader.loadClass(GridTestExternalClassLoader.java:152)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at 
org.apache.ignite.p2p.SharedDeploymentTest.runJob2(SharedDeploymentTest.java:124)
at 
org.apache.ignite.p2p.SharedDeploymentTest.testDeploymentFromSecondAndThird(SharedDeploymentTest.java:82)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6922) Class can not undeploy from grid in some specific cases

2017-11-15 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-6922:
-

 Summary: Class can not undeploy from grid in some specific cases
 Key: IGNITE-6922
 URL: https://issues.apache.org/jira/browse/IGNITE-6922
 Project: Ignite
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
Reporter: Vladislav Pyatkov






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6799) Check of starvation in striped thread pool

2017-10-31 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-6799:
-

 Summary: Check of starvation in striped thread pool
 Key: IGNITE-6799
 URL: https://issues.apache.org/jira/browse/IGNITE-6799
 Project: Ignite
  Issue Type: Improvement
  Security Level: Public (Viewable by anyone)
Reporter: Vladislav Pyatkov


We have got false alarm like:

{noformat}
2017-10-30 14:01:40.308[WARN 
][grid-timeout-worker-#63%DPL_GRID%DplGridNodeName%][o.a.ignite.internal.util.typedef.G]
 >>> Possible starvation in striped pool. 
2017-10-30 13:56:41.538[WARN 
][grid-timeout-worker-#63%DPL_GRID%DplGridNodeName%][o.a.ignite.internal.util.typedef.G]
 >>> Possible starvation in striped pool. 
2017-10-30 13:46:40.488[WARN 
][grid-timeout-worker-#63%DPL_GRID%DplGridNodeName%][o.a.ignite.internal.util.typedef.G]
 >>> Possible starvation in striped pool. 
2017-10-30 13:37:45.481[WARN 
][grid-timeout-worker-#63%DPL_GRID%DplGridNodeName%][o.a.ignite.internal.util.typedef.G]
 >>> Possible starvation in striped pool. 
{noformat}

It will be on checkpoint usually, but that is false triggering. Because thread 
have not been active long time, but got active recently.

We should save last active state on stripe like it done with completedCntrs and 
rewrite condition:

{code}
completedCntrs[i] != -1 &&
completedCntrs[i] == completedCnt &&
actives[i] == active &&
active
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Please review it IGNITE-6737

2017-10-27 Thread Vladislav Pyatkov
Hi,

I have added PR for this issue:

https://issues.apache.org/jira/browse/IGNITE-6737

and this fix passed in TC.

This is serious issue witch can loop nodes from cluster.

-- 
Vladislav Pyatkov


[jira] [Created] (IGNITE-6737) GridDeploymentPerVersionStore retries loading class infinitely

2017-10-24 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-6737:
-

 Summary: GridDeploymentPerVersionStore retries loading class 
infinitely
 Key: IGNITE-6737
 URL: https://issues.apache.org/jira/browse/IGNITE-6737
 Project: Ignite
  Issue Type: Bug
  Security Level: Public (Viewable by anyone)
Reporter: Vladislav Pyatkov


{noformat}
2017-10-24 14:34:06 [DEBUG] 
[org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore] 
[pub-#5258%DPL_GRID%DplGridNodeName%] - Deployment meta for local deployment: 
GridDeploymentMetadata [depMode=SHARED, 
alias=com.sbt.bgp.task.AffinityApplicationTaskCallable, 
clsName=com.sbt.bgp.task.AffinityApplicationTaskCallable, userVer=null, 
sndNodeId=1b852edd-1f41-4489-af78-dbe8226a9b16, clsLdrId=null, clsLdr=null, 
participants=null, parentLdr=null, record=true, nodeFilter=null, seqNum=n/a]

2017-10-24 14:34:06 [DEBUG] 
[org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore] 
[pub-#5258%DPL_GRID%DplGridNodeName%] - Failed to load class for local 
auto-deployment [ldr=grid:com.sbt.core.envelope.container.FileCl
assLoader@3e4327dc, meta=GridDeploymentMetadata [depMode=SHARED, 
alias=com.sbt.bgp.task.AffinityApplicationTaskCallable, 
clsName=com.sbt.bgp.task.AffinityApplicationTaskCallable, userVer=null, 
sndNodeId=1b852edd-1f41-4489-af78-dbe8226a9b
16, clsLdrId=null, clsLdr=null, participants=null, parentLdr=null, record=true, 
nodeFilter=null, seqNum=n/a]]

2017-10-24 14:34:06 [DEBUG] 
[org.apache.ignite.internal.managers.deployment.GridDeploymentPerVersionStore] 
[pub-#5258%DPL_GRID%DplGridNodeName%] - Deployment cannot be reused (class does 
not exist on participating nodes) [dep=SharedDeployment [rmv=false, 
super=GridDeployment [ts=1508810401226, depMode=SHARED, 
clsLdr=GridDeploymentClassLoader 
[id=7953e0c4f51-1b852edd-1f41-4489-af78-dbe8226a9b16, singleNode=false, 
nodeLdrMap={bc5a1eaa-e056-4bd8-b7d3-684e75522b81=373cd8c4f51-bc5a1eaa-e056-4bd8-b7d3-684e75522b81,
 
3018f0bb-7c94-410e-9a0f-028c3fbc8aab=a5b822c4f51-3018f0bb-7c94-410e-9a0f-028c3fbc8aab,
 
f1774f8d-84e9-43c3-86a3-d7a47c291f45=afd441c4f51-f1774f8d-84e9-43c3-86a3-d7a47c291f45,
 
5a0b56e8-a8ae-4742-834c-d688592866c4=a6e985c4f51-5a0b56e8-a8ae-4742-834c-d688592866c4,
 
65fdae9e-78c7-49a2-b9ee-a8e99dbb87ea=bcd257c4f51-65fdae9e-78c7-49a2-b9ee-a8e99dbb87ea,
 
045ddd4d-3e39-4b25-bf52-c264f59efbc6=e6ec81c4f51-045ddd4d-3e39-4b25-bf52-c264f59efbc6,
 
afadbbce-542d-435c-b85a-78d395b463a5=967664c4f51-afadbbce-542d-435c-b85a-78d395b463a5,
 
4b2662e9-d525-4d96-936c-8cc645464e65=591541c4f51-4b2662e9-d525-4d96-936c-8cc645464e65},
 p2pTimeout=5000, usrVer=0, depMode=SHARED, quiet=false], 
clsLdrId=7953e0c4f51-1b852edd-1f41-4489-af78-dbe8226a9b16, userVer=0, 
loc=false, 
sampleClsName=com.sbt.fea_cc.services.business.autoStopTurnkeySettings.AutoStopTurnkeySettingsService$FindOrderTurnkeyForSuspend,
 pendingUndeploy=false, undeployed=false, usage=0]], 
meta=GridDeploymentMetadata [depMode=SHARED, 
alias=com.sbt.bgp.task.AffinityApplicationTaskCallable, 
clsName=com.sbt.bgp.task.AffinityApplicationTaskCallable, userVer=0, 
sndNodeId=4457016c-5f93-450f-b2a7-86bd25f536cf, 
clsLdrId=898962c4f51-4457016c-5f93-450f-b2a7-86bd25f536cf, clsLdr=null, 
participants=null, parentLdr=null, record=true, nodeFilter=null, 
seqNum=150888744]]


2017-10-24 14:34:06 [DEBUG] 
[org.apache.ignite.internal.managers.deployment.GridDeploymentPerVersionStore] 
[pub-#5258%DPL_GRID%DplGridNodeName%] - Deployment cannot be reused (random 
class could not be loaded from sender node) [dep=SharedDeployment [rmv=false, 
super=GridDeployment [ts=1508810401226, depMode=SHARED, 
clsLdr=GridDeploymentClassLoader 
[id=7953e0c4f51-1b852edd-1f41-4489-af78-dbe8226a9b16, singleNode=false, 
nodeLdrMap={bc5a1eaa-e056-4bd8-b7d3-684e75522b81=373cd8c4f51-bc5a1eaa-e056-4bd8-b7d3-684e75522b81,
 
3018f0bb-7c94-410e-9a0f-028c3fbc8aab=a5b822c4f51-3018f0bb-7c94-410e-9a0f-028c3fbc8aab,
 
f1774f8d-84e9-43c3-86a3-d7a47c291f45=afd441c4f51-f1774f8d-84e9-43c3-86a3-d7a47c291f45,
 
5a0b56e8-a8ae-4742-834c-d688592866c4=a6e985c4f51-5a0b56e8-a8ae-4742-834c-d688592866c4,
 
65fdae9e-78c7-49a2-b9ee-a8e99dbb87ea=bcd257c4f51-65fdae9e-78c7-49a2-b9ee-a8e99dbb87ea,
 
045ddd4d-3e39-4b25-bf52-c264f59efbc6=e6ec81c4f51-045ddd4d-3e39-4b25-bf52-c264f59efbc6,
 
afadbbce-542d-435c-b85a-78d395b463a5=967664c4f51-afadbbce-542d-435c-b85a-78d395b463a5,
 
4b2662e9-d525-4d96-936c-8cc645464e65=591541c4f51-4b2662e9-d525-4d96-936c-8cc645464e65},
 p2pTimeout=5000, usrVer=0, depMode=SHARED, quiet=false], 
clsLdrId=7953e0c4f51-1b852edd-1f41-4489-af78-dbe8226a9b16, userVer=0, 
loc=false, 
sampleClsName=com.sbt.fea_cc.services.business.autoStopTurnkeySettings.AutoStopTurnkeySettingsService$FindOrderTurnkeyForSuspend,
 pendingUndeploy=false, undeployed=false, usage=0]], 
meta=GridDeploymentMetadata [depMode=SHARED, 
alias=com.sbt.bgp.task.AffinityApplicationTaskCallable, 
clsName

[jira] [Created] (IGNITE-6589) Encountered incompatible class loaders for cache

2017-10-10 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-6589:
-

 Summary: Encountered incompatible class loaders for cache
 Key: IGNITE-6589
 URL: https://issues.apache.org/jira/browse/IGNITE-6589
 Project: Ignite
  Issue Type: Bug
Reporter: Vladislav Pyatkov


By unknown reasons DeploymentManager forces to use objects with compatible 
classloader.
{noformat}
class org.apache.ignite.IgniteCheckedException: Encountered incompatible class 
loaders for cache [class1=org.apache.ignite.tests.p2p.cache.Person, 
class2=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap]
at 
org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager.registerClass(GridCacheDeploymentManager.java:642)
at 
org.apache.ignite.internal.processors.cache.GridCacheDeploymentManager.registerClass(GridCacheDeploymentManager.java:586)
at 
org.apache.ignite.internal.processors.cache.GridCacheMessage.prepareObject(GridCacheMessage.java:223)
at 
org.apache.ignite.internal.processors.cache.GridCacheMessage.marshalInvokeArguments(GridCacheMessage.java:444)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateInvokeRequest.prepareMarshal(GridNearAtomicSingleUpdateInvokeRequest.java:192)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onSend(GridCacheIoManager.java:1120)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1154)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1205)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(GridNearAtomicAbstractUpdateFuture.java:311)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (IGNITE-6579) WAL history does not used when node returns to cluster again

2017-10-09 Thread Vladislav Pyatkov (JIRA)
Vladislav Pyatkov created IGNITE-6579:
-

 Summary: WAL history does not used when node returns to cluster 
again
 Key: IGNITE-6579
 URL: https://issues.apache.org/jira/browse/IGNITE-6579
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Reporter: Vladislav Pyatkov


When I have set big enough value to "WAL history size" and stop node on 20 
minutes, I got the message from coordinator (order=1):

{noformat}
2017-10-06 15:46:33.429 [WARN 
][sys-#10740%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.GridDhtPartitionTopologyImpl]
 Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=e51a1db2-f49b-44a9-b122-adde4016d9e7,
 cacheOrGroupName=CACHEGROUP_PARTICLE_DServiceZone, partId=2424, 
haveHistory=false]
2017-10-06 15:46:33.429 [WARN 
][sys-#10740%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.GridDhtPartitionTopologyImpl]
 Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=e51a1db2-f49b-44a9-b122-adde4016d9e7,
 cacheOrGroupName=CACHEGROUP_PARTICLE_DServiceZone, partId=2427, 
haveHistory=false]
2017-10-06 15:46:33.429 [WARN 
][sys-#10740%DPL_GRID%DplGridNodeName%][o.a.i.i.p.c.d.d.GridDhtPartitionTopologyImpl]
 Partition has been scheduled for rebalancing due to outdated update counter 
[nodeId=e51a1db2-f49b-44a9-b122-adde4016d9e7,
 cacheOrGroupName=CACHEGROUP_PARTICLE_DServiceZone, partId=2426, 
haveHistory=false]
{noformat}

after start node again.
I think, history size should be enough, but I see it is not by logs 
(haveHistory=false).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: PRIMARY_SYNC+readFromBackup semantics

2017-10-04 Thread Vladislav Pyatkov
Hi Val,

If we update local backup immediate synchronously when sending commit to
primary, this only partly removes questions about consistence view.
But we always can to get other (unpredictable) value, because another
transaction will be executed simultaneously from other threads.

At the same time this is good place for optimization, probably reduce
network overhead.
I think, need to create a ticket in Jira for the improvement.


On Tue, Oct 3, 2017 at 12:27 AM, Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> Igniters,
>
> I noticed that combination of PRIMARY_SYNC mode and readFromBackup=true
> (both are default values BTW) introduces weird semantics when reading *on a
> backup node*. Basically, if I do put and then get for the same key in the
> same thread, I can get previous value. In my understanding, this happens
> because even local backup is updated asynchronously in this case.
>
> First of all, this is obviously confusing and would be considered as a bug
> by most of the users (I just updated the key with some value, why would I
> get another value when reading it?).
>
> Second of all, it seems that we send a network message from primary node to
> local backup, which doesn't make much sense to me and looks like
> unnecessary performance overhead.
>
> Is it possible to update local backup synchronously in this scenario?
>
> -Val
>



-- 
Vladislav Pyatkov


  1   2   >