[jira] [Created] (IGNITE-7698) Page read during replacement should be outside of segment write lock

2018-02-13 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-7698:


 Summary: Page read during replacement should be outside of segment 
write lock
 Key: IGNITE-7698
 URL: https://issues.apache.org/jira/browse/IGNITE-7698
 Project: Ignite
  Issue Type: Improvement
  Components: persistence
Affects Versions: 2.1
Reporter: Alexey Goncharuk
 Fix For: 2.5


When a page is acquired, if it needs to be read from disk, we read it inside 
the segment write lock which blocks other threads from acquiring pages that are 
already in memory.

This can be easily avoided: once we initialized the page's being loaded RW 
lock, we can immediately acquire the write lock - no deadlocks can happen here. 
Afterwards, we can release the segment write lock and read the page.

The change seems to be very local.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7697) Update maven-javadoc-plugin version

2018-02-13 Thread Peter Ivanov (JIRA)
Peter Ivanov created IGNITE-7697:


 Summary: Update maven-javadoc-plugin version
 Key: IGNITE-7697
 URL: https://issues.apache.org/jira/browse/IGNITE-7697
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.4
Reporter: Peter Ivanov
Assignee: Peter Ivanov
 Fix For: 2.5


Update version of {{maven-javadoc-plugin}} in order to try to overcome 
following error:
{code}
javadoc: warning - Error fetching URL: 
http://hadoop.apache.org/docs/current/api/
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7696) Deadlock at GridDhtAtomicCache.lockEntries called through GridDhtAtomicCache.updateAllAsyncInternal

2018-02-13 Thread Sadayuki Furuhashi (JIRA)
Sadayuki Furuhashi created IGNITE-7696:
--

 Summary: Deadlock at GridDhtAtomicCache.lockEntries called through 
GridDhtAtomicCache.updateAllAsyncInternal
 Key: IGNITE-7696
 URL: https://issues.apache.org/jira/browse/IGNITE-7696
 Project: Ignite
  Issue Type: Bug
  Components: cache
Affects Versions: 2.3
 Environment: * Ignite 2.3
 * OpenJDK version "1.8.0_151"
 * Linux 4.4.0
Reporter: Sadayuki Furuhashi


We observed that all nodes in a cluster completely stalls and put/get/remove 
operations to a cache blocks for ever. When it happens, we can see following 
log in thread dump:
{code:java}
2018-02-14_04:21:33.84410 Found one Java-level deadlock:
2018-02-14_04:21:33.84410 =
2018-02-14_04:21:33.84411 "sys-#41%IgniteManager%":
2018-02-14_04:21:33.84411 waiting to lock monitor 0x7f6d5e41a558 (object 
0x000781083ef0, a 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry),
2018-02-14_04:21:33.84411 which is held by "sys-stripe-5-#6%IgniteManager%"
2018-02-14_04:21:33.84412 "sys-stripe-5-#6%IgniteManager%":
2018-02-14_04:21:33.84412 waiting to lock monitor 0x7f6d5e41de68 (object 
0x000781083e70, a 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry)
2018-02-14_04:21:33.84412 in JNI, which is held by 
"sys-stripe-2-#3%IgniteManager%"
2018-02-14_04:21:33.84412 "sys-stripe-2-#3%IgniteManager%":
2018-02-14_04:21:33.84413 waiting to lock monitor 0x7f6d5e41a558 (object 
0x000781083ef0, a 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry)
2018-02-14_04:21:33.84413 in JNI, which is held by 
"sys-stripe-5-#6%IgniteManager%"
2018-02-14_04:21:33.84413
2018-02-14_04:21:33.84414 Java stack information for the threads listed above:
2018-02-14_04:21:33.84414 ===
2018-02-14_04:21:33.84416 "sys-#41%IgniteManager%":
2018-02-14_04:21:33.84416 at 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry.markObsoleteVersion(GridCacheMapEntry.java:2153)
2018-02-14_04:21:33.84417 - waiting to lock <0x000781083ef0> (a 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry)
2018-02-14_04:21:33.84417 at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.removeVersionedEntry(GridDhtLocalPartition.java:368)
2018-02-14_04:21:33.84418 at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.cleanupRemoveQueue(GridDhtLocalPartition.java:392)
2018-02-14_04:21:33.84418 at 
org.apache.ignite.internal.processors.cache.GridCacheProcessor$RemovedItemsCleanupTask$1.run(GridCacheProcessor.java:4051)
2018-02-14_04:21:33.84418 at 
org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6687)
2018-02-14_04:21:33.84419 at 
org.apache.ignite.internal.processors.closure.GridClosureProcessor$1.body(GridClosureProcessor.java:827)
2018-02-14_04:21:33.84419 at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
2018-02-14_04:21:33.84419 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2018-02-14_04:21:33.84420 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2018-02-14_04:21:33.84421 at java.lang.Thread.run(Thread.java:748)
2018-02-14_04:21:33.84421 "sys-stripe-5-#6%IgniteManager%":
2018-02-14_04:21:33.84421 at sun.misc.Unsafe.monitorEnter(Native Method)
2018-02-14_04:21:33.84421 at 
org.apache.ignite.internal.util.GridUnsafe.monitorEnter(GridUnsafe.java:1207)
2018-02-14_04:21:33.84422 at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.lockEntries(GridDhtAtomicCache.java:2848)
2018-02-14_04:21:33.84422 at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(GridDhtAtomicCache.java:1707)
2018-02-14_04:21:33.84423 at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(GridDhtAtomicCache.java:1629)
2018-02-14_04:21:33.84423 at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.processNearAtomicUpdateRequest(GridDhtAtomicCache.java:3056)
2018-02-14_04:21:33.84424 at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$400(GridDhtAtomicCache.java:131)
2018-02-14_04:21:33.84424 at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:267)
2018-02-14_04:21:33.84425 at 
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$5.apply(GridDhtAtomicCache.java:262)
2018-02-14_04:21:33.84425 at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1060)
2018-02-14_04:21:33.84425 at 

Re: API to enlist running user tasks

2018-02-13 Thread Nikolay Izhikov
Dmitriy

I thought about those type of tasks that require custom user code to be 
executed inside Ignite.

Your suggestion make sense for me also. Let's include SQL queries.
Should we include ScanQuery, TextQuery to this API?

В Вт, 13/02/2018 в 13:27 -0800, Dmitriy Setrakyan пишет:
> Nikolay, how about regular SQL queries?
> 
> On Tue, Feb 13, 2018 at 6:31 AM, Nikolay Izhikov 
> wrote:
> 
> > Hello, Igniters.
> > 
> > We have some requests from users [1] to have ability to get list of all
> > running continuous queries.
> > I propose to implement such ability.
> > 
> > To implement it we have to extend our JMX beans to provide the following
> > information:
> > 
> > * List of continuous queries for cache:
> > * local listener node.
> > * listener class name.
> > * routine ID.
> > * other CQ parameters
> > * creation timestamp (?)
> > 
> > * List of running compute tasks for node:
> > * node ID task started from.
> > * task class name.
> > * other task parameters.
> > * creation timestamp (?)
> > * start timestamp (?)
> > 
> > * List of running  jobs for node:
> > * node ID task started from.
> > * task class name.
> > * other job parameters.
> > * creation timestamp (?)
> > * start timestamp (?)
> > 
> > I'm planning to file tickets to implement these features.
> > So, please, write if you have any objections.
> > 
> > [1] http://apache-ignite-developers.2346864.n4.nabble.
> > com/Re-List-of-running-Continuous-queries-or-CacheEntryListener-per-cache-
> > or-node-tp25526.html

signature.asc
Description: This is a digitally signed message part


Re: API to enlist running user tasks

2018-02-13 Thread Nikolay Izhikov
Hello, Denis.

> Or is it just a separate task?

Yes, it is a separate task.

I propose to have a different method for each type of long running user task:

* continuous queries
* compute tasks
* compute jobs

В Вт, 13/02/2018 в 17:23 -0800, Denis Magda пишет:
> Sounds like a right addition to our metrics APIs. 
> 
> However, I’m not sure how this part is related to continuous queries:
> 
> > * List of running compute tasks for node: 
> >* node ID task started from. 
> >* task class name. 
> >* other task parameters. 
> >* creation timestamp (?) 
> >* start timestamp (?) 
> > 
> > * List of running  jobs for node: 
> >* node ID task started from. 
> >* task class name. 
> >* other job parameters. 
> >* creation timestamp (?)
> >* start timestamp (?) 
> 
> Or is it just a separate task?
> 
> —
> Denis
> 
> > On Feb 13, 2018, at 6:31 AM, Nikolay Izhikov  wrote:
> > 
> > Hello, Igniters. 
> > 
> > We have some requests from users [1] to have ability to get list of all 
> > running continuous queries.
> > I propose to implement such ability. 
> > 
> > To implement it we have to extend our JMX beans to provide the following 
> > information: 
> > 
> > * List of continuous queries for cache:
> >* local listener node. 
> >* listener class name. 
> >* routine ID. 
> >* other CQ parameters 
> >* creation timestamp (?) 
> > 
> > * List of running compute tasks for node: 
> >* node ID task started from. 
> >* task class name. 
> >* other task parameters. 
> >* creation timestamp (?) 
> >* start timestamp (?) 
> > 
> > * List of running  jobs for node: 
> >* node ID task started from. 
> >* task class name. 
> >* other job parameters. 
> >* creation timestamp (?)
> >* start timestamp (?) 
> > 
> > I'm planning to file tickets to implement these features. 
> > So, please, write if you have any objections. 
> > 
> > [1] 
> > http://apache-ignite-developers.2346864.n4.nabble.com/Re-List-of-running-Continuous-queries-or-CacheEntryListener-per-cache-or-node-tp25526.html
> >  
> > 

signature.asc
Description: This is a digitally signed message part


[GitHub] ignite pull request #3519: IGNITE-7693: Printing out session ids on joining ...

2018-02-13 Thread shroman
GitHub user shroman opened a pull request:

https://github.com/apache/ignite/pull/3519

IGNITE-7693: Printing out session ids on joining via ZookeeperDiscove…

…rySpi.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shroman/ignite IGNITE-7693

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3519.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3519


commit fce93839db7ddc303ca5ced3cb71e6603d4b1adc
Author: shroman 
Date:   2018-02-14T02:50:29Z

IGNITE-7693: Printing out session ids on joining via ZookeeperDiscoverySpi.




---


Re: Write ahead log and early eviction of new elements

2018-02-13 Thread Denis Magda
My guess before the entry (element) gets evicted it will be synced to a 
respective partition file on disk forcibly so that you can grab it from there 
later. 

Ignite persistence experts please confirm my understanding.

—
Denis

> On Feb 13, 2018, at 1:28 PM, Raymond Wilson  
> wrote:
> 
> I have a scenario I would like to validate when using Ignite Persistence.
>  
> I understand when I add an element to a cache that element is serialized, 
> placed into the local memory for the cache on that server and then placed 
> into the WAL pending checkpointing (merging into the persistence store).
>  
> What happens if the newly added element is evicted and then re-read from the 
> cache by the client before the next checkpoint occurs?
>  
> Thanks,
> Raymond.
>  



Re: API to enlist running user tasks

2018-02-13 Thread Denis Magda
Sounds like a right addition to our metrics APIs. 

However, I’m not sure how this part is related to continuous queries:

> * List of running compute tasks for node: 
>* node ID task started from. 
>* task class name. 
>* other task parameters. 
>* creation timestamp (?) 
>* start timestamp (?) 
> 
> * List of running  jobs for node: 
>* node ID task started from. 
>* task class name. 
>* other job parameters. 
>* creation timestamp (?)
>* start timestamp (?) 

Or is it just a separate task?

—
Denis

> On Feb 13, 2018, at 6:31 AM, Nikolay Izhikov  wrote:
> 
> Hello, Igniters. 
> 
> We have some requests from users [1] to have ability to get list of all 
> running continuous queries.
> I propose to implement such ability. 
> 
> To implement it we have to extend our JMX beans to provide the following 
> information: 
> 
> * List of continuous queries for cache:
>* local listener node. 
>* listener class name. 
>* routine ID. 
>* other CQ parameters 
>* creation timestamp (?) 
> 
> * List of running compute tasks for node: 
>* node ID task started from. 
>* task class name. 
>* other task parameters. 
>* creation timestamp (?) 
>* start timestamp (?) 
> 
> * List of running  jobs for node: 
>* node ID task started from. 
>* task class name. 
>* other job parameters. 
>* creation timestamp (?)
>* start timestamp (?) 
> 
> I'm planning to file tickets to implement these features. 
> So, please, write if you have any objections. 
> 
> [1] 
> http://apache-ignite-developers.2346864.n4.nabble.com/Re-List-of-running-Continuous-queries-or-CacheEntryListener-per-cache-or-node-tp25526.html
>  
> 


[GitHub] ignite pull request #3517: IGNITE-7588 Deprecate CacheLocalStore annotation

2018-02-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/ignite/pull/3517


---


Write ahead log and early eviction of new elements

2018-02-13 Thread Raymond Wilson
I have a scenario I would like to validate when using Ignite Persistence.



I understand when I add an element to a cache that element is serialized,
placed into the local memory for the cache on that server and then placed
into the WAL pending checkpointing (merging into the persistence store).



What happens if the newly added element is evicted and then re-read from
the cache by the client before the next checkpoint occurs?



Thanks,

Raymond.


Re: WAL Archive Issue

2018-02-13 Thread Dmitry Pavlov
I see, it seems subgoal 'gain predictable size' can be achieved with
following options:
 - https://issues.apache.org/jira/browse/IGNITE-6552 implementation (in
variant of '...WAL history size in time units and maximum size in GBytes',
- here we probably should change description or create 2nd issue),
 - no-archiver mode ( segments still can be deleted, but in same directory
it was written) - maximum perfomance on ext* fs.
 - applying compressor to segments older than 1 completed checkpoint ago -
saves space.

Is it necessary to store data we can safely remove?

Or may be Ignite should handle this by itself and delete unnecessary
segments on low space left on device, like Linux decreases page cache in
memory if there is no free RAM left.

вт, 13 февр. 2018 г. в 23:32, Ivan Rakov :

> As far as I understand, the idea is WAL archive with predictable size
> ("N checkpoints" is not predictable size), which can be safely removed
> (e.g. if free disk space is urgently needed) without losing crash recovery.
>
> No-archiver mode makes sense as well - it should be faster than current
> mode (at least, on filesystems different from XFS). It will be useful
> for users who has lots of disk space and want to gain maximum throughput.
>
> Best Regards,
> Ivan Rakov
>
> On 13.02.2018 23:14, Dmitry Pavlov wrote:
> > Hi, I didn't get the point why it may be required to separate WAL work,
> WAL
> > uncheckpointed archive (some work outside segment rotation) and
> > checkpointed archive (which is better to be compressed using Ignite new
> > feature - WAL compressor).
> >
> > Please consider new no-archiver mode implemented recently.
> >
> > If archive folder confuses end user, grid admin may set up this mode (all
> > segments is placed in 1 directory) instead of introducing folders.
> >
> >
> > вт, 13 февр. 2018 г. в 22:11, Ivan Rakov :
> >
> >> I think, I got the point now.
> >> There's no need to copy files from "temp" to "archive" dir - we can just
> >> move them, which is a constant-time operation.
> >> Makes sense.
> >>
> >> Change is quite complex (we need to synchronize all movings thoroughly
> >> to avoid ruining existing WAL read iterators), but feasible.
> >>
> >> Best Regards,
> >> Ivan Rakov
> >>
> >>
> >> On 13.02.2018 22:06, Ivan Rakov wrote:
> >>> Yakov,
> >>>
> >>> This will work. However, I expect performance degradation with this
> >>> change. Disk storage has a limited number of I/O operations per second
> >>> on hardware level. List of already existing disk I/O activities
> >>> (writing to WAL work dir, copying from WAL work dir to WAL archive
> >>> dir, writing partition files during checkpoint) will be updated with a
> >>> new one - copying from WAL work dir to temp dir.
> >>>
> >>> Best Regards,
> >>> Ivan Rakov
> >>>
> >>> On 13.02.2018 21:35, Yakov Zhdanov wrote:
>  Ivan,
> 
>  I do not want to create new files. As far as I know, now we copy
>  segments
>  to archive dir before they get checkpointed. What I suggest is to
>  copy them
>  to a temp dir under wal directory and then move to archive. In my
>  understanding at the time we copy the files to a temp folder all
>  changes to
>  them are already fsynced.
> 
>  Correct?
> 
>  Yakov Zhdanov,
>  www.gridgain.com
> 
>  2018-02-13 21:29 GMT+03:00 Ivan Rakov :
> 
> > Yakov,
> >
> > I see the only one problem with your suggestion - number of
> > "uncheckpointed" segments is potentially unlimited.
> > Right now we have limited number (10) of file segments with immutable
> > names in WAL "work" directory. We have to keep this approach due to
> > known
> > bug in XFS - fsync time is nearly twice bigger for recently created
> > files.
> >
> > Best Regards,
> > Ivan Rakov
> >
> >
> > On 13.02.2018 21:22, Yakov Zhdanov wrote:
> >
> >> I meant we still will be copying segment once and then will be
> >> moving it
> >> to
> >> archive which should not affect file system much.
> >>
> >> Thoughts?
> >>
> >> --Yakov
> >>
> >> 2018-02-13 21:19 GMT+03:00 Yakov Zhdanov :
> >>
> >> Alex,
> >>> I remember we had some confusing behavior for WAL archive when
> >>> archived
> >>> segments were required for successful recovery.
> >>>
> >>> Is issue still present?
> >>>
> >>> If yes, what if we copy "uncheckpointed" segments to a directory
> >>> under
> >>> wal
> >>> directory and then move the segments to archive after checkpoint?
> >>> Will
> >>> this
> >>> work?
> >>>
> >>> Thanks!
> >>>
> >>> --Yakov
> >>>
> >>>
> >>
>
>


Re: WAL Archive Issue

2018-02-13 Thread Ivan Rakov
As far as I understand, the idea is WAL archive with predictable size 
("N checkpoints" is not predictable size), which can be safely removed 
(e.g. if free disk space is urgently needed) without losing crash recovery.


No-archiver mode makes sense as well - it should be faster than current 
mode (at least, on filesystems different from XFS). It will be useful 
for users who has lots of disk space and want to gain maximum throughput.


Best Regards,
Ivan Rakov

On 13.02.2018 23:14, Dmitry Pavlov wrote:

Hi, I didn't get the point why it may be required to separate WAL work, WAL
uncheckpointed archive (some work outside segment rotation) and
checkpointed archive (which is better to be compressed using Ignite new
feature - WAL compressor).

Please consider new no-archiver mode implemented recently.

If archive folder confuses end user, grid admin may set up this mode (all
segments is placed in 1 directory) instead of introducing folders.


вт, 13 февр. 2018 г. в 22:11, Ivan Rakov :


I think, I got the point now.
There's no need to copy files from "temp" to "archive" dir - we can just
move them, which is a constant-time operation.
Makes sense.

Change is quite complex (we need to synchronize all movings thoroughly
to avoid ruining existing WAL read iterators), but feasible.

Best Regards,
Ivan Rakov


On 13.02.2018 22:06, Ivan Rakov wrote:

Yakov,

This will work. However, I expect performance degradation with this
change. Disk storage has a limited number of I/O operations per second
on hardware level. List of already existing disk I/O activities
(writing to WAL work dir, copying from WAL work dir to WAL archive
dir, writing partition files during checkpoint) will be updated with a
new one - copying from WAL work dir to temp dir.

Best Regards,
Ivan Rakov

On 13.02.2018 21:35, Yakov Zhdanov wrote:

Ivan,

I do not want to create new files. As far as I know, now we copy
segments
to archive dir before they get checkpointed. What I suggest is to
copy them
to a temp dir under wal directory and then move to archive. In my
understanding at the time we copy the files to a temp folder all
changes to
them are already fsynced.

Correct?

Yakov Zhdanov,
www.gridgain.com

2018-02-13 21:29 GMT+03:00 Ivan Rakov :


Yakov,

I see the only one problem with your suggestion - number of
"uncheckpointed" segments is potentially unlimited.
Right now we have limited number (10) of file segments with immutable
names in WAL "work" directory. We have to keep this approach due to
known
bug in XFS - fsync time is nearly twice bigger for recently created
files.

Best Regards,
Ivan Rakov


On 13.02.2018 21:22, Yakov Zhdanov wrote:


I meant we still will be copying segment once and then will be
moving it
to
archive which should not affect file system much.

Thoughts?

--Yakov

2018-02-13 21:19 GMT+03:00 Yakov Zhdanov :

Alex,

I remember we had some confusing behavior for WAL archive when
archived
segments were required for successful recovery.

Is issue still present?

If yes, what if we copy "uncheckpointed" segments to a directory
under
wal
directory and then move the segments to archive after checkpoint?
Will
this
work?

Thanks!

--Yakov








Re: WAL Archive Issue

2018-02-13 Thread Dmitry Pavlov
Hi, I didn't get the point why it may be required to separate WAL work, WAL
uncheckpointed archive (some work outside segment rotation) and
checkpointed archive (which is better to be compressed using Ignite new
feature - WAL compressor).

Please consider new no-archiver mode implemented recently.

If archive folder confuses end user, grid admin may set up this mode (all
segments is placed in 1 directory) instead of introducing folders.


вт, 13 февр. 2018 г. в 22:11, Ivan Rakov :

> I think, I got the point now.
> There's no need to copy files from "temp" to "archive" dir - we can just
> move them, which is a constant-time operation.
> Makes sense.
>
> Change is quite complex (we need to synchronize all movings thoroughly
> to avoid ruining existing WAL read iterators), but feasible.
>
> Best Regards,
> Ivan Rakov
>
>
> On 13.02.2018 22:06, Ivan Rakov wrote:
> > Yakov,
> >
> > This will work. However, I expect performance degradation with this
> > change. Disk storage has a limited number of I/O operations per second
> > on hardware level. List of already existing disk I/O activities
> > (writing to WAL work dir, copying from WAL work dir to WAL archive
> > dir, writing partition files during checkpoint) will be updated with a
> > new one - copying from WAL work dir to temp dir.
> >
> > Best Regards,
> > Ivan Rakov
> >
> > On 13.02.2018 21:35, Yakov Zhdanov wrote:
> >> Ivan,
> >>
> >> I do not want to create new files. As far as I know, now we copy
> >> segments
> >> to archive dir before they get checkpointed. What I suggest is to
> >> copy them
> >> to a temp dir under wal directory and then move to archive. In my
> >> understanding at the time we copy the files to a temp folder all
> >> changes to
> >> them are already fsynced.
> >>
> >> Correct?
> >>
> >> Yakov Zhdanov,
> >> www.gridgain.com
> >>
> >> 2018-02-13 21:29 GMT+03:00 Ivan Rakov :
> >>
> >>> Yakov,
> >>>
> >>> I see the only one problem with your suggestion - number of
> >>> "uncheckpointed" segments is potentially unlimited.
> >>> Right now we have limited number (10) of file segments with immutable
> >>> names in WAL "work" directory. We have to keep this approach due to
> >>> known
> >>> bug in XFS - fsync time is nearly twice bigger for recently created
> >>> files.
> >>>
> >>> Best Regards,
> >>> Ivan Rakov
> >>>
> >>>
> >>> On 13.02.2018 21:22, Yakov Zhdanov wrote:
> >>>
>  I meant we still will be copying segment once and then will be
>  moving it
>  to
>  archive which should not affect file system much.
> 
>  Thoughts?
> 
>  --Yakov
> 
>  2018-02-13 21:19 GMT+03:00 Yakov Zhdanov :
> 
>  Alex,
> > I remember we had some confusing behavior for WAL archive when
> > archived
> > segments were required for successful recovery.
> >
> > Is issue still present?
> >
> > If yes, what if we copy "uncheckpointed" segments to a directory
> > under
> > wal
> > directory and then move the segments to archive after checkpoint?
> > Will
> > this
> > work?
> >
> > Thanks!
> >
> > --Yakov
> >
> >
> >
>
>


[GitHub] ignite pull request #3518: IGNITE-7690 Move shared memory suite to Ignite Ba...

2018-02-13 Thread glukos
GitHub user glukos opened a pull request:

https://github.com/apache/ignite/pull/3518

IGNITE-7690 Move shared memory suite to Ignite Basic 2



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-7690

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3518.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3518


commit b60356a440e929800da6fd654a179a1dfb69
Author: Ivan Rakov 
Date:   2018-02-13T20:02:04Z

IGNITE-7690 Move shared memory suite 
(IpcSharedMemoryCrashDetectionSelfTest) to Ignite Basic 2




---


Re: [SparkDataFrame] Query Optimization. Prototype

2018-02-13 Thread Valentin Kulichenko
Nikolay,

Non-collocated joins should be used only if there is no way to collocate.
Please read here for more info: https://apacheignite-sql.readme.io/docs/
distributed-joins

As for limitations, I think Vladimir is talking more about syntax related
stuff, i.e. what we do or don't support from SQL compliance perspective. We
depend on H2 here and therefore don't have full knowledge, so I understand
that it takes time to test and document everything. But requirement to have
an index for a non-collocated join is introduced by Ignite and, if it's an
expected one, should be documented. *Vladimir*, can you please comment on
this?

In general, I don't see a reason to exclude anything (especially joins)
from PR. Please finalize the change and pass it to me for review.

-Val

On Tue, Feb 13, 2018 at 11:10 AM, Nikolay Izhikov 
wrote:

> Valentin,
>
> > Looks like this is because you enabled non-collocated joins
>
> But non-collocated joins is only way to be sure that join returns correct
> results.
> So in my case it's OK to enable them.
> Am I right?
>
> > do we have this documented somewhere?
>
> I'm asked that in previuous mail.
> Vladimir Ozerov give me an answer [1] I quoted for you:
>
> > Unfortunately, at this moment we do not have complete list of all
> restrictions on our joins, because a lot of work is delegated to H2.
> > In some unsupported scenarios we throw an exception.
> > In other cases we return incorrect results silently (e.g. if you do not
> co-locate data and forgot to set "distributed joins" flag).
> > We have a plan to perform excessive testing of joins (both co-located
> and distributed) and list all known limitations.
> > This would require writing a lot of unit tests to cover various
> scenarios.
> > I think we will have this information in a matter of 1-2 months.
>
> So the answer is no, we haven't documentation for a join limitations.
>
> That's why I propose to exclude join optimization from my PR until:
>
> 1. We create documentation for all join limitations.
> 2. Create the way to check is certain join satisfy current limitations.
>
> [1] http://apache-ignite-developers.2346864.n4.nabble.com/
> SparkDataFrame-Query-Optimization-Prototype-tp26249p26361.html
>
> В Вт, 13/02/2018 в 09:55 -0800, Valentin Kulichenko пишет:
> > Nikolay,
> >
> > Looks like this is because you enabled non-collocated joins. I was not
> > aware of this limitation though, do we have this documented somewhere?
> >
> > -Val
> >
> > On Tue, Feb 13, 2018 at 8:21 AM, Nikolay Izhikov 
> > wrote:
> >
> > > Val,
> > >
> > > Source code check: https://github.com/apache/igni
> te/blob/master/modules/
> > > indexing/src/main/java/org/apache/ignite/internal/processors
> /query/h2/opt/
> > > GridH2CollocationModel.java#L382
> > >
> > > Stack trace:
> > >
> > > javax.cache.CacheException: Failed to prepare distributed join query:
> join
> > > condition does not use index [joinedCache=SQL_PUBLIC_JT2, plan=SELECT
> > > __Z0.ID AS __C0_0,
> > > __Z0.VAL1 AS __C0_1,
> > > __Z1.ID AS __C0_2,
> > > __Z1.VAL2 AS __C0_3
> > > FROM PUBLIC.JT1 __Z0
> > > /* PUBLIC.JT1.__SCAN_ */
> > > INNER JOIN PUBLIC.JT2 __Z1
> > > /* batched:broadcast PUBLIC.JT2.__SCAN_ */
> > > ON 1=1
> > > WHERE __Z0.VAL1 = __Z1.VAL2]
> > > at org.apache.ignite.internal.processors.query.h2.opt.
> > > GridH2CollocationModel.joinedWithCollocated(GridH2Collocatio
> nModel.java:
> > > 384)
> > > at org.apache.ignite.internal.processors.query.h2.opt.
> > > GridH2CollocationModel.calculate(GridH2CollocationModel.java:308)
> > > at org.apache.ignite.internal.processors.query.h2.opt.
> > > GridH2CollocationModel.type(GridH2CollocationModel.java:549)
> > > at org.apache.ignite.internal.processors.query.h2.opt.
> > > GridH2CollocationModel.calculate(GridH2CollocationModel.java:257)
> > > at org.apache.ignite.internal.processors.query.h2.opt.
> > > GridH2CollocationModel.type(GridH2CollocationModel.java:549)
> > > at org.apache.ignite.internal.processors.query.h2.opt.
> > > GridH2CollocationModel.isCollocated(GridH2CollocationModel.java:691)
> > > at org.apache.ignite.internal.processors.query.h2.sql.
> > > GridSqlQuerySplitter.split(GridSqlQuerySplitter.java:239)
> > > at org.apache.ignite.internal.processors.query.h2.
> > > IgniteH2Indexing.split(IgniteH2Indexing.java:1856)
> > > at org.apache.ignite.internal.processors.query.h2.
> > > IgniteH2Indexing.parseAndSplit(IgniteH2Indexing.java:1818)
> > > at org.apache.ignite.internal.processors.query.h2.
> > > IgniteH2Indexing.querySqlFields(IgniteH2Indexing.java:1569)
> > > at org.apache.ignite.internal.processors.query.
> > > GridQueryProcessor$4.applyx(GridQueryProcessor.java:2037)
> > > at org.apache.ignite.internal.processors.query.
> > > GridQueryProcessor$4.applyx(GridQueryProcessor.java:2032)
> > > at 

Re: WAL Archive Issue

2018-02-13 Thread Ivan Rakov

I think, I got the point now.
There's no need to copy files from "temp" to "archive" dir - we can just 
move them, which is a constant-time operation.

Makes sense.

Change is quite complex (we need to synchronize all movings thoroughly 
to avoid ruining existing WAL read iterators), but feasible.


Best Regards,
Ivan Rakov


On 13.02.2018 22:06, Ivan Rakov wrote:

Yakov,

This will work. However, I expect performance degradation with this 
change. Disk storage has a limited number of I/O operations per second 
on hardware level. List of already existing disk I/O activities 
(writing to WAL work dir, copying from WAL work dir to WAL archive 
dir, writing partition files during checkpoint) will be updated with a 
new one - copying from WAL work dir to temp dir.


Best Regards,
Ivan Rakov

On 13.02.2018 21:35, Yakov Zhdanov wrote:

Ivan,

I do not want to create new files. As far as I know, now we copy 
segments
to archive dir before they get checkpointed. What I suggest is to 
copy them

to a temp dir under wal directory and then move to archive. In my
understanding at the time we copy the files to a temp folder all 
changes to

them are already fsynced.

Correct?

Yakov Zhdanov,
www.gridgain.com

2018-02-13 21:29 GMT+03:00 Ivan Rakov :


Yakov,

I see the only one problem with your suggestion - number of
"uncheckpointed" segments is potentially unlimited.
Right now we have limited number (10) of file segments with immutable
names in WAL "work" directory. We have to keep this approach due to 
known
bug in XFS - fsync time is nearly twice bigger for recently created 
files.


Best Regards,
Ivan Rakov


On 13.02.2018 21:22, Yakov Zhdanov wrote:

I meant we still will be copying segment once and then will be 
moving it

to
archive which should not affect file system much.

Thoughts?

--Yakov

2018-02-13 21:19 GMT+03:00 Yakov Zhdanov :

Alex,
I remember we had some confusing behavior for WAL archive when 
archived

segments were required for successful recovery.

Is issue still present?

If yes, what if we copy "uncheckpointed" segments to a directory 
under

wal
directory and then move the segments to archive after checkpoint? 
Will

this
work?

Thanks!

--Yakov








Re: [SparkDataFrame] Query Optimization. Prototype

2018-02-13 Thread Nikolay Izhikov
Valentin,

> Looks like this is because you enabled non-collocated joins

But non-collocated joins is only way to be sure that join returns correct 
results.
So in my case it's OK to enable them.
Am I right?

> do we have this documented somewhere?

I'm asked that in previuous mail.
Vladimir Ozerov give me an answer [1] I quoted for you:

> Unfortunately, at this moment we do not have complete list of all 
> restrictions on our joins, because a lot of work is delegated to H2.
> In some unsupported scenarios we throw an exception.
> In other cases we return incorrect results silently (e.g. if you do not 
> co-locate data and forgot to set "distributed joins" flag).
> We have a plan to perform excessive testing of joins (both co-located and 
> distributed) and list all known limitations.
> This would require writing a lot of unit tests to cover various scenarios.
> I think we will have this information in a matter of 1-2 months.

So the answer is no, we haven't documentation for a join limitations.

That's why I propose to exclude join optimization from my PR until:

1. We create documentation for all join limitations.
2. Create the way to check is certain join satisfy current limitations.

[1] 
http://apache-ignite-developers.2346864.n4.nabble.com/SparkDataFrame-Query-Optimization-Prototype-tp26249p26361.html

В Вт, 13/02/2018 в 09:55 -0800, Valentin Kulichenko пишет:
> Nikolay,
> 
> Looks like this is because you enabled non-collocated joins. I was not
> aware of this limitation though, do we have this documented somewhere?
> 
> -Val
> 
> On Tue, Feb 13, 2018 at 8:21 AM, Nikolay Izhikov 
> wrote:
> 
> > Val,
> > 
> > Source code check: https://github.com/apache/ignite/blob/master/modules/
> > indexing/src/main/java/org/apache/ignite/internal/processors/query/h2/opt/
> > GridH2CollocationModel.java#L382
> > 
> > Stack trace:
> > 
> > javax.cache.CacheException: Failed to prepare distributed join query: join
> > condition does not use index [joinedCache=SQL_PUBLIC_JT2, plan=SELECT
> > __Z0.ID AS __C0_0,
> > __Z0.VAL1 AS __C0_1,
> > __Z1.ID AS __C0_2,
> > __Z1.VAL2 AS __C0_3
> > FROM PUBLIC.JT1 __Z0
> > /* PUBLIC.JT1.__SCAN_ */
> > INNER JOIN PUBLIC.JT2 __Z1
> > /* batched:broadcast PUBLIC.JT2.__SCAN_ */
> > ON 1=1
> > WHERE __Z0.VAL1 = __Z1.VAL2]
> > at org.apache.ignite.internal.processors.query.h2.opt.
> > GridH2CollocationModel.joinedWithCollocated(GridH2CollocationModel.java:
> > 384)
> > at org.apache.ignite.internal.processors.query.h2.opt.
> > GridH2CollocationModel.calculate(GridH2CollocationModel.java:308)
> > at org.apache.ignite.internal.processors.query.h2.opt.
> > GridH2CollocationModel.type(GridH2CollocationModel.java:549)
> > at org.apache.ignite.internal.processors.query.h2.opt.
> > GridH2CollocationModel.calculate(GridH2CollocationModel.java:257)
> > at org.apache.ignite.internal.processors.query.h2.opt.
> > GridH2CollocationModel.type(GridH2CollocationModel.java:549)
> > at org.apache.ignite.internal.processors.query.h2.opt.
> > GridH2CollocationModel.isCollocated(GridH2CollocationModel.java:691)
> > at org.apache.ignite.internal.processors.query.h2.sql.
> > GridSqlQuerySplitter.split(GridSqlQuerySplitter.java:239)
> > at org.apache.ignite.internal.processors.query.h2.
> > IgniteH2Indexing.split(IgniteH2Indexing.java:1856)
> > at org.apache.ignite.internal.processors.query.h2.
> > IgniteH2Indexing.parseAndSplit(IgniteH2Indexing.java:1818)
> > at org.apache.ignite.internal.processors.query.h2.
> > IgniteH2Indexing.querySqlFields(IgniteH2Indexing.java:1569)
> > at org.apache.ignite.internal.processors.query.
> > GridQueryProcessor$4.applyx(GridQueryProcessor.java:2037)
> > at org.apache.ignite.internal.processors.query.
> > GridQueryProcessor$4.applyx(GridQueryProcessor.java:2032)
> > at org.apache.ignite.internal.util.lang.IgniteOutClosureX.
> > apply(IgniteOutClosureX.java:36)
> > at org.apache.ignite.internal.processors.query.GridQueryProcessor.
> > executeQuery(GridQueryProcessor.java:2553)
> > at org.apache.ignite.internal.processors.query.GridQueryProcessor.
> > querySqlFields(GridQueryProcessor.java:2046)
> > at org.apache.ignite.internal.processors.cache.
> > IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:664)
> > at org.apache.ignite.internal.processors.cache.
> > IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:615)
> > at org.apache.ignite.internal.processors.cache.
> > GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:382)
> > at org.apache.ignite.spark.JoinTestSpec.execSQL(
> > JoinTestSpec.scala:63)
> > 
> > 
> > В Вт, 13/02/2018 в 08:12 -0800, Valentin Kulichenko пишет:
> > > Nikolay,
> > > 
> > > This doesn't make sense to me. Not having an index should not cause
> > 
> > query to fail. What is the exception?
> > > 
> > > -Val
> > > 
> > > On Tue, Feb 

Re: WAL Archive Issue

2018-02-13 Thread Ivan Rakov

Yakov,

This will work. However, I expect performance degradation with this 
change. Disk storage has a limited number of I/O operations per second 
on hardware level. List of already existing disk I/O activities (writing 
to WAL work dir, copying from WAL work dir to WAL archive dir, writing 
partition files during checkpoint) will be updated with a new one - 
copying from WAL work dir to temp dir.


Best Regards,
Ivan Rakov

On 13.02.2018 21:35, Yakov Zhdanov wrote:

Ivan,

I do not want to create new files. As far as I know, now we copy segments
to archive dir before they get checkpointed. What I suggest is to copy them
to a temp dir under wal directory and then move to archive. In my
understanding at the time we copy the files to a temp folder all changes to
them are already fsynced.

Correct?

Yakov Zhdanov,
www.gridgain.com

2018-02-13 21:29 GMT+03:00 Ivan Rakov :


Yakov,

I see the only one problem with your suggestion - number of
"uncheckpointed" segments is potentially unlimited.
Right now we have limited number (10) of file segments with immutable
names in WAL "work" directory. We have to keep this approach due to known
bug in XFS - fsync time is nearly twice bigger for recently created files.

Best Regards,
Ivan Rakov


On 13.02.2018 21:22, Yakov Zhdanov wrote:


I meant we still will be copying segment once and then will be moving it
to
archive which should not affect file system much.

Thoughts?

--Yakov

2018-02-13 21:19 GMT+03:00 Yakov Zhdanov :

Alex,

I remember we had some confusing behavior for WAL archive when archived
segments were required for successful recovery.

Is issue still present?

If yes, what if we copy "uncheckpointed" segments to a directory under
wal
directory and then move the segments to archive after checkpoint? Will
this
work?

Thanks!

--Yakov






Re: Should we annotate @CacheLocalStore as @Depricated?

2018-02-13 Thread Vyacheslav Daradur
Valentin, thank you for replying.

The task [1][2] is ready for review. Please have a look.


[1] https://issues.apache.org/jira/browse/IGNITE-5097
[2] https://github.com/apache/ignite/pull/3517/files

On Tue, Feb 13, 2018 at 9:23 PM, Valentin Kulichenko
 wrote:
> Vyacheslav,
>
> These are test classes, there is not reason to put deprecation on them. We
> need to deprecate anything that is part of public API (in this case I
> believe it's only this annotation, nothing else).
>
> -Val
>
> On Tue, Feb 13, 2018 at 7:09 AM, Vyacheslav Daradur 
> wrote:
>
>> Guys, I need your advice about deprecation rules.
>>
>> Usually, deprecation of annotation doesn't affect on classes which are
>> marked by this annotation.
>>
>> But @CacheLocalStore affect on classes interpretation by Ignite very match.
>> For example:
>> GridCacheStoreManagerDeserializationTest
>> CacheDeploymentTestStoreFactory
>>
>> Should we annotate such classes as deprecated too?
>>
>> On Wed, Jan 31, 2018 at 4:07 PM, Vyacheslav Daradur 
>> wrote:
>> > I filed the ticket [1] and will do it soon.
>> >
>> > [1] https://issues.apache.org/jira/browse/IGNITE-7588
>> >
>> > On Tue, Jan 30, 2018 at 2:27 PM, Anton Vinogradov
>> >  wrote:
>> >> +1
>> >>
>> >> On Tue, Jan 30, 2018 at 9:02 AM, Yakov Zhdanov 
>> wrote:
>> >>
>> >>> +1 for deprecation
>> >>>
>> >>>
>> >>>
>> >>> --Yakov
>> >>>
>> >>> 2018-01-30 1:06 GMT+03:00 Valentin Kulichenko <
>> >>> valentin.kuliche...@gmail.com
>> >>> >:
>> >>>
>> >>> > +1
>> >>> >
>> >>> > On Mon, Jan 29, 2018 at 8:31 AM, Andrey Mashenkov <
>> >>> > andrey.mashen...@gmail.com> wrote:
>> >>> >
>> >>> > > Vyacheslav,
>> >>> > >
>> >>> > > +1 for dropping @CacheLocalStore.
>> >>> > > Ignite have no support 2-phase commit for store and public API
>> provides
>> >>> > no
>> >>> > > methods to users can easily implement it by themselves.
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > > On Mon, Jan 29, 2018 at 7:11 PM, Vyacheslav Daradur <
>> >>> daradu...@gmail.com
>> >>> > >
>> >>> > > wrote:
>> >>> > >
>> >>> > > > Hi Igniters,
>> >>> > > >
>> >>> > > > I've worked with Apache Ignite 3rd Party Persistent Storage tools
>> >>> > > recently.
>> >>> > > >
>> >>> > > > I found that use of CacheLocalStore annotation has hidden
>> issues, for
>> >>> > > > example:
>> >>> > > > * rebalancing issues [1]
>> >>> > > > * possible data consistency issues [1]
>> >>> > > > * handling of CacheLocalStore on clients nodes [2]
>> >>> > > >
>> >>> > > > Valentin K. considers it necessary to make @CacheLocalStore
>> >>> deprecated
>> >>> > > > and remove. If we want to have a decentralized persistent
>> storage we
>> >>> > > > should use Apache Ignite Native Persistence.
>> >>> > > >
>> >>> > > > If the community supports this decision I will create a new Jira
>> >>> issue.
>> >>> > > >
>> >>> > > > Any thoughts?
>> >>> > > >
>> >>> > > > [1] http://apache-ignite-developers.2346864.n4.nabble.
>> >>> > > > com/Losing-data-during-restarting-cluster-with-
>> >>> > > > persistence-enabled-tt24267.html
>> >>> > > > [2] http://apache-ignite-developers.2346864.n4.nabble.
>> >>> > com/How-to-handle-
>> >>> > > > CacheLocalStore-on-clients-node-tt25703.html
>> >>> > > >
>> >>> > > >
>> >>> > > >
>> >>> > > > --
>> >>> > > > Best Regards, Vyacheslav D.
>> >>> > > >
>> >>> > >
>> >>> > >
>> >>> > >
>> >>> > > --
>> >>> > > Best regards,
>> >>> > > Andrey V. Mashenkov
>> >>> > >
>> >>> >
>> >>>
>> >
>> >
>> >
>> > --
>> > Best Regards, Vyacheslav D.
>>
>>
>>
>> --
>> Best Regards, Vyacheslav D.
>>



-- 
Best Regards, Vyacheslav D.


[GitHub] ignite pull request #3517: IGNITE-7588 Deprecate CacheLocalStore annotation

2018-02-13 Thread daradurvs
GitHub user daradurvs opened a pull request:

https://github.com/apache/ignite/pull/3517

IGNITE-7588 Deprecate CacheLocalStore annotation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/daradurvs/ignite ignite-7588

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3517.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3517


commit ff3caa20e6ed7106002e59573067b18965bab3cf
Author: Vyacheslav Daradur 
Date:   2018-02-13T18:44:32Z

ignite-7588: @CacheLocaleStore was annotated as deprecated




---


Re: WAL Archive Issue

2018-02-13 Thread Yakov Zhdanov
Ivan,

I do not want to create new files. As far as I know, now we copy segments
to archive dir before they get checkpointed. What I suggest is to copy them
to a temp dir under wal directory and then move to archive. In my
understanding at the time we copy the files to a temp folder all changes to
them are already fsynced.

Correct?

Yakov Zhdanov,
www.gridgain.com

2018-02-13 21:29 GMT+03:00 Ivan Rakov :

> Yakov,
>
> I see the only one problem with your suggestion - number of
> "uncheckpointed" segments is potentially unlimited.
> Right now we have limited number (10) of file segments with immutable
> names in WAL "work" directory. We have to keep this approach due to known
> bug in XFS - fsync time is nearly twice bigger for recently created files.
>
> Best Regards,
> Ivan Rakov
>
>
> On 13.02.2018 21:22, Yakov Zhdanov wrote:
>
>> I meant we still will be copying segment once and then will be moving it
>> to
>> archive which should not affect file system much.
>>
>> Thoughts?
>>
>> --Yakov
>>
>> 2018-02-13 21:19 GMT+03:00 Yakov Zhdanov :
>>
>> Alex,
>>>
>>> I remember we had some confusing behavior for WAL archive when archived
>>> segments were required for successful recovery.
>>>
>>> Is issue still present?
>>>
>>> If yes, what if we copy "uncheckpointed" segments to a directory under
>>> wal
>>> directory and then move the segments to archive after checkpoint? Will
>>> this
>>> work?
>>>
>>> Thanks!
>>>
>>> --Yakov
>>>
>>>
>


Re: WAL Archive Issue

2018-02-13 Thread Ivan Rakov

Yakov,

I see the only one problem with your suggestion - number of 
"uncheckpointed" segments is potentially unlimited.
Right now we have limited number (10) of file segments with immutable 
names in WAL "work" directory. We have to keep this approach due to 
known bug in XFS - fsync time is nearly twice bigger for recently 
created files.


Best Regards,
Ivan Rakov

On 13.02.2018 21:22, Yakov Zhdanov wrote:

I meant we still will be copying segment once and then will be moving it to
archive which should not affect file system much.

Thoughts?

--Yakov

2018-02-13 21:19 GMT+03:00 Yakov Zhdanov :


Alex,

I remember we had some confusing behavior for WAL archive when archived
segments were required for successful recovery.

Is issue still present?

If yes, what if we copy "uncheckpointed" segments to a directory under wal
directory and then move the segments to archive after checkpoint? Will this
work?

Thanks!

--Yakov





Re: Should we annotate @CacheLocalStore as @Depricated?

2018-02-13 Thread Valentin Kulichenko
Vyacheslav,

These are test classes, there is not reason to put deprecation on them. We
need to deprecate anything that is part of public API (in this case I
believe it's only this annotation, nothing else).

-Val

On Tue, Feb 13, 2018 at 7:09 AM, Vyacheslav Daradur 
wrote:

> Guys, I need your advice about deprecation rules.
>
> Usually, deprecation of annotation doesn't affect on classes which are
> marked by this annotation.
>
> But @CacheLocalStore affect on classes interpretation by Ignite very match.
> For example:
> GridCacheStoreManagerDeserializationTest
> CacheDeploymentTestStoreFactory
>
> Should we annotate such classes as deprecated too?
>
> On Wed, Jan 31, 2018 at 4:07 PM, Vyacheslav Daradur 
> wrote:
> > I filed the ticket [1] and will do it soon.
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-7588
> >
> > On Tue, Jan 30, 2018 at 2:27 PM, Anton Vinogradov
> >  wrote:
> >> +1
> >>
> >> On Tue, Jan 30, 2018 at 9:02 AM, Yakov Zhdanov 
> wrote:
> >>
> >>> +1 for deprecation
> >>>
> >>>
> >>>
> >>> --Yakov
> >>>
> >>> 2018-01-30 1:06 GMT+03:00 Valentin Kulichenko <
> >>> valentin.kuliche...@gmail.com
> >>> >:
> >>>
> >>> > +1
> >>> >
> >>> > On Mon, Jan 29, 2018 at 8:31 AM, Andrey Mashenkov <
> >>> > andrey.mashen...@gmail.com> wrote:
> >>> >
> >>> > > Vyacheslav,
> >>> > >
> >>> > > +1 for dropping @CacheLocalStore.
> >>> > > Ignite have no support 2-phase commit for store and public API
> provides
> >>> > no
> >>> > > methods to users can easily implement it by themselves.
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > > On Mon, Jan 29, 2018 at 7:11 PM, Vyacheslav Daradur <
> >>> daradu...@gmail.com
> >>> > >
> >>> > > wrote:
> >>> > >
> >>> > > > Hi Igniters,
> >>> > > >
> >>> > > > I've worked with Apache Ignite 3rd Party Persistent Storage tools
> >>> > > recently.
> >>> > > >
> >>> > > > I found that use of CacheLocalStore annotation has hidden
> issues, for
> >>> > > > example:
> >>> > > > * rebalancing issues [1]
> >>> > > > * possible data consistency issues [1]
> >>> > > > * handling of CacheLocalStore on clients nodes [2]
> >>> > > >
> >>> > > > Valentin K. considers it necessary to make @CacheLocalStore
> >>> deprecated
> >>> > > > and remove. If we want to have a decentralized persistent
> storage we
> >>> > > > should use Apache Ignite Native Persistence.
> >>> > > >
> >>> > > > If the community supports this decision I will create a new Jira
> >>> issue.
> >>> > > >
> >>> > > > Any thoughts?
> >>> > > >
> >>> > > > [1] http://apache-ignite-developers.2346864.n4.nabble.
> >>> > > > com/Losing-data-during-restarting-cluster-with-
> >>> > > > persistence-enabled-tt24267.html
> >>> > > > [2] http://apache-ignite-developers.2346864.n4.nabble.
> >>> > com/How-to-handle-
> >>> > > > CacheLocalStore-on-clients-node-tt25703.html
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > --
> >>> > > > Best Regards, Vyacheslav D.
> >>> > > >
> >>> > >
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Best regards,
> >>> > > Andrey V. Mashenkov
> >>> > >
> >>> >
> >>>
> >
> >
> >
> > --
> > Best Regards, Vyacheslav D.
>
>
>
> --
> Best Regards, Vyacheslav D.
>


Re: WAL Archive Issue

2018-02-13 Thread Yakov Zhdanov
I meant we still will be copying segment once and then will be moving it to
archive which should not affect file system much.

Thoughts?

--Yakov

2018-02-13 21:19 GMT+03:00 Yakov Zhdanov :

> Alex,
>
> I remember we had some confusing behavior for WAL archive when archived
> segments were required for successful recovery.
>
> Is issue still present?
>
> If yes, what if we copy "uncheckpointed" segments to a directory under wal
> directory and then move the segments to archive after checkpoint? Will this
> work?
>
> Thanks!
>
> --Yakov
>


WAL Archive Issue

2018-02-13 Thread Yakov Zhdanov
Alex,

I remember we had some confusing behavior for WAL archive when archived
segments were required for successful recovery.

Is issue still present?

If yes, what if we copy "uncheckpointed" segments to a directory under wal
directory and then move the segments to archive after checkpoint? Will this
work?

Thanks!

--Yakov


Re: [SparkDataFrame] Query Optimization. Prototype

2018-02-13 Thread Valentin Kulichenko
Nikolay,

Looks like this is because you enabled non-collocated joins. I was not
aware of this limitation though, do we have this documented somewhere?

-Val

On Tue, Feb 13, 2018 at 8:21 AM, Nikolay Izhikov 
wrote:

> Val,
>
> Source code check: https://github.com/apache/ignite/blob/master/modules/
> indexing/src/main/java/org/apache/ignite/internal/processors/query/h2/opt/
> GridH2CollocationModel.java#L382
>
> Stack trace:
>
> javax.cache.CacheException: Failed to prepare distributed join query: join
> condition does not use index [joinedCache=SQL_PUBLIC_JT2, plan=SELECT
> __Z0.ID AS __C0_0,
> __Z0.VAL1 AS __C0_1,
> __Z1.ID AS __C0_2,
> __Z1.VAL2 AS __C0_3
> FROM PUBLIC.JT1 __Z0
> /* PUBLIC.JT1.__SCAN_ */
> INNER JOIN PUBLIC.JT2 __Z1
> /* batched:broadcast PUBLIC.JT2.__SCAN_ */
> ON 1=1
> WHERE __Z0.VAL1 = __Z1.VAL2]
> at org.apache.ignite.internal.processors.query.h2.opt.
> GridH2CollocationModel.joinedWithCollocated(GridH2CollocationModel.java:
> 384)
> at org.apache.ignite.internal.processors.query.h2.opt.
> GridH2CollocationModel.calculate(GridH2CollocationModel.java:308)
> at org.apache.ignite.internal.processors.query.h2.opt.
> GridH2CollocationModel.type(GridH2CollocationModel.java:549)
> at org.apache.ignite.internal.processors.query.h2.opt.
> GridH2CollocationModel.calculate(GridH2CollocationModel.java:257)
> at org.apache.ignite.internal.processors.query.h2.opt.
> GridH2CollocationModel.type(GridH2CollocationModel.java:549)
> at org.apache.ignite.internal.processors.query.h2.opt.
> GridH2CollocationModel.isCollocated(GridH2CollocationModel.java:691)
> at org.apache.ignite.internal.processors.query.h2.sql.
> GridSqlQuerySplitter.split(GridSqlQuerySplitter.java:239)
> at org.apache.ignite.internal.processors.query.h2.
> IgniteH2Indexing.split(IgniteH2Indexing.java:1856)
> at org.apache.ignite.internal.processors.query.h2.
> IgniteH2Indexing.parseAndSplit(IgniteH2Indexing.java:1818)
> at org.apache.ignite.internal.processors.query.h2.
> IgniteH2Indexing.querySqlFields(IgniteH2Indexing.java:1569)
> at org.apache.ignite.internal.processors.query.
> GridQueryProcessor$4.applyx(GridQueryProcessor.java:2037)
> at org.apache.ignite.internal.processors.query.
> GridQueryProcessor$4.applyx(GridQueryProcessor.java:2032)
> at org.apache.ignite.internal.util.lang.IgniteOutClosureX.
> apply(IgniteOutClosureX.java:36)
> at org.apache.ignite.internal.processors.query.GridQueryProcessor.
> executeQuery(GridQueryProcessor.java:2553)
> at org.apache.ignite.internal.processors.query.GridQueryProcessor.
> querySqlFields(GridQueryProcessor.java:2046)
> at org.apache.ignite.internal.processors.cache.
> IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:664)
> at org.apache.ignite.internal.processors.cache.
> IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:615)
> at org.apache.ignite.internal.processors.cache.
> GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:382)
> at org.apache.ignite.spark.JoinTestSpec.execSQL(
> JoinTestSpec.scala:63)
>
>
> В Вт, 13/02/2018 в 08:12 -0800, Valentin Kulichenko пишет:
> > Nikolay,
> >
> > This doesn't make sense to me. Not having an index should not cause
> query to fail. What is the exception?
> >
> > -Val
> >
> > On Tue, Feb 13, 2018 at 8:07 AM, Nikolay Izhikov 
> wrote:
> > > Hello, Valentin.
> > >
> > > > When you're talking about join optimization, what exactly are you
> referring to?
> > >
> > > I'm referring to my PR [1]
> > > Currently, it contains transformation from Spark joins to Ignite joins
> [2]
> > >
> > > But, if I understand Vladimir answer right, for now, we don't *fully*
> support SQL join queries.
> > >
> > > Sometimes it will work just right, in other cases, it will throw an
> exception due Ignite internal implementation.
> > >
> > > Please, see my example [3].
> > > Query from line 4 will throw an exception.
> > > The same query from line 10 will succeed, because of index creation.
> > >
> > > Both of them syntactically correct.
> > >
> > > > Unfortunately, at this moment we do not have complete list of all
> restrictions on our joins, because a lot of work is delegated to H2.
> > > > In some unsupported scenarios we throw an exception.
> > > > In other cases we return incorrect results silently (e.g. if you do
> not co-locate data and forgot to set "distributed joins" flag).
> > > > We have a plan to perform excessive testing of joins (both
> co-located and distributed) and list all known limitations.
> > > > This would require writing a lot of unit tests to cover various
> scenarios.
> > > > I think we will have this information in a matter of 1-2 months.
> > >
> > > [1] https://github.com/apache/ignite/pull/3397
> > > [2] https://github.com/apache/ignite/pull/3397/files#diff-
> 5a861613530bbce650efa50d553a0e92R227
> 

Re: Ignite Teamcity email notifications

2018-02-13 Thread Dmitry Pavlov
Hi Igniters,

Thanks to Peter Ivanov and all involved into this activity. We've enabled
first email notification rule for a subset of the per-commit tests, Ignite
Run All Basic (suites: license & basic & compute and PDS unit test; branch
= master).

Email is sent only if failed build contains your changes. To receive all
emails it is still required to setup your own notification rule in TC
interface. Please do not ignore failures; more failure details will
available by the link in email.

Please contact me if you see any failure in the received letter, but it
seems that your changes are not related to failure. All flaky failures
should be removed from this subset/or fixed. Also feel free to create
investigations.

I hope we will continue to include more test sets in this rule (for this we
need fast and stable suites).

Sincerely,
Dmitriy Pavlov

сб, 15 июл. 2017 г. в 16:50, Denis Magda :

> Excellent improvement for our testing and continuous intergration
> processes, thanks folks!
>
> On Saturday, July 15, 2017, Dmitry Pavlov  wrote:
>
> > Hi Igniters,
> >
> > Thanks to Alexey Chetaev we can now setup personal email notifications
> from
> > the Teamcity about broken builds.
> >
> > To set up notifications you can go to your profile and enable email
> > notification for your own PR and/or master (see the page
> > http://ci.ignite.apache.org/profile.html?item=userNotifications)
> >
> > You can set branch filter for example as follows:
> > +:ignite-2.*
> > +:pull/2296/head
> > And enable notifications on
> > - build fails (only first failing and/or including your changes),
> > - and successful build (optionally only first build after failed).
> >
> > Best Regards,
> > Dmitriy Pavlov
> >
> > пн, 3 июл. 2017 г. в 17:43, Denis Magda  
> > >:
> >
> > > Dmitriy P.,
> > >
> > > The list has been created. Here is a response from ASF:
> > >
> > > As requested by you, the following mailing list has been created:
> > > c...@ignite.apache.org   c...@ignite.apache.org
> > >
> > > Moderators: dma...@apache.org   > dma...@apache.org >,
> > > dsetrak...@apache.org   > >
> > > This list is private.
> > >
> > >
> > >
> > > ---
> > >
> > > The list will start accepting mail in 60 minutes from now.  If it's a
> > > public
> > > list, it will appear on https://lists.apache.org/ <
> > > https://lists.apache.org/> within a few minutes of
> > > the first post to it.
> > >
> > > —
> > > Denis
> > >
> > > > On Jun 29, 2017, at 4:10 PM, Denis Magda  > > wrote:
> > > >
> > > > Trigged the alias creation. Will let you know once it’s ready for
> > usage.
> > > >
> > > > —
> > > > Denis
> > > >
> > > >> On Jun 29, 2017, at 3:51 PM, Dmitriy Setrakyan <
> dsetrak...@apache.org
> > >
> > > wrote:
> > > >>
> > > >> I like this alias.
> > > >>
> > > >> On Thu, Jun 29, 2017 at 3:43 PM, Denis Magda  > > wrote:
> > > >>
> > > >>> What’s about c...@ignite.apache.org   > c...@ignite.apache.org >? If
> > > there
> > > >>> are no objections I’ll create the alias.
> > > >>>
> > > >>> —
> > > >>> Denis
> > > >>>
> > >  On Jun 29, 2017, at 11:53 AM, Dmitry Pavlov <
> dpavlov@gmail.com
> > >
> > > >>> wrote:
> > > 
> > >  Hi Dmitriy,
> > > 
> > >  At the first step we need some email address to send notifications
> > > from
> > >  behalf of TeamCity. We need to set up SMTP server, username and
> > > password.
> > >  Later users may set up personal notification rules ( see the link
> > >  http://ci.ignite.apache.org/profile.html?item=userNotifications).
> > > 
> > >  Teamcity takes into account last commits in branch and sends
> > > >>> notifications
> > >  separately and only to users which may break the build (option
> name
> > > >>> 'builds
> > >  containing my changes').
> > > 
> > >  Best Regards,
> > >  Dmitriy Pavlov
> > > 
> > > 
> > >  чт, 29 июн. 2017 г. в 21:21, Dmitriy Setrakyan <
> > dsetrak...@apache.org 
> > > >:
> > > 
> > > > Dmitry, I don't think ign...@apache.org  is a
> valid
> > email address.
> > > Why
> > > > creating a mailing list for TC is not a good option?
> > > >
> > > > On Thu, Jun 29, 2017 at 8:17 AM, Dmitry Pavlov <
> > > dpavlov@gmail.com >
> > > > wrote:
> > > >
> > > >> Hi Igniters,
> > > >>
> > > >> I want to set up a email notifications from the public Teamcity
> > > about
> > > >> broken builds. But there is no configured address and email
> > > account. To
> > > > set
> > > >> up notifications we need some mail box (account) to send
> messages
> > > from.
> > > >>
> > > >> I asked the apache 

[GitHub] ignite pull request #3516: IGNITE-7695: Enable Ignite Update Notifier tests

2018-02-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/ignite/pull/3516


---


Re: Looks like a bug in ServerImpl.joinTopology()

2018-02-13 Thread Yakov Zhdanov
Alex, you can alter ServerImpl and insert a latch or thread.sleep(xxx)
anywhere you like to show the incorrect behavior you describe.

--Yakov


[GitHub] ignite pull request #3516: IGNITE-7695: Enable Ignite Update Notifier tests

2018-02-13 Thread dspavlov
GitHub user dspavlov opened a pull request:

https://github.com/apache/ignite/pull/3516

IGNITE-7695: Enable Ignite Update Notifier tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-7695

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3516.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3516


commit e2850e2d6116ccdea809bb925aa90cf530e817ee
Author: dpavlov 
Date:   2018-02-13T17:16:51Z

IGNITE-7695: Enable Ignite Update Notifier tests




---


[jira] [Created] (IGNITE-7695) Enable Ignite Update Notifier tests

2018-02-13 Thread Dmitriy Pavlov (JIRA)
Dmitriy Pavlov created IGNITE-7695:
--

 Summary: Enable Ignite Update Notifier tests
 Key: IGNITE-7695
 URL: https://issues.apache.org/jira/browse/IGNITE-7695
 Project: Ignite
  Issue Type: Task
Reporter: Dmitriy Pavlov
Assignee: Dmitriy Pavlov


org.apache.ignite.internal.GridVersionSelfTest#testVersions
org.apache.ignite.internal.IgniteUpdateNotifierPerClusterSettingSelfTest#testNotifierEnabledForCluster

and unmute on TC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] ignite pull request #3515: IGNITE-7640: make DiscoveryDataClusterState immut...

2018-02-13 Thread SharplEr
GitHub user SharplEr opened a pull request:

https://github.com/apache/ignite/pull/3515

IGNITE-7640: make DiscoveryDataClusterState immutable

for CI testing

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SharplEr/ignite ignite-7640

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3515.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3515


commit 9a52c8a851006bea1c647f694df8ea398da50595
Author: Alexander Menshikov 
Date:   2018-02-13T16:15:36Z

make DiscoveryDataClusterState immutable




---


[jira] [Created] (IGNITE-7694) testActiveClientReconnectToInactiveCluster hangs because of an assertion

2018-02-13 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-7694:


 Summary: testActiveClientReconnectToInactiveCluster hangs because 
of an assertion
 Key: IGNITE-7694
 URL: https://issues.apache.org/jira/browse/IGNITE-7694
 Project: Ignite
  Issue Type: Bug
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [SparkDataFrame] Query Optimization. Prototype

2018-02-13 Thread Nikolay Izhikov
Val, 

Source code check: 
https://github.com/apache/ignite/blob/master/modules/indexing/src/main/java/org/apache/ignite/internal/processors/query/h2/opt/GridH2CollocationModel.java#L382

Stack trace:

javax.cache.CacheException: Failed to prepare distributed join query: join 
condition does not use index [joinedCache=SQL_PUBLIC_JT2, plan=SELECT
__Z0.ID AS __C0_0,
__Z0.VAL1 AS __C0_1,
__Z1.ID AS __C0_2,
__Z1.VAL2 AS __C0_3
FROM PUBLIC.JT1 __Z0
/* PUBLIC.JT1.__SCAN_ */
INNER JOIN PUBLIC.JT2 __Z1
/* batched:broadcast PUBLIC.JT2.__SCAN_ */
ON 1=1
WHERE __Z0.VAL1 = __Z1.VAL2]
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.joinedWithCollocated(GridH2CollocationModel.java:384)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.calculate(GridH2CollocationModel.java:308)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.type(GridH2CollocationModel.java:549)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.calculate(GridH2CollocationModel.java:257)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.type(GridH2CollocationModel.java:549)
at 
org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.isCollocated(GridH2CollocationModel.java:691)
at 
org.apache.ignite.internal.processors.query.h2.sql.GridSqlQuerySplitter.split(GridSqlQuerySplitter.java:239)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.split(IgniteH2Indexing.java:1856)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.parseAndSplit(IgniteH2Indexing.java:1818)
at 
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.querySqlFields(IgniteH2Indexing.java:1569)
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor$4.applyx(GridQueryProcessor.java:2037)
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor$4.applyx(GridQueryProcessor.java:2032)
at 
org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36)
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:2553)
at 
org.apache.ignite.internal.processors.query.GridQueryProcessor.querySqlFields(GridQueryProcessor.java:2046)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:664)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:615)
at 
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:382)
at org.apache.ignite.spark.JoinTestSpec.execSQL(JoinTestSpec.scala:63)


В Вт, 13/02/2018 в 08:12 -0800, Valentin Kulichenko пишет:
> Nikolay,
> 
> This doesn't make sense to me. Not having an index should not cause query to 
> fail. What is the exception?
> 
> -Val
> 
> On Tue, Feb 13, 2018 at 8:07 AM, Nikolay Izhikov  wrote:
> > Hello, Valentin.
> > 
> > > When you're talking about join optimization, what exactly are you 
> > > referring to?
> > 
> > I'm referring to my PR [1]
> > Currently, it contains transformation from Spark joins to Ignite joins [2]
> > 
> > But, if I understand Vladimir answer right, for now, we don't *fully* 
> > support SQL join queries.
> > 
> > Sometimes it will work just right, in other cases, it will throw an 
> > exception due Ignite internal implementation.
> > 
> > Please, see my example [3].
> > Query from line 4 will throw an exception.
> > The same query from line 10 will succeed, because of index creation.
> > 
> > Both of them syntactically correct.
> > 
> > > Unfortunately, at this moment we do not have complete list of all 
> > > restrictions on our joins, because a lot of work is delegated to H2.
> > > In some unsupported scenarios we throw an exception.
> > > In other cases we return incorrect results silently (e.g. if you do not 
> > > co-locate data and forgot to set "distributed joins" flag).
> > > We have a plan to perform excessive testing of joins (both co-located and 
> > > distributed) and list all known limitations.
> > > This would require writing a lot of unit tests to cover various scenarios.
> > > I think we will have this information in a matter of 1-2 months.
> > 
> > [1] https://github.com/apache/ignite/pull/3397
> > [2] 
> > https://github.com/apache/ignite/pull/3397/files#diff-5a861613530bbce650efa50d553a0e92R227
> > [3] https://gist.github.com/nizhikov/a4389fd78636869dd38c13920b5baf2b
> > 
> > В Пн, 12/02/2018 в 13:45 -0800, Valentin Kulichenko пишет:
> > > Nikolay,
> > >
> > > When you're talking about join optimization, what exactly are you 
> > > referring to?
> > >
> > > Since other parts of data frames integration are already merged, I think 
> > > it's a good time to resurrect this 

[jira] [Created] (IGNITE-7693) New node joining via ZookeeperDiscoverySpi should print out its ZooKeeper sessionId

2018-02-13 Thread Sergey Chugunov (JIRA)
Sergey Chugunov created IGNITE-7693:
---

 Summary: New node joining via ZookeeperDiscoverySpi should print 
out its ZooKeeper sessionId
 Key: IGNITE-7693
 URL: https://issues.apache.org/jira/browse/IGNITE-7693
 Project: Ignite
  Issue Type: Improvement
Reporter: Sergey Chugunov


For now there is no way to match Ignite nodes joining to Ignite cluster with 
log entries in ZooKeeper nodes' logs.

In ZooKeeper logs there are entries like this:
{noformat}
myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@687] - Established session 
0x161575d88530007 with negotiated timeout 1 for client 
/:{noformat}
but it is hard to match them with Ignite nodes when there are several started 
on the same host.

If Ignite node prints out its session on join it makes correlating them with 
particular ZooKeeper instance much easier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [SparkDataFrame] Query Optimization. Prototype

2018-02-13 Thread Valentin Kulichenko
Nikolay,

This doesn't make sense to me. Not having an index should not cause query
to fail. What is the exception?

-Val

On Tue, Feb 13, 2018 at 8:07 AM, Nikolay Izhikov 
wrote:

> Hello, Valentin.
>
> > When you're talking about join optimization, what exactly are you
> referring to?
>
> I'm referring to my PR [1]
> Currently, it contains transformation from Spark joins to Ignite joins [2]
>
> But, if I understand Vladimir answer right, for now, we don't *fully*
> support SQL join queries.
>
> Sometimes it will work just right, in other cases, it will throw an
> exception due Ignite internal implementation.
>
> Please, see my example [3].
> Query from line 4 will throw an exception.
> The same query from line 10 will succeed, because of index creation.
>
> Both of them syntactically correct.
>
> > Unfortunately, at this moment we do not have complete list of all
> restrictions on our joins, because a lot of work is delegated to H2.
> > In some unsupported scenarios we throw an exception.
> > In other cases we return incorrect results silently (e.g. if you do not
> co-locate data and forgot to set "distributed joins" flag).
> > We have a plan to perform excessive testing of joins (both co-located
> and distributed) and list all known limitations.
> > This would require writing a lot of unit tests to cover various
> scenarios.
> > I think we will have this information in a matter of 1-2 months.
>
> [1] https://github.com/apache/ignite/pull/3397
> [2] https://github.com/apache/ignite/pull/3397/files#diff-
> 5a861613530bbce650efa50d553a0e92R227
> [3] https://gist.github.com/nizhikov/a4389fd78636869dd38c13920b5baf2b
>
> В Пн, 12/02/2018 в 13:45 -0800, Valentin Kulichenko пишет:
> > Nikolay,
> >
> > When you're talking about join optimization, what exactly are you
> referring to?
> >
> > Since other parts of data frames integration are already merged, I think
> it's a good time to resurrect this thread? Does it make sense to review it
> right now? Or you want to make some more changes?
> >
> > -Val
> >
> > On Mon, Feb 12, 2018 at 12:20 AM, Vladimir Ozerov 
> wrote:
> > > Hi Nikolay,
> > >
> > > I am not sure if ticket for DECIMAL column metadata exists. If you
> haven't find one under "sql" component, please feel free to create it on
> your own. As far as testing of joins, I think it makes sense to start
> working on it when we finish ANSI compliance testing which is already in
> progress.
> > >
> > > On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <
> nizhikov@gmail.com> wrote:
> > > > Hello, Vladimir.
> > > >
> > > > Thank you for an answer.
> > > >
> > > > > Do you mean whether it is possible to read it from table metadata?
> > > >
> > > > Yes, you are right.
> > > > I want to read scale and precision of DECIMAL column from table
> metadata.
> > > >
> > > > > This will be fixed at some point in future, but I do not have any
> dates at the moment.
> > > >
> > > > Is there ticket for it? I can't find it via jira search
> > > >
> > > > > at this moment we do not have complete list of all restrictions on
> our joins, because a lot of work is delegated to H2.
> > > > > In some unsupported scenarios we throw an exception.
> > > > > In other cases we return incorrect results silently (e.g. if you
> do not co-locate data and forgot to set "distributed joins" flag).
> > > >
> > > > Guys, Val, may be we should exclude join optimization from
> IGNITE-7077 while we haven't all limitation on the hand?
> > > >
> > > > > We have a plan to perform excessive testing of joins (both
> co-located and distributed) and list all known limitations.
> > > >
> > > > Can I help somehow with this activity?
> > > >
> > > >
> > > > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > > > Hi Nikolay,
> > > > >
> > > > > Could you please clarify your question about scale and precision?
> Do you mean whether it is possible to read it from table metadata? If yes,
> it is not possible at the moment unfortunately - we do not store
> information about lengths, scales and precision, only actual data types are
> passed to H2 (e.g. String, BigDecimal, etc.). This will be fixed at some
> point in future, but I do not have any dates at the moment.
> > > > >
> > > > > Now about joins - Denis, I think you provided wrong link to our
> internal GridGain docs where we accumulate information about ANSI
> compatibility and which will are going to publish on Ignite WIKI when it is
> ready. In any case, this is not what Nikolay aksed about. The question was
> about limitation of our joins which has nothing to do with ANSI standard.
> Unfortunately, at this moment we do not have complete list of all
> restrictions on our joins, because a lot of work is delegated to H2. In
> some unsupported scenarios we throw an exception. In other cases we return
> incorrect results silently (e.g. if you do not co-locate data and forgot to
> set "distributed joins" flag). We have a plan to perform excessive testing

Re: [SparkDataFrame] Query Optimization. Prototype

2018-02-13 Thread Valentin Kulichenko
Sounds good. Let me know when you feel it's ready and I'll take a look.

-Val

On Tue, Feb 13, 2018 at 7:56 AM, Nikolay Izhikov 
wrote:

> Hello, Valentin.
>
> > Since other parts of data frames integration are already merged,
> > I think it's a good time to resurrect this thread?
> > Does it make sense to review it right now?
> > Or you want to make some more changes?
>
> I've already merged PR [1] with current master.
> So you can review it, if you with.
>
> But I need couple of days to double check all changes.
> Extends scaladoc, etc.
>
> I will write you when PR is fully ready.
>
> [1] https://github.com/apache/ignite/pull/3397
>
>
>
> В Пн, 12/02/2018 в 13:45 -0800, Valentin Kulichenko пишет:
> > Nikolay,
> >
> > When you're talking about join optimization, what exactly are you
> referring to?
> >
> > Since other parts of data frames integration are already merged, I think
> it's a good time to resurrect this thread? Does it make sense to review it
> right now? Or you want to make some more changes?
> >
> > -Val
> >
> > On Mon, Feb 12, 2018 at 12:20 AM, Vladimir Ozerov 
> wrote:
> > > Hi Nikolay,
> > >
> > > I am not sure if ticket for DECIMAL column metadata exists. If you
> haven't find one under "sql" component, please feel free to create it on
> your own. As far as testing of joins, I think it makes sense to start
> working on it when we finish ANSI compliance testing which is already in
> progress.
> > >
> > > On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <
> nizhikov@gmail.com> wrote:
> > > > Hello, Vladimir.
> > > >
> > > > Thank you for an answer.
> > > >
> > > > > Do you mean whether it is possible to read it from table metadata?
> > > >
> > > > Yes, you are right.
> > > > I want to read scale and precision of DECIMAL column from table
> metadata.
> > > >
> > > > > This will be fixed at some point in future, but I do not have any
> dates at the moment.
> > > >
> > > > Is there ticket for it? I can't find it via jira search
> > > >
> > > > > at this moment we do not have complete list of all restrictions on
> our joins, because a lot of work is delegated to H2.
> > > > > In some unsupported scenarios we throw an exception.
> > > > > In other cases we return incorrect results silently (e.g. if you
> do not co-locate data and forgot to set "distributed joins" flag).
> > > >
> > > > Guys, Val, may be we should exclude join optimization from
> IGNITE-7077 while we haven't all limitation on the hand?
> > > >
> > > > > We have a plan to perform excessive testing of joins (both
> co-located and distributed) and list all known limitations.
> > > >
> > > > Can I help somehow with this activity?
> > > >
> > > >
> > > > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > > > Hi Nikolay,
> > > > >
> > > > > Could you please clarify your question about scale and precision?
> Do you mean whether it is possible to read it from table metadata? If yes,
> it is not possible at the moment unfortunately - we do not store
> information about lengths, scales and precision, only actual data types are
> passed to H2 (e.g. String, BigDecimal, etc.). This will be fixed at some
> point in future, but I do not have any dates at the moment.
> > > > >
> > > > > Now about joins - Denis, I think you provided wrong link to our
> internal GridGain docs where we accumulate information about ANSI
> compatibility and which will are going to publish on Ignite WIKI when it is
> ready. In any case, this is not what Nikolay aksed about. The question was
> about limitation of our joins which has nothing to do with ANSI standard.
> Unfortunately, at this moment we do not have complete list of all
> restrictions on our joins, because a lot of work is delegated to H2. In
> some unsupported scenarios we throw an exception. In other cases we return
> incorrect results silently (e.g. if you do not co-locate data and forgot to
> set "distributed joins" flag). We have a plan to perform excessive testing
> of joins (both co-located and distributed) and list all known limitations.
> This would require writing a lot of unit tests to cover various scenarios.
> I think we will have this information in a matter of 1-2 months.
> > > > >
> > > > > Vladimir.
> > > > >
> > > > > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda 
> wrote:
> > > > > > Agree. The unsupported functions should be mentioned on the page
> that will cover Ignite ANSI-99 compliance. We have first results available
> for CORE features of the specification:
> > > > > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/
> 45093646/ANSI+SQL+99  net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99>
> > > > > >
> > > > > > That’s on my radar. I’ll take care of this.
> > > > > >
> > > > > > —
> > > > > > Denis
> > > > > >
> > > > > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan <
> dsetrak...@apache.org> wrote:
> > > > > > >
> > > > > > > I think we need a page listing the 

Re: [SparkDataFrame] Query Optimization. Prototype

2018-02-13 Thread Nikolay Izhikov
Hello, Valentin.

> Since other parts of data frames integration are already merged, 
> I think it's a good time to resurrect this thread? 
> Does it make sense to review it right now? 
> Or you want to make some more changes?

I've already merged PR [1] with current master.
So you can review it, if you with.

But I need couple of days to double check all changes. 
Extends scaladoc, etc.

I will write you when PR is fully ready.

[1] https://github.com/apache/ignite/pull/3397



В Пн, 12/02/2018 в 13:45 -0800, Valentin Kulichenko пишет:
> Nikolay,
> 
> When you're talking about join optimization, what exactly are you referring 
> to?
> 
> Since other parts of data frames integration are already merged, I think it's 
> a good time to resurrect this thread? Does it make sense to review it right 
> now? Or you want to make some more changes?
> 
> -Val
> 
> On Mon, Feb 12, 2018 at 12:20 AM, Vladimir Ozerov  
> wrote:
> > Hi Nikolay,
> > 
> > I am not sure if ticket for DECIMAL column metadata exists. If you haven't 
> > find one under "sql" component, please feel free to create it on your own. 
> > As far as testing of joins, I think it makes sense to start working on it 
> > when we finish ANSI compliance testing which is already in progress.
> > 
> > On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov  
> > wrote:
> > > Hello, Vladimir.
> > > 
> > > Thank you for an answer.
> > > 
> > > > Do you mean whether it is possible to read it from table metadata?
> > > 
> > > Yes, you are right.
> > > I want to read scale and precision of DECIMAL column from table metadata.
> > > 
> > > > This will be fixed at some point in future, but I do not have any dates 
> > > > at the moment.
> > > 
> > > Is there ticket for it? I can't find it via jira search
> > > 
> > > > at this moment we do not have complete list of all restrictions on our 
> > > > joins, because a lot of work is delegated to H2.
> > > > In some unsupported scenarios we throw an exception.
> > > > In other cases we return incorrect results silently (e.g. if you do not 
> > > > co-locate data and forgot to set "distributed joins" flag).
> > > 
> > > Guys, Val, may be we should exclude join optimization from IGNITE-7077 
> > > while we haven't all limitation on the hand?
> > > 
> > > > We have a plan to perform excessive testing of joins (both co-located 
> > > > and distributed) and list all known limitations.
> > > 
> > > Can I help somehow with this activity?
> > > 
> > > 
> > > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > > Hi Nikolay,
> > > >
> > > > Could you please clarify your question about scale and precision? Do 
> > > > you mean whether it is possible to read it from table metadata? If yes, 
> > > > it is not possible at the moment unfortunately - we do not store 
> > > > information about lengths, scales and precision, only actual data types 
> > > > are passed to H2 (e.g. String, BigDecimal, etc.). This will be fixed at 
> > > > some point in future, but I do not have any dates at the moment.
> > > >
> > > > Now about joins - Denis, I think you provided wrong link to our 
> > > > internal GridGain docs where we accumulate information about ANSI 
> > > > compatibility and which will are going to publish on Ignite WIKI when 
> > > > it is ready. In any case, this is not what Nikolay aksed about. The 
> > > > question was about limitation of our joins which has nothing to do with 
> > > > ANSI standard. Unfortunately, at this moment we do not have complete 
> > > > list of all restrictions on our joins, because a lot of work is 
> > > > delegated to H2. In some unsupported scenarios we throw an exception. 
> > > > In other cases we return incorrect results silently (e.g. if you do not 
> > > > co-locate data and forgot to set "distributed joins" flag). We have a 
> > > > plan to perform excessive testing of joins (both co-located and 
> > > > distributed) and list all known limitations. This would require writing 
> > > > a lot of unit tests to cover various scenarios. I think we will have 
> > > > this information in a matter of 1-2 months.
> > > >
> > > > Vladimir.
> > > >
> > > > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda  wrote:
> > > > > Agree. The unsupported functions should be mentioned on the page that 
> > > > > will cover Ignite ANSI-99 compliance. We have first results available 
> > > > > for CORE features of the specification:
> > > > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99
> > > > >  
> > > > > 
> > > > >
> > > > > That’s on my radar. I’ll take care of this.
> > > > >
> > > > > —
> > > > > Denis
> > > > >
> > > > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan 
> > > > > >  wrote:
> > > > > >
> > > > > > I think we need a page listing the unsupported functions with 
> > > > > > explanation
> > > > > > why, which is 

[jira] [Created] (IGNITE-7692) affinityCall and affinityRun may execute code on backup partitions

2018-02-13 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-7692:


 Summary: affinityCall and affinityRun may execute code on backup 
partitions
 Key: IGNITE-7692
 URL: https://issues.apache.org/jira/browse/IGNITE-7692
 Project: Ignite
  Issue Type: Bug
Affects Versions: 2.1
Reporter: Alexey Goncharuk
 Fix For: 2.5


Apparently, the affinityCall and affinityRun methods reserve partitions and 
check their state to be OWNING, however, if topology changes and partition role 
is changed to backup from primary, the code is still executed.

This can be an issue if a user executes a local SQL query inside the 
affinityCall runnable. In this case, the query result may return null.

This can be observed in the 
IgniteCacheLockPartitionOnAffinityRunTest#getPersonsCountSingleCache - note an 
additional check I've added to make the test pass.

I think it is ok to have an old semantics for the API, because in some cases 
(scan query, local gets) a backup OWNER is enough. However, it looks like we 
need to add another API method to enforce that affinity run be executed on 
primary nodes and forbid primary role change.
Another option is to detect a topology version of the affinity run and use that 
version for local SQL queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [SparkDataFrame] Query Optimization. Prototype

2018-02-13 Thread Nikolay Izhikov
Hello, Vladimir.

I've created ticket

https://issues.apache.org/jira/browse/IGNITE-7691

В Пн, 12/02/2018 в 11:20 +0300, Vladimir Ozerov пишет:
> Hi Nikolay,
> 
> I am not sure if ticket for DECIMAL column metadata exists. If you haven't 
> find one under "sql" component, please feel free to create it on your own. As 
> far as testing of joins, I think it makes sense to start working on it when 
> we finish ANSI compliance testing which is already in progress.
> 
> On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov  
> wrote:
> > Hello, Vladimir.
> > 
> > Thank you for an answer.
> > 
> > > Do you mean whether it is possible to read it from table metadata?
> > 
> > Yes, you are right.
> > I want to read scale and precision of DECIMAL column from table metadata.
> > 
> > > This will be fixed at some point in future, but I do not have any dates 
> > > at the moment.
> > 
> > Is there ticket for it? I can't find it via jira search
> > 
> > > at this moment we do not have complete list of all restrictions on our 
> > > joins, because a lot of work is delegated to H2.
> > > In some unsupported scenarios we throw an exception.
> > > In other cases we return incorrect results silently (e.g. if you do not 
> > > co-locate data and forgot to set "distributed joins" flag).
> > 
> > Guys, Val, may be we should exclude join optimization from IGNITE-7077 
> > while we haven't all limitation on the hand?
> > 
> > > We have a plan to perform excessive testing of joins (both co-located and 
> > > distributed) and list all known limitations.
> > 
> > Can I help somehow with this activity?
> > 
> > 
> > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > Hi Nikolay,
> > >
> > > Could you please clarify your question about scale and precision? Do you 
> > > mean whether it is possible to read it from table metadata? If yes, it is 
> > > not possible at the moment unfortunately - we do not store information 
> > > about lengths, scales and precision, only actual data types are passed to 
> > > H2 (e.g. String, BigDecimal, etc.). This will be fixed at some point in 
> > > future, but I do not have any dates at the moment.
> > >
> > > Now about joins - Denis, I think you provided wrong link to our internal 
> > > GridGain docs where we accumulate information about ANSI compatibility 
> > > and which will are going to publish on Ignite WIKI when it is ready. In 
> > > any case, this is not what Nikolay aksed about. The question was about 
> > > limitation of our joins which has nothing to do with ANSI standard. 
> > > Unfortunately, at this moment we do not have complete list of all 
> > > restrictions on our joins, because a lot of work is delegated to H2. In 
> > > some unsupported scenarios we throw an exception. In other cases we 
> > > return incorrect results silently (e.g. if you do not co-locate data and 
> > > forgot to set "distributed joins" flag). We have a plan to perform 
> > > excessive testing of joins (both co-located and distributed) and list all 
> > > known limitations. This would require writing a lot of unit tests to 
> > > cover various scenarios. I think we will have this information in a 
> > > matter of 1-2 months.
> > >
> > > Vladimir.
> > >
> > > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda  wrote:
> > > > Agree. The unsupported functions should be mentioned on the page that 
> > > > will cover Ignite ANSI-99 compliance. We have first results available 
> > > > for CORE features of the specification:
> > > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99
> > > >  
> > > > 
> > > >
> > > > That’s on my radar. I’ll take care of this.
> > > >
> > > > —
> > > > Denis
> > > >
> > > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan 
> > > > >  wrote:
> > > > >
> > > > > I think we need a page listing the unsupported functions with 
> > > > > explanation
> > > > > why, which is either it does not make sense in Ignite or is planned in
> > > > > future release.
> > > > >
> > > > > Sergey, do you think you will be able to do it?
> > > > >
> > > > > D.
> > > > >
> > > > > On Tue, Jan 23, 2018 at 12:05 AM, Serge Puchnin 
> > > > > 
> > > > > wrote:
> > > > >
> > > > >> yes, the Cust function is supporting both Ignite and H2.
> > > > >>
> > > > >> I've updated the documentation for next system functions:
> > > > >> CASEWHEN Function, CAST, CONVERT, TABLE
> > > > >>
> > > > >> https://apacheignite-sql.readme.io/docs/system-functions
> > > > >>
> > > > >> And for my mind, next functions aren't applicable for Ignite:
> > > > >> ARRAY_GET, ARRAY_LENGTH, ARRAY_CONTAINS, CSVREAD, CSVWRITE, DATABASE,
> > > > >> DATABASE_PATH, DISK_SPACE_USED, FILE_READ, FILE_WRITE, LINK_SCHEMA,
> > > > >> MEMORY_FREE, MEMORY_USED, LOCK_MODE, LOCK_TIMEOUT, READONLY, CURRVAL,
> > > > >> AUTOCOMMIT, CANCEL_SESSION, IDENTITY, NEXTVAL, 

[jira] [Created] (IGNITE-7691) Provide info about DECIMAL column scale and precision

2018-02-13 Thread Nikolay Izhikov (JIRA)
Nikolay Izhikov created IGNITE-7691:
---

 Summary: Provide info about DECIMAL column scale and precision
 Key: IGNITE-7691
 URL: https://issues.apache.org/jira/browse/IGNITE-7691
 Project: Ignite
  Issue Type: Improvement
  Components: sql
Affects Versions: 2.4
Reporter: Nikolay Izhikov
Assignee: Nikolay Izhikov
 Fix For: 2.5


Currently, it impossible to obtain scale and precision of DECIMAL column from 
sql table metadata.
Ignite should provide those type of meta information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: IGNITE-5714 Context switching for pessimistic transactions

2018-02-13 Thread Nikolay Izhikov
Hello, Alexey.

Could you please, write little more about your implementation

1. Design of implementation.

2. Advantages of you implementation in compare with other ways?

3. Transactions performance penalties/improvements?

В Вт, 13/02/2018 в 14:17 +, ALEKSEY KUZNETSOV пишет:
>  Hi, Igniters!
> 
>  Currently we have context switching implemented for optimistic
> transactions [1].
> 
>  Goal of the current ticket is to support transaction suspend()\resume()
> operations for pessimistic transactions.
> 
>  The essential problem with them lies in *IgniteTxAdapter#threadId*.
>  Thread id is set when transaction is created and afterwards is transferred
> between nodes by GridDistributedTx requests\responses when we perform
> put\get operations.
>  When we suspend and resume transaction, thread id is got changed locally,
> but not on remote nodes.
> 
>  In ticket I decided to partly remove thread id from source, and introduced
> *undefined* value for it, where its value must be ignored.
>  Another solution can be to replace thread id usage with some new global
> transaction id counter.
> 
>  The former solution has advantages :
>  compatibility is preserved, step-by-step clear implementation, minimal
> changes to explicit cache lock work(it still creates candidates with
> not-null thread id) as opposed to the last solution.
> 
>  There are 3 possible solutions to "thread id on remote nodes" issue :
>  1) Change thread id on remote nodes once suspend()\resume() is called.
>  2) Get rid of sending thread id to remote nodes.
>  3) Don’t remove the field, just put -1 (undefined) in it.
> 
>  The last option was chosen, because it will save compatibility in cluster
> with nodes of older versions.
>  Note that still outside the transaction, when explicit cache lock is
> created, thread id is set not null value in lock request(i.e.
> GridNearLockRequest).
> 
>  Thread id is moved from global IgniteTxAdapter to GridNearTxLocal, as long
> as only *near local* transaction is need it.
>  For instance, when local candidate(either near local or dht local) is
> created for GridNearTxLocal. Note that remote candidates are created with
> thread id undefined, because it useless for non-local candidates.
>  In IgniteTxAdapter#ownsLock thread id is replaced with tx version check.
>  We could do it, because near transactions has got unique versions to check
> against.
> 
>  In tx synchronizer GridCacheTxFinishSync thread id is replaced with tx
> version, so we don't need to store it and send by GridFinishResponse
> messages.
>  As a consequence, thread id is also removed from grid near finish\prepare
> request\response.
> 
>  Also, thread id information is removed from deadlock messages (in
> TxDeadlock, TxDeadlockDetection).
> 
> Please, review it:
> 
> ticket *https://issues.apache.org/jira/browse/IGNITE-5714
> *
> pull request *https://github.com/apache/ignite/pull/2789
> *
> review https://reviews.ignite.apache.org/ignite/review/IGNT-CR-364
> 
>  [1] : https://issues.apache.org/jira/browse/IGNITE-5712.

signature.asc
Description: This is a digitally signed message part


[jira] [Created] (IGNITE-7690) Move shared memory suite (IpcSharedMemoryCrashDetectionSelfTest) to Ignite Basic 2

2018-02-13 Thread Dmitriy Pavlov (JIRA)
Dmitriy Pavlov created IGNITE-7690:
--

 Summary: Move shared memory suite 
(IpcSharedMemoryCrashDetectionSelfTest) to Ignite Basic 2
 Key: IGNITE-7690
 URL: https://issues.apache.org/jira/browse/IGNITE-7690
 Project: Ignite
  Issue Type: Task
Reporter: Dmitriy Pavlov
Assignee: Ivan Rakov


Test is flaky but included into basic (stable) suite
   Ignite Basic [ tests 1 ] i 
 org.apache.ignite.testsuites.IgniteBasicTestSuite: 
org.apache.ignite.internal.util.ipc.shmem.IpcSharedMemoryCrashDetectionSelfTest.testIgfsServerClientInteractionsUponClientKilling
 (fail rate 2%) 

https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-2464527718752484555=%3Cdefault%3E=testDetails


It is better to move this test to basic 2 suite - place for flaky and long 
running tests.

It is also desired to introduce IgniteBasic2 suite in code with correct 
comments on purpose



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] ignite pull request #3514: IGNITE-7481 suspended tx timeout rollback fix

2018-02-13 Thread voipp
GitHub user voipp opened a pull request:

https://github.com/apache/ignite/pull/3514

IGNITE-7481 suspended tx timeout rollback fix



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/voipp/ignite IGNITE-7481

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3514.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3514


commit ef84639e627b3a7060da960dd61c21a6ffebdbf1
Author: voipp 
Date:   2018-02-13T15:15:00Z

IGNITE-7481 suspended tx timeout rollback fix




---


Re: Page Locking vs Entry-level Locking

2018-02-13 Thread Dmitry Pavlov
Hi John,

Entry level locks still exist. I've described locks coming from page
memory, page memory and its locks is lower-level abstraction than entries
level.

Segment is part of region consist from number of pages.

1 region - * segments, 1 segment - * pages.

Segment lock is performed when set of pages in this segment is changed
(page rotated with disk - this process named page replacement, or new
additional page is created - allocatePage).

Segments intended to decrease contention to page sets, by default number of
segments is coming from CPU count.

Some info can be found here
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Regionandsegmentstructure

Unfortunately there is no picture there about segments, but I hope some day
it will be created.

Sincerely,
Dmitriy Pavlov

вт, 13 февр. 2018 г. в 16:51, John Wilson :

> Hi Pavlov,
>
> Thanks for your explanation. However, I still have these questions:
>
>
>1. If locking is per page-level and there are no entry-level locks,
>then why does the documentation here talk about having entry-level
>transaction locks? https://apacheignite.readme.io/docs/transactions
>2. I'm not clear with what "locking a durable memory region segment "
>means, what is a segment? Does a page contain multiple segments or a
>segment contains multiple pages? The mental picture I have is: a memory
>region is divided into pages (which can be meta data pages, index, or data
>pages). Index pages hold links to data page id and offset for key-value
>pairs while data pages contain the actual key-value pairs. So, what really
>is a segment and what does locking a memory segment means? I understanding
>page-locks, what is segment lock?
>
> Appreciate your response!!!
>
> On Tue, Feb 13, 2018 at 2:32 AM, Dmitry Pavlov 
> wrote:
>
>> Hi John,
>>
>> 1. No, content modification require lock holding to page to provide
>> consitency in multithreaded environment.
>> 2. Page is locked for read before reading its content, and unlocked after.
>> Same for lock for write for writting. 1 writer or N readers allowed for
>> page. On unlock write lock, dirty flag for page may be set if there data
>> was actually modified.
>> 3. Lock has per-page basis, additional striping by any offest within page
>> is not required accoring to tests.
>>
>> Only one contention is observed sometimes in high load test, it is
>> contention of threads to lock to durable memory region segment. But this
>> situation can be handled by setting concurrenclyLevel in
>> DataStorageConfiguration.
>>
>> Sincerely,
>> Dmitriy Pavlov
>>
>>
>> вт, 13 февр. 2018 г. в 9:56, John Wilson :
>>
>> > Hi,
>> >
>> > Ignite documentation talks about entry-level locks and the page
>> structure
>> > has a LOCK_OFFSET that I assume is used to store tag info. I have these
>> > questions.
>> >
>> >1. Does Ignite use a lock-free implementation to lock pages and/or
>> >entries?
>> >2. When is a page locked and when is it released?
>> >3. When an entry is inserted/modified in a page, is the page locked
>> >(forbidding other threads from inserting entries in the page)? or
>> only
>> > the
>> >entry's offset is locked? (allowing other threads to insert/modify
>> other
>> >items)
>> >
>> > Thanks!
>> >
>>
>
>


API to enlist running user tasks

2018-02-13 Thread Nikolay Izhikov
Hello, Igniters. 

We have some requests from users [1] to have ability to get list of all running 
continuous queries.
I propose to implement such ability. 

To implement it we have to extend our JMX beans to provide the following 
information: 

* List of continuous queries for cache:
* local listener node. 
* listener class name. 
* routine ID. 
* other CQ parameters 
* creation timestamp (?) 

* List of running compute tasks for node: 
* node ID task started from. 
* task class name. 
* other task parameters. 
* creation timestamp (?) 
* start timestamp (?) 

* List of running  jobs for node: 
* node ID task started from. 
* task class name. 
* other job parameters. 
* creation timestamp (?)
* start timestamp (?) 

I'm planning to file tickets to implement these features. 
So, please, write if you have any objections. 

[1] 
http://apache-ignite-developers.2346864.n4.nabble.com/Re-List-of-running-Continuous-queries-or-CacheEntryListener-per-cache-or-node-tp25526.html

signature.asc
Description: This is a digitally signed message part


[jira] [Created] (IGNITE-7689) IgnitePdsBinaryMetadataOnClusterRestartTest flaky fails on TC

2018-02-13 Thread Alexey Goncharuk (JIRA)
Alexey Goncharuk created IGNITE-7689:


 Summary: IgnitePdsBinaryMetadataOnClusterRestartTest flaky fails 
on TC
 Key: IGNITE-7689
 URL: https://issues.apache.org/jira/browse/IGNITE-7689
 Project: Ignite
  Issue Type: Test
Reporter: Alexey Goncharuk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7688) DDL does not working properly on sql queries.

2018-02-13 Thread Muratcan TUKSAL (JIRA)
Muratcan TUKSAL created IGNITE-7688:
---

 Summary: DDL does not working properly on sql queries.
 Key: IGNITE-7688
 URL: https://issues.apache.org/jira/browse/IGNITE-7688
 Project: Ignite
  Issue Type: Bug
  Components: 2.3
Affects Versions: 2.3
 Environment: we have 5 node running on Ubuntu 16.04(VM). we donwloaded 
binary dist. from download page. 
Reporter: Muratcan TUKSAL
 Attachments: buggy-config.xml

* start ignite cluster persistent enabled mode (tried on 5 node)
 * activate cluster via ignitevisor
 * Create a table through jdbc 
 * kill all nodes
 * start all nodes again
 * activate cluster via ignitevisor
 * drop that specific table
 * deactivate cluster(doesnt matter via top -deactivate or kill all nodes)
 * activate cluster
 * dropped table still there with no data



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


IGNITE-5714 Context switching for pessimistic transactions

2018-02-13 Thread ALEKSEY KUZNETSOV
 Hi, Igniters!

 Currently we have context switching implemented for optimistic
transactions [1].

 Goal of the current ticket is to support transaction suspend()\resume()
operations for pessimistic transactions.

 The essential problem with them lies in *IgniteTxAdapter#threadId*.
 Thread id is set when transaction is created and afterwards is transferred
between nodes by GridDistributedTx requests\responses when we perform
put\get operations.
 When we suspend and resume transaction, thread id is got changed locally,
but not on remote nodes.

 In ticket I decided to partly remove thread id from source, and introduced
*undefined* value for it, where its value must be ignored.
 Another solution can be to replace thread id usage with some new global
transaction id counter.

 The former solution has advantages :
 compatibility is preserved, step-by-step clear implementation, minimal
changes to explicit cache lock work(it still creates candidates with
not-null thread id) as opposed to the last solution.

 There are 3 possible solutions to "thread id on remote nodes" issue :
 1) Change thread id on remote nodes once suspend()\resume() is called.
 2) Get rid of sending thread id to remote nodes.
 3) Don’t remove the field, just put -1 (undefined) in it.

 The last option was chosen, because it will save compatibility in cluster
with nodes of older versions.
 Note that still outside the transaction, when explicit cache lock is
created, thread id is set not null value in lock request(i.e.
GridNearLockRequest).

 Thread id is moved from global IgniteTxAdapter to GridNearTxLocal, as long
as only *near local* transaction is need it.
 For instance, when local candidate(either near local or dht local) is
created for GridNearTxLocal. Note that remote candidates are created with
thread id undefined, because it useless for non-local candidates.
 In IgniteTxAdapter#ownsLock thread id is replaced with tx version check.
 We could do it, because near transactions has got unique versions to check
against.

 In tx synchronizer GridCacheTxFinishSync thread id is replaced with tx
version, so we don't need to store it and send by GridFinishResponse
messages.
 As a consequence, thread id is also removed from grid near finish\prepare
request\response.

 Also, thread id information is removed from deadlock messages (in
TxDeadlock, TxDeadlockDetection).

Please, review it:

ticket *https://issues.apache.org/jira/browse/IGNITE-5714
*
pull request *https://github.com/apache/ignite/pull/2789
*
review https://reviews.ignite.apache.org/ignite/review/IGNT-CR-364

 [1] : https://issues.apache.org/jira/browse/IGNITE-5712.
-- 

*Best Regards,*

*Kuznetsov Aleksey*


Re: Optimistic Locking and the Prepare Phase

2018-02-13 Thread John Wilson
Hi  Vladimir,

Your answer is what is depicted in the graphics and makes perfect sense to
me. I guess what I'm confused about is what a "prepare" phase means and
what "*In optimistic transactions, locks are acquired on primary nodes
during the "prepare" phase* " means.

My understanding of a "prepare" phase based on the blog here (
https://www.gridgain.com/resources/blog/apache-ignite-transactions-architecture-2-phase-commit-protocol)
is that it is the phase where we acquire all the necessary locks (in
pessimistic locking) before we start the commit phase.


   1. In the context of *pessimistic* locking, at the end of the prepare
   phase but before we start commit, we would have acquired all locks. True?
   2. In the context of *optimistic* locking, a prepare phase would not
   request for or acquire locks. True?
   3. In the context of *optimistic* locking, at the end of the prepare
   phase but before we start commit, we have stored the current version of the
   keys in the transaction coordinator but we have not yet requested or
   acquired any locks. Locks will be acquired during the commit phase. True?

Thanks!

On Tue, Feb 13, 2018 at 12:54 AM, Vladimir Ozerov 
wrote:

> Hi John,
>
> 1) In PESSIMISTIC mode locks are obtained either on first update
> (READ_COMMITTED) or even read (REPEATABLE_READ). I.e. they obtained before
> prepare phase and are held for the duration of transaction. In OPTIMISTIC
> mode locks are obtained only after you call IgniteTransaction.commit().
> 2) It means that transaction will fail if enlisted entries have been
> changed after they were accessed by current transaction, but before this
> transaction is committed.
>
> On Tue, Feb 13, 2018 at 9:49 AM, John Wilson 
> wrote:
>
> > Hi,
> >
> > The design doc below states:
> >
> > *" In optimistic transactions, locks are acquired on primary nodes during
> > the "prepare" phase, then promoted to backup nodes and released once the
> > transaction is committed. Depending on an isolation level, if Ignite
> > detects that a version of an entry has been changed since the time it was
> > requested by a transaction, then the transaction will fail at the
> "prepare"
> > phase and it will be up to an application to decide whether to restart
> the
> > transaction or not."*
> >
> > Two questions:
> >
> >
> >1. If locks are acquired during the prepare phase, why do we state
> that
> >lock acquisition for optimistic locking is delayed (as compared
> against
> >pessimistic locking)?
> >2. If "*ignite detects the version has changed since last request by
> >transaction, it will fail at prepare phase*". Very confusing. What is
> >the last request? I thought the "last request" means the "prepare"
> phase
> >and if so why we say it may fail during prepare phase?
> >
> > The graphic make sense to me - i.e. locks for optimistic locking are
> > acquired on the commit phase and not on the prepare phase.
> >
> > https://cwiki.apache.org/confluence/display/IGNITE/
> > Ignite+Key-Value+Transactions+Architecture
> >
> > Please help clarify.
> >
> > Thanks.
> >
>


Re: Orphaned, duplicate, and main-class tests!

2018-02-13 Thread Ilya Kasnacheev
Anton,

>Tests should be attached to appropriate suites

This I can do

> and muted if necessary, Issues should be created on each mute.

This is roughly a week of work. I can't spare that right now. I doubt
anyone can.

Can we approach this by smaller steps?

-- 
Ilya Kasnacheev

2018-02-06 19:55 GMT+03:00 Anton Vinogradov :

> Val,
>
> Tests should be attached to appropriate suites and muted if necessary,
> Issues should be created on each mute.
>
> On Tue, Feb 6, 2018 at 7:23 PM, Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
> > Anton,
> >
> > I tend to agree with Ilya that identifying and fixing all the possible
> > broken tests in one go is not feasible. What is the proper way in your
> > view? What are you suggesting?
> >
> > -Val
> >
> > On Mon, Feb 5, 2018 at 2:18 AM, Anton Vinogradov <
> avinogra...@gridgain.com
> > >
> > wrote:
> >
> > > Ilya,
> > >
> > > 1) Still see no reason for such changes. Does this break something?
> > >
> > > 2) Looks like you're trying to add Trash*TestSuite.java which will
> never
> > be
> > > refactored.
> > > We should do everything in proper way now, not sometime.
> > >
> > > 3) Your comments looks odd to me.
> > > Issue should be resolved in proper way.
> > >
> > > On Mon, Feb 5, 2018 at 1:07 PM, Ilya Kasnacheev <
> > ilya.kasnach...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Anton,
> > > >
> > > > 1) We already have ~100 files named "*AbstractTest.java". Renaming
> > these
> > > > several files will help checking for orphaned tests in the future, as
> > > well
> > > > as increasing code base consistency.
> > > >
> > > > 2) This is huge work that is not doable by any single developer.
> While
> > > > IgniteLostAndFoundTestSuite can be slowly refactored away
> > > > This is unless you are OK with putting all these tests, most of which
> > are
> > > > red and some are hanging, in production test suites and therefore
> > > breaking
> > > > productivity for a couple months while this gets sorted.
> > > > Are you OK with that? Anybody else?
> > > >
> > > > 3) I think I *could* put them in some test suite or another, but I'm
> > > pretty
> > > > sure I can't fix them all, not in one commit, not ever. Nobody can do
> > > that
> > > > single-handedly. We need a plan here.
> > > >
> > > > Ilya.
> > > >
> > > >
> > > > --
> > > > Ilya Kasnacheev
> > > >
> > > > 2018-02-05 13:00 GMT+03:00 Anton Vinogradov <
> avinogra...@gridgain.com
> > >:
> > > >
> > > > > Ilya,
> > > > >
> > > > > 1) I don't think it's a good idea to rename classes to
> > > *AbstractTest.java
> > > > > since they already have abstract word at definition.
> > > > > We can perform such renaming only in case whole project will be
> > > > refactored,
> > > > > but I see no reason to do this.
> > > > >
> > > > > 2) All not included test should be included to appropriate siutes.
> > > > > Creating IgniteLostAndFoundTestSuite,java is not acceptable.
> > > > >
> > > > > 3) In case you're not sure what to do with particular tests, please
> > > > provide
> > > > > lists of such tests. Please group tests by "problem".
> > > > >
> > > > >
> > > > > On Fri, Feb 2, 2018 at 12:28 AM, Dmitry Pavlov <
> > dpavlov@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Ilya,
> > > > > >
> > > > > > Thank you for this research. I think it is useful for community
> to
> > > > > identify
> > > > > > and remove obsolete tests (if any), and include lost test into CI
> > run
> > > > > chain
> > > > > > (if applicable).
> > > > > >
> > > > > > For test with main() methods I suggest to ask authors (git
> > annotate)
> > > > and
> > > > > if
> > > > > > there is no response probably we should remove such code.
> > > > > >
> > > > > > Since I am not sure all tests in this lost suite are quite
> > > > stable I
> > > > > > suggest to create standalone TC Run configuration for such tests.
> > > > > >
> > > > > > Earlier I've removed most of tests causing timeouts from basic
> > suite.
> > > > > > Ideally Basic suite should contain fast run quite stable tests (
> > and
> > > 0
> > > > > > flaky ) because it is included into RunAllBasic sub set to brief
> > > commit
> > > > > > check  (
> > > > > > https://ci.ignite.apache.org/viewType.html?buildTypeId=
> > > > > IgniteTests24Java8_
> > > > > > RunBasicTests
> > > > > >  ).
> > > > > >
> > > > > > Sincerely,
> > > > > > Dmitriy Pavlov
> > > > > >
> > > > > > чт, 1 февр. 2018 г. в 20:22, Ilya Kasnacheev <
> > > > ilya.kasnach...@gmail.com
> > > > > >:
> > > > > >
> > > > > > > Hello!
> > > > > > >
> > > > > > > While working on Ignite, I have noticed that not all tests are
> in
> > > any
> > > > > > test
> > > > > > > suite, hence I expect they are ignored. I have also noticed
> some
> > > > files
> > > > > in
> > > > > > > src/test and named *Test.java are actually runnable
> main-classes
> > > and
> > > > > not
> > > > > > > tests. I think they're ignored to. Also I've noticed that 6
> tests
> > > > > repeat
> > > > > > > twice.
> > > > > > 

[jira] [Created] (IGNITE-7687) SQL SELECT doesn't update TTL for Touched/AccessedExpiryPolicy

2018-02-13 Thread Stanislav Lukyanov (JIRA)
Stanislav Lukyanov created IGNITE-7687:
--

 Summary: SQL SELECT doesn't update TTL for 
Touched/AccessedExpiryPolicy
 Key: IGNITE-7687
 URL: https://issues.apache.org/jira/browse/IGNITE-7687
 Project: Ignite
  Issue Type: Bug
  Components: sql
Affects Versions: 2.5
Reporter: Stanislav Lukyanov


SQL SELECT queries don't update TTLs when TouchedExpiryPolicy or 
AccessedExpiryPolicy is used (unlike IgniteCache::get which does update the 
TTLs).

Example (modified SqlDmlExample):

CacheConfiguration orgCacheCfg = new 
CacheConfiguration(ORG_CACHE)
.setIndexedTypes(Long.class, Organization.class)
.setExpiryPolicyFactory(TouchedExpiryPolicy.factoryOf(new 
Duration(TimeUnit.SECONDS, 10)));

IgniteCache orgCache = 
ignite.getOrCreateCache(orgCacheCfg);

SqlFieldsQuery qry = new SqlFieldsQuery("insert into Organization (_key, 
id, name) values (?, ?, ?)");
orgCache.query(qry.setArgs(1L, 1L, "ASF"));
orgCache.query(qry.setArgs(2L, 2L, "Eclipse"));

SqlFieldsQuery qry1 = new SqlFieldsQuery("select id, name from Organization 
as o");
for (int i = 0; ;i++) {
List res = orgCache.query(qry1).getAll();
print("i = " + i);
for (Object next : res)
System.out.println(">>> " + next);
U.sleep(5000);
}

Output:
>>> i = 0
>>> [1, ASF]
>>> [2, Eclipse]

>>> i = 1
>>> [1, ASF]
>>> [2, Eclipse]

>>> i = 2

>>> i = 3
...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Page Locking vs Entry-level Locking

2018-02-13 Thread John Wilson
Hi Pavlov,

Thanks for your explanation. However, I still have these questions:


   1. If locking is per page-level and there are no entry-level locks, then
   why does the documentation here talk about having entry-level transaction
   locks? https://apacheignite.readme.io/docs/transactions
   2. I'm not clear with what "locking a durable memory region segment "
   means, what is a segment? Does a page contain multiple segments or a
   segment contains multiple pages? The mental picture I have is: a memory
   region is divided into pages (which can be meta data pages, index, or data
   pages). Index pages hold links to data page id and offset for key-value
   pairs while data pages contain the actual key-value pairs. So, what really
   is a segment and what does locking a memory segment means? I understanding
   page-locks, what is segment lock?

Appreciate your response!!!

On Tue, Feb 13, 2018 at 2:32 AM, Dmitry Pavlov 
wrote:

> Hi John,
>
> 1. No, content modification require lock holding to page to provide
> consitency in multithreaded environment.
> 2. Page is locked for read before reading its content, and unlocked after.
> Same for lock for write for writting. 1 writer or N readers allowed for
> page. On unlock write lock, dirty flag for page may be set if there data
> was actually modified.
> 3. Lock has per-page basis, additional striping by any offest within page
> is not required accoring to tests.
>
> Only one contention is observed sometimes in high load test, it is
> contention of threads to lock to durable memory region segment. But this
> situation can be handled by setting concurrenclyLevel in
> DataStorageConfiguration.
>
> Sincerely,
> Dmitriy Pavlov
>
>
> вт, 13 февр. 2018 г. в 9:56, John Wilson :
>
> > Hi,
> >
> > Ignite documentation talks about entry-level locks and the page structure
> > has a LOCK_OFFSET that I assume is used to store tag info. I have these
> > questions.
> >
> >1. Does Ignite use a lock-free implementation to lock pages and/or
> >entries?
> >2. When is a page locked and when is it released?
> >3. When an entry is inserted/modified in a page, is the page locked
> >(forbidding other threads from inserting entries in the page)? or only
> > the
> >entry's offset is locked? (allowing other threads to insert/modify
> other
> >items)
> >
> > Thanks!
> >
>


Looks like a bug in ServerImpl.joinTopology()

2018-02-13 Thread Александр Меньшиков
Hello.

I saw such code in `ServerImpl.joinTopology()`


locNode.order(1);

locNode.internalOrder(1);

spi.gridStartTime = U.currentTimeMillis();

locNode.visible(true);

ring.clear();

ring.topologyVersion(1);



And it looks like a bug because the `locNode` is contained inside the
`ring` (`TcpDiscoveryNodesRing.locNode` which also be inside a `
TcpDiscoveryNodesRing.nodes` collection) and every operation with the `
TcpDiscoveryNodesRing.nodes` is executed under a read-write lock. And not
without a reason. `locNode.order` used inside the `
TcpDiscoveryNodesRing.nodes` for sorting (it's TreeSet) and such violation
of thread safety can destroy collection navigation.

The `TcpDiscoveryNode.internalOrder` is volatile and `ring.clear()` line
resets the`TcpDiscoveryNodesRing.nodes` collection, so that issue is
hidden. But if another thread would execute finding operation on the
collection after `locNode.internalOrder(1)`, but before `ring.clear()` the
issue will appear.

But it's hard to create fair reproducer for this situation.

Am I right about that and should create an issue in Jira or I just miss
something?


[jira] [Created] (IGNITE-7686) PDS Direct IO failue: IgnitePdsEvictionTest.testPageEviction

2018-02-13 Thread Dmitriy Pavlov (JIRA)
Dmitriy Pavlov created IGNITE-7686:
--

 Summary: PDS Direct IO failue: 
IgnitePdsEvictionTest.testPageEviction 
 Key: IGNITE-7686
 URL: https://issues.apache.org/jira/browse/IGNITE-7686
 Project: Ignite
  Issue Type: Bug
  Components: persistence
Reporter: Dmitriy Pavlov
Assignee: Dmitriy Pavlov
 Fix For: 2.5


java.util.concurrent.TimeoutException: Test has been timed out 
[test=testPageEviction, timeout=30] 

Reproduced only on TC agent



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7685) Incorrect AllocationRate counting

2018-02-13 Thread Anton Vinogradov (JIRA)
Anton Vinogradov created IGNITE-7685:


 Summary: Incorrect AllocationRate counting
 Key: IGNITE-7685
 URL: https://issues.apache.org/jira/browse/IGNITE-7685
 Project: Ignite
  Issue Type: Task
Reporter: Anton Vinogradov
Assignee: Andrey Kuznetsov


Each call of 
{{org.apache.ignite.internal.processors.cache.persistence.DataRegionMetricsImpl#updateTotalAllocatedPages}}
 performs {{allocRate.onHit()}} call which is not correct since delta can be 
negative or bigger that 1.

Need to fix allocationRate counting



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-7684) Ignore IGNITE_USE_ASYNC_FILE_IO_FACTORY in FileWriteAheadLogManager

2018-02-13 Thread Alexander Belyak (JIRA)
Alexander Belyak created IGNITE-7684:


 Summary: Ignore IGNITE_USE_ASYNC_FILE_IO_FACTORY in 
FileWriteAheadLogManager
 Key: IGNITE-7684
 URL: https://issues.apache.org/jira/browse/IGNITE-7684
 Project: Ignite
  Issue Type: Improvement
  Components: general
Affects Versions: 2.4
Reporter: Alexander Belyak


If IGNITE_USE_ASYNC_FILE_IO_FACTORY specified and no IGNITE_WAL_MMAP we get:

{noformat}

java.lang.UnsupportedOperationException: AsynchronousFileChannel doesn't 
support mmap. 
at 
org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.map(AsyncFileIO.java:173)
 
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.restoreWriteHandle(FileWriteAheadLogManager.java:1068)
 

at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager.resumeLogging(FileWriteAheadLogManager.java:552)

at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readCheckpointAndRestoreMemory(GridCacheDatabaseSharedManager.java:714)

at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onClusterStateChangeRequest(GridDhtPartitionsExchangeFuture.java:841)

at 
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:595)

at 
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2329)

at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) 
at java.lang.Thread.run(Thread.java:748)

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Make Teamcity Green Again

2018-02-13 Thread Dmitry Pavlov
Forgot to share link to actual investigations list
https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=investigations

Please feel free to create and assign investigations if you have assumption
about causes of failure.

вт, 13 февр. 2018 г. в 1:34, Dmitriy Setrakyan :

> Dmitriy, thanks for pushing this! We have to bring the number of the
> failing tests to zero. I hope that the community gets behind this effort.
>
> D.
>
> On Mon, Feb 12, 2018 at 8:26 AM, Dmitry Pavlov 
> wrote:
>
> > Hi Folks,
> >
> > I want to resurrect this process and start to fix tests. Currently
> approx.
> > 60-80 tests are failed each run on master.
> >
> > As one from the first point I suggest to use TC investigations for brief
> > research of test failures.
> >
> > If you can help with research please create and assign investigation on
> TC
> > (on test fail select dropdown menu, then 'Investigate / Mute'
> > then 'Investigated by: me' & 'Resolve: auto/manually'; Runs
> > https://ci.ignite.apache.org/viewType.html?buildTypeId=
> > IgniteTests24Java8_RunAll_IgniteTests24Java8=%3Cdefault%3E
> > ). Also help with existing fixes review/merge would be very appreciated.
> >
> > Issue creation is still desired, but investigation will help us to
> identify
> > tests failures which are under research and failures not researched for
> > now.
> >
> > Sincerely,
> > Dmitriy Pavlov
> >
> > вт, 7 нояб. 2017 г. в 0:06, Dmitry Pavlov :
> >
> > > "Ignored" means that the test is disabled from the code, using
> annotation
> > > @Ignore. Not sure it works for our JUnit3 style Java tests. It is used
> > for
> > > .NET.
> > > "Muted" test is disabled in the interface TeamCity, but test code is
> > > actually executed (which is why hanging tests needs to be failed in
> > > addition to mute).
> > >
> > > The fact that the run all now became green means all the tests that
> fall
> > > at this point are flaky. If it is about 200 non-muted flaky tests
> having
> > > the probability of passing 98%, then the whole chain will pass only
> with
> > > the probability 0.98 ^ 200 = 1.7%
> > >
> > > I suggest continue to identify, fix, or mute such flaky tests to get
> > > stable green RunAll for correct code.
> > >
> > > пн, 6 нояб. 2017 г. в 22:57, Denis Magda :
> > >
> > >> Dmitriy,
> > >>
> > >> Now it’s time to fix a pretty decent number of muted and ignored
> tests.
> > >> BTW, what’s the difference between “ignored” and “muted”? I’ve seen
> the
> > >> former before.
> > >>
> > >> —
> > >> Denis
> > >>
> > >> > On Nov 5, 2017, at 9:58 AM, Dmitry Pavlov 
> > >> wrote:
> > >> >
> > >> > Hi Igniters,
> > >> >
> > >> > I am happy to report that the activity of monitoring and fixing the
> > >> tests
> > >> > brought the first significant victory!
> > >> >
> > >> > Today tests launch running all Ignite tests (Ignite 2.0
> Tests->RunAll)
> > >> is
> > >> > completely green:
> > >> https://ci.ignite.apache.org/viewLog.html?buildId=930953
> > >> >
> > >> > This is an reason for being pride, but not a reason to stop this
> > >> activity.
> > >> > Unfortunately, there are still a number of tests with unpredictable
> > >> > failures (flaky, 234 tests), and a number of tests have been muted.
> > Full
> > >> > project scope is available in the article
> > >> >
> > >> https://cwiki.apache.org/confluence/display/IGNITE/
> > Make+Teamcity+Green+Again
> > >> >
> > >> > In any case, thank to all of you who helped and continues to help
> the
> > >> > project to correct the tests and bugs.
> > >> >
> > >> > Sincerely,
> > >> > Pavlov Dmitry
> > >> >
> > >> >
> > >> > пт, 21 июл. 2017 г. в 14:29, Pavel Tupitsyn :
> > >> >
> > >> >>> Green Again
> > >> >>> Again
> > >> >> As if it ever was green :)
> > >> >>
> > >> >> Of course +1 on this, let me know if you see any .NET-related
> > failures.
> > >> >>
> > >> >> On Fri, Jul 21, 2017 at 1:06 PM, Николай Ижиков <
> > >> nizhikov@gmail.com>
> > >> >> wrote:
> > >> >>
> > >> >>> Hello, Igniters.
> > >> >>>
> > >> >>> Also ready to help to #MakeTeamcityGreenAgain !
> > >> >>>
> > >> >>> 21 июля 2017 г. 12:56 PM пользователь "Vyacheslav Daradur" <
> > >> >>> daradu...@gmail.com> написал:
> > >> >>>
> > >>  Hi guys.
> > >> 
> > >>  I vote for #MakeTeamcityGreenAgain. :-)
> > >> 
> > >>  FYI: it had been described and supported previously[1]
> > >> 
> > >>  After the completion of my current task I will try to help with
> > this
> > >>  activity.
> > >> 
> > >>  [1]
> > >>  http://apache-ignite-developers.2346864.n4.nabble.
> > >>  com/Test-failures-td14353.html
> > >> 
> > >>  2017-07-21 12:39 GMT+03:00 Anton Vinogradov :
> > >> 
> > >> > Nikolay,
> > >> >
> > >> > That's also a big problem for me, as reviewer, to accept changes
> > and
> > >>  merge
> > >> > them to master.
> > >> 

IGNITE-7409 Rework exception handling in suspend()\resume() methods

2018-02-13 Thread ALEKSEY KUZNETSOV
Hi, Igntrs!

When user misuses transactions, one could get assertion error.
Currently, when we start optimistic transaction, do some work within it and
try to resume it(which is incorrect behavior) assertion error is thrown. It
looks weird.
On the contrary more descriptive exception should be thrown.

In my fix exception handling is reworked, plz review it:
tickect [1],
review in upsource [2].

[1]: https://issues.apache.org/jira/browse/IGNITE-7409
[2]: https://reviews.ignite.apache.org/ignite/review/IGNT-CR-463
-- 

*Best Regards,*

*Kuznetsov Aleksey*


Re: Page Locking vs Entry-level Locking

2018-02-13 Thread Dmitry Pavlov
Hi John,

1. No, content modification require lock holding to page to provide
consitency in multithreaded environment.
2. Page is locked for read before reading its content, and unlocked after.
Same for lock for write for writting. 1 writer or N readers allowed for
page. On unlock write lock, dirty flag for page may be set if there data
was actually modified.
3. Lock has per-page basis, additional striping by any offest within page
is not required accoring to tests.

Only one contention is observed sometimes in high load test, it is
contention of threads to lock to durable memory region segment. But this
situation can be handled by setting concurrenclyLevel in
DataStorageConfiguration.

Sincerely,
Dmitriy Pavlov


вт, 13 февр. 2018 г. в 9:56, John Wilson :

> Hi,
>
> Ignite documentation talks about entry-level locks and the page structure
> has a LOCK_OFFSET that I assume is used to store tag info. I have these
> questions.
>
>1. Does Ignite use a lock-free implementation to lock pages and/or
>entries?
>2. When is a page locked and when is it released?
>3. When an entry is inserted/modified in a page, is the page locked
>(forbidding other threads from inserting entries in the page)? or only
> the
>entry's offset is locked? (allowing other threads to insert/modify other
>items)
>
> Thanks!
>


[GitHub] ignite pull request #3498: IGNITE-3111 .NET: Configure SSL without Spring

2018-02-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/ignite/pull/3498


---


[jira] [Created] (IGNITE-7683) ContinuousQueryWithTransformer needs to be documented

2018-02-13 Thread Nikolay Izhikov (JIRA)
Nikolay Izhikov created IGNITE-7683:
---

 Summary: ContinuousQueryWithTransformer needs to be documented
 Key: IGNITE-7683
 URL: https://issues.apache.org/jira/browse/IGNITE-7683
 Project: Ignite
  Issue Type: Improvement
  Components: documentation
Reporter: Nikolay Izhikov
Assignee: Nikolay Izhikov
 Fix For: 2.5


New API - ContinuousQueryWithTransformer should be documented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] ignite pull request #3513: Ignite-2.4.2-p4

2018-02-13 Thread DmitriyGovorukhin
GitHub user DmitriyGovorukhin opened a pull request:

https://github.com/apache/ignite/pull/3513

Ignite-2.4.2-p4



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-2.4.2-p4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/3513.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3513


commit e7ca9b65a68de7752195c8f4d2b5180f3c77d19f
Author: Dmitriy Govorukhin 
Date:   2017-11-13T18:52:47Z

ignite-blt-merge -> ignite-2.4.1

commit cc8168fc184bb7f5e3cc3bbb0743397097f78bfb
Author: Dmitriy Govorukhin 
Date:   2017-11-13T19:13:01Z

merge ignite-pitr-rc1 -> ignite-2.4.1

commit 87e6d74cf6a251c7984f9e68c391f790feccc281
Author: Dmitriy Govorukhin 
Date:   2017-11-14T12:49:33Z

ignite-gg-12877 Compact consistent ID in WAL

commit 9f5a22711baea05bd37ab07c8f928a4837dd83a4
Author: Ilya Lantukh 
Date:   2017-11-14T14:12:28Z

Fixed javadoc.

commit d5af2d78dd8eef8eca8ac5391d31d8c779649bb0
Author: Alexey Kuznetsov 
Date:   2017-11-15T08:09:00Z

IGNITE-6913 Baseline: Added new options to controls.sh for baseline 
manipulations.

commit 713924ce865752b6e99b03bd624136541cea5f9f
Author: Sergey Chugunov 
Date:   2017-11-15T09:03:12Z

IGNITE-5850 failover tests for cache operations during BaselineTopology 
changes

commit b65fd134e748d496f732ec2aa0953a0531f544b8
Author: Ilya Lantukh 
Date:   2017-11-15T12:54:35Z

TX read logging if PITR is enabled.

commit 9b2a567c0e04dc33116b51f88bee75f76e9107d1
Author: Ilya Lantukh 
Date:   2017-11-15T13:45:16Z

TX read logging if PITR is enabled.

commit 993058ccf0b2b8d9e80750c3e45a9ffa31d85dfa
Author: Dmitriy Govorukhin 
Date:   2017-11-15T13:51:54Z

ignite-2.4.1 optimization for store full set node more compacted

commit 1eba521f608d39967aec376b397b7fc800234e54
Author: Dmitriy Govorukhin 
Date:   2017-11-15T13:52:22Z

Merge remote-tracking branch 'professional/ignite-2.4.1' into ignite-2.4.1

commit 564b3fd51f8a7d1d81cb6874df66d0270623049c
Author: Sergey Chugunov 
Date:   2017-11-15T14:00:51Z

IGNITE-5850 fixed issue with initialization of data regions on node 
activation, fixed issue with auto-activation when random node joins inactive 
cluster with existing BLT

commit c6d1fa4da7adfadc80abdc7eaf6452b86a4f6aa4
Author: Sergey Chugunov 
Date:   2017-11-15T16:23:08Z

IGNITE-5850 transitionResult is set earlier when request for changing 
BaselineTopology is sent

commit d65674363163e38a4c5fdd73d1c8d8e1c7610797
Author: Sergey Chugunov 
Date:   2017-11-16T11:59:07Z

IGNITE-5850 new failover tests for changing BaselineTopology up (new node 
added to topology)

commit 20552f3851fe8825191b144179be032965e0b5c6
Author: Sergey Chugunov 
Date:   2017-11-16T12:53:43Z

IGNITE-5850 improved error message when online node is removed from baseline

commit 108bbcae4505ac904a6db774643ad600bfb42c21
Author: Sergey Chugunov 
Date:   2017-11-16T13:45:52Z

IGNITE-5850 BaselineTopology should not change on cluster deactivation

commit deb641ad3bdbf260fa60ad6bf607629652e324bd
Author: Dmitriy Govorukhin 
Date:   2017-11-17T09:45:44Z

ignite-2.4.1 truncate wal and checkpoint history on move/delete snapshot

commit 3c8b06f3659af30d1fd148ccc0f40e216a56c998
Author: Alexey Goncharuk 
Date:   2017-11-17T12:48:12Z

IGNITE-6947 Abandon remap after single map if future is done (fixes NPE)

commit ba2047e5ae7d271a677e0c418375d82d78c4023e
Author: devozerov 
Date:   2017-11-14T12:26:31Z

IGNITE-6901: Fixed assertion during 
IgniteH2Indexing.rebuildIndexesFromHash. This closes #3027.

commit abfc0466d6d61d87255d0fe38cbdf11ad46d4f89
Author: Sergey Chugunov 
Date:   2017-11-17T13:40:57Z

IGNITE-5850 tests for queries in presence of BaselineTopology

commit f4eabaf2a905abacc4c60c01d3ca04f6ca9ec188
Author: Sergey Chugunov 
Date:   2017-11-17T17:23:02Z

IGNITE-5850 implementation for setBaselineTopology(long topVer) migrated 
from wc-251

commit 4edeccd3e0b671aa277f58995df9ff9935baa95a
Author: EdShangGG 
Date:   2017-11-17T18:21:17Z

GG-13074 Multiple snapshot test failures after baseline topology is 
introduced
-adding baseline test to suite
-fixing issues with baseline

commit edae228c8f55990c15ef3044be987dcb00d6c81a
Author: EdShangGG 
Date:   2017-11-18T10:36:41Z

hack with sleep

commit b5bffc7580a4a8ffbcc06f60c282e73979179578
Author: Ilya 

Re: Optimistic Locking and the Prepare Phase

2018-02-13 Thread Vladimir Ozerov
Hi John,

1) In PESSIMISTIC mode locks are obtained either on first update
(READ_COMMITTED) or even read (REPEATABLE_READ). I.e. they obtained before
prepare phase and are held for the duration of transaction. In OPTIMISTIC
mode locks are obtained only after you call IgniteTransaction.commit().
2) It means that transaction will fail if enlisted entries have been
changed after they were accessed by current transaction, but before this
transaction is committed.

On Tue, Feb 13, 2018 at 9:49 AM, John Wilson 
wrote:

> Hi,
>
> The design doc below states:
>
> *" In optimistic transactions, locks are acquired on primary nodes during
> the "prepare" phase, then promoted to backup nodes and released once the
> transaction is committed. Depending on an isolation level, if Ignite
> detects that a version of an entry has been changed since the time it was
> requested by a transaction, then the transaction will fail at the "prepare"
> phase and it will be up to an application to decide whether to restart the
> transaction or not."*
>
> Two questions:
>
>
>1. If locks are acquired during the prepare phase, why do we state that
>lock acquisition for optimistic locking is delayed (as compared against
>pessimistic locking)?
>2. If "*ignite detects the version has changed since last request by
>transaction, it will fail at prepare phase*". Very confusing. What is
>the last request? I thought the "last request" means the "prepare" phase
>and if so why we say it may fail during prepare phase?
>
> The graphic make sense to me - i.e. locks for optimistic locking are
> acquired on the commit phase and not on the prepare phase.
>
> https://cwiki.apache.org/confluence/display/IGNITE/
> Ignite+Key-Value+Transactions+Architecture
>
> Please help clarify.
>
> Thanks.
>


[jira] [Created] (IGNITE-7682) LocalSize cache functions on C++

2018-02-13 Thread Roman Bastanov (JIRA)
Roman Bastanov created IGNITE-7682:
--

 Summary: LocalSize cache functions on C++
 Key: IGNITE-7682
 URL: https://issues.apache.org/jira/browse/IGNITE-7682
 Project: Ignite
  Issue Type: Bug
  Components: platforms
 Environment: Ignite builded by jdk1.8.0_152 with sources tag:ignite-2.3
cpp libs builded by Microsoft Visual Studio Enterprise 2015 Version 
14.0.25431.01 Update 3
all x64
Reporter: Roman Bastanov


LocalSize functions with all variations of CachePeekMode returns same results.
They always returns all cache size, the sum of all node caches.
{code}
auto cache = IgniteNode.GetCache<...>(cache_name);
cache.LocalSize(ignite::cache::CachePeekMode::BACKUP)
cache.LocalSize(ignite::cache::CachePeekMode::NEAR_CACHE)
cache.LocalSize(ignite::cache::CachePeekMode::OFFHEAP)
cache.LocalSize(ignite::cache::CachePeekMode::ONHEAP)
cache.LocalSize(ignite::cache::CachePeekMode::PRIMARY)
cache.LocalSize(ignite::cache::CachePeekMode::SWAP)
{code}
Despite this, manually calculations are correct, and returns local size(cache 
on this node).
{code}
auto query = cache::query::ScanQuery();
query.SetLocal(true);
auto cursor = cache.Query(query);

while (cursor.HasNext()) {
cache_size++;
}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)