Re: [ANNOUNCE] Apache Ignite 2.16.0 Released

2024-01-05 Thread John Smith
Can we Upgrade from 2.13.0 to 2.16.0 Opensource or we need to kill the
whole cluster and all clients?

On Tue, Dec 26, 2023 at 2:33 PM Nikita Amelchev 
wrote:

> The Apache Ignite Community is pleased to announce the release of
> Apache Ignite 2.16.0.
>
> Apache Ignite® is an in-memory computing platform for transactional,
> analytical, and streaming workloads delivering in-memory speeds at a
> petabyte scale.
> https://ignite.apache.org
>
> The Apache Ignite community has made a lot of changes in the 2.16.0
> release. This blog post will help you to know about some valuable
> improvements:
> https://ignite.apache.org/blog/apache-ignite-2-16-0.html
>
> For the full list of changes, you can refer to the RELEASE_NOTES list
> which is trying to catalogue the most significant improvements for
> this version of the platform.
> https://ignite.apache.org/releases/2.16.0/release_notes.html
>
> Download the latest Ignite version from here:
> https://ignite.apache.org/download.cgi
>
> Please let us know if you encounter any problems:
> https://ignite.apache.org/our-community.html#faq
>
> Regards,
> Nikita Amelchev on behalf of the Apache Ignite community.
>


Re: Why wpuld a client node error cause server node to shut off?

2023-11-02 Thread John Smith
You mean on XML config? Ok I'll check it.

Thanks

On Wed, Nov 1, 2023 at 5:14 AM Stephen Darlington 
wrote:

> There are lots of "throttling" warnings. It could be as simple as your
> cluster is at its limit. Faster or more disks might help, as might scaling
> out. The other is that you've enabled write throttling.
> Counter-intuitively, you might want to *dis*able that. It'll still do
> write throttling, just using a different algorithm.
>
> On Tue, 31 Oct 2023 at 15:35, John Smith  wrote:
>
>> I understand you have no time and I have also followed that link. My
>> nodes are 32GB and I have allocated 8GB for heap and some for off-heap. So
>> I'm def not hitting some ceiling where it needs to try to force some huge
>> garbage collection.
>>
>> What 'i'm asking based on the config and stats I gave do you see anything
>> that sticks out in those configs not the logs?
>>
>> On Tue, Oct 31, 2023 at 10:42 AM Stephen Darlington <
>> sdarling...@apache.org> wrote:
>>
>>> No, sorry, the issue is that I don't have the time to go through 25,000
>>> lines of log file. As I said, your cluster had network or long JVM pause
>>> issues, probably the latter:
>>>
>>> [21:37:12,517][WARNING][jvm-pause-detector-worker][IgniteKernal%xx]
>>> Possible too long JVM pause: 63356 milliseconds.
>>>
>>> When nodes are continually talking to one another, no Ignite code being
>>> executed for over a minute is going to be a *big* problem. You need to
>>> tune your JVM. There are some hints in the documentation:
>>> https://ignite.apache.org/docs/latest/perf-and-troubleshooting/memory-tuning
>>>
>>>
>>> On Tue, 31 Oct 2023 at 13:16, John Smith  wrote:
>>>
>>>> Does any of this infor help? I included what we do more or less plus
>>>> stats and configs.
>>>>
>>>> There are 9 caches of which the biggest one is 5 million records
>>>> (partitioned with 1 backup), the key is String (11 chars) and the value
>>>> integer.
>>>>
>>>> The rest are replicated and some partitioned but max a few thousand
>>>> records at best.
>>>>
>>>> The nodes are 32GB here is the output of the free -m
>>>>
>>>>   totalusedfree  shared  buff/cache
>>>> available
>>>> Mem:  321672521   26760   02885
>>>>   29222
>>>> Swap:  2047   02047
>>>>
>>>> And here is node stats:
>>>>
>>>> Time of the snapshot: 2023-10-31 13:08:56
>>>>
>>>> +-+
>>>> | ID  | e8044c1a-6e0d-4f94-9a04-0711a3d7fc6e
>>>>|
>>>> | ID8 | E8044C1A
>>>>|
>>>> | Consistent ID   | b14350a9-6963-442c-9529-14f70f95a6d9
>>>>|
>>>> | Node Type   | Server
>>>>|
>>>> | Order   | 2660
>>>>|
>>>> | Address (0) | xx
>>>> |
>>>> | Address (1) | 127.0.0.1
>>>>   |
>>>> | Address (2) | 0:0:0:0:0:0:0:1%lo
>>>>|
>>>> | OS info | Linux amd64 4.15.0-197-generic
>>>>|
>>>> | OS user | ignite
>>>>|
>>>> | Deployment mode | SHARED
>>>>|
>>>> | Language runtime| Java Platform API Specification ver.
>>>> 1.8  |
>>>> | Ignite version  | 2.12.0
>>>>|
>>>> | Ignite instance name| xx
>>>>   |
>>>> | JRE information | HotSpot 64-Bit Tiered Compilers
>>>>   |
>>>> | JVM start time  | 2023-09-29 14:50:39
>>>>   |
>>>> | Node start time | 2023-09-29 14:54:34
>>>>   |
>>>> | Up time | 09:28:57.946
>>>>|
>>>> | CPUs| 4
>>>>   |
>>>> | Last metric update  | 2023-10-31 13:07:49
>>>>   |
>>>> | Non-loopback IPs| xx, xx |

Re: Why wpuld a client node error cause server node to shut off?

2023-10-31 Thread John Smith
I understand you have no time and I have also followed that link. My nodes
are 32GB and I have allocated 8GB for heap and some for off-heap. So I'm
def not hitting some ceiling where it needs to try to force some huge
garbage collection.

What 'i'm asking based on the config and stats I gave do you see anything
that sticks out in those configs not the logs?

On Tue, Oct 31, 2023 at 10:42 AM Stephen Darlington 
wrote:

> No, sorry, the issue is that I don't have the time to go through 25,000
> lines of log file. As I said, your cluster had network or long JVM pause
> issues, probably the latter:
>
> [21:37:12,517][WARNING][jvm-pause-detector-worker][IgniteKernal%xx]
> Possible too long JVM pause: 63356 milliseconds.
>
> When nodes are continually talking to one another, no Ignite code being
> executed for over a minute is going to be a *big* problem. You need to
> tune your JVM. There are some hints in the documentation:
> https://ignite.apache.org/docs/latest/perf-and-troubleshooting/memory-tuning
>
>
> On Tue, 31 Oct 2023 at 13:16, John Smith  wrote:
>
>> Does any of this infor help? I included what we do more or less plus
>> stats and configs.
>>
>> There are 9 caches of which the biggest one is 5 million records
>> (partitioned with 1 backup), the key is String (11 chars) and the value
>> integer.
>>
>> The rest are replicated and some partitioned but max a few thousand
>> records at best.
>>
>> The nodes are 32GB here is the output of the free -m
>>
>>   totalusedfree  shared  buff/cache
>> available
>> Mem:  321672521   26760   02885
>> 29222
>> Swap:  2047   02047
>>
>> And here is node stats:
>>
>> Time of the snapshot: 2023-10-31 13:08:56
>>
>> +-+
>> | ID  | e8044c1a-6e0d-4f94-9a04-0711a3d7fc6e
>>  |
>> | ID8 | E8044C1A
>>  |
>> | Consistent ID   | b14350a9-6963-442c-9529-14f70f95a6d9
>>  |
>> | Node Type   | Server
>>  |
>> | Order   | 2660
>>  |
>> | Address (0) | xx
>>   |
>> | Address (1) | 127.0.0.1
>> |
>> | Address (2) | 0:0:0:0:0:0:0:1%lo
>>  |
>> | OS info | Linux amd64 4.15.0-197-generic
>>  |
>> | OS user | ignite
>>  |
>> | Deployment mode | SHARED
>>  |
>> | Language runtime| Java Platform API Specification ver. 1.8
>>  |
>> | Ignite version  | 2.12.0
>>  |
>> | Ignite instance name| xx
>> |
>> | JRE information | HotSpot 64-Bit Tiered Compilers
>> |
>> | JVM start time  | 2023-09-29 14:50:39
>> |
>> | Node start time | 2023-09-29 14:54:34
>> |
>> | Up time | 09:28:57.946
>>  |
>> | CPUs| 4
>> |
>> | Last metric update  | 2023-10-31 13:07:49
>> |
>> | Non-loopback IPs| xx, xx |
>> | Enabled MACs| xx
>>   |
>> | Maximum active jobs | 1
>> |
>> | Current active jobs | 0
>> |
>> | Average active jobs | 0.01
>>  |
>> | Maximum waiting jobs| 0
>> |
>> | Current waiting jobs| 0
>> |
>> | Average waiting jobs| 0.00
>>  |
>> | Maximum rejected jobs   | 0
>> |
>> | Current rejected jobs   | 0
>> |
>> | Average rejected jobs   | 0.00
>>  |
>> | Maximum cancelled jobs  | 0
>> |
>> | Current cancelled jobs  | 0
>> |
>> | Average cancelled jobs  | 0.00
>>  |
>> | Total rejected jobs | 0
>> |
>> | Total executed jobs | 2
>> |
>> | Total cancelled jobs| 0
>> |
>> | Maximum job wait time   | 0ms
>> |
>> | Current job wait time   | 0ms
>> |
>> | Average job wait time   | 0.00ms
>>  |
>> | Maximum job execute time| 11ms
>>  |
>> | Current job execute time| 0ms
>> |
>&

Re: Why wpuld a client node error cause server node to shut off?

2023-10-31 Thread John Smith
Does any of this infor help? I included what we do more or less plus stats
and configs.

There are 9 caches of which the biggest one is 5 million records
(partitioned with 1 backup), the key is String (11 chars) and the value
integer.

The rest are replicated and some partitioned but max a few thousand records
at best.

The nodes are 32GB here is the output of the free -m

  totalusedfree  shared  buff/cache
available
Mem:  321672521   26760   02885
29222
Swap:  2047   02047

And here is node stats:

Time of the snapshot: 2023-10-31 13:08:56
+-+
| ID  | e8044c1a-6e0d-4f94-9a04-0711a3d7fc6e
   |
| ID8 | E8044C1A
   |
| Consistent ID   | b14350a9-6963-442c-9529-14f70f95a6d9
   |
| Node Type   | Server
   |
| Order   | 2660
   |
| Address (0) | xx
|
| Address (1) | 127.0.0.1
  |
| Address (2) | 0:0:0:0:0:0:0:1%lo
   |
| OS info | Linux amd64 4.15.0-197-generic
   |
| OS user | ignite
   |
| Deployment mode | SHARED
   |
| Language runtime| Java Platform API Specification ver. 1.8
   |
| Ignite version  | 2.12.0
   |
| Ignite instance name| xx
  |
| JRE information | HotSpot 64-Bit Tiered Compilers
  |
| JVM start time  | 2023-09-29 14:50:39
  |
| Node start time | 2023-09-29 14:54:34
  |
| Up time | 09:28:57.946
   |
| CPUs| 4
  |
| Last metric update  | 2023-10-31 13:07:49
  |
| Non-loopback IPs| xx, xx |
| Enabled MACs| xx
|
| Maximum active jobs | 1
  |
| Current active jobs | 0
  |
| Average active jobs | 0.01
   |
| Maximum waiting jobs| 0
  |
| Current waiting jobs| 0
  |
| Average waiting jobs| 0.00
   |
| Maximum rejected jobs   | 0
  |
| Current rejected jobs   | 0
  |
| Average rejected jobs   | 0.00
   |
| Maximum cancelled jobs  | 0
  |
| Current cancelled jobs  | 0
  |
| Average cancelled jobs  | 0.00
   |
| Total rejected jobs | 0
  |
| Total executed jobs | 2
  |
| Total cancelled jobs| 0
  |
| Maximum job wait time   | 0ms
  |
| Current job wait time   | 0ms
  |
| Average job wait time   | 0.00ms
   |
| Maximum job execute time| 11ms
   |
| Current job execute time| 0ms
  |
| Average job execute time| 5.50ms
   |
| Total busy time | 5733919ms
  |
| Busy time % | 0.21%
  |
| Current CPU load %  | 1.93%
  |
| Average CPU load %  | 4.35%
  |
| Heap memory initialized | 504mb
  |
| Heap memory used| 310mb
  |
| Heap memory committed   | 556mb
  |
| Heap memory maximum | 8gb
  |
| Non-heap memory initialized | 2mb
  |
| Non-heap memory used| 114mb
  |
| Non-heap memory committed   | 119mb
  |
| Non-heap memory maximum | 0
  |
| Current thread count| 125
  |
| Maximum thread count| 140
  |
| Total started thread count  | 409025
   |
| Current daemon thread count | 15
   |
+-+

Data region metrics:
+==+
|   Name   | Page size |   Pages|Memory |
 Rates   | Checkpoint buffer | Large entries |
+==+
| Default_Region   | 0 | Total:  307665 | Total:  1gb   |
Allocation: 0.00 | Pages: 0  | 0.00% |
|  |   | Dirty:  0  | In RAM: 0 |
Eviction:   0.00 | Size:  0  |   |
|  |   | Memory: 0  |   |
Replace:0.00 |   |   |
|  |   | Fill factor: 0.00% |   |
   |   |   |
+--+---++---+--+---+---+
| metastoreMemPlc  | 0 | Total:  57 | Total:  228kb |
Allocation: 0.00 | Pages: 0  | 0.00% |
|  |   | Dirty:  0  | In RAM: 0 |
Eviction:   0.00 | Size:  0  |   |
|  |   | Memory: 0 

Re: Why wpuld a client node error cause server node to shut off?

2023-10-30 Thread John Smith
Here you go:
https://www.dropbox.com/scl/fi/wbst5ybec7pah9xec4pct/xx.0.log?rlkey=pfn3v0u9dup0zq7zu1v46kktc=0

On Mon, Oct 30, 2023 at 11:55 AM Stephen Darlington 
wrote:

> It wouldn't. We'd need to see more of the logs to determine what the
> problem was.
>
> On Mon, 30 Oct 2023 at 15:12, John Smith  wrote:
>
>> Hi I see this error message on the server node...
>>
>> [21:37:20,310][SEVERE][query-#2884155%raange%][GridMapQueryExecutor]
>> Failed to send message.
>> class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
>> Failed to send message (node left topology): TcpDiscoveryNode
>> [id=d6a33cc0-59e7-452d-a516-730e4c89c29e,
>> consistentId=d6a33cc0-59e7-452d-a516-730e4c89c29e, addrs=ArrayList
>> [127.0.0.1, xx], sockAddrs=HashSet [/127.0.0.1:0, /xx:0],
>> discPort=0, order=1573, intOrder=803, lastExchangeTime=1677567347200,
>> loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true]
>>
>> Why would that cause the server to shut off?
>>
>


Why wpuld a client node error cause server node to shut off?

2023-10-30 Thread John Smith
Hi I see this error message on the server node...

[21:37:20,310][SEVERE][query-#2884155%raange%][GridMapQueryExecutor] Failed
to send message.
class org.apache.ignite.internal.cluster.ClusterTopologyCheckedException:
Failed to send message (node left topology): TcpDiscoveryNode
[id=d6a33cc0-59e7-452d-a516-730e4c89c29e,
consistentId=d6a33cc0-59e7-452d-a516-730e4c89c29e, addrs=ArrayList
[127.0.0.1, xx], sockAddrs=HashSet [/127.0.0.1:0, /xx:0],
discPort=0, order=1573, intOrder=803, lastExchangeTime=1677567347200,
loc=false, ver=2.12.0#20220108-sha1:b1289f75, isClient=true]

Why would that cause the server to shut off?


Re: Using REST api seems to "lock" record?

2023-09-29 Thread John Smith
Ok so I figured out the issue. I'm getting a class cast Exception.

The cache in the application is declared as  when I use
the REST API to put a value. I guess it changes the value to String and my
application catches the error. Well it's actually swallowing the error but
I can fix that.

My question now would be, if using the HTTP REST Api, can we specify the
type of the value when doing a put?


On Fri, Sep 29, 2023 at 1:29 PM John Smith  wrote:

> Hi, running 2.12
>
> When I use the put command like so:
> http://xx/ignite?cmd=put=carrier-ids-for-phones=15149838779=10009=60001
>
> Then I call the Java get async function, it seems to block and doesn't
> return.
>
> If I use this command:
> http://xx/ignite?cmd=put=carrier-ids-for-phones=15149838779=10009=60001
>
> The java async Api blocks, but once the record is expired, the java async
> Api returns.
>


Using REST api seems to "lock" record?

2023-09-29 Thread John Smith
Hi, running 2.12

When I use the put command like so:
http://xx/ignite?cmd=put=carrier-ids-for-phones=15149838779=10009=60001

Then I call the Java get async function, it seems to block and doesn't
return.

If I use this command:
http://xx/ignite?cmd=put=carrier-ids-for-phones=15149838779=10009=60001

The java async Api blocks, but once the record is expired, the java async
Api returns.


Re: Ignite visor timeout when calling node command on thick client in Kubernets cluster.

2023-07-21 Thread John Smith
Never mind, my Kubernetes Service wasn't getting endpoints. But weirdly
enough there was still some sort of connection going on.

On Thu, Jul 20, 2023 at 9:16 PM John Smith  wrote:

> So the client is exposed as node ports and I have been able to provide the
> proper ports back to the client and cluster...
>
> When I look at the node details I see...
>
> | Address (0) | 10.xxx.xxx.xxx|
> < Kubernetes internal I.P
> | Address (1) | 127.0.0.1|
>
> So it only knows the 2 addresses but  somehow the timeout is aware of the
> 3rd address see below
>
> addrs=[/10.xxx.xxx.xxx:47100, /127.0.0.1:47100, /172.xxx.xxx.xxx:30524]]
> <- 172 Is where the thick client is exposed as node port. So it somehow
> knows it?
>
> Even on the client I can see ignite visor connected
>
> Completed partition exchange
> [localNode=c9b86d24-0f0d-4198-98c5-59ce677669f8,
> exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
> [topVer=434, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
> [id=c16d4ff0-e37e-4a18-a2ae-ec770ad25c39,
> consistentId=0:0:0:0:0:0:0:1%lo,127.0.0.1,172.xxx.xxx.xxx:47500,
> addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.xxx.xxx.xxx],
> sockAddrs=HashSet [0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500,
> xx-visor-0001/172.xxx.xxx.xxx:47500], discPort=47500, order=434,
> intOrder=227, lastExchangeTime=1689901276893, loc=false,
> ver=2.12.0#20220108-sha1:b1289f75, isClient=false], rebalanced=true,
> done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=434,
> minorTopVer=0]]
> AffinityTopologyVersion [topVer=434, minorTopVer=0], evt=NODE_JOINED,
> evtNode=c16d4ff0-e37e-4a18-a2ae-ec770ad25c39, client=true]
>
> On ignite visor we see the error below
>
> [00:29:07,053][SEVERE][main][TcpCommunicationSpi] Failed to send message
> to remote node [node=TcpDiscoveryNode
> [id=c9b86d24-0f0d-4198-98c5-59ce677669f8,
> consistentId=c9b86d24-0f0d-4198-98c5-59ce677669f8, addrs=ArrayList
> [10.xxx.xxx.xxx, 127.0.0.1], sockAddrs=HashSet [/10.xxx.xxx.xxx:0, /
> 127.0.0.1:0], discPort=0, order=429, intOrder=224,
> lastExchangeTime=1689899260716, loc=false,
> ver=2.12.0#20220108-sha1:b1289f75, isClient=true], msg=GridIoMessage
> [plc=3, topic=TOPIC_JOB, topicOrd=0, ordered=false, timeout=0,
> skipOnTimeout=false, msg=GridJobExecuteRequest
> [sesId=fd758d57981-a43b0db8-3b02-4506-ac69-412e46736682,
> jobId=0e758d57981-a43b0db8-3b02-4506-ac69-412e46736682,
> startTaskTime=1689899286905, timeout=9223372036854775807,
> taskName=org.apache.ignite.internal.visor.node.VisorNodeDataCollectorTask,
> userVer=0,
> taskClsName=org.apache.ignite.internal.visor.node.VisorNodeDataCollectorTask,
> ldrParticipants=null, cpSpi=null, createTime=1689899286989,
> clsLdrId=90558d57981-a43b0db8-3b02-4506-ac69-412e46736682,
> depMode=ISOLATED, dynamicSiblings=false, forceLocDep=true,
> sesFullSup=false, internal=true, topPred=null, part=-1, topVer=null,
> execName=null]]]
> class org.apache.ignite.IgniteCheckedException: Failed to connect to node
> (is node still alive?). Make sure that each ComputeTask and cache
> Transaction has a timeout set in order to prevent parties from waiting
> forever in case of network issues
> [nodeId=c9b86d24-0f0d-4198-98c5-59ce677669f8, addrs=[/10.xxx.xxx.xxx:47100,
> /127.0.0.1:47100, /172.xxx.xxx.xxx:30524]]
>


Ignite visor timeout when calling node command on thick client in Kubernets cluster.

2023-07-20 Thread John Smith
So the client is exposed as node ports and I have been able to provide the
proper ports back to the client and cluster...

When I look at the node details I see...

| Address (0) | 10.xxx.xxx.xxx|
< Kubernetes internal I.P
| Address (1) | 127.0.0.1|

So it only knows the 2 addresses but  somehow the timeout is aware of the
3rd address see below

addrs=[/10.xxx.xxx.xxx:47100, /127.0.0.1:47100, /172.xxx.xxx.xxx:30524]]
<- 172 Is where the thick client is exposed as node port. So it somehow
knows it?

Even on the client I can see ignite visor connected

Completed partition exchange
[localNode=c9b86d24-0f0d-4198-98c5-59ce677669f8,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=434, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode
[id=c16d4ff0-e37e-4a18-a2ae-ec770ad25c39,
consistentId=0:0:0:0:0:0:0:1%lo,127.0.0.1,172.xxx.xxx.xxx:47500,
addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.xxx.xxx.xxx],
sockAddrs=HashSet [0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500,
xx-visor-0001/172.xxx.xxx.xxx:47500], discPort=47500, order=434,
intOrder=227, lastExchangeTime=1689901276893, loc=false,
ver=2.12.0#20220108-sha1:b1289f75, isClient=false], rebalanced=true,
done=true, newCrdFut=null], topVer=AffinityTopologyVersion [topVer=434,
minorTopVer=0]]
AffinityTopologyVersion [topVer=434, minorTopVer=0], evt=NODE_JOINED,
evtNode=c16d4ff0-e37e-4a18-a2ae-ec770ad25c39, client=true]

On ignite visor we see the error below

[00:29:07,053][SEVERE][main][TcpCommunicationSpi] Failed to send message to
remote node [node=TcpDiscoveryNode
[id=c9b86d24-0f0d-4198-98c5-59ce677669f8,
consistentId=c9b86d24-0f0d-4198-98c5-59ce677669f8, addrs=ArrayList
[10.xxx.xxx.xxx, 127.0.0.1], sockAddrs=HashSet [/10.xxx.xxx.xxx:0, /
127.0.0.1:0], discPort=0, order=429, intOrder=224,
lastExchangeTime=1689899260716, loc=false,
ver=2.12.0#20220108-sha1:b1289f75, isClient=true], msg=GridIoMessage
[plc=3, topic=TOPIC_JOB, topicOrd=0, ordered=false, timeout=0,
skipOnTimeout=false, msg=GridJobExecuteRequest
[sesId=fd758d57981-a43b0db8-3b02-4506-ac69-412e46736682,
jobId=0e758d57981-a43b0db8-3b02-4506-ac69-412e46736682,
startTaskTime=1689899286905, timeout=9223372036854775807,
taskName=org.apache.ignite.internal.visor.node.VisorNodeDataCollectorTask,
userVer=0,
taskClsName=org.apache.ignite.internal.visor.node.VisorNodeDataCollectorTask,
ldrParticipants=null, cpSpi=null, createTime=1689899286989,
clsLdrId=90558d57981-a43b0db8-3b02-4506-ac69-412e46736682,
depMode=ISOLATED, dynamicSiblings=false, forceLocDep=true,
sesFullSup=false, internal=true, topPred=null, part=-1, topVer=null,
execName=null]]]
class org.apache.ignite.IgniteCheckedException: Failed to connect to node
(is node still alive?). Make sure that each ComputeTask and cache
Transaction has a timeout set in order to prevent parties from waiting
forever in case of network issues
[nodeId=c9b86d24-0f0d-4198-98c5-59ce677669f8, addrs=[/10.xxx.xxx.xxx:47100,
/127.0.0.1:47100, /172.xxx.xxx.xxx:30524]]


Re: Does peer class loading have to be enabled on both server and client node?

2023-03-02 Thread John Smith
Yeah that's what I mean, so URI deployment doesn't have the same
restrictions, does it affect thick clients in any way???

On Thu, Mar 2, 2023 at 4:00 AM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> You could consider using URI deployment?
> https://ignite.apache.org/docs/latest/code-deployment/deploying-user-code
>
> On 1 Mar 2023, at 18:59, John Smith  wrote:
>
> So I'm stuck with a catch 22 here... I can't enable the flag on the server
> nodes without shutting down all the client nodes and vise versa
>
> Unless I did something wrong, but the server shuts off when i set the
> value to true and it doesn't match with the another node... So I would have
> to shutdown my entire infrastructure to enable it?
>
> Like even if I shut down all the thick client connected I would have to
> even to turn off all the server nodes make sure everything is off and then
> restart one by one.
>
> Does code deployment present the same problem, where I enable a shared
> folder to periodically load classes?
>
> On Mon., Feb. 27, 2023, 11:11 a.m. John Smith, 
> wrote:
>
>> Sorry to be clear the applications are thick clients but client = true
>> flag is enabled.
>>
>> On Mon, Feb 27, 2023 at 11:09 AM John Smith 
>> wrote:
>>
>>> Oh god! Forget that then! lol Really? So If I have 10 applications they
>>> all need to be recompiled/reconfigured and redeployed?
>>>
>>> On Mon, Feb 27, 2023 at 11:07 AM Stephen Darlington <
>>> stephen.darling...@gridgain.com> wrote:
>>>
>>>> It’s a cluster wide setting and needs to be set to the same value on
>>>> all nodes, both server and thick-client.
>>>>
>>>> > On 27 Feb 2023, at 15:58, John Smith  wrote:
>>>> >
>>>> > I have 3 node clusters and I'm trying to enable peer class loading on
>>>> the cluster, but it keeps shutting off after restart because it says the
>>>> remote node doesn't have it enabled.
>>>> >
>>>> > So is peer class loading required to be enabled on the server nodes
>>>> or can it just be enabled per client that needs it?
>>>>
>>>>
>


Re: Does peer class loading have to be enabled on both server and client node?

2023-03-01 Thread John Smith
So I'm stuck with a catch 22 here... I can't enable the flag on the server
nodes without shutting down all the client nodes and vise versa

Unless I did something wrong, but the server shuts off when i set the value
to true and it doesn't match with the another node... So I would have to
shutdown my entire infrastructure to enable it?

Like even if I shut down all the thick client connected I would have to
even to turn off all the server nodes make sure everything is off and then
restart one by one.

Does code deployment present the same problem, where I enable a shared
folder to periodically load classes?

On Mon., Feb. 27, 2023, 11:11 a.m. John Smith, 
wrote:

> Sorry to be clear the applications are thick clients but client = true
> flag is enabled.
>
> On Mon, Feb 27, 2023 at 11:09 AM John Smith 
> wrote:
>
>> Oh god! Forget that then! lol Really? So If I have 10 applications they
>> all need to be recompiled/reconfigured and redeployed?
>>
>> On Mon, Feb 27, 2023 at 11:07 AM Stephen Darlington <
>> stephen.darling...@gridgain.com> wrote:
>>
>>> It’s a cluster wide setting and needs to be set to the same value on all
>>> nodes, both server and thick-client.
>>>
>>> > On 27 Feb 2023, at 15:58, John Smith  wrote:
>>> >
>>> > I have 3 node clusters and I'm trying to enable peer class loading on
>>> the cluster, but it keeps shutting off after restart because it says the
>>> remote node doesn't have it enabled.
>>> >
>>> > So is peer class loading required to be enabled on the server nodes or
>>> can it just be enabled per client that needs it?
>>>
>>>


Re: Performance of data stream on 3 cluster node.

2023-03-01 Thread John Smith
My key is  phone_number and they are all unique... I'll check with the
command...

On Wed., Mar. 1, 2023, 11:20 a.m. Stephen Darlington, <
stephen.darling...@gridgain.com> wrote:

> The streamer doesn’t determine where the data goes. It just efficiently
> sends it to the correct place.
>
> If your data is skewed in some way so that there is more data in some
> partitions than others, then you could find one machine with more work to
> do than others. All else being equal, you’ll also get better distribution
> with more than three nodes.
>
> On 1 Mar 2023, at 15:45, John Smith  wrote:
>
> Ok thanks. I just thought the streamer would be more uniform.
>
> On Wed, Mar 1, 2023 at 4:41 AM Stephen Darlington <
> stephen.darling...@gridgain.com> wrote:
>
>> You might want to check the data distribution. You can use control.sh
>> —cache distribution to do that.
>>
>> On 28 Feb 2023, at 20:32, John Smith  wrote:
>>
>> The last thing I can add to clarify is, the 3 node cluster is a
>> centralized cluster and the CSV loader is a thick client running on its own
>> machine.
>>
>> On Tue, Feb 28, 2023 at 2:52 PM John Smith 
>> wrote:
>>
>>> Btw when I run a query like SELECT COLUMN_2, COUNT(COLUMN_1) FROM
>>> MY_TABLE GROUP BY COLUMN_2; The query runs full tilt 100% on all 3 nodes
>>> and returns in a respectable manager.
>>>
>>> So not sure whats going on but with the data streamer I guess most of
>>> the writes are pushed to THE ONE node mostly and the others are busy making
>>> the backups or the network to push/back up can't keep up?
>>> The same behaviour happens with replicated table when using the data,
>>> one node seems to be running almost 100% while the others hover at 40-50%
>>> The fastest I could get the streamer to work is to turn off backups, but
>>> same thing, one node runs full tilt while the others are "slowish"
>>>
>>> Queries are ok, all nodes are fully utilized.
>>>
>>> On Tue, Feb 28, 2023 at 12:54 PM John Smith 
>>> wrote:
>>>
>>>> Hi so I'm using it in a pretty straight forward kind of way at least I
>>>> think...
>>>>
>>>> I'm loading 35 million lines from CSV to an SQL table. Decided to use
>>>> streamer as I figured it would still be allot faster than batching SQL
>>>> INSERTS.
>>>> I tried with backup=0 and backup=1 (Prefer to have backup on)
>>>> 1- With 0 backups: 6 minutes to load
>>>> 2- With 1 backups: 15 minutes to load.
>>>>
>>>> In both cases I still see the same behaviour, the 1 machine seems to be
>>>> taking the brunt of the work...
>>>>
>>>> I'm reading a CSV file line by line and doing streamer.add()
>>>>
>>>> The table definition is as follows...
>>>> CREATE TABLE PUBLIC.MY_TABLE (
>>>> COLUMN_1 VARCHAR(32) NOT NULL,
>>>> COLUMN_2 VARCHAR(64) NOT NULL,
>>>> CONSTRAINT PHONE_CARRIER_IDS_PK PRIMARY KEY (COLUMN_1)
>>>> ) with "template=parallelTpl, backups=0, key_type=String,
>>>> value_type=MyObject";
>>>> CREATE INDEX MY_TABLE_COLUMN_2_IDX ON PUBLIC.MY_TABLE (COLUMN_2);
>>>>
>>>> String fileName = "my_file";
>>>>
>>>> final String cacheNameDest = "MY_TABLE";
>>>>
>>>> try(
>>>> Ignite igniteDest =
>>>> configIgnite(Arrays.asList("...:47500..47509", "...:47500..47509",
>>>> "...:47500..47509"), "ignite-dest");
>>>> IgniteCache cacheDest =
>>>> igniteDest.getOrCreateCache(cacheNameDest).withKeepBinary();
>>>> IgniteDataStreamer streamer
>>>> = igniteDest.dataStreamer(cacheNameDest);
>>>> ) {
>>>> System.out.println("Ignite started.");
>>>> long start = System.currentTimeMillis();
>>>>
>>>> System.out.println("Cache size: " +
>>>> cacheDest.size(CachePeekMode.PRIMARY));
>>>> System.out.println("Default");
>>>> System.out.println("1d");
>>>>
>>>> IgniteBinary binaryDest = igniteDest.binary();
>>>>
>>>> try (BufferedReader br = new BufferedReader(new
>>>> FileReader(fileName))) {
>>>> int count = 0

Re: Performance of data stream on 3 cluster node.

2023-03-01 Thread John Smith
Ok thanks. I just thought the streamer would be more uniform.

On Wed, Mar 1, 2023 at 4:41 AM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> You might want to check the data distribution. You can use control.sh
> —cache distribution to do that.
>
> On 28 Feb 2023, at 20:32, John Smith  wrote:
>
> The last thing I can add to clarify is, the 3 node cluster is a
> centralized cluster and the CSV loader is a thick client running on its own
> machine.
>
> On Tue, Feb 28, 2023 at 2:52 PM John Smith  wrote:
>
>> Btw when I run a query like SELECT COLUMN_2, COUNT(COLUMN_1) FROM
>> MY_TABLE GROUP BY COLUMN_2; The query runs full tilt 100% on all 3 nodes
>> and returns in a respectable manager.
>>
>> So not sure whats going on but with the data streamer I guess most of the
>> writes are pushed to THE ONE node mostly and the others are busy making the
>> backups or the network to push/back up can't keep up?
>> The same behaviour happens with replicated table when using the data, one
>> node seems to be running almost 100% while the others hover at 40-50%
>> The fastest I could get the streamer to work is to turn off backups, but
>> same thing, one node runs full tilt while the others are "slowish"
>>
>> Queries are ok, all nodes are fully utilized.
>>
>> On Tue, Feb 28, 2023 at 12:54 PM John Smith 
>> wrote:
>>
>>> Hi so I'm using it in a pretty straight forward kind of way at least I
>>> think...
>>>
>>> I'm loading 35 million lines from CSV to an SQL table. Decided to use
>>> streamer as I figured it would still be allot faster than batching SQL
>>> INSERTS.
>>> I tried with backup=0 and backup=1 (Prefer to have backup on)
>>> 1- With 0 backups: 6 minutes to load
>>> 2- With 1 backups: 15 minutes to load.
>>>
>>> In both cases I still see the same behaviour, the 1 machine seems to be
>>> taking the brunt of the work...
>>>
>>> I'm reading a CSV file line by line and doing streamer.add()
>>>
>>> The table definition is as follows...
>>> CREATE TABLE PUBLIC.MY_TABLE (
>>> COLUMN_1 VARCHAR(32) NOT NULL,
>>> COLUMN_2 VARCHAR(64) NOT NULL,
>>> CONSTRAINT PHONE_CARRIER_IDS_PK PRIMARY KEY (COLUMN_1)
>>> ) with "template=parallelTpl, backups=0, key_type=String,
>>> value_type=MyObject";
>>> CREATE INDEX MY_TABLE_COLUMN_2_IDX ON PUBLIC.MY_TABLE (COLUMN_2);
>>>
>>> String fileName = "my_file";
>>>
>>> final String cacheNameDest = "MY_TABLE";
>>>
>>> try(
>>> Ignite igniteDest =
>>> configIgnite(Arrays.asList("...:47500..47509", "...:47500..47509",
>>> "...:47500..47509"), "ignite-dest");
>>> IgniteCache cacheDest =
>>> igniteDest.getOrCreateCache(cacheNameDest).withKeepBinary();
>>> IgniteDataStreamer streamer
>>> = igniteDest.dataStreamer(cacheNameDest);
>>> ) {
>>> System.out.println("Ignite started.");
>>> long start = System.currentTimeMillis();
>>>
>>> System.out.println("Cache size: " +
>>> cacheDest.size(CachePeekMode.PRIMARY));
>>> System.out.println("Default");
>>> System.out.println("1d");
>>>
>>> IgniteBinary binaryDest = igniteDest.binary();
>>>
>>> try (BufferedReader br = new BufferedReader(new
>>> FileReader(fileName))) {
>>> int count = 0;
>>>
>>> String line;
>>> while ((line = br.readLine()) != null) {
>>>
>>> String[] parts = line.split("\\|");
>>>
>>> BinaryObjectBuilder keyBuilder =
>>> binaryDest.builder("String");
>>> keyBuilder.setField("COLUMN_1", parts[1],
>>> String.class);
>>> BinaryObjectBuilder valueBuilder =
>>> binaryDest.builder("PhoneCarrier");
>>> valueBuilder.setField("COLUMN_2", parts[3],
>>> String.class);
>>>
>>> streamer.addData(keyBuilder.build(),
>>> valueBuilder.build());
>>>
>>> count++;
>>>
>>> if ((count % 1) == 0) {
>>>  

Re: Performance of data stream on 3 cluster node.

2023-02-28 Thread John Smith
The last thing I can add to clarify is, the 3 node cluster is a centralized
cluster and the CSV loader is a thick client running on its own machine.

On Tue, Feb 28, 2023 at 2:52 PM John Smith  wrote:

> Btw when I run a query like SELECT COLUMN_2, COUNT(COLUMN_1) FROM MY_TABLE
> GROUP BY COLUMN_2; The query runs full tilt 100% on all 3 nodes and returns
> in a respectable manager.
>
> So not sure whats going on but with the data streamer I guess most of the
> writes are pushed to THE ONE node mostly and the others are busy making the
> backups or the network to push/back up can't keep up?
> The same behaviour happens with replicated table when using the data, one
> node seems to be running almost 100% while the others hover at 40-50%
> The fastest I could get the streamer to work is to turn off backups, but
> same thing, one node runs full tilt while the others are "slowish"
>
> Queries are ok, all nodes are fully utilized.
>
> On Tue, Feb 28, 2023 at 12:54 PM John Smith 
> wrote:
>
>> Hi so I'm using it in a pretty straight forward kind of way at least I
>> think...
>>
>> I'm loading 35 million lines from CSV to an SQL table. Decided to use
>> streamer as I figured it would still be allot faster than batching SQL
>> INSERTS.
>> I tried with backup=0 and backup=1 (Prefer to have backup on)
>> 1- With 0 backups: 6 minutes to load
>> 2- With 1 backups: 15 minutes to load.
>>
>> In both cases I still see the same behaviour, the 1 machine seems to be
>> taking the brunt of the work...
>>
>> I'm reading a CSV file line by line and doing streamer.add()
>>
>> The table definition is as follows...
>> CREATE TABLE PUBLIC.MY_TABLE (
>> COLUMN_1 VARCHAR(32) NOT NULL,
>> COLUMN_2 VARCHAR(64) NOT NULL,
>> CONSTRAINT PHONE_CARRIER_IDS_PK PRIMARY KEY (COLUMN_1)
>> ) with "template=parallelTpl, backups=0, key_type=String,
>> value_type=MyObject";
>> CREATE INDEX MY_TABLE_COLUMN_2_IDX ON PUBLIC.MY_TABLE (COLUMN_2);
>>
>> String fileName = "my_file";
>>
>> final String cacheNameDest = "MY_TABLE";
>>
>> try(
>> Ignite igniteDest =
>> configIgnite(Arrays.asList("...:47500..47509", "...:47500..47509",
>> "...:47500..47509"), "ignite-dest");
>> IgniteCache cacheDest =
>> igniteDest.getOrCreateCache(cacheNameDest).withKeepBinary();
>> IgniteDataStreamer streamer =
>> igniteDest.dataStreamer(cacheNameDest);
>> ) {
>> System.out.println("Ignite started.");
>> long start = System.currentTimeMillis();
>>
>> System.out.println("Cache size: " +
>> cacheDest.size(CachePeekMode.PRIMARY));
>> System.out.println("Default");
>> System.out.println("1d");
>>
>> IgniteBinary binaryDest = igniteDest.binary();
>>
>> try (BufferedReader br = new BufferedReader(new
>> FileReader(fileName))) {
>> int count = 0;
>>
>> String line;
>> while ((line = br.readLine()) != null) {
>>
>> String[] parts = line.split("\\|");
>>
>> BinaryObjectBuilder keyBuilder =
>> binaryDest.builder("String");
>> keyBuilder.setField("COLUMN_1", parts[1],
>> String.class);
>> BinaryObjectBuilder valueBuilder =
>> binaryDest.builder("PhoneCarrier");
>> valueBuilder.setField("COLUMN_2", parts[3],
>> String.class);
>>
>> streamer.addData(keyBuilder.build(),
>> valueBuilder.build());
>>
>> count++;
>>
>> if ((count % 1) == 0) {
>> System.out.println(count);
>> }
>> }
>> streamer.flush();
>> long end = System.currentTimeMillis();
>> System.out.println("Ms: " + (end - start));
>> } catch (IOException e) {
>> e.printStackTrace();
>> }
>> }
>>
>> On Tue, Feb 28, 2023 at 11:00 AM Jeremy McMillan <
>> jeremy.mcmil...@gridgain.com> wrote:
>>
>>> Have you tried tracing the workload on the 100% and 40% nodes for
>>> comparison? There just isn't enough detail in your question to help predict
>

Re: Performance of data stream on 3 cluster node.

2023-02-28 Thread John Smith
Btw when I run a query like SELECT COLUMN_2, COUNT(COLUMN_1) FROM MY_TABLE
GROUP BY COLUMN_2; The query runs full tilt 100% on all 3 nodes and returns
in a respectable manager.

So not sure whats going on but with the data streamer I guess most of the
writes are pushed to THE ONE node mostly and the others are busy making the
backups or the network to push/back up can't keep up?
The same behaviour happens with replicated table when using the data, one
node seems to be running almost 100% while the others hover at 40-50%
The fastest I could get the streamer to work is to turn off backups, but
same thing, one node runs full tilt while the others are "slowish"

Queries are ok, all nodes are fully utilized.

On Tue, Feb 28, 2023 at 12:54 PM John Smith  wrote:

> Hi so I'm using it in a pretty straight forward kind of way at least I
> think...
>
> I'm loading 35 million lines from CSV to an SQL table. Decided to use
> streamer as I figured it would still be allot faster than batching SQL
> INSERTS.
> I tried with backup=0 and backup=1 (Prefer to have backup on)
> 1- With 0 backups: 6 minutes to load
> 2- With 1 backups: 15 minutes to load.
>
> In both cases I still see the same behaviour, the 1 machine seems to be
> taking the brunt of the work...
>
> I'm reading a CSV file line by line and doing streamer.add()
>
> The table definition is as follows...
> CREATE TABLE PUBLIC.MY_TABLE (
> COLUMN_1 VARCHAR(32) NOT NULL,
> COLUMN_2 VARCHAR(64) NOT NULL,
> CONSTRAINT PHONE_CARRIER_IDS_PK PRIMARY KEY (COLUMN_1)
> ) with "template=parallelTpl, backups=0, key_type=String,
> value_type=MyObject";
> CREATE INDEX MY_TABLE_COLUMN_2_IDX ON PUBLIC.MY_TABLE (COLUMN_2);
>
> String fileName = "my_file";
>
> final String cacheNameDest = "MY_TABLE";
>
> try(
> Ignite igniteDest =
> configIgnite(Arrays.asList("...:47500..47509", "...:47500..47509",
> "...:47500..47509"), "ignite-dest");
> IgniteCache cacheDest =
> igniteDest.getOrCreateCache(cacheNameDest).withKeepBinary();
> IgniteDataStreamer streamer =
> igniteDest.dataStreamer(cacheNameDest);
> ) {
> System.out.println("Ignite started.");
> long start = System.currentTimeMillis();
>
> System.out.println("Cache size: " +
> cacheDest.size(CachePeekMode.PRIMARY));
> System.out.println("Default");
> System.out.println("1d");
>
> IgniteBinary binaryDest = igniteDest.binary();
>
> try (BufferedReader br = new BufferedReader(new
> FileReader(fileName))) {
> int count = 0;
>
> String line;
> while ((line = br.readLine()) != null) {
>
> String[] parts = line.split("\\|");
>
> BinaryObjectBuilder keyBuilder =
> binaryDest.builder("String");
> keyBuilder.setField("COLUMN_1", parts[1],
> String.class);
> BinaryObjectBuilder valueBuilder =
> binaryDest.builder("PhoneCarrier");
> valueBuilder.setField("COLUMN_2", parts[3],
> String.class);
>
> streamer.addData(keyBuilder.build(),
> valueBuilder.build());
>
> count++;
>
> if ((count % 1) == 0) {
> System.out.println(count);
> }
> }
> streamer.flush();
> long end = System.currentTimeMillis();
> System.out.println("Ms: " + (end - start));
> } catch (IOException e) {
> e.printStackTrace();
> }
> }
>
> On Tue, Feb 28, 2023 at 11:00 AM Jeremy McMillan <
> jeremy.mcmil...@gridgain.com> wrote:
>
>> Have you tried tracing the workload on the 100% and 40% nodes for
>> comparison? There just isn't enough detail in your question to help predict
>> what should be happening with the cluster workload. For a starting point,
>> please identify your design goals. It's easy to get confused by advice that
>> seeks to help you do something you don't want to do.
>>
>> Some things to think about include how the stream workload is composed.
>> How should/would this work if there were only one node? How should behavior
>> change as nodes are added to the topology and the test is repeated?
>>
>> Gedanken: what if the data streamer is doing some really expensive
>> operations as it feeds the data into the stream, but the 

Re: Performance of data stream on 3 cluster node.

2023-02-28 Thread John Smith
Hi so I'm using it in a pretty straight forward kind of way at least I
think...

I'm loading 35 million lines from CSV to an SQL table. Decided to use
streamer as I figured it would still be allot faster than batching SQL
INSERTS.
I tried with backup=0 and backup=1 (Prefer to have backup on)
1- With 0 backups: 6 minutes to load
2- With 1 backups: 15 minutes to load.

In both cases I still see the same behaviour, the 1 machine seems to be
taking the brunt of the work...

I'm reading a CSV file line by line and doing streamer.add()

The table definition is as follows...
CREATE TABLE PUBLIC.MY_TABLE (
COLUMN_1 VARCHAR(32) NOT NULL,
COLUMN_2 VARCHAR(64) NOT NULL,
CONSTRAINT PHONE_CARRIER_IDS_PK PRIMARY KEY (COLUMN_1)
) with "template=parallelTpl, backups=0, key_type=String,
value_type=MyObject";
CREATE INDEX MY_TABLE_COLUMN_2_IDX ON PUBLIC.MY_TABLE (COLUMN_2);

String fileName = "my_file";

final String cacheNameDest = "MY_TABLE";

try(
Ignite igniteDest =
configIgnite(Arrays.asList("...:47500..47509", "...:47500..47509",
"...:47500..47509"), "ignite-dest");
IgniteCache cacheDest =
igniteDest.getOrCreateCache(cacheNameDest).withKeepBinary();
IgniteDataStreamer streamer =
igniteDest.dataStreamer(cacheNameDest);
) {
System.out.println("Ignite started.");
long start = System.currentTimeMillis();

System.out.println("Cache size: " +
cacheDest.size(CachePeekMode.PRIMARY));
System.out.println("Default");
System.out.println("1d");

IgniteBinary binaryDest = igniteDest.binary();

try (BufferedReader br = new BufferedReader(new
FileReader(fileName))) {
int count = 0;

String line;
while ((line = br.readLine()) != null) {

String[] parts = line.split("\\|");

BinaryObjectBuilder keyBuilder =
binaryDest.builder("String");
keyBuilder.setField("COLUMN_1", parts[1], String.class);
BinaryObjectBuilder valueBuilder =
binaryDest.builder("PhoneCarrier");
valueBuilder.setField("COLUMN_2", parts[3],
String.class);

streamer.addData(keyBuilder.build(),
valueBuilder.build());

count++;

if ((count % 1) == 0) {
System.out.println(count);
}
}
streamer.flush();
long end = System.currentTimeMillis();
System.out.println("Ms: " + (end - start));
} catch (IOException e) {
e.printStackTrace();
}
}

On Tue, Feb 28, 2023 at 11:00 AM Jeremy McMillan <
jeremy.mcmil...@gridgain.com> wrote:

> Have you tried tracing the workload on the 100% and 40% nodes for
> comparison? There just isn't enough detail in your question to help predict
> what should be happening with the cluster workload. For a starting point,
> please identify your design goals. It's easy to get confused by advice that
> seeks to help you do something you don't want to do.
>
> Some things to think about include how the stream workload is composed.
> How should/would this work if there were only one node? How should behavior
> change as nodes are added to the topology and the test is repeated?
>
> Gedanken: what if the data streamer is doing some really expensive
> operations as it feeds the data into the stream, but the nodes can very
> cheaply put the processed data into their cache partitions? In this case,
> for example, the expensive operations should be refactored into a stream
> transformer that will move the workload from the stream sender to the
> stream receivers.
> https://ignite.apache.org/docs/latest/data-streaming#stream-transformer
>
> Also gedanken: what if the data distribution is skewed such that one node
> gets more data than 2x the data sent to other partitions because of
> affinity? In this case, for example, changes to affinity/colocation design
> or changes to cluster topology (more nodes with greater CPU to RAM ratio?)
> can help distribute the load so that no single node becomes a bottleneck.
>
> On Tue, Feb 28, 2023 at 9:27 AM John Smith  wrote:
>
>> Hi I'm using the data streamer to insert into a 3 cluster node. I have
>> noticed that 1 node is pegging at 100% cpu while the others are at 40ish %.
>>
>> Is that normal?
>>
>>
>>


Performance of data stream on 3 cluster node.

2023-02-28 Thread John Smith
Hi I'm using the data streamer to insert into a 3 cluster node. I have
noticed that 1 node is pegging at 100% cpu while the others are at 40ish %.

Is that normal?


Re: Does peer class loading have to be enabled on both server and client node?

2023-02-27 Thread John Smith
Sorry to be clear the applications are thick clients but client = true flag
is enabled.

On Mon, Feb 27, 2023 at 11:09 AM John Smith  wrote:

> Oh god! Forget that then! lol Really? So If I have 10 applications they
> all need to be recompiled/reconfigured and redeployed?
>
> On Mon, Feb 27, 2023 at 11:07 AM Stephen Darlington <
> stephen.darling...@gridgain.com> wrote:
>
>> It’s a cluster wide setting and needs to be set to the same value on all
>> nodes, both server and thick-client.
>>
>> > On 27 Feb 2023, at 15:58, John Smith  wrote:
>> >
>> > I have 3 node clusters and I'm trying to enable peer class loading on
>> the cluster, but it keeps shutting off after restart because it says the
>> remote node doesn't have it enabled.
>> >
>> > So is peer class loading required to be enabled on the server nodes or
>> can it just be enabled per client that needs it?
>>
>>


Re: Does peer class loading have to be enabled on both server and client node?

2023-02-27 Thread John Smith
Oh god! Forget that then! lol Really? So If I have 10 applications they all
need to be recompiled/reconfigured and redeployed?

On Mon, Feb 27, 2023 at 11:07 AM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> It’s a cluster wide setting and needs to be set to the same value on all
> nodes, both server and thick-client.
>
> > On 27 Feb 2023, at 15:58, John Smith  wrote:
> >
> > I have 3 node clusters and I'm trying to enable peer class loading on
> the cluster, but it keeps shutting off after restart because it says the
> remote node doesn't have it enabled.
> >
> > So is peer class loading required to be enabled on the server nodes or
> can it just be enabled per client that needs it?
>
>


Does peer class loading have to be enabled on both server and client node?

2023-02-27 Thread John Smith
I have 3 node clusters and I'm trying to enable peer class loading on the
cluster, but it keeps shutting off after restart because it says the remote
node doesn't have it enabled.

So is peer class loading required to be enabled on the server nodes or can
it just be enabled per client that needs it?


Re: Apache ignite.sh script can't detect CPUs or network etc///

2023-02-24 Thread John Smith
Ok figured out. Needed to delete the cache files for the cache in
maintenance mode.

On Thu., Feb. 23, 2023, 6:03 p.m. John Smith, 
wrote:

> Ok i noticed this when I put -v
>
> Node is being started in maintenance mode. Starting IsolatedDiscoverySpi
> instead of configured discovery SPI.
>
> How do I fix this? I guess resizing the VM fucked up soemthing.
>
> On Thu, Feb 23, 2023 at 5:36 PM John Smith  wrote:
>
>> Hi I'm running Ignite in openstack and I attempted to resize the virtual
>> machine, that part seems to have worked but now when I restart ignite it
>> can't seem to join the cluster.
>>
>> It seems maybe like a permission issue but not sure... Any ideas?
>>
>> 1- Ignite was installed using deb package.
>> 2- When I run "sudo /usr/share/apache-ignite/bin/ignite.sh" the node
>> detects the OS info and displays: Topology snapshot [ver=1,
>> locNode=532afd5c, servers=1, clients=0, state=ACTIVE, CPUs=4,
>> offheap=6.3GB, heap=7.9GB]
>> 3- When I run sudo /usr/share/apache-ignite/bin/ignite.sh
>> /etc/apache-ignite/basic-config.xml" the node can't detect the info and
>> displays: Topology snapshot [ver=1, locNode=1ea0058d, servers=1, clients=0,
>> state=INACTIVE, CPUs=-1, offheap=3.0GB, heap=0.1GB]
>> 4- Same with Systemd service as #3 above. This used to work before.
>>
>


Re: Apache ignite.sh script can't detect CPUs or network etc///

2023-02-23 Thread John Smith
Ok i noticed this when I put -v

Node is being started in maintenance mode. Starting IsolatedDiscoverySpi
instead of configured discovery SPI.

How do I fix this? I guess resizing the VM fucked up soemthing.

On Thu, Feb 23, 2023 at 5:36 PM John Smith  wrote:

> Hi I'm running Ignite in openstack and I attempted to resize the virtual
> machine, that part seems to have worked but now when I restart ignite it
> can't seem to join the cluster.
>
> It seems maybe like a permission issue but not sure... Any ideas?
>
> 1- Ignite was installed using deb package.
> 2- When I run "sudo /usr/share/apache-ignite/bin/ignite.sh" the node
> detects the OS info and displays: Topology snapshot [ver=1,
> locNode=532afd5c, servers=1, clients=0, state=ACTIVE, CPUs=4,
> offheap=6.3GB, heap=7.9GB]
> 3- When I run sudo /usr/share/apache-ignite/bin/ignite.sh
> /etc/apache-ignite/basic-config.xml" the node can't detect the info and
> displays: Topology snapshot [ver=1, locNode=1ea0058d, servers=1, clients=0,
> state=INACTIVE, CPUs=-1, offheap=3.0GB, heap=0.1GB]
> 4- Same with Systemd service as #3 above. This used to work before.
>


Apache ignite.sh script can't detect CPUs or network etc///

2023-02-23 Thread John Smith
Hi I'm running Ignite in openstack and I attempted to resize the virtual
machine, that part seems to have worked but now when I restart ignite it
can't seem to join the cluster.

It seems maybe like a permission issue but not sure... Any ideas?

1- Ignite was installed using deb package.
2- When I run "sudo /usr/share/apache-ignite/bin/ignite.sh" the node
detects the OS info and displays: Topology snapshot [ver=1,
locNode=532afd5c, servers=1, clients=0, state=ACTIVE, CPUs=4,
offheap=6.3GB, heap=7.9GB]
3- When I run sudo /usr/share/apache-ignite/bin/ignite.sh
/etc/apache-ignite/basic-config.xml" the node can't detect the info and
displays: Topology snapshot [ver=1, locNode=1ea0058d, servers=1, clients=0,
state=INACTIVE, CPUs=-1, offheap=3.0GB, heap=0.1GB]
4- Same with Systemd service as #3 above. This used to work before.


Re: How to avoid "all partition owners have left the grid" or handle automatically.

2023-02-21 Thread John Smith
ok, not sure what happened but I'm pretty sure it was one machine at a
time. But ok.

So just to be clear, with backup = 1 then we can lose 1 machine for any
amount of time until it comes back fully online and rebalanced before going
to the next machine?

On Tue, Feb 21, 2023 at 10:01 AM Stephen Darlington <
stephen.darling...@gridgain.com> wrote:

> I think there is an argument that when you have persistence enabled and a
> sensible partition loss policy, then you shouldn’t have to reset lost
> partitions. As you note, the data is still consistent. You’ve just
> temporarily lost some availability.
>
> However, that’s not how it currently works. If you shut down more nodes
> than you have backups, then you have to reset lost partitions.
>
> On 20 Feb 2023, at 18:14, John Smith  wrote:
>
> My cache config for distributed cache is as follows... The maintenance of
> a machine can be about 10-20 mins depending on what the maintenance is. I
> don't lose data. I just get "all partition owners have left" message and
> then I just use control script to reset the  flag for that specific cache.
>
>  class="org.apache.ignite.configuration.CacheConfiguration">  
>   value="1" /> 
> 
>
> On Mon., Feb. 20, 2023, 7:03 a.m. Stephen Darlington, <
> stephen.darling...@gridgain.com> wrote:
>
>> How are your caches configured? If they have at least one backup, you
>> should be able to restart one node at a time without data loss.
>>
>> There is no automated way to reset lost partitions. Nor should there be
>> (IMHO). If you have lost partitions, you have probably lost data. That
>> should require manual intervention.
>>
>> On 14 Feb 2023, at 17:58, John Smith  wrote:
>>
>> Hello, does anyone have insights on this?
>>
>> On Thu., Feb. 9, 2023, 4:28 p.m. John Smith, 
>> wrote:
>>
>>> Any thoughts on this?
>>>
>>> On Mon., Feb. 6, 2023, 8:38 p.m. John Smith, 
>>> wrote:
>>>
>>>> That Jira doesn't look like the issue at all. That issue seems to
>>>> suggest that there is a "data loss" exception. In our case the grid sets
>>>> the cache in a "safe" mode... "all partition owners have left the grid"
>>>> which requires us to then manually reset the flag.
>>>>
>>>> On Mon, Feb 6, 2023 at 7:46 PM 18624049226 <18624049...@163.com> wrote:
>>>>
>>>>> https://issues.apache.org/jira/browse/IGNITE-17657
>>>>> 在 2023/2/7 05:41, John Smith 写道:
>>>>>
>>>>> Hi, sometimes when we perform maintenance and reboot nodes we get "All
>>>>> partition owners have left the grid" and then we go and run ./control.sh
>>>>> --host ignite-xx --cache reset_lost_partitions some-cache and
>>>>> everything is fine again...
>>>>>
>>>>> This seems to happen with partitioned caches and we are running as
>>>>> READ_WRITE_SAFE.
>>>>>
>>>>> We have a few caches and instead of relying on a human to manually go
>>>>> run the command is there a way for this to happen automatically?
>>>>>
>>>>> And if there is an automatic way how do we enable it and what are the
>>>>> consequences?
>>>>>
>>>>>
>>
>


Re: How to avoid "all partition owners have left the grid" or handle automatically.

2023-02-20 Thread John Smith
My cache config for distributed cache is as follows... The maintenance of a
machine can be about 10-20 mins depending on what the maintenance is. I
don't lose data. I just get "all partition owners have left" message and
then I just use control script to reset the  flag for that specific cache.

  
  


On Mon., Feb. 20, 2023, 7:03 a.m. Stephen Darlington, <
stephen.darling...@gridgain.com> wrote:

> How are your caches configured? If they have at least one backup, you
> should be able to restart one node at a time without data loss.
>
> There is no automated way to reset lost partitions. Nor should there be
> (IMHO). If you have lost partitions, you have probably lost data. That
> should require manual intervention.
>
> On 14 Feb 2023, at 17:58, John Smith  wrote:
>
> Hello, does anyone have insights on this?
>
> On Thu., Feb. 9, 2023, 4:28 p.m. John Smith, 
> wrote:
>
>> Any thoughts on this?
>>
>> On Mon., Feb. 6, 2023, 8:38 p.m. John Smith, 
>> wrote:
>>
>>> That Jira doesn't look like the issue at all. That issue seems to
>>> suggest that there is a "data loss" exception. In our case the grid sets
>>> the cache in a "safe" mode... "all partition owners have left the grid"
>>> which requires us to then manually reset the flag.
>>>
>>> On Mon, Feb 6, 2023 at 7:46 PM 18624049226 <18624049...@163.com> wrote:
>>>
>>>> https://issues.apache.org/jira/browse/IGNITE-17657
>>>> 在 2023/2/7 05:41, John Smith 写道:
>>>>
>>>> Hi, sometimes when we perform maintenance and reboot nodes we get "All
>>>> partition owners have left the grid" and then we go and run ./control.sh
>>>> --host ignite-xx --cache reset_lost_partitions some-cache and
>>>> everything is fine again...
>>>>
>>>> This seems to happen with partitioned caches and we are running as
>>>> READ_WRITE_SAFE.
>>>>
>>>> We have a few caches and instead of relying on a human to manually go
>>>> run the command is there a way for this to happen automatically?
>>>>
>>>> And if there is an automatic way how do we enable it and what are the
>>>> consequences?
>>>>
>>>>
>


Re: How to avoid "all partition owners have left the grid" or handle automatically.

2023-02-14 Thread John Smith
Hello, does anyone have insights on this?

On Thu., Feb. 9, 2023, 4:28 p.m. John Smith,  wrote:

> Any thoughts on this?
>
> On Mon., Feb. 6, 2023, 8:38 p.m. John Smith, 
> wrote:
>
>> That Jira doesn't look like the issue at all. That issue seems to suggest
>> that there is a "data loss" exception. In our case the grid sets the cache
>> in a "safe" mode... "all partition owners have left the grid" which
>> requires us to then manually reset the flag.
>>
>> On Mon, Feb 6, 2023 at 7:46 PM 18624049226 <18624049...@163.com> wrote:
>>
>>> https://issues.apache.org/jira/browse/IGNITE-17657
>>> 在 2023/2/7 05:41, John Smith 写道:
>>>
>>> Hi, sometimes when we perform maintenance and reboot nodes we get "All
>>> partition owners have left the grid" and then we go and run ./control.sh
>>> --host ignite-xx --cache reset_lost_partitions some-cache and
>>> everything is fine again...
>>>
>>> This seems to happen with partitioned caches and we are running as
>>> READ_WRITE_SAFE.
>>>
>>> We have a few caches and instead of relying on a human to manually go
>>> run the command is there a way for this to happen automatically?
>>>
>>> And if there is an automatic way how do we enable it and what are the
>>> consequences?
>>>
>>>


Re: How to avoid "all partition owners have left the grid" or handle automatically.

2023-02-09 Thread John Smith
Any thoughts on this?

On Mon., Feb. 6, 2023, 8:38 p.m. John Smith,  wrote:

> That Jira doesn't look like the issue at all. That issue seems to suggest
> that there is a "data loss" exception. In our case the grid sets the cache
> in a "safe" mode... "all partition owners have left the grid" which
> requires us to then manually reset the flag.
>
> On Mon, Feb 6, 2023 at 7:46 PM 18624049226 <18624049...@163.com> wrote:
>
>> https://issues.apache.org/jira/browse/IGNITE-17657
>> 在 2023/2/7 05:41, John Smith 写道:
>>
>> Hi, sometimes when we perform maintenance and reboot nodes we get "All
>> partition owners have left the grid" and then we go and run ./control.sh
>> --host ignite-xx --cache reset_lost_partitions some-cache and
>> everything is fine again...
>>
>> This seems to happen with partitioned caches and we are running as
>> READ_WRITE_SAFE.
>>
>> We have a few caches and instead of relying on a human to manually go run
>> the command is there a way for this to happen automatically?
>>
>> And if there is an automatic way how do we enable it and what are the
>> consequences?
>>
>>


Re: How to avoid "all partition owners have left the grid" or handle automatically.

2023-02-06 Thread John Smith
That Jira doesn't look like the issue at all. That issue seems to suggest
that there is a "data loss" exception. In our case the grid sets the cache
in a "safe" mode... "all partition owners have left the grid" which
requires us to then manually reset the flag.

On Mon, Feb 6, 2023 at 7:46 PM 18624049226 <18624049...@163.com> wrote:

> https://issues.apache.org/jira/browse/IGNITE-17657
> 在 2023/2/7 05:41, John Smith 写道:
>
> Hi, sometimes when we perform maintenance and reboot nodes we get "All
> partition owners have left the grid" and then we go and run ./control.sh
> --host ignite-xx --cache reset_lost_partitions some-cache and
> everything is fine again...
>
> This seems to happen with partitioned caches and we are running as
> READ_WRITE_SAFE.
>
> We have a few caches and instead of relying on a human to manually go run
> the command is there a way for this to happen automatically?
>
> And if there is an automatic way how do we enable it and what are the
> consequences?
>
>


How to avoid "all partition owners have left the grid" or handle automatically.

2023-02-06 Thread John Smith
Hi, sometimes when we perform maintenance and reboot nodes we get "All
partition owners have left the grid" and then we go and run ./control.sh
--host ignite-xx --cache reset_lost_partitions some-cache and
everything is fine again...

This seems to happen with partitioned caches and we are running as
READ_WRITE_SAFE.

We have a few caches and instead of relying on a human to manually go run
the command is there a way for this to happen automatically?

And if there is an automatic way how do we enable it and what are the
consequences?


Re: Cluster shutdown by "to many files open"

2022-12-16 Thread John Smith
It started fine and it also recreated the partitions for that cache.

On Fri, Dec 16, 2022 at 10:48 AM John Smith  wrote:

> Weird because after restart and just deleting the
> work/db/node-xx/cache-my-cache folder on the node that shutdown. It
> started up fine and I have the same amount of caches...
> And sudo lsof -p -a PID returns only 3600 files
>
> On Fri, Dec 16, 2022 at 12:10 AM Gianluca Bonetti <
> gianluca.bone...@gmail.com> wrote:
>
>> Hello
>>
>> I had the same problem, with far more caches in use, as total number (but
>> each cache was very small in size).
>>
>> 32768 files is definitely too low.
>> In my ase, I had to raise it to 262144 hard limit and 131072 soft limit.
>> Please update your /etc/security/limits.conf records for the user you run
>> your app with.
>>
>> I also raised fs.file-max to 2097152 which may be excessive, but I don't
>> see a problem with setting it that high.
>>
>> Cheers
>> Gianluca
>>
>> On Fri, 16 Dec 2022 at 01:39, John Smith  wrote:
>>
>>> Hi it seems the JVM was forcefully shutdown when I tried to create a new
>>> partitioned cache.
>>>
>>> The error seems to indicate that it was "too many files" can someone
>>> from Ignite confirm this?
>>>
>>> I have checked with lsof and Ignite only has about 3600 files open. It's
>>> the only service running on that server. So I don't see how this could
>>> happen? I have a total of 10 caches mixed between replicated and
>>> partitioned (1 backup) over 3 nodes.
>>>
>>> I have
>>>
>>> fs.file-max:30
>>> and
>>> - softnofile  32768
>>> - hardnofile  32768
>>> respectively on each node.
>>>
>>> What I did was delete the db/folder for that specific cache on that node
>>> and when I restarted it. It worked and recreated the folder for that cache.
>>>
>>> https://www.dropbox.com/s/zwf28akser9p4dt/ignite-XX.0.log?dl=0
>>>
>>


Re: Cluster shutdown by "to many files open"

2022-12-16 Thread John Smith
Weird because after restart and just deleting the
work/db/node-xx/cache-my-cache folder on the node that shutdown. It
started up fine and I have the same amount of caches...
And sudo lsof -p -a PID returns only 3600 files

On Fri, Dec 16, 2022 at 12:10 AM Gianluca Bonetti <
gianluca.bone...@gmail.com> wrote:

> Hello
>
> I had the same problem, with far more caches in use, as total number (but
> each cache was very small in size).
>
> 32768 files is definitely too low.
> In my ase, I had to raise it to 262144 hard limit and 131072 soft limit.
> Please update your /etc/security/limits.conf records for the user you run
> your app with.
>
> I also raised fs.file-max to 2097152 which may be excessive, but I don't
> see a problem with setting it that high.
>
> Cheers
> Gianluca
>
> On Fri, 16 Dec 2022 at 01:39, John Smith  wrote:
>
>> Hi it seems the JVM was forcefully shutdown when I tried to create a new
>> partitioned cache.
>>
>> The error seems to indicate that it was "too many files" can someone from
>> Ignite confirm this?
>>
>> I have checked with lsof and Ignite only has about 3600 files open. It's
>> the only service running on that server. So I don't see how this could
>> happen? I have a total of 10 caches mixed between replicated and
>> partitioned (1 backup) over 3 nodes.
>>
>> I have
>>
>> fs.file-max:30
>> and
>> - softnofile  32768
>> - hardnofile  32768
>> respectively on each node.
>>
>> What I did was delete the db/folder for that specific cache on that node
>> and when I restarted it. It worked and recreated the folder for that cache.
>>
>> https://www.dropbox.com/s/zwf28akser9p4dt/ignite-XX.0.log?dl=0
>>
>


Cluster shutdown by "to many files open"

2022-12-15 Thread John Smith
Hi it seems the JVM was forcefully shutdown when I tried to create a new
partitioned cache.

The error seems to indicate that it was "too many files" can someone from
Ignite confirm this?

I have checked with lsof and Ignite only has about 3600 files open. It's
the only service running on that server. So I don't see how this could
happen? I have a total of 10 caches mixed between replicated and
partitioned (1 backup) over 3 nodes.

I have

fs.file-max:30
and
- softnofile  32768
- hardnofile  32768
respectively on each node.

What I did was delete the db/folder for that specific cache on that node
and when I restarted it. It worked and recreated the folder for that cache.

https://www.dropbox.com/s/zwf28akser9p4dt/ignite-XX.0.log?dl=0


Re: Apache Hudi + Apache Ignite

2022-09-25 Thread John Smith
Something like this?

https://ignite.apache.org/use-cases/hadoop-acceleration.html

On Thu., Sep. 22, 2022, 3:44 a.m. Stephen Darlington, <
stephen.darling...@gridgain.com> wrote:

> I don’t know of anyone doing this, however it looks like it should be
> possible.
>
> According to a quick skim of the docs, to read/write to Hudl you need
> Flink or Spark. To use the Cache Store (read/write-through) you’d need to
> embed one of those inside Ignite, so plenty of opportunity for “dependency
> hell.” I do know of one project where they embedded Spark.
>
> On 22 Sep 2022, at 03:58, Tecno Brain 
> wrote:
>
> I have heard of a tool called Alluxio used between Hudi and Spark/Presto. (
> https://www.alluxio.io/blog/building-high-performance-data-lake-using-apache-hudi-and-alluxio-at-t3go/
> )
> I was wondering if Apache Ignite could serve the same purpose, allowing
> queries to be processed faster.
>
> On Thu, Sep 15, 2022 at 10:29 AM Jeremy McMillan <
> jeremy.mcmil...@gridgain.com> wrote:
>
>> I just read this, about hudi, and I can't see a use case for putting hudi
>> behind an Ignite write-through cache.
>>
>> https://www.xenonstack.com/insights/what-is-hudi
>>
>> Hudi seems to be a write accelerator for Spark on HDFS, primarily.
>>
>> What would the expected outcome be if we assume the magic integration was
>> present and working as you intend? What's the difference between that and
>> not using Ignite with Hudi?
>>
>> On Wed, Sep 14, 2022, 22:50 Tecno Brain 
>> wrote:
>>
>>> In particular I am looking if anyone has used Apache Ignite as a
>>> write-through cache to Hudi.
>>> Does that make sense?
>>>
>>> On Wed, Sep 14, 2022 at 10:50 PM Tecno Brain <
>>> cerebrotecnolog...@gmail.com> wrote:
>>>
 I was wondering if anybody has used Hudi + Ignite?
 Any references to articles, conferences are greatly appreciated.

 Thanks




>


Re: Re[2]: What is data-streamer-stripe threasd?

2022-09-19 Thread John Smith
Nah, it's fine just wanted to make sure what it was. Unless you think I
should log at least an issue?


On Wed, Sep 14, 2022 at 3:13 AM Zhenya Stanilovsky via user <
user@ignite.apache.org> wrote:

> Yep, i already mention that you can`t disable this pool at all and 1
> worker thread still be visible.
> You can fill the issue but i can`t guarantee that it would be completed
> soon, or can do it yourself and present pull request.
>
> best.
>
>
> Ok so just to understand on the client side. Set the pool size for data
> streamer to 1.
>
> But it will still look blocked?
>
> On Mon., Sep. 12, 2022, 8:59 a.m. Zhenya Stanilovsky via user, <
> user@ignite.apache.org
> > wrote:
>
> John, seems all you can here is just to set this pool size into «1» , «0»
> — tends to error.
>
>
> https://ignite.apache.org/docs/latest/data-streaming#configuring-data-streamer-thread-pool-size
>
> 1 thread will still be frozen in such a case.
>
>
>
>
>
>
> Hi I'm profiling my application through YourKit and it indicates that a
> bunch of these threads (data-streamer-stripe) are "frozen" for 21 days.
> This
>
> I'm not using data streaming, is there a way to disable it or just ignore
> the messages? The application is configured as thick client (client = true)
>
>
>
>
>
>
>
>
>
>
>


Re: What is data-streamer-stripe threasd?

2022-09-13 Thread John Smith
Ok so just to understand on the client side. Set the pool size for data
streamer to 1.

But it will still look blocked?

On Mon., Sep. 12, 2022, 8:59 a.m. Zhenya Stanilovsky via user, <
user@ignite.apache.org> wrote:

> John, seems all you can here is just to set this pool size into «1» , «0»
> — tends to error.
>
>
> https://ignite.apache.org/docs/latest/data-streaming#configuring-data-streamer-thread-pool-size
>
> 1 thread will still be frozen in such a case.
>
>
>
>
>
>
> Hi I'm profiling my application through YourKit and it indicates that a
> bunch of these threads (data-streamer-stripe) are "frozen" for 21 days.
> This
>
> I'm not using data streaming, is there a way to disable it or just ignore
> the messages? The application is configured as thick client (client = true)
>
>
>
>
>
>
>


What is data-streamer-stripe threasd?

2022-09-09 Thread John Smith
Hi I'm profiling my application through YourKit and it indicates that a
bunch of these threads (data-streamer-stripe) are "frozen" for 21 days.
This

I'm not using data streaming, is there a way to disable it or just ignore
the messages? The application is configured as thick client (client = true)


Re: What does javax.cache.CacheException: Failed to execute map query on remote node mean?

2022-08-31 Thread John Smith
Ok but since I dropped and recreated the table I'm fine? It won't somehow
throw that error again? And if I upgrade to 2.13 from 2.12 will I have the
same issue?

On Wed, Aug 31, 2022 at 3:31 PM Alex Plehanov 
wrote:

> John Smith,
>
> Thank you. This issue will be fixed in upcoming 2.14.
>
> ср, 31 авг. 2022 г. в 21:50, John Smith :
>
>> Here it is... And yes I recently upgraded to 2.12 from 2.8.1
>>
>> create table if not exists car_code (
>> provider_id int,
>> car_id int,
>> car_code varchar(16),
>> primary key (provider_id, car_id)
>> ) with "template=replicatedTpl, key_type=CarCodeKey, value_type=CarCode";
>>
>> On Wed, Aug 31, 2022 at 7:25 AM Alex Plehanov 
>> wrote:
>>
>>> John Smith,
>>>
>>> Can you please show DDL for the car_code table? Does PK of this table
>>> include provider_id or car_code columns?
>>> I found a compatibility issue, with the same behaviour, it happens when
>>> storage created with Ignite version before 2.11 is used with the newer
>>> Ignite version. Have you upgraded the dev environment with existing storage
>>> recently (before starting to get this error)?
>>>
>>>
>>> чт, 4 авг. 2022 г. в 17:06, John Smith :
>>>
>>>> Let me know if that makes any sense, because the test data is the same
>>>> and the application code is the same. Only dropped and created the table
>>>> again using DbEaver.
>>>>
>>>> On Wed, Aug 3, 2022 at 11:39 AM John Smith 
>>>> wrote:
>>>>
>>>>> Hi, so I dropped the table and simply recreated it. Did NOT restart
>>>>> the application.
>>>>>
>>>>> Now it works fine.
>>>>>
>>>>> On Wed, Aug 3, 2022 at 9:58 AM John Smith 
>>>>> wrote:
>>>>>
>>>>>> How? The code is 100% the same between production and dev. And it's
>>>>>> part of a bigger application.
>>>>>>
>>>>>> Only dev has the issue. I will drop and recreate the table if that
>>>>>> fixes the issue then what?
>>>>>>
>>>>>> You are saying mismatch, it's a string period.
>>>>>>
>>>>>> "select car_id from car_code where provider_id = ? and car_code = ? 
>>>>>> order by car_id asc limit 1;"
>>>>>>
>>>>>>
>>>>>> The first parameter is Integer and the second one is String. there's
>>>>>> no way this can mismatch... And even if the String was a UUID it's still 
>>>>>> a
>>>>>> string.
>>>>>>
>>>>>> public JsonArray query(final String sql, final long timeoutMs,
>>>>>> final Object... args) {
>>>>>> SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args);
>>>>>> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS);
>>>>>>
>>>>>> try (QueryCursor> cursor = cache.query(query)) {
>>>>>> List rows = new ArrayList<>();
>>>>>> Iterator> iterator = cursor.iterator();
>>>>>>
>>>>>> while(iterator.hasNext()) {
>>>>>> List currentRow = iterator.next();
>>>>>> JsonArray row = new JsonArray();
>>>>>>
>>>>>> currentRow.forEach(o -> row.add(o));
>>>>>>
>>>>>> rows.add(row);
>>>>>> }
>>>>>>
>>>>>> promise.tryComplete(rows);
>>>>>> } catch(Exception ex) {
>>>>>> ex.printStackTrace();
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> Integer providerId = 1;
>>>>>> String carCode = "FOO";
>>>>>>
>>>>>> query("select car_id from car_code where provider_id = ? and
>>>>>> car_code = ? order by car_id asc limit 1;", 3000, providerId, cardCode);
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 3, 2022 at 6:50 AM Taras Ledkov 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi John and Don,
>>>>>>>
>>>>>>> I guess the root cause in the data types mismatch between table
>>>>>>> schema and actual data at the store or type of the query parameter.
>>>>>>> To explore the gap, it would be very handy if you could provide a
>>>>>>> small reproducer (standalone project or PR somewhere).
>>>>>>>
>>>>>>> > In my case I'm not even using UUID fields. Also the same code 2
>>>>>>> diff environment dev vs prod doesn't cause the issue. I'm lucky enough 
>>>>>>> that
>>>>>>> it's on dev and prod is ok.
>>>>>>> >
>>>>>>> > But that last part might be misleading because in prod I think it
>>>>>>> happened early on during upgrade and all I did was recreate the sql 
>>>>>>> table.
>>>>>>> >
>>>>>>> > So before I do the same on dev... I want to see what the issue is.
>>>>>>> >
>>>>>>> > On Tue., Aug. 2, 2022, 6:06 p.m. ,  wrote:
>>>>>>> >
>>>>>>> >> I‘m only speculating but this looks very similar to the issue I
>>>>>>> had last week and reported to the group here.
>>>>>>> >>
>>>>>>> >> Caused by: org.h2.message.DbException: Hexadecimal string with
>>>>>>> odd number of characters: "5" [90003-197]
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> With best regards,
>>>>>>> Taras Ledkov
>>>>>>>
>>>>>>


Re: What does javax.cache.CacheException: Failed to execute map query on remote node mean?

2022-08-31 Thread John Smith
Here it is... And yes I recently upgraded to 2.12 from 2.8.1

create table if not exists car_code (
provider_id int,
car_id int,
car_code varchar(16),
primary key (provider_id, car_id)
) with "template=replicatedTpl, key_type=CarCodeKey, value_type=CarCode";

On Wed, Aug 31, 2022 at 7:25 AM Alex Plehanov 
wrote:

> John Smith,
>
> Can you please show DDL for the car_code table? Does PK of this table
> include provider_id or car_code columns?
> I found a compatibility issue, with the same behaviour, it happens when
> storage created with Ignite version before 2.11 is used with the newer
> Ignite version. Have you upgraded the dev environment with existing storage
> recently (before starting to get this error)?
>
>
> чт, 4 авг. 2022 г. в 17:06, John Smith :
>
>> Let me know if that makes any sense, because the test data is the same
>> and the application code is the same. Only dropped and created the table
>> again using DbEaver.
>>
>> On Wed, Aug 3, 2022 at 11:39 AM John Smith 
>> wrote:
>>
>>> Hi, so I dropped the table and simply recreated it. Did NOT restart the
>>> application.
>>>
>>> Now it works fine.
>>>
>>> On Wed, Aug 3, 2022 at 9:58 AM John Smith 
>>> wrote:
>>>
>>>> How? The code is 100% the same between production and dev. And it's
>>>> part of a bigger application.
>>>>
>>>> Only dev has the issue. I will drop and recreate the table if that
>>>> fixes the issue then what?
>>>>
>>>> You are saying mismatch, it's a string period.
>>>>
>>>> "select car_id from car_code where provider_id = ? and car_code = ? order 
>>>> by car_id asc limit 1;"
>>>>
>>>>
>>>> The first parameter is Integer and the second one is String. there's no
>>>> way this can mismatch... And even if the String was a UUID it's still a
>>>> string.
>>>>
>>>> public JsonArray query(final String sql, final long timeoutMs,
>>>> final Object... args) {
>>>> SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args);
>>>> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS);
>>>>
>>>> try (QueryCursor> cursor = cache.query(query)) {
>>>> List rows = new ArrayList<>();
>>>> Iterator> iterator = cursor.iterator();
>>>>
>>>> while(iterator.hasNext()) {
>>>> List currentRow = iterator.next();
>>>> JsonArray row = new JsonArray();
>>>>
>>>> currentRow.forEach(o -> row.add(o));
>>>>
>>>> rows.add(row);
>>>> }
>>>>
>>>> promise.tryComplete(rows);
>>>> } catch(Exception ex) {
>>>> ex.printStackTrace();
>>>> }
>>>> }
>>>>
>>>> Integer providerId = 1;
>>>> String carCode = "FOO";
>>>>
>>>> query("select car_id from car_code where provider_id = ? and
>>>> car_code = ? order by car_id asc limit 1;", 3000, providerId, cardCode);
>>>>
>>>>
>>>>
>>>> On Wed, Aug 3, 2022 at 6:50 AM Taras Ledkov  wrote:
>>>>
>>>>> Hi John and Don,
>>>>>
>>>>> I guess the root cause in the data types mismatch between table schema
>>>>> and actual data at the store or type of the query parameter.
>>>>> To explore the gap, it would be very handy if you could provide a
>>>>> small reproducer (standalone project or PR somewhere).
>>>>>
>>>>> > In my case I'm not even using UUID fields. Also the same code 2 diff
>>>>> environment dev vs prod doesn't cause the issue. I'm lucky enough that 
>>>>> it's
>>>>> on dev and prod is ok.
>>>>> >
>>>>> > But that last part might be misleading because in prod I think it
>>>>> happened early on during upgrade and all I did was recreate the sql table.
>>>>> >
>>>>> > So before I do the same on dev... I want to see what the issue is.
>>>>> >
>>>>> > On Tue., Aug. 2, 2022, 6:06 p.m. ,  wrote:
>>>>> >
>>>>> >> I‘m only speculating but this looks very similar to the issue I had
>>>>> last week and reported to the group here.
>>>>> >>
>>>>> >> Caused by: org.h2.message.DbException: Hexadecimal string with odd
>>>>> number of characters: "5" [90003-197]
>>>>>
>>>>>
>>>>> --
>>>>> With best regards,
>>>>> Taras Ledkov
>>>>>
>>>>


Re: Cache Exception for specific parameter values

2022-08-04 Thread John Smith
The only other thing I can think of is that I did an upgrade to 2.12.0

On Thu, Aug 4, 2022 at 12:52 PM John Smith  wrote:

> Personally I think there is some sort of index corruption. I dropped the
> table and recreated it and my problem went away.
> I never restarted the application and it started to work again.
>
> Maybe you can make a copy of your table, insert the data fresh and point
> your app to the copied table. And see if that works. Before doing something
> as drastic as I did.
>
> On Fri, Jul 29, 2022 at 9:14 AM  wrote:
>
>> Hi Taras,
>>
>> attached the extract from the local node log (which is a Ignite client)
>> and the remote node log (which is a Ignite server with persistance
>> enabled).
>>
>> What's really strange is that previous value "IDX_STAGE_308" seems to be
>> the root cause? This is a value in column UCID but it's not related to
>> the current search, except that of course the H2 DB must search through
>> this column. Maybe there is invalid characters in the index from data
>> rows from a few days ago? Maybe the index is broken? Why would it only
>> show when this specific query is executed?
>>
>> Please let me know your thoughts.
>>
>> Thanks,
>> Thomas.
>>
>>
>> Am 29.07.2022 um 12:48 schrieb Taras Ledkov:
>> > Hi,
>> >
>> > Could you provide the original exception from the map node?
>> > It must be available at the log files of the map node.
>>
>


Re: Cache Exception for specific parameter values

2022-08-04 Thread John Smith
Personally I think there is some sort of index corruption. I dropped the
table and recreated it and my problem went away.
I never restarted the application and it started to work again.

Maybe you can make a copy of your table, insert the data fresh and point
your app to the copied table. And see if that works. Before doing something
as drastic as I did.

On Fri, Jul 29, 2022 at 9:14 AM  wrote:

> Hi Taras,
>
> attached the extract from the local node log (which is a Ignite client)
> and the remote node log (which is a Ignite server with persistance
> enabled).
>
> What's really strange is that previous value "IDX_STAGE_308" seems to be
> the root cause? This is a value in column UCID but it's not related to
> the current search, except that of course the H2 DB must search through
> this column. Maybe there is invalid characters in the index from data
> rows from a few days ago? Maybe the index is broken? Why would it only
> show when this specific query is executed?
>
> Please let me know your thoughts.
>
> Thanks,
> Thomas.
>
>
> Am 29.07.2022 um 12:48 schrieb Taras Ledkov:
> > Hi,
> >
> > Could you provide the original exception from the map node?
> > It must be available at the log files of the map node.
>


Re: What does javax.cache.CacheException: Failed to execute map query on remote node mean?

2022-08-04 Thread John Smith
Let me know if that makes any sense, because the test data is the same and
the application code is the same. Only dropped and created the table again
using DbEaver.

On Wed, Aug 3, 2022 at 11:39 AM John Smith  wrote:

> Hi, so I dropped the table and simply recreated it. Did NOT restart the
> application.
>
> Now it works fine.
>
> On Wed, Aug 3, 2022 at 9:58 AM John Smith  wrote:
>
>> How? The code is 100% the same between production and dev. And it's part
>> of a bigger application.
>>
>> Only dev has the issue. I will drop and recreate the table if that fixes
>> the issue then what?
>>
>> You are saying mismatch, it's a string period.
>>
>> "select car_id from car_code where provider_id = ? and car_code = ? order by 
>> car_id asc limit 1;"
>>
>>
>> The first parameter is Integer and the second one is String. there's no
>> way this can mismatch... And even if the String was a UUID it's still a
>> string.
>>
>> public JsonArray query(final String sql, final long timeoutMs, final
>> Object... args) {
>> SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args);
>> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS);
>>
>> try (QueryCursor> cursor = cache.query(query)) {
>> List rows = new ArrayList<>();
>> Iterator> iterator = cursor.iterator();
>>
>> while(iterator.hasNext()) {
>> List currentRow = iterator.next();
>> JsonArray row = new JsonArray();
>>
>> currentRow.forEach(o -> row.add(o));
>>
>> rows.add(row);
>> }
>>
>> promise.tryComplete(rows);
>> } catch(Exception ex) {
>> ex.printStackTrace();
>> }
>> }
>>
>> Integer providerId = 1;
>> String carCode = "FOO";
>>
>> query("select car_id from car_code where provider_id = ? and car_code
>> = ? order by car_id asc limit 1;", 3000, providerId, cardCode);
>>
>>
>>
>> On Wed, Aug 3, 2022 at 6:50 AM Taras Ledkov  wrote:
>>
>>> Hi John and Don,
>>>
>>> I guess the root cause in the data types mismatch between table schema
>>> and actual data at the store or type of the query parameter.
>>> To explore the gap, it would be very handy if you could provide a small
>>> reproducer (standalone project or PR somewhere).
>>>
>>> > In my case I'm not even using UUID fields. Also the same code 2 diff
>>> environment dev vs prod doesn't cause the issue. I'm lucky enough that it's
>>> on dev and prod is ok.
>>> >
>>> > But that last part might be misleading because in prod I think it
>>> happened early on during upgrade and all I did was recreate the sql table.
>>> >
>>> > So before I do the same on dev... I want to see what the issue is.
>>> >
>>> > On Tue., Aug. 2, 2022, 6:06 p.m. ,  wrote:
>>> >
>>> >> I‘m only speculating but this looks very similar to the issue I had
>>> last week and reported to the group here.
>>> >>
>>> >> Caused by: org.h2.message.DbException: Hexadecimal string with odd
>>> number of characters: "5" [90003-197]
>>>
>>>
>>> --
>>> With best regards,
>>> Taras Ledkov
>>>
>>


Re: What does javax.cache.CacheException: Failed to execute map query on remote node mean?

2022-08-03 Thread John Smith
Hi, so I dropped the table and simply recreated it. Did NOT restart the
application.

Now it works fine.

On Wed, Aug 3, 2022 at 9:58 AM John Smith  wrote:

> How? The code is 100% the same between production and dev. And it's part
> of a bigger application.
>
> Only dev has the issue. I will drop and recreate the table if that fixes
> the issue then what?
>
> You are saying mismatch, it's a string period.
>
> "select car_id from car_code where provider_id = ? and car_code = ? order by 
> car_id asc limit 1;"
>
>
> The first parameter is Integer and the second one is String. there's no
> way this can mismatch... And even if the String was a UUID it's still a
> string.
>
> public JsonArray query(final String sql, final long timeoutMs, final
> Object... args) {
> SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args);
> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS);
>
> try (QueryCursor> cursor = cache.query(query)) {
> List rows = new ArrayList<>();
> Iterator> iterator = cursor.iterator();
>
> while(iterator.hasNext()) {
> List currentRow = iterator.next();
> JsonArray row = new JsonArray();
>
> currentRow.forEach(o -> row.add(o));
>
> rows.add(row);
> }
>
> promise.tryComplete(rows);
> } catch(Exception ex) {
> ex.printStackTrace();
> }
> }
>
> Integer providerId = 1;
> String carCode = "FOO";
>
> query("select car_id from car_code where provider_id = ? and car_code
> = ? order by car_id asc limit 1;", 3000, providerId, cardCode);
>
>
>
> On Wed, Aug 3, 2022 at 6:50 AM Taras Ledkov  wrote:
>
>> Hi John and Don,
>>
>> I guess the root cause in the data types mismatch between table schema
>> and actual data at the store or type of the query parameter.
>> To explore the gap, it would be very handy if you could provide a small
>> reproducer (standalone project or PR somewhere).
>>
>> > In my case I'm not even using UUID fields. Also the same code 2 diff
>> environment dev vs prod doesn't cause the issue. I'm lucky enough that it's
>> on dev and prod is ok.
>> >
>> > But that last part might be misleading because in prod I think it
>> happened early on during upgrade and all I did was recreate the sql table.
>> >
>> > So before I do the same on dev... I want to see what the issue is.
>> >
>> > On Tue., Aug. 2, 2022, 6:06 p.m. ,  wrote:
>> >
>> >> I‘m only speculating but this looks very similar to the issue I had
>> last week and reported to the group here.
>> >>
>> >> Caused by: org.h2.message.DbException: Hexadecimal string with odd
>> number of characters: "5" [90003-197]
>>
>>
>> --
>> With best regards,
>> Taras Ledkov
>>
>


Re: What does javax.cache.CacheException: Failed to execute map query on remote node mean?

2022-08-03 Thread John Smith
How? The code is 100% the same between production and dev. And it's part of
a bigger application.

Only dev has the issue. I will drop and recreate the table if that fixes
the issue then what?

You are saying mismatch, it's a string period.

"select car_id from car_code where provider_id = ? and car_code = ?
order by car_id asc limit 1;"


The first parameter is Integer and the second one is String. there's no way
this can mismatch... And even if the String was a UUID it's still a string.

public JsonArray query(final String sql, final long timeoutMs, final
Object... args) {
SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args);
query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS);

try (QueryCursor> cursor = cache.query(query)) {
List rows = new ArrayList<>();
Iterator> iterator = cursor.iterator();

while(iterator.hasNext()) {
List currentRow = iterator.next();
JsonArray row = new JsonArray();

currentRow.forEach(o -> row.add(o));

rows.add(row);
}

promise.tryComplete(rows);
} catch(Exception ex) {
ex.printStackTrace();
}
}

Integer providerId = 1;
String carCode = "FOO";

query("select car_id from car_code where provider_id = ? and car_code =
? order by car_id asc limit 1;", 3000, providerId, cardCode);



On Wed, Aug 3, 2022 at 6:50 AM Taras Ledkov  wrote:

> Hi John and Don,
>
> I guess the root cause in the data types mismatch between table schema and
> actual data at the store or type of the query parameter.
> To explore the gap, it would be very handy if you could provide a small
> reproducer (standalone project or PR somewhere).
>
> > In my case I'm not even using UUID fields. Also the same code 2 diff
> environment dev vs prod doesn't cause the issue. I'm lucky enough that it's
> on dev and prod is ok.
> >
> > But that last part might be misleading because in prod I think it
> happened early on during upgrade and all I did was recreate the sql table.
> >
> > So before I do the same on dev... I want to see what the issue is.
> >
> > On Tue., Aug. 2, 2022, 6:06 p.m. ,  wrote:
> >
> >> I‘m only speculating but this looks very similar to the issue I had
> last week and reported to the group here.
> >>
> >> Caused by: org.h2.message.DbException: Hexadecimal string with odd
> number of characters: "5" [90003-197]
>
>
> --
> With best regards,
> Taras Ledkov
>


Re: Re: What does javax.cache.CacheException: Failed to execute map query on remote node mean?

2022-08-03 Thread John Smith
In my case I'm not even using UUID fields. Also the same code 2 diff
environment dev vs prod doesn't cause the issue. I'm lucky enough that it's
on dev and prod is ok.

But that last part might be misleading because in prod I think it happened
early on during upgrade and all I did was recreate the sql table.

So before I do the same on dev... I want to see what the issue is.

On Tue., Aug. 2, 2022, 6:06 p.m. ,  wrote:

> I‘m only speculating but this looks very similar to the issue I had last
> week and reported to the group here.
>
> Caused by: org.h2.message.DbException: Hexadecimal string with odd number
> of characters: "5" [90003-197]
>
> Why does H2 think it’s hex String format? For me it turned out H2 was
> wrongly thinking my column data is UUID format, even though it was
> configured as regular String class. Most data in a column was text like
> ‚abc‘ or similar text. But there were also some values that were actually
> UUIDs but still converted to String for this column.
>
> So when I then searched for using SqlFieldsQuery with arguments for a
> value that was a UUID as String I got a similar exception.
>
> I am still trying to create a smaller repro case, that’s why I haven’t
> described my solution in more detail in my other thread yet.
>
>
>
> On 02.08.22 at 23:04, John Smith wrote:
>
> From: "John Smith" 
> Date: 2. August 2022
> To: user@ignite.apache.org
> Cc:
> Subject: Re: What does javax.cache.CacheException: Failed to execute map
> query on remote node mean?
> Here it is...
>
> [20:58:03,050][SEVERE][query-#395344%xx%][GridMapQueryExecutor] Failed
> to execute local query.
> class org.apache.ignite.internal.processors.query.IgniteSQLException:
> General error: "class org.apache.ignite.IgniteCheckedException: Runtime
> failure on lookup row: IndexSearchRowImpl
> [rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@16bc23dd]";
> SQL statement:
> SELECT
> __Z0.CAR_ID __C0_0
> FROM PUBLIC.CAR_CODE __Z0
> WHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.CAR_CODE = ?2)
> ORDER BY 1 LIMIT 1 [5-197]
> at
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQuery(IgniteH2Indexing.java:875)
> at
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQueryWithTimer(IgniteH2Indexing.java:962)
> at
> org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:454)
> at
> org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:274)
> at
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2187)
> at
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.lambda$start$22(IgniteH2Indexing.java:2132)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:3480)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)
> at
> org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)
> at
> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.h2.jdbc.JdbcSQLException: General error: "class
> org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:
> IndexSearchRowImpl
> [rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@16bc23dd]";
> SQL statement:
> SELECT
> __Z0.CAR_ID __C0_0
> FROM PUBLIC.CAR_CODE __Z0
> WHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.CAR_CODE = ?2)
> ORDER BY 1 LIMIT 1 [5-197]
> at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
> at org.h2.message.DbException.get(DbException.java:168)
> at org.h2.message.DbException.convert(DbException.java:307)
> at
> org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.find(H2TreeIndex.java:214)
> at org.h2.index.BaseIndex.find(BaseIndex.java:130)
> at org.h2.index.IndexCursor.find(IndexCursor.java:176)
> at org.h2.table.TableFilter.next(TableFilter.java:471)
> at
> org.h2.command.dml.Select$LazyResultQueryFlat.fetchNextRow(Select.java:1452)
> at org.h2.result.LazyResult.hasNext(La

Re: What does javax.cache.CacheException: Failed to execute map query on remote node mean?

2022-08-02 Thread John Smith
ee.BPlusTree.compare(BPlusTree.java:5430)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findInsertionPoint(BPlusTree.java:5350)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.access$1100(BPlusTree.java:100)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run0(BPlusTree.java:307)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:5944)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Search.run(BPlusTree.java:287)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$GetPageHandler.run(BPlusTree.java:5930)
at
org.apache.ignite.internal.processors.cache.persistence.tree.util.PageHandler.readPage(PageHandler.java:174)
at
org.apache.ignite.internal.processors.cache.persistence.DataStructure.read(DataStructure.java:415)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.read(BPlusTree.java:6131)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findDown(BPlusTree.java:1449)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doFind(BPlusTree.java:1416)
at
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.findOne(BPlusTree.java:1379)
... 31 more
Caused by: org.h2.message.DbException: Hexadecimal string with odd number
of characters: "5" [90003-197]
at org.h2.message.DbException.get(DbException.java:179)
at org.h2.message.DbException.get(DbException.java:155)
at org.h2.util.StringUtils.convertHexToBytes(StringUtils.java:913)
at org.h2.value.Value.convertTo(Value.java:1078)
at org.h2.value.Value.convertTo(Value.java:617)
at org.h2.value.Value.convertTo(Value.java:592)
at org.h2.table.Table.compareTypeSafe(Table.java:1187)
at
org.apache.ignite.internal.processors.query.h2.index.H2RowComparator.compareValues(H2RowComparator.java:149)
... 48 more
Caused by: org.h2.jdbc.JdbcSQLException: Hexadecimal string with odd number
of characters: "5" [90003-197]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
... 56 more

On Tue, Aug 2, 2022 at 10:33 AM Николай Ижиков  wrote:

> Hello, John.
>
> Provided stack trace not enough to answer your question.
> Can you, please, provide log from the remote node?
>
> 2 авг. 2022 г., в 17:14, John Smith  написал(а):
>
> Anyone?
>
> On Fri, Jul 29, 2022 at 8:44 AM John Smith  wrote:
>
>> Any thoughts on this?
>>
>> On Mon., Jul. 25, 2022, 11:29 a.m. John Smith, 
>> wrote:
>>
>>> Hi I have the following code and I get the below exception. The cache
>>> runs on 3 remote nodes and it is accessed by thick client (client = true)
>>>
>>> String sql = "select car_id from car_code where provider_id = ? and
>>> car_code = ? order by car_id asc limit 1;"
>>> Integer providerId = 1;
>>> String cardCode = "HONDA";
>>>
>>> JssonArray array = query(sql, 3000, providerId, carCode);
>>>
>>> JsonArray query(final String sql, final long timeoutMs, final Object...
>>> args) {
>>> SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args);
>>> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS);
>>>
>>> try (QueryCursor> cursor = cache.query(query)) {
>>> List rows = new ArrayList<>();
>>> Iterator> iterator = cursor.iterator();
>>>
>>> while(iterator.hasNext()) {
>>> List currentRow = iterator.next();
>>> JsonArray row = new JsonArray();
>>>
>>> currentRow.forEach(o -> row.add(o));
>>>
>>> rows.add(row);
>>> }
>>>
>>> return rows;
>>> } catch(Exception ex) {
>>> ex.printStackTrace();
>>> }
>>> }
>>>
>>> Running this in Datagrip with JDBC client works fine;
>>>
>>> select
>>> car_id
>>> from car_code
>>> where provider_id = 5 and car_code = 'HONDA'
>>> order by car_id asc limit 1;
>>>
>>> Works
>>>
>>>
>>> javax.cache.CacheException: Failed to execute map query on remote node
>>> [nodeId=xx, errMsg=General error: \"class
>>> org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:
>>> IndexSearchRowImpl
>>> [rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@16bc23dd]\";
>>> SQL statement:
>>> \nSELECT
>>> \n__Z0.CAR_ID __C0_0
>>> \nFROM PUBL

Re: What does javax.cache.CacheException: Failed to execute map query on remote node mean?

2022-08-02 Thread John Smith
Anyone?

On Fri, Jul 29, 2022 at 8:44 AM John Smith  wrote:

> Any thoughts on this?
>
> On Mon., Jul. 25, 2022, 11:29 a.m. John Smith, 
> wrote:
>
>> Hi I have the following code and I get the below exception. The cache
>> runs on 3 remote nodes and it is accessed by thick client (client = true)
>>
>> String sql = "select car_id from car_code where provider_id = ? and
>> car_code = ? order by car_id asc limit 1;"
>> Integer providerId = 1;
>> String cardCode = "HONDA";
>>
>> JssonArray array = query(sql, 3000, providerId, carCode);
>>
>> JsonArray query(final String sql, final long timeoutMs, final Object...
>> args) {
>> SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args);
>> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS);
>>
>> try (QueryCursor> cursor = cache.query(query)) {
>> List rows = new ArrayList<>();
>> Iterator> iterator = cursor.iterator();
>>
>> while(iterator.hasNext()) {
>> List currentRow = iterator.next();
>> JsonArray row = new JsonArray();
>>
>> currentRow.forEach(o -> row.add(o));
>>
>> rows.add(row);
>> }
>>
>> return rows;
>> } catch(Exception ex) {
>> ex.printStackTrace();
>> }
>> }
>>
>> Running this in Datagrip with JDBC client works fine;
>>
>> select
>> car_id
>> from car_code
>> where provider_id = 5 and car_code = 'HONDA'
>> order by car_id asc limit 1;
>>
>> Works
>>
>>
>> javax.cache.CacheException: Failed to execute map query on remote node
>> [nodeId=xx, errMsg=General error: \"class
>> org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:
>> IndexSearchRowImpl
>> [rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@16bc23dd]\";
>> SQL statement:
>> \nSELECT
>> \n__Z0.CAR_ID __C0_0
>> \nFROM PUBLIC.CAR_CODE __Z0
>> \nWHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.CAR_CODE = ?2)
>> \nORDER BY 1 LIMIT 1 [5-197]]
>> \n\tat
>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.fail(GridReduceQueryExecutor.java:235)
>> \n\tat
>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.onFail(GridReduceQueryExecutor.java:214)
>> \n\tat
>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2193)
>> \n\tat
>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.lambda$start$22(IgniteH2Indexing.java:2132)
>> \n\tat
>> org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:3480)
>> \n\tat
>> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)
>> \n\tat
>> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)
>> \n\tat
>> org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)
>> \n\tat
>> org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)
>> \n\tat
>> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
>> \n\tat
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> \n\tat
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> \n\tat java.lang.Thread.run(Thread.java:748)
>> \n
>>
>


Re: What does javax.cache.CacheException: Failed to execute map query on remote node mean?

2022-07-29 Thread John Smith
Any thoughts on this?

On Mon., Jul. 25, 2022, 11:29 a.m. John Smith, 
wrote:

> Hi I have the following code and I get the below exception. The cache runs
> on 3 remote nodes and it is accessed by thick client (client = true)
>
> String sql = "select car_id from car_code where provider_id = ? and
> car_code = ? order by car_id asc limit 1;"
> Integer providerId = 1;
> String cardCode = "HONDA";
>
> JssonArray array = query(sql, 3000, providerId, carCode);
>
> JsonArray query(final String sql, final long timeoutMs, final Object...
> args) {
> SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args);
> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS);
>
> try (QueryCursor> cursor = cache.query(query)) {
> List rows = new ArrayList<>();
> Iterator> iterator = cursor.iterator();
>
> while(iterator.hasNext()) {
> List currentRow = iterator.next();
> JsonArray row = new JsonArray();
>
> currentRow.forEach(o -> row.add(o));
>
> rows.add(row);
> }
>
> return rows;
> } catch(Exception ex) {
> ex.printStackTrace();
> }
> }
>
> Running this in Datagrip with JDBC client works fine;
>
> select
> car_id
> from car_code
> where provider_id = 5 and car_code = 'HONDA'
> order by car_id asc limit 1;
>
> Works
>
>
> javax.cache.CacheException: Failed to execute map query on remote node
> [nodeId=xx, errMsg=General error: \"class
> org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:
> IndexSearchRowImpl
> [rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@16bc23dd]\";
> SQL statement:
> \nSELECT
> \n__Z0.CAR_ID __C0_0
> \nFROM PUBLIC.CAR_CODE __Z0
> \nWHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.CAR_CODE = ?2)
> \nORDER BY 1 LIMIT 1 [5-197]]
> \n\tat
> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.fail(GridReduceQueryExecutor.java:235)
> \n\tat
> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.onFail(GridReduceQueryExecutor.java:214)
> \n\tat
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2193)
> \n\tat
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.lambda$start$22(IgniteH2Indexing.java:2132)
> \n\tat
> org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:3480)
> \n\tat
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)
> \n\tat
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)
> \n\tat
> org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)
> \n\tat
> org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)
> \n\tat
> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
> \n\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> \n\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> \n\tat java.lang.Thread.run(Thread.java:748)
> \n
>


What does javax.cache.CacheException: Failed to execute map query on remote node mean?

2022-07-25 Thread John Smith
Hi I have the following code and I get the below exception. The cache runs
on 3 remote nodes and it is accessed by thick client (client = true)

String sql = "select car_id from car_code where provider_id = ? and
car_code = ? order by car_id asc limit 1;"
Integer providerId = 1;
String cardCode = "HONDA";

JssonArray array = query(sql, 3000, providerId, carCode);

JsonArray query(final String sql, final long timeoutMs, final Object...
args) {
SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args);
query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS);

try (QueryCursor> cursor = cache.query(query)) {
List rows = new ArrayList<>();
Iterator> iterator = cursor.iterator();

while(iterator.hasNext()) {
List currentRow = iterator.next();
JsonArray row = new JsonArray();

currentRow.forEach(o -> row.add(o));

rows.add(row);
}

return rows;
} catch(Exception ex) {
ex.printStackTrace();
}
}

Running this in Datagrip with JDBC client works fine;

select
car_id
from car_code
where provider_id = 5 and car_code = 'HONDA'
order by car_id asc limit 1;

Works


javax.cache.CacheException: Failed to execute map query on remote node
[nodeId=xx, errMsg=General error: \"class
org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:
IndexSearchRowImpl
[rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@16bc23dd]\";
SQL statement:
\nSELECT
\n__Z0.CAR_ID __C0_0
\nFROM PUBLIC.CAR_CODE __Z0
\nWHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.CAR_CODE = ?2)
\nORDER BY 1 LIMIT 1 [5-197]]
\n\tat
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.fail(GridReduceQueryExecutor.java:235)
\n\tat
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.onFail(GridReduceQueryExecutor.java:214)
\n\tat
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2193)
\n\tat
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.lambda$start$22(IgniteH2Indexing.java:2132)
\n\tat
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:3480)
\n\tat
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)
\n\tat
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)
\n\tat
org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)
\n\tat
org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)
\n\tat
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
\n\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
\n\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
\n\tat java.lang.Thread.run(Thread.java:748)
\n


Re: What is org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:

2022-02-28 Thread John Smith
Hi it's a lookup table. There roughly 600 entries, they were maualy cut and
pasted in the sql command line tool as one big insert.

insert into car_code
(
provider_id
,car_id
,car_code
)
values
(5,1,''),
(5,10001,'HONDA'),
(5,10002,'TOYOTA'),
(5,10003,'LEXUS'),
(5,10004,'FORD')
;

On Fri, Feb 25, 2022 at 5:45 PM Maksim Timonin 
wrote:

> Hi!
>
> I wrote a test with your DDL and query, and it works for me. I need some
> more time to dig into it..
>
> Could you also, please, provide a way (with an example) how you insert
> data to the table?
>
> Thanks,
> Maksim
>
> On Fri, Feb 25, 2022 at 9:32 PM John Smith  wrote:
>
>> Hi Maksim did you look into this?
>>
>> On Tue., Feb. 22, 2022, 9:51 a.m. John Smith, 
>> wrote:
>>
>>> Hi. This is it.
>>>
>>> create table if not exists car_code (
>>> provider_id int,
>>> car_id int,
>>> car_code varchar(16),
>>> primary key (provider_id, car_id)
>>> ) with "template=replicatedTpl, key_type=CarCodeKey, value_type=CarCode";
>>>
>>> select
>>> car_id
>>> from car_code
>>> where
>>> provider_id = ? and
>>> car_code = ?
>>> order by car_id asc limit 1;
>>>
>>> On Mon, Feb 21, 2022 at 3:24 AM Maksim Timonin 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Yes, it looks strange. Could you please share your DDLs and queries
>>>> that led to the exception? We can add a compatibility test and it will help
>>>> us to investigate the issue.
>>>>
>>>> Maksim
>>>>
>>>> On Tue, Feb 15, 2022 at 3:28 PM John Smith 
>>>> wrote:
>>>>
>>>>> It's weird. I dropped the table and recreated it without restarting
>>>>> the client applications and it started worked.
>>>>>
>>>>> This hapenned after upgrading from 2.8.1 to 2.12.0
>>>>>
>>>>> What's even funnier. I did the upgrade on my dev cluster first and let
>>>>> everything run for a couple weeks just to be sure.
>>>>>
>>>>> On Tue., Feb. 15, 2022, 3:13 a.m. Maksim Timonin, <
>>>>> timoninma...@apache.org> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> It looks like you have a column with non-String type, but try to
>>>>>> query it with a String argument.
>>>>>>
>>>>>> Could you please share your DDL for table and query parameters for
>>>>>> this query?
>>>>>>
>>>>>> Thanks,
>>>>>> Maksim
>>>>>>
>>>>>> On Tue, Feb 15, 2022 at 8:54 AM John Smith 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi, on the client side I'm getting the below Exception and on the
>>>>>>> server side it is pasted below.
>>>>>>>
>>>>>>>
>>>>>>> javax.cache.CacheException: Failed to execute map query on remote
>>>>>>> node [nodeId=6e350b53-7224-4b11-b81b-00f44c699b87, errMsg=General error:
>>>>>>> \"class org.apache.ignite.IgniteCheckedException: Runtime failure on 
>>>>>>> lookup
>>>>>>> row: IndexSearchRowImpl
>>>>>>> [rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@d3e431c]\";
>>>>>>> SQL statement:\nSELECT\n__Z0.XX_ID __C0_0\nFROM PUBLIC.XX_CODE
>>>>>>> __Z0\nWHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.XX_CODE = ?2)\nORDER 
>>>>>>> BY 1
>>>>>>> LIMIT 1 [5-197]]\n\tat
>>>>>>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.fail(GridReduceQueryExecutor.java:235)\n\tat
>>>>>>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.onFail(GridReduceQueryExecutor.java:214)\n\tat
>>>>>>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2193)\n\tat
>>>>>>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.lambda$start$22(IgniteH2Indexing.java:2132)\n\tat
>>>>>>> org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:3480)\n\tat
>>>>>>> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)\n\tat
&g

Re: What is org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:

2022-02-25 Thread John Smith
Hi Maksim did you look into this?

On Tue., Feb. 22, 2022, 9:51 a.m. John Smith, 
wrote:

> Hi. This is it.
>
> create table if not exists car_code (
> provider_id int,
> car_id int,
> car_code varchar(16),
> primary key (provider_id, car_id)
> ) with "template=replicatedTpl, key_type=CarCodeKey, value_type=CarCode";
>
> select
> car_id
> from car_code
> where
> provider_id = ? and
> car_code = ?
> order by car_id asc limit 1;
>
> On Mon, Feb 21, 2022 at 3:24 AM Maksim Timonin 
> wrote:
>
>> Hi,
>>
>> Yes, it looks strange. Could you please share your DDLs and queries that
>> led to the exception? We can add a compatibility test and it will help us
>> to investigate the issue.
>>
>> Maksim
>>
>> On Tue, Feb 15, 2022 at 3:28 PM John Smith 
>> wrote:
>>
>>> It's weird. I dropped the table and recreated it without restarting the
>>> client applications and it started worked.
>>>
>>> This hapenned after upgrading from 2.8.1 to 2.12.0
>>>
>>> What's even funnier. I did the upgrade on my dev cluster first and let
>>> everything run for a couple weeks just to be sure.
>>>
>>> On Tue., Feb. 15, 2022, 3:13 a.m. Maksim Timonin, <
>>> timoninma...@apache.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> It looks like you have a column with non-String type, but try to query
>>>> it with a String argument.
>>>>
>>>> Could you please share your DDL for table and query parameters for this
>>>> query?
>>>>
>>>> Thanks,
>>>> Maksim
>>>>
>>>> On Tue, Feb 15, 2022 at 8:54 AM John Smith 
>>>> wrote:
>>>>
>>>>> Hi, on the client side I'm getting the below Exception and on the
>>>>> server side it is pasted below.
>>>>>
>>>>>
>>>>> javax.cache.CacheException: Failed to execute map query on remote node
>>>>> [nodeId=6e350b53-7224-4b11-b81b-00f44c699b87, errMsg=General error: 
>>>>> \"class
>>>>> org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:
>>>>> IndexSearchRowImpl
>>>>> [rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@d3e431c]\";
>>>>> SQL statement:\nSELECT\n__Z0.XX_ID __C0_0\nFROM PUBLIC.XX_CODE
>>>>> __Z0\nWHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.XX_CODE = ?2)\nORDER BY 
>>>>> 1
>>>>> LIMIT 1 [5-197]]\n\tat
>>>>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.fail(GridReduceQueryExecutor.java:235)\n\tat
>>>>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.onFail(GridReduceQueryExecutor.java:214)\n\tat
>>>>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2193)\n\tat
>>>>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.lambda$start$22(IgniteH2Indexing.java:2132)\n\tat
>>>>> org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:3480)\n\tat
>>>>> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)\n\tat
>>>>> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)\n\tat
>>>>> org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)\n\tat
>>>>> org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)\n\tat
>>>>> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)\n\tat
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
>>>>> java.lang.Thread.run(Thread.java:748)\n
>>>>>
>>>>> [05:42:24,158][SEVERE][query-#1422%xx%][GridMapQueryExecutor]
>>>>> Failed to execute local query.
>>>>> class org.apache.ignite.internal.processors.query.IgniteSQLException:
>>>>> General error: "class org.apache.ignite.IgniteCheckedException: Runtime
>>>>> failure on lookup row: IndexSearchRowImpl
>>>>> [rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@75eb2111]"

Re: What is org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:

2022-02-22 Thread John Smith
Hi. This is it.

create table if not exists car_code (
provider_id int,
car_id int,
car_code varchar(16),
primary key (provider_id, car_id)
) with "template=replicatedTpl, key_type=CarCodeKey, value_type=CarCode";

select
car_id
from car_code
where
provider_id = ? and
car_code = ?
order by car_id asc limit 1;

On Mon, Feb 21, 2022 at 3:24 AM Maksim Timonin 
wrote:

> Hi,
>
> Yes, it looks strange. Could you please share your DDLs and queries that
> led to the exception? We can add a compatibility test and it will help us
> to investigate the issue.
>
> Maksim
>
> On Tue, Feb 15, 2022 at 3:28 PM John Smith  wrote:
>
>> It's weird. I dropped the table and recreated it without restarting the
>> client applications and it started worked.
>>
>> This hapenned after upgrading from 2.8.1 to 2.12.0
>>
>> What's even funnier. I did the upgrade on my dev cluster first and let
>> everything run for a couple weeks just to be sure.
>>
>> On Tue., Feb. 15, 2022, 3:13 a.m. Maksim Timonin, <
>> timoninma...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> It looks like you have a column with non-String type, but try to query
>>> it with a String argument.
>>>
>>> Could you please share your DDL for table and query parameters for this
>>> query?
>>>
>>> Thanks,
>>> Maksim
>>>
>>> On Tue, Feb 15, 2022 at 8:54 AM John Smith 
>>> wrote:
>>>
>>>> Hi, on the client side I'm getting the below Exception and on the
>>>> server side it is pasted below.
>>>>
>>>>
>>>> javax.cache.CacheException: Failed to execute map query on remote node
>>>> [nodeId=6e350b53-7224-4b11-b81b-00f44c699b87, errMsg=General error: \"class
>>>> org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:
>>>> IndexSearchRowImpl
>>>> [rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@d3e431c]\";
>>>> SQL statement:\nSELECT\n__Z0.XX_ID __C0_0\nFROM PUBLIC.XX_CODE
>>>> __Z0\nWHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.XX_CODE = ?2)\nORDER BY 1
>>>> LIMIT 1 [5-197]]\n\tat
>>>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.fail(GridReduceQueryExecutor.java:235)\n\tat
>>>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.onFail(GridReduceQueryExecutor.java:214)\n\tat
>>>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2193)\n\tat
>>>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.lambda$start$22(IgniteH2Indexing.java:2132)\n\tat
>>>> org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:3480)\n\tat
>>>> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)\n\tat
>>>> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)\n\tat
>>>> org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)\n\tat
>>>> org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)\n\tat
>>>> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)\n\tat
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
>>>> java.lang.Thread.run(Thread.java:748)\n
>>>>
>>>> [05:42:24,158][SEVERE][query-#1422%xx%][GridMapQueryExecutor]
>>>> Failed to execute local query.
>>>> class org.apache.ignite.internal.processors.query.IgniteSQLException:
>>>> General error: "class org.apache.ignite.IgniteCheckedException: Runtime
>>>> failure on lookup row: IndexSearchRowImpl
>>>> [rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@75eb2111]";
>>>> SQL statement:
>>>> SELECT
>>>> __Z0.XX_ID __C0_0
>>>> FROM PUBLIC.XX_CODE __Z0
>>>> WHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.XX_CODE = ?2)
>>>> ORDER BY 1 LIMIT 1 [5-197]
>>>> at
>>>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQuery(IgniteH2Indexing.java:875)
>>>> at
&g

Re: What is org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:

2022-02-15 Thread John Smith
It's weird. I dropped the table and recreated it without restarting the
client applications and it started worked.

This hapenned after upgrading from 2.8.1 to 2.12.0

What's even funnier. I did the upgrade on my dev cluster first and let
everything run for a couple weeks just to be sure.

On Tue., Feb. 15, 2022, 3:13 a.m. Maksim Timonin, 
wrote:

> Hi,
>
> It looks like you have a column with non-String type, but try to query it
> with a String argument.
>
> Could you please share your DDL for table and query parameters for this
> query?
>
> Thanks,
> Maksim
>
> On Tue, Feb 15, 2022 at 8:54 AM John Smith  wrote:
>
>> Hi, on the client side I'm getting the below Exception and on the server
>> side it is pasted below.
>>
>>
>> javax.cache.CacheException: Failed to execute map query on remote node
>> [nodeId=6e350b53-7224-4b11-b81b-00f44c699b87, errMsg=General error: \"class
>> org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:
>> IndexSearchRowImpl
>> [rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@d3e431c]\";
>> SQL statement:\nSELECT\n__Z0.XX_ID __C0_0\nFROM PUBLIC.XX_CODE
>> __Z0\nWHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.XX_CODE = ?2)\nORDER BY 1
>> LIMIT 1 [5-197]]\n\tat
>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.fail(GridReduceQueryExecutor.java:235)\n\tat
>> org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.onFail(GridReduceQueryExecutor.java:214)\n\tat
>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2193)\n\tat
>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.lambda$start$22(IgniteH2Indexing.java:2132)\n\tat
>> org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:3480)\n\tat
>> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)\n\tat
>> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)\n\tat
>> org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)\n\tat
>> org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)\n\tat
>> org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)\n\tat
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
>> java.lang.Thread.run(Thread.java:748)\n
>>
>> [05:42:24,158][SEVERE][query-#1422%xx%][GridMapQueryExecutor] Failed
>> to execute local query.
>> class org.apache.ignite.internal.processors.query.IgniteSQLException:
>> General error: "class org.apache.ignite.IgniteCheckedException: Runtime
>> failure on lookup row: IndexSearchRowImpl
>> [rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@75eb2111]";
>> SQL statement:
>> SELECT
>> __Z0.XX_ID __C0_0
>> FROM PUBLIC.XX_CODE __Z0
>> WHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.XX_CODE = ?2)
>> ORDER BY 1 LIMIT 1 [5-197]
>> at
>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQuery(IgniteH2Indexing.java:875)
>> at
>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQueryWithTimer(IgniteH2Indexing.java:962)
>> at
>> org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:454)
>> at
>> org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:274)
>> at
>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2187)
>> at
>> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.lambda$start$22(IgniteH2Indexing.java:2132)
>> at
>> org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:3480)
>> at
>> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)
>> at
>> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)
>> at
>> org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)
>> at
>> org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)
>> at
>> org.apache.ignite.inte

What is org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:

2022-02-14 Thread John Smith
Hi, on the client side I'm getting the below Exception and on the server
side it is pasted below.


javax.cache.CacheException: Failed to execute map query on remote node
[nodeId=6e350b53-7224-4b11-b81b-00f44c699b87, errMsg=General error: \"class
org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:
IndexSearchRowImpl
[rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@d3e431c]\";
SQL statement:\nSELECT\n__Z0.XX_ID __C0_0\nFROM PUBLIC.XX_CODE
__Z0\nWHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.XX_CODE = ?2)\nORDER BY 1
LIMIT 1 [5-197]]\n\tat
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.fail(GridReduceQueryExecutor.java:235)\n\tat
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.onFail(GridReduceQueryExecutor.java:214)\n\tat
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2193)\n\tat
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.lambda$start$22(IgniteH2Indexing.java:2132)\n\tat
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:3480)\n\tat
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)\n\tat
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)\n\tat
org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)\n\tat
org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)\n\tat
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)\n\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
java.lang.Thread.run(Thread.java:748)\n

[05:42:24,158][SEVERE][query-#1422%xx%][GridMapQueryExecutor] Failed to
execute local query.
class org.apache.ignite.internal.processors.query.IgniteSQLException:
General error: "class org.apache.ignite.IgniteCheckedException: Runtime
failure on lookup row: IndexSearchRowImpl
[rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@75eb2111]";
SQL statement:
SELECT
__Z0.XX_ID __C0_0
FROM PUBLIC.XX_CODE __Z0
WHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.XX_CODE = ?2)
ORDER BY 1 LIMIT 1 [5-197]
at
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQuery(IgniteH2Indexing.java:875)
at
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.executeSqlQueryWithTimer(IgniteH2Indexing.java:962)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest0(GridMapQueryExecutor.java:454)
at
org.apache.ignite.internal.processors.query.h2.twostep.GridMapQueryExecutor.onQueryRequest(GridMapQueryExecutor.java:274)
at
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.onMessage(IgniteH2Indexing.java:2187)
at
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.lambda$start$22(IgniteH2Indexing.java:2132)
at
org.apache.ignite.internal.managers.communication.GridIoManager$ArrayListener.onMessage(GridIoManager.java:3480)
at
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907)
at
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528)
at
org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:242)
at
org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421)
at
org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.h2.jdbc.JdbcSQLException: General error: "class
org.apache.ignite.IgniteCheckedException: Runtime failure on lookup row:
IndexSearchRowImpl
[rowHnd=org.apache.ignite.internal.processors.query.h2.index.QueryIndexRowHandler@75eb2111]";
SQL statement:
SELECT
__Z0.XX_ID __C0_0
FROM PUBLIC.XX_CODE __Z0
WHERE (__Z0.PROVIDER_ID = ?1) AND (__Z0.XX_CODE = ?2)
ORDER BY 1 LIMIT 1 [5-197]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:357)
at org.h2.message.DbException.get(DbException.java:168)
at org.h2.message.DbException.convert(DbException.java:307)
at
org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.find(H2TreeIndex.java:214)
at org.h2.index.BaseIndex.find(BaseIndex.java:130)
at org.h2.index.IndexCursor.find(IndexCursor.java:176)
at org.h2.table.TableFilter.next(TableFilter.java:471)
at
org.h2.command.dml.Select$LazyResultQueryFlat.fetchNextRow(Select.java:1452)
at 

Re: Update compatibility guide.

2022-01-23 Thread John Smith
Appart the fact that we can't do rolling upgrades. The update worked.

On Fri., Jan. 21, 2022, 10:51 p.m. John Smith, 
wrote:

> Is there an update compatibility guide somewhere? Running 2.8 and would
> like to update to 2.12?
>


Update compatibility guide.

2022-01-21 Thread John Smith
Is there an update compatibility guide somewhere? Running 2.8 and would
like to update to 2.12?


Apache Ignite and the log4j Vulnerability.

2021-12-15 Thread John Smith
So far I haven't seen anyone ask about the issue here in the lists. So I'll
give it a go.

I'm personally using 2.8.1

1- If we are running as a service using .DEB or .RPM or other linux
packages: The default logging is JUL so nothing to worry about.
2- If we aren' t specifically enabling the ignite-log4j2 module by copying
it to the libs folder: Also nothing to worry about.
3- If we are not specifically enabling log4j2 in XML config or through JAVA
code: Also nothing to worry about.
4- If we are not pulling the ignite-log4j2 dependency with maven/gradle:
Also nothing to worry about.
5- On the client side (client = true). We pull ignite-slf4j + use
logback-classic + logback-core: Also nothing to worry about.

Strictly speaking from Ignite's side, if external dependencies pull log4j2
dependency so long we don't explicitly enable any Ignite log4j2 config we
are ok as well.


Re: Re[2]: What does "First 10 long running cache futures" ?

2021-11-02 Thread John Smith
That feels like what happened. But I just rebooted the servers. But I also
force all my clients = true.

On Mon, 1 Nov 2021 at 16:40, Mike Wiesenberg 
wrote:

> In my experience, seeing the ' First 10 long running cache futures'
> warning (which should probably be logged as Fatal) means the ignite node
> your client is connected to is busted and needs to be rebooted. In
> addition, since ignite clients aren't smart enough to disconnect from a
> node in a busted state and try another one, you need to reboot all clients
> which may be connected to those nodes. All in all, a very problematic
> situation, and the lack of automatic failover in particular has made me
> seriously question if Ignite is production-ready software.
>
> On Wed, Oct 6, 2021 at 8:52 AM John Smith  wrote:
>
>> Ok. For now I rebooted all nodes... But it's fairly easy to reproduce.
>>
>> On Wed., Oct. 6, 2021, 2:17 a.m. Zhenya Stanilovsky, 
>> wrote:
>>
>>>
>>> Ok, seems something goes wrong on node with
>>> id=36edbfd5-4feb-417e-b965-bdc34a0a6f4f If you still have a problem, can u
>>> send here or directly by me these logs ?
>>>
>>>
>>>
>>>
>>> And finally this on the coordinator node
>>>
>>> [14:07:41,282][WARNING][exchange-worker-#42%xx%][GridDhtPartitionsExchangeFuture]
>>> Unable to await partitions release latch within timeout. Some nodes have
>>> not sent acknowledgement for latch completion. It's possible due to
>>> unfinishined atomic updates, transactions or not released explicit locks on
>>> that nodes. Please check logs for errors on nodes with ids reported in
>>> latch `pendingAcks` collection [latch=ServerLatch [permits=1,
>>> pendingAcks=HashSet [36edbfd5-4feb-417e-b965-bdc34a0a6f4f],
>>> super=CompletableLatch [id=CompletableLatchUid [id=exchange,
>>> topVer=AffinityTopologyVersion [topVer=103, minorTopVer=0]
>>>
>>> On Tue, 5 Oct 2021 at 10:07, John Smith >> > wrote:
>>>
>>> And I see this...
>>>
>>> [14:04:15,150][WARNING][exchange-worker-#43%raange%][GridDhtPartitionsExchangeFuture]
>>> Unable to await partitions release latch within timeout. For more details
>>> please check coordinator node logs [crdNode=TcpDiscoveryNode
>>> [id=36ad785d-e344-43bb-b685-e79557572b54,
>>> consistentId=8172e45d-3ff8-4fe4-aeda-e7d30c1e11e2, addrs=ArrayList
>>> [127.0.0.1, xx.65], sockAddrs=HashSet [xx-0002/xx.65:47500, /
>>> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
>>> lastExchangeTime=1633370987399, loc=false,
>>> ver=2.8.1#20200521-sha1:86422096, isClient=false]] [latch=ClientLatch
>>> [coordinator=TcpDiscoveryNode [id=36ad785d-e344-43bb-b685-e79557572b54,
>>> consistentId=8172e45d-3ff8-4fe4-aeda-e7d30c1e11e2, addrs=ArrayList
>>> [127.0.0.1, xx.65], sockAddrs=HashSet [xx-0002/xx.65:47500, /
>>> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
>>> lastExchangeTime=1633370987399, loc=false,
>>> ver=2.8.1#20200521-sha1:86422096, isClient=false], ackSent=true,
>>> super=CompletableLatch [id=CompletableLatchUid [id=exchange,
>>> topVer=AffinityTopologyVersion [topVer=103, minorTopVer=0]
>>>
>>> On Tue, 5 Oct 2021 at 10:02, John Smith >> > wrote:
>>>
>>> Actually to be more clear...
>>>
>>> http://xx-0001:8080/ignite?cmd=version responds immediately.
>>>
>>> http://xx-0001:8080/ignite?cmd=size=my-cache doesn't
>>> respond at all.
>>>
>>> On Tue, 5 Oct 2021 at 09:59, John Smith >> > wrote:
>>>
>>> Yeah ever since I got this erro for example the REST APi wont return and
>>> the request are slower. But when I connect with visor I can get stats I can
>>> scan the cache etc...
>>>
>>> Is it possible that these async futures/threads are not released?
>>>
>>> On Tue, 5 Oct 2021 at 04:11, Zhenya Stanilovsky >> > wrote:
>>>
>>> Hi, this is just a warning shows that something suspicious observed.
>>> There is no simple reply for your question, in common case all these
>>> messages are due to cluster (resources or settings)  limitation.
>>> Check documentation for tuning performance [1]
>>>
>>> [1]
>>> https://ignite.apache.org/docs/latest/perf-and-troubleshooting/general-perf-tips
>>>
>>>
>>>
>>> Hi, using 2.8.1 I understand the message as in my async TRX is taking
>>> longer but is there a way to prevent it?
>>>
>>> When this happened I was pushing about 50, 000 get/puts per second from
>>> my API.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>


Re: Re[2]: What does "First 10 long running cache futures" ?

2021-10-06 Thread John Smith
Ok. For now I rebooted all nodes... But it's fairly easy to reproduce.

On Wed., Oct. 6, 2021, 2:17 a.m. Zhenya Stanilovsky, 
wrote:

>
> Ok, seems something goes wrong on node with
> id=36edbfd5-4feb-417e-b965-bdc34a0a6f4f If you still have a problem, can u
> send here or directly by me these logs ?
>
>
>
>
> And finally this on the coordinator node
>
> [14:07:41,282][WARNING][exchange-worker-#42%xx%][GridDhtPartitionsExchangeFuture]
> Unable to await partitions release latch within timeout. Some nodes have
> not sent acknowledgement for latch completion. It's possible due to
> unfinishined atomic updates, transactions or not released explicit locks on
> that nodes. Please check logs for errors on nodes with ids reported in
> latch `pendingAcks` collection [latch=ServerLatch [permits=1,
> pendingAcks=HashSet [36edbfd5-4feb-417e-b965-bdc34a0a6f4f],
> super=CompletableLatch [id=CompletableLatchUid [id=exchange,
> topVer=AffinityTopologyVersion [topVer=103, minorTopVer=0]
>
> On Tue, 5 Oct 2021 at 10:07, John Smith  > wrote:
>
> And I see this...
>
> [14:04:15,150][WARNING][exchange-worker-#43%raange%][GridDhtPartitionsExchangeFuture]
> Unable to await partitions release latch within timeout. For more details
> please check coordinator node logs [crdNode=TcpDiscoveryNode
> [id=36ad785d-e344-43bb-b685-e79557572b54,
> consistentId=8172e45d-3ff8-4fe4-aeda-e7d30c1e11e2, addrs=ArrayList
> [127.0.0.1, xx.65], sockAddrs=HashSet [xx-0002/xx.65:47500, /
> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
> lastExchangeTime=1633370987399, loc=false,
> ver=2.8.1#20200521-sha1:86422096, isClient=false]] [latch=ClientLatch
> [coordinator=TcpDiscoveryNode [id=36ad785d-e344-43bb-b685-e79557572b54,
> consistentId=8172e45d-3ff8-4fe4-aeda-e7d30c1e11e2, addrs=ArrayList
> [127.0.0.1, xx.65], sockAddrs=HashSet [xx-0002/xx.65:47500, /
> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
> lastExchangeTime=1633370987399, loc=false,
> ver=2.8.1#20200521-sha1:86422096, isClient=false], ackSent=true,
> super=CompletableLatch [id=CompletableLatchUid [id=exchange,
> topVer=AffinityTopologyVersion [topVer=103, minorTopVer=0]
>
> On Tue, 5 Oct 2021 at 10:02, John Smith  > wrote:
>
> Actually to be more clear...
>
> http://xx-0001:8080/ignite?cmd=version responds immediately.
>
> http://xx-0001:8080/ignite?cmd=size=my-cache doesn't
> respond at all.
>
> On Tue, 5 Oct 2021 at 09:59, John Smith  > wrote:
>
> Yeah ever since I got this erro for example the REST APi wont return and
> the request are slower. But when I connect with visor I can get stats I can
> scan the cache etc...
>
> Is it possible that these async futures/threads are not released?
>
> On Tue, 5 Oct 2021 at 04:11, Zhenya Stanilovsky  > wrote:
>
> Hi, this is just a warning shows that something suspicious observed.
> There is no simple reply for your question, in common case all these
> messages are due to cluster (resources or settings)  limitation.
> Check documentation for tuning performance [1]
>
> [1]
> https://ignite.apache.org/docs/latest/perf-and-troubleshooting/general-perf-tips
>
>
>
> Hi, using 2.8.1 I understand the message as in my async TRX is taking
> longer but is there a way to prevent it?
>
> When this happened I was pushing about 50, 000 get/puts per second from my
> API.
>
>
>
>
>
>
>
>
>
>
>


Re: What does "First 10 long running cache futures" ?

2021-10-05 Thread John Smith
And finally this on the coordinator node

[14:07:41,282][WARNING][exchange-worker-#42%xx%][GridDhtPartitionsExchangeFuture]
Unable to await partitions release latch within timeout. Some nodes have
not sent acknowledgement for latch completion. It's possible due to
unfinishined atomic updates, transactions or not released explicit locks on
that nodes. Please check logs for errors on nodes with ids reported in
latch `pendingAcks` collection [latch=ServerLatch [permits=1,
pendingAcks=HashSet [36edbfd5-4feb-417e-b965-bdc34a0a6f4f],
super=CompletableLatch [id=CompletableLatchUid [id=exchange,
topVer=AffinityTopologyVersion [topVer=103, minorTopVer=0]

On Tue, 5 Oct 2021 at 10:07, John Smith  wrote:

> And I see this...
>
> [14:04:15,150][WARNING][exchange-worker-#43%raange%][GridDhtPartitionsExchangeFuture]
> Unable to await partitions release latch within timeout. For more details
> please check coordinator node logs [crdNode=TcpDiscoveryNode
> [id=36ad785d-e344-43bb-b685-e79557572b54,
> consistentId=8172e45d-3ff8-4fe4-aeda-e7d30c1e11e2, addrs=ArrayList
> [127.0.0.1, xx.65], sockAddrs=HashSet [xx-0002/xx.65:47500, /
> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
> lastExchangeTime=1633370987399, loc=false,
> ver=2.8.1#20200521-sha1:86422096, isClient=false]] [latch=ClientLatch
> [coordinator=TcpDiscoveryNode [id=36ad785d-e344-43bb-b685-e79557572b54,
> consistentId=8172e45d-3ff8-4fe4-aeda-e7d30c1e11e2, addrs=ArrayList
> [127.0.0.1, xx.65], sockAddrs=HashSet [xx-0002/xx.65:47500, /
> 127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
> lastExchangeTime=1633370987399, loc=false,
> ver=2.8.1#20200521-sha1:86422096, isClient=false], ackSent=true,
> super=CompletableLatch [id=CompletableLatchUid [id=exchange,
> topVer=AffinityTopologyVersion [topVer=103, minorTopVer=0]
>
> On Tue, 5 Oct 2021 at 10:02, John Smith  wrote:
>
>> Actually to be more clear...
>>
>> http://xx-0001:8080/ignite?cmd=version responds immediately.
>>
>> http://xx-0001:8080/ignite?cmd=size=my-cache doesn't
>> respond at all.
>>
>> On Tue, 5 Oct 2021 at 09:59, John Smith  wrote:
>>
>>> Yeah ever since I got this erro for example the REST APi wont return and
>>> the request are slower. But when I connect with visor I can get stats I can
>>> scan the cache etc...
>>>
>>> Is it possible that these async futures/threads are not released?
>>>
>>> On Tue, 5 Oct 2021 at 04:11, Zhenya Stanilovsky 
>>> wrote:
>>>
>>>> Hi, this is just a warning shows that something suspicious observed.
>>>> There is no simple reply for your question, in common case all these
>>>> messages are due to cluster (resources or settings)  limitation.
>>>> Check documentation for tuning performance [1]
>>>>
>>>> [1]
>>>> https://ignite.apache.org/docs/latest/perf-and-troubleshooting/general-perf-tips
>>>>
>>>>
>>>> Hi, using 2.8.1 I understand the message as in my async TRX is taking
>>>> longer but is there a way to prevent it?
>>>>
>>>> When this happened I was pushing about 50, 000 get/puts per second from
>>>> my API.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>


Re: What does "First 10 long running cache futures" ?

2021-10-05 Thread John Smith
And I see this...

[14:04:15,150][WARNING][exchange-worker-#43%raange%][GridDhtPartitionsExchangeFuture]
Unable to await partitions release latch within timeout. For more details
please check coordinator node logs [crdNode=TcpDiscoveryNode
[id=36ad785d-e344-43bb-b685-e79557572b54,
consistentId=8172e45d-3ff8-4fe4-aeda-e7d30c1e11e2, addrs=ArrayList
[127.0.0.1, xx.65], sockAddrs=HashSet [xx-0002/xx.65:47500, /
127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
lastExchangeTime=1633370987399, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=false]] [latch=ClientLatch
[coordinator=TcpDiscoveryNode [id=36ad785d-e344-43bb-b685-e79557572b54,
consistentId=8172e45d-3ff8-4fe4-aeda-e7d30c1e11e2, addrs=ArrayList
[127.0.0.1, xx.65], sockAddrs=HashSet [xx-0002/xx.65:47500, /
127.0.0.1:47500], discPort=47500, order=1, intOrder=1,
lastExchangeTime=1633370987399, loc=false,
ver=2.8.1#20200521-sha1:86422096, isClient=false], ackSent=true,
super=CompletableLatch [id=CompletableLatchUid [id=exchange,
topVer=AffinityTopologyVersion [topVer=103, minorTopVer=0]

On Tue, 5 Oct 2021 at 10:02, John Smith  wrote:

> Actually to be more clear...
>
> http://xx-0001:8080/ignite?cmd=version responds immediately.
>
> http://xx-0001:8080/ignite?cmd=size=my-cache doesn't
> respond at all.
>
> On Tue, 5 Oct 2021 at 09:59, John Smith  wrote:
>
>> Yeah ever since I got this erro for example the REST APi wont return and
>> the request are slower. But when I connect with visor I can get stats I can
>> scan the cache etc...
>>
>> Is it possible that these async futures/threads are not released?
>>
>> On Tue, 5 Oct 2021 at 04:11, Zhenya Stanilovsky 
>> wrote:
>>
>>> Hi, this is just a warning shows that something suspicious observed.
>>> There is no simple reply for your question, in common case all these
>>> messages are due to cluster (resources or settings)  limitation.
>>> Check documentation for tuning performance [1]
>>>
>>> [1]
>>> https://ignite.apache.org/docs/latest/perf-and-troubleshooting/general-perf-tips
>>>
>>>
>>> Hi, using 2.8.1 I understand the message as in my async TRX is taking
>>> longer but is there a way to prevent it?
>>>
>>> When this happened I was pushing about 50, 000 get/puts per second from
>>> my API.
>>>
>>>
>>>
>>>
>>>
>>>
>>


Re: What does "First 10 long running cache futures" ?

2021-10-05 Thread John Smith
Actually to be more clear...

http://xx-0001:8080/ignite?cmd=version responds immediately.

http://xx-0001:8080/ignite?cmd=size=my-cache doesn't respond
at all.

On Tue, 5 Oct 2021 at 09:59, John Smith  wrote:

> Yeah ever since I got this erro for example the REST APi wont return and
> the request are slower. But when I connect with visor I can get stats I can
> scan the cache etc...
>
> Is it possible that these async futures/threads are not released?
>
> On Tue, 5 Oct 2021 at 04:11, Zhenya Stanilovsky 
> wrote:
>
>> Hi, this is just a warning shows that something suspicious observed.
>> There is no simple reply for your question, in common case all these
>> messages are due to cluster (resources or settings)  limitation.
>> Check documentation for tuning performance [1]
>>
>> [1]
>> https://ignite.apache.org/docs/latest/perf-and-troubleshooting/general-perf-tips
>>
>>
>> Hi, using 2.8.1 I understand the message as in my async TRX is taking
>> longer but is there a way to prevent it?
>>
>> When this happened I was pushing about 50, 000 get/puts per second from
>> my API.
>>
>>
>>
>>
>>
>>
>


Re: What does "First 10 long running cache futures" ?

2021-10-05 Thread John Smith
Yeah ever since I got this erro for example the REST APi wont return and
the request are slower. But when I connect with visor I can get stats I can
scan the cache etc...

Is it possible that these async futures/threads are not released?

On Tue, 5 Oct 2021 at 04:11, Zhenya Stanilovsky  wrote:

> Hi, this is just a warning shows that something suspicious observed.
> There is no simple reply for your question, in common case all these
> messages are due to cluster (resources or settings)  limitation.
> Check documentation for tuning performance [1]
>
> [1]
> https://ignite.apache.org/docs/latest/perf-and-troubleshooting/general-perf-tips
>
>
> Hi, using 2.8.1 I understand the message as in my async TRX is taking
> longer but is there a way to prevent it?
>
> When this happened I was pushing about 50, 000 get/puts per second from my
> API.
>
>
>
>
>
>


What does "First 10 long running cache futures" ?

2021-10-04 Thread John Smith
Hi, using 2.8.1 I understand the message as in my async TRX is taking
longer but is there a way to prevent it?

When this happened I was pushing about 50, 000 get/puts per second from my
API.


What is considered high IO wait and partition exchange fiallure?

2020-11-17 Thread John Smith
So if I understand correctly the logs below...

The node that shut off was timing out trying to get partition exchange from
the indicated nodes and it shut itself off correct? Does this mean this
node was also the master?

1- The time indicated in the log is that UTC?
2- I'm trying to see if it was high IO, but the node that stopped had no
high IO.
3- the other nodes had an average peek of about 0.3%-0.5% And that was
around 16:30 EST So not sure if those times match with the log.
4- On super high load days we get up to 7%-9% IO Wait but we never lost a
node on those occasions.
5- The nodes are persistent nodes so I'm guessing on write there's high IO
can we reduce the commit time? but even then I'm not sure it's related. As
even on higher load we don't lose the node.

[04:36:07,771][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=132, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=e5645874-77e9-4455-8db5-07fa63984276, addrs=[127.0.0.1, xxx.xxx.xxx.5],
sockAddrs=[/xxx.xxx.xxx.5:0, /127.0.0.1:0], discPort=0, order=108,
intOrder=60, lastExchangeTime=1600798045116, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,771][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish
exchange future [startVer=AffinityTopologyVersion [topVer=133,
minorTopVer=0], resVer=null, err=class
org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is
stopping: xx]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=133, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=13f473cb-5441-460b-9b9f-a23c1bcf3b0b, addrs=[127.0.0.1, xxx.xxx.xxx.3],
sockAddrs=[/127.0.0.1:0, /xxx.xxx.xxx.3:0], discPort=0, order=110,
intOrder=61, lastExchangeTime=1600798045127, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish
exchange future [startVer=AffinityTopologyVersion [topVer=134,
minorTopVer=0], resVer=null, err=class
org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is
stopping: xx]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=134, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=ef9f3750-edfa-474e-8e68-6f0abee5095a, addrs=[127.0.0.1,
xxx.xxx.xxx.10], sockAddrs=[/127.0.0.1:0, /xxx.xxx.xxx.10:0], discPort=0,
order=112, intOrder=62, lastExchangeTime=1600798045137, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish
exchange future [startVer=AffinityTopologyVersion [topVer=135,
minorTopVer=0], resVer=null, err=class
org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is
stopping: xx]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=135, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=e14e5bea-2784-4a6f-af2f-e98a4bf0ab52, addrs=[127.0.0.1,
xxx.xxx.xxx.63], sockAddrs=[xx-0001/xxx.xxx.xxx.63:47500, /
127.0.0.1:47500], discPort=47500, order=118, intOrder=65,
lastExchangeTime=1600798763474, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish
exchange future [startVer=AffinityTopologyVersion [topVer=136,
minorTopVer=0], resVer=null, err=class
org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is
stopping: xx]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=136, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=8facbe9c-6d9f-4c2d-8d9c-5c7f38dbb7da, addrs=[127.0.0.1,
xxx.xxx.xxx.69], sockAddrs=[/127.0.0.1:47500,
xx-0003/xxx.xxx.xxx.69:47500], discPort=47500, order=120, intOrder=66,
lastExchangeTime=1600799256509, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,796][INFO][db-checkpoint-thread-#56%xx%][GridCacheDatabaseSharedManager]
Skipping checkpoint (no pages were 

What is considered high IO wait and partition exchange failure?

2020-11-17 Thread John Smith
So if I understand correctly the logs below...

The node that shut off was timing out trying to get partition exchange from
the indicated nodes and it shut itself off correct? Does this mean this
node was also the master?

1- The time indicated in the log is that UTC?
2- I'm trying to see if it was high IO, but the node that stopped had no
high IO.
3- the other nodes had an average peek of about 0.3%-0.5% And that was
around 16:30 EST So not sure if those times match with the log.
4- On super high load days we get up to 7%-9% IO Wait but we never lost a
node on those occasions.
5- The nodes are persistent nodes so I'm guessing on write there a high IO
can we reduce the IO, but even then I'm not sure it's related.

[04:36:07,771][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=132, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=e5645874-77e9-4455-8db5-07fa63984276, addrs=[127.0.0.1, 172.17.0.5],
sockAddrs=[/172.17.0.5:0, /127.0.0.1:0], discPort=0, order=108,
intOrder=60, lastExchangeTime=1600798045116, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,771][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish
exchange future [startVer=AffinityTopologyVersion [topVer=133,
minorTopVer=0], resVer=null, err=class
org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is
stopping: xx]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=133, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=13f473cb-5441-460b-9b9f-a23c1bcf3b0b, addrs=[127.0.0.1, 172.17.0.3],
sockAddrs=[/127.0.0.1:0, /172.17.0.3:0], discPort=0, order=110,
intOrder=61, lastExchangeTime=1600798045127, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish
exchange future [startVer=AffinityTopologyVersion [topVer=134,
minorTopVer=0], resVer=null, err=class
org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is
stopping: xx]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=134, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=ef9f3750-edfa-474e-8e68-6f0abee5095a, addrs=[127.0.0.1, 172.17.0.10],
sockAddrs=[/127.0.0.1:0, /172.17.0.10:0], discPort=0, order=112,
intOrder=62, lastExchangeTime=1600798045137, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=true], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish
exchange future [startVer=AffinityTopologyVersion [topVer=135,
minorTopVer=0], resVer=null, err=class
org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is
stopping: xx]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=135, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=e14e5bea-2784-4a6f-af2f-e98a4bf0ab52, addrs=[127.0.0.1,
xxx.xxx.xxx.63], sockAddrs=[xx-0001/xxx.xxx.xxx.63:47500, /
127.0.0.1:47500], discPort=47500, order=118, intOrder=65,
lastExchangeTime=1600798763474, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture] Finish
exchange future [startVer=AffinityTopologyVersion [topVer=136,
minorTopVer=0], resVer=null, err=class
org.apache.ignite.internal.IgniteInterruptedCheckedException: Node is
stopping: xx]
[04:36:07,772][INFO][node-stopper][GridDhtPartitionsExchangeFuture]
Completed partition exchange
[localNode=8d84b4e9-8c11-4166-a75a-9b3d959ff1fe,
exchange=GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion
[topVer=136, minorTopVer=0], evt=NODE_FAILED, evtNode=TcpDiscoveryNode
[id=8facbe9c-6d9f-4c2d-8d9c-5c7f38dbb7da, addrs=[127.0.0.1,
xxx.xxx.xxx.69], sockAddrs=[/127.0.0.1:47500,
xx-0003/xxx.xxx.xxx.69:47500], discPort=47500, order=120, intOrder=66,
lastExchangeTime=1600799256509, loc=false,
ver=2.7.0#20181130-sha1:256ae401, isClient=false], done=true], topVer=null,
durationFromInit=1605328567765]
[04:36:07,796][INFO][db-checkpoint-thread-#56%xx%][GridCacheDatabaseSharedManager]
Skipping checkpoint (no pages were modified) [checkpointLockWait=0ms,
checkpointLockHoldTime=14ms, 

Re: Lost node again.

2020-08-20 Thread John Smith
It's the default. And as per Ilya I had a suspected GC pause of 45000 ms so
I figure 60 second would be ok. As for the GC pauses we (as in I and ignite
team) have already looked at GC logs previously and it wasn't the issue.

For the monitoring we are using Elastisearch, with Metricbeat and Kibana as
the dashboard. Not the latest because then I would be able to use JMX as
well :p
I will try toll look into a JMX kafka log exporter or something and see if
I can get them into Elastic when and if I hav3 time lol



On Thu, 20 Aug 2020 at 12:28, Denis Magda  wrote:

> Dennis, wouldn't 15 seconds faillureDetectionTimeout cause even more
>> shutdowns?
>
>
> What's your current value? For sure, It doesn't make sense to decrease the
> value until all mysterious pauses are figured out. The downside of a high
> failureDetectionTimeout is that the cluster won't remove a node that failed
> for a reason until the timeout expires. So, if there is a failed node that
> has to process some operations then the rest of the cluster will be trying
> to reach it out until the failureDetectionTimeout is reached. That affects
> performance of some operations where the failed node has to be involved.
>
> Btw, what's the tool you are using for the monitoring? Looks nice.
>
> -
> Denis
>
>
> On Thu, Aug 20, 2020 at 6:44 AM John Smith  wrote:
>
>> Hi here is an example of our cluster during our normal "high" usage. The
>> node shutting down seems to happen on "off" hours.
>>
>> Dennis, wouldn't 15 seconds faillureDetectionTimeout cause even more
>> shutdowns?
>> We also considered more tuning stuff in the docs, we' ll see I guess...
>> As for now we don't have separate disks for now.
>>
>>
>>
>> On Wed, 19 Aug 2020 at 23:35, Denis Magda  wrote:
>>
>>> John,
>>>
>>> I would try to get to the bottom of the issue, especially, if the case
>>> is reproducible.
>>>
>>> If that's not GC then check if that's the I/O (your logs show that the
>>> checkpointing rate is high):
>>>
>>>- You can monitor checkpointing duration with a JMX tool
>>>
>>> <https://www.gridgain.com/docs/latest/administrators-guide/monitoring-metrics/metrics#monitoring-checkpointing-operations>
>>>  or
>>>Control Center
>>>
>>> <https://www.gridgain.com/docs/control-center/latest/monitoring/metrics#checkpoint-duration>
>>>.
>>>- Configure write-throttling
>>>
>>> <https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning#pages-writes-throttling>
>>>if the checkpointing buffer fills in quickly.
>>>- Ideally, storage files and WALs should be stored on different SSD
>>>media
>>>
>>> <https://www.gridgain.com/docs/latest/perf-troubleshooting-guide/persistence-tuning#keep-wals-separately>.
>>>SSDs also do garbage collection and you might hit it frequently.
>>>
>>> As for the failureDetectionTimeout, I would set it to 15 secs until your
>>> cluster is battle-tested and well-tuned for your use case.
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Tue, Aug 18, 2020 at 10:37 AM John Smith 
>>> wrote:
>>>
>>>> I don't see why we would get such a huge pause, in fact I have provided
>>>> GC logs before and we found nothing...
>>>>
>>>> All operations are in the "big" partitioned 3 million cache are put or
>>>> get and a query on another cache which has 450 entries. There no other
>>>> caches.
>>>>
>>>> The nodes all have 6G off heap and 26G off heap.
>>>>
>>>> I think it can be IO related but I can't seem to be able to correlate
>>>> it to IO. I saw some heavy IO usage but the node failed way after.
>>>>
>>>> Now my question is should I put the failure detection to 60s just for
>>>> the sake of trying it? Isn't that too high? If i put the servers to 60s how
>>>> how high should I put the clients?
>>>>
>>>> On Tue., Aug. 18, 2020, 7:32 a.m. Ilya Kasnacheev, <
>>>> ilya.kasnach...@gmail.com> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> [13:39:53,242][WARNING][jvm-pause-detector-worker][IgniteKernal%company]
>>>>> Possible too long JVM pause: 41779 milliseconds.
>>>>>
>>>>> It seems that you have too-long full GC. Either make sure it does not
>>>>> happen, or increase failureDe

Re: Lost node again.

2020-08-18 Thread John Smith
I don't see why we would get such a huge pause, in fact I have provided GC
logs before and we found nothing...

All operations are in the "big" partitioned 3 million cache are put or get
and a query on another cache which has 450 entries. There no other caches.

The nodes all have 6G off heap and 26G off heap.

I think it can be IO related but I can't seem to be able to correlate it to
IO. I saw some heavy IO usage but the node failed way after.

Now my question is should I put the failure detection to 60s just for the
sake of trying it? Isn't that too high? If i put the servers to 60s how how
high should I put the clients?

On Tue., Aug. 18, 2020, 7:32 a.m. Ilya Kasnacheev, <
ilya.kasnach...@gmail.com> wrote:

> Hello!
>
> [13:39:53,242][WARNING][jvm-pause-detector-worker][IgniteKernal%company]
> Possible too long JVM pause: 41779 milliseconds.
>
> It seems that you have too-long full GC. Either make sure it does not
> happen, or increase failureDetectionTimeout to be longer than any expected
> GC.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 17 авг. 2020 г. в 17:51, John Smith :
>
>> Hi guys it seems every couple of weeks we lose a node... Here are the
>> logs:
>> https://www.dropbox.com/sh/8cv2v8q5lcsju53/AAAU6ZSFkfiZPaMwHgIh5GAfa?dl=0
>>
>> And some extra details. Maybe I need to do more tuning then what is
>> already mentioned below, maybe set a higher timeout?
>>
>> 3 server nodes and 9 clients (client = true)
>>
>> Performance wise the cluster is not doing any kind of high volume on
>> average it does about 15-20 puts/gets/queries (any combination of) per
>> 30-60 seconds.
>>
>> The biggest cache we have is: 3 million records distributed with 1 backup
>> using the following template.
>>
>>   > class="org.apache.ignite.configuration.CacheConfiguration">
>> 
>> 
>> 
>> 
>> 
>>   
>>
>> Persistence is configured:
>>
>>   
>> > class="org.apache.ignite.configuration.DataStorageConfiguration">
>>   
>>   
>> > class="org.apache.ignite.configuration.DataRegionConfiguration">
>>   
>>
>>   
>>   
>> 
>>   
>> 
>>   
>>
>> We also followed the tuning instructions for GC and I/O
>> if [ -z "$JVM_OPTS" ] ; then
>> JVM_OPTS="-Xms6g -Xmx6g -server -XX:MaxMetaspaceSize=256m"
>> fi
>>
>> #
>> # Uncomment the following GC settings if you see spikes in your
>> throughput due to Garbage Collection.
>> #
>> JVM_OPTS="$JVM_OPTS -XX:+UseG1GC -XX:+AlwaysPreTouch
>> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC"
>> sysctl -w vm.dirty_writeback_centisecs=500 sysctl -w vm
>> .dirty_expire_centisecs=500
>>
>>


Re: Operation block on Cluster recovery/rebalance.

2020-08-18 Thread John Smith
Hi Denis, for everyones reference:
https://issues.apache.org/jira/browse/IGNITE-13372

On Mon, 17 Aug 2020 at 14:28, Denis Magda  wrote:

> But on client reconnect, doesn't it mean it will still block until the
>> cluster is active even if I get new IgniteCache instance?
>
>
> No, the client will be getting an exception on an attempt to get an
> IgniteCache instance.
>
> -
> Denis
>
>
> On Fri, Aug 14, 2020 at 4:14 PM John Smith  wrote:
>
>> Yeah I can maybe use vertx event bus or something to do this... But now I
>> have to tie the ignite instance to the IgniteCahe repository I wrote.
>>
>> But on client reconnect, doesn't it mean it will still block until the
>> cluster is active even if I get new IgniteCache instance?
>>
>> On Fri, 14 Aug 2020 at 18:22, Denis Magda  wrote:
>>
>>> @Evgenii Zhuravlev , @Ilya Kasnacheev
>>> , any thoughts on this?
>>>
>>> As a dirty workaround, you can update your cache references on client
>>> reconnect events. You will be getting an exception by calling
>>> ignite.cache(cacheName) in the time when the cluster is not activated yet.
>>> Does this work for you?
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Fri, Aug 14, 2020 at 3:12 PM John Smith 
>>> wrote:
>>>
>>>> Is there any work around? I can't have an HTTP server block on all
>>>> requests.
>>>>
>>>> 1- I need to figure out why I lose a server nodes every few weeks,
>>>> which when rebooting the nodes cause the inactive state until they are
>>>> back
>>>>
>>>> 2- Implement some kind of logic on the client side not to block the
>>>> HTTP part...
>>>>
>>>> Can IgniteCache instance be notified of disconnected events so I can
>>>> maybe tell the repository class I have to set a flag to skip the operation?
>>>>
>>>>
>>>> On Fri., Aug. 14, 2020, 5:17 p.m. Denis Magda, 
>>>> wrote:
>>>>
>>>>> My guess that it's standard behavior for all operations (SQL,
>>>>> key-value, compute, etc.). But I'll let the maintainers of those modules
>>>>> clarify.
>>>>>
>>>>> -
>>>>> Denis
>>>>>
>>>>>
>>>>> On Fri, Aug 14, 2020 at 1:44 PM John Smith 
>>>>> wrote:
>>>>>
>>>>>> Hi Denis, so to understand it's all operations or just the query?
>>>>>>
>>>>>> On Fri., Aug. 14, 2020, 12:53 p.m. Denis Magda, 
>>>>>> wrote:
>>>>>>
>>>>>>> John,
>>>>>>>
>>>>>>> Ok, we nailed it. That's the current expected behavior. Generally, I
>>>>>>> agree with you that the platform should support an option when 
>>>>>>> operations
>>>>>>> fail if the cluster is deactivated. Could you propose the change by
>>>>>>> starting a discussion on the dev list? You can refer to this user list
>>>>>>> discussion for reference. Let me know if you need help with this.
>>>>>>>
>>>>>>> -
>>>>>>> Denis
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 13, 2020 at 5:55 PM John Smith 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> No I, reuse the instance. The cache instance is created once at
>>>>>>>> startup of the application and I pass it to my "repository" class
>>>>>>>>
>>>>>>>> public abstract class AbstractIgniteRepository implements 
>>>>>>>> CacheRepository {
>>>>>>>> public final long DEFAULT_OPERATION_TIMEOUT = 2000;
>>>>>>>>
>>>>>>>> private Vertx vertx;
>>>>>>>> private IgniteCache cache;
>>>>>>>>
>>>>>>>> AbstractIgniteRepository(Vertx vertx, IgniteCache cache) {
>>>>>>>> this.vertx = vertx;
>>>>>>>> this.cache = cache;
>>>>>>>> }
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>> Future> query(final String sql, final long 
>>>>>>>> timeoutMs, final Object... args) {
>>>>>>>> 

Lost node again.

2020-08-17 Thread John Smith
Hi guys it seems every couple of weeks we lose a node... Here are the logs:
https://www.dropbox.com/sh/8cv2v8q5lcsju53/AAAU6ZSFkfiZPaMwHgIh5GAfa?dl=0

And some extra details. Maybe I need to do more tuning then what is already
mentioned below, maybe set a higher timeout?

3 server nodes and 9 clients (client = true)

Performance wise the cluster is not doing any kind of high volume on
average it does about 15-20 puts/gets/queries (any combination of) per
30-60 seconds.

The biggest cache we have is: 3 million records distributed with 1 backup
using the following template.

  





  

Persistence is configured:

  

  
  

  

  
  

  

  

We also followed the tuning instructions for GC and I/O
if [ -z "$JVM_OPTS" ] ; then
JVM_OPTS="-Xms6g -Xmx6g -server -XX:MaxMetaspaceSize=256m"
fi

#
# Uncomment the following GC settings if you see spikes in your throughput
due to Garbage Collection.
#
JVM_OPTS="$JVM_OPTS -XX:+UseG1GC -XX:+AlwaysPreTouch
-XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC"
sysctl -w vm.dirty_writeback_centisecs=500 sysctl -w vm
.dirty_expire_centisecs=500


Re: Cache configuration

2020-08-15 Thread John Smith
You can create templates in the XML and programmatically when you say
getOrCreate() you can specify the template to use and pass in random name
for the cache name ...

https://apacheignite.readme.io/docs/cache-template#:~:text=Cache%20templates%20are%20useful%20when,CREATE%20TABLE%20and%20REST%20commands
.

On Sat., Aug. 15, 2020, 8:53 a.m. itsmeravikiran.c, <
itsmeravikira...@gmail.com> wrote:

> My cache ids are dynamic.
> Is it possible to add cache configuration in xml.
> I have checked, name property is mandatory. But i cannot add the name as
> it's dynamic name.
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: Operation block on Cluster recovery/rebalance.

2020-08-14 Thread John Smith
Yeah I can maybe use vertx event bus or something to do this... But now I
have to tie the ignite instance to the IgniteCahe repository I wrote.

But on client reconnect, doesn't it mean it will still block until the
cluster is active even if I get new IgniteCache instance?

On Fri, 14 Aug 2020 at 18:22, Denis Magda  wrote:

> @Evgenii Zhuravlev , @Ilya Kasnacheev
> , any thoughts on this?
>
> As a dirty workaround, you can update your cache references on client
> reconnect events. You will be getting an exception by calling
> ignite.cache(cacheName) in the time when the cluster is not activated yet.
> Does this work for you?
>
> -
> Denis
>
>
> On Fri, Aug 14, 2020 at 3:12 PM John Smith  wrote:
>
>> Is there any work around? I can't have an HTTP server block on all
>> requests.
>>
>> 1- I need to figure out why I lose a server nodes every few weeks, which
>> when rebooting the nodes cause the inactive state until they are back
>>
>> 2- Implement some kind of logic on the client side not to block the HTTP
>> part...
>>
>> Can IgniteCache instance be notified of disconnected events so I can
>> maybe tell the repository class I have to set a flag to skip the operation?
>>
>>
>> On Fri., Aug. 14, 2020, 5:17 p.m. Denis Magda,  wrote:
>>
>>> My guess that it's standard behavior for all operations (SQL, key-value,
>>> compute, etc.). But I'll let the maintainers of those modules clarify.
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Fri, Aug 14, 2020 at 1:44 PM John Smith 
>>> wrote:
>>>
>>>> Hi Denis, so to understand it's all operations or just the query?
>>>>
>>>> On Fri., Aug. 14, 2020, 12:53 p.m. Denis Magda, 
>>>> wrote:
>>>>
>>>>> John,
>>>>>
>>>>> Ok, we nailed it. That's the current expected behavior. Generally, I
>>>>> agree with you that the platform should support an option when operations
>>>>> fail if the cluster is deactivated. Could you propose the change by
>>>>> starting a discussion on the dev list? You can refer to this user list
>>>>> discussion for reference. Let me know if you need help with this.
>>>>>
>>>>> -
>>>>> Denis
>>>>>
>>>>>
>>>>> On Thu, Aug 13, 2020 at 5:55 PM John Smith 
>>>>> wrote:
>>>>>
>>>>>> No I, reuse the instance. The cache instance is created once at
>>>>>> startup of the application and I pass it to my "repository" class
>>>>>>
>>>>>> public abstract class AbstractIgniteRepository implements 
>>>>>> CacheRepository {
>>>>>> public final long DEFAULT_OPERATION_TIMEOUT = 2000;
>>>>>>
>>>>>> private Vertx vertx;
>>>>>> private IgniteCache cache;
>>>>>>
>>>>>> AbstractIgniteRepository(Vertx vertx, IgniteCache cache) {
>>>>>> this.vertx = vertx;
>>>>>> this.cache = cache;
>>>>>> }
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> Future> query(final String sql, final long 
>>>>>> timeoutMs, final Object... args) {
>>>>>> final Promise> promise = Promise.promise();
>>>>>>
>>>>>> vertx.setTimer(timeoutMs, l -> {
>>>>>> promise.tryFail(new TimeoutException("Cache operation did 
>>>>>> not complete within: " + timeoutMs + " Ms.")); // THIS FIRE IF THE BLOE 
>>>>>> DOESN"T COMPLETE IN TIME.
>>>>>> });
>>>>>>
>>>>>> vertx.>executeBlocking(code -> {
>>>>>> SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args);
>>>>>> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS);
>>>>>>
>>>>>>
>>>>>> try (QueryCursor> cursor = cache.query(query)) { // 
>>>>>> <--- BLOCKS HERE.
>>>>>> List rows = new ArrayList<>();
>>>>>> Iterator> iterator = cursor.iterator();
>>>>>>
>>>>>> while(iterator.hasNext()) {
>>>>>> List currentRow = iterator.next();
>>>>>>

Re: Operation block on Cluster recovery/rebalance.

2020-08-14 Thread John Smith
Is there any work around? I can't have an HTTP server block on all requests.

1- I need to figure out why I lose a server nodes every few weeks, which
when rebooting the nodes cause the inactive state until they are back

2- Implement some kind of logic on the client side not to block the HTTP
part...

Can IgniteCache instance be notified of disconnected events so I can maybe
tell the repository class I have to set a flag to skip the operation?


On Fri., Aug. 14, 2020, 5:17 p.m. Denis Magda,  wrote:

> My guess that it's standard behavior for all operations (SQL, key-value,
> compute, etc.). But I'll let the maintainers of those modules clarify.
>
> -
> Denis
>
>
> On Fri, Aug 14, 2020 at 1:44 PM John Smith  wrote:
>
>> Hi Denis, so to understand it's all operations or just the query?
>>
>> On Fri., Aug. 14, 2020, 12:53 p.m. Denis Magda, 
>> wrote:
>>
>>> John,
>>>
>>> Ok, we nailed it. That's the current expected behavior. Generally, I
>>> agree with you that the platform should support an option when operations
>>> fail if the cluster is deactivated. Could you propose the change by
>>> starting a discussion on the dev list? You can refer to this user list
>>> discussion for reference. Let me know if you need help with this.
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Thu, Aug 13, 2020 at 5:55 PM John Smith 
>>> wrote:
>>>
>>>> No I, reuse the instance. The cache instance is created once at startup
>>>> of the application and I pass it to my "repository" class
>>>>
>>>> public abstract class AbstractIgniteRepository implements 
>>>> CacheRepository {
>>>> public final long DEFAULT_OPERATION_TIMEOUT = 2000;
>>>>
>>>> private Vertx vertx;
>>>> private IgniteCache cache;
>>>>
>>>> AbstractIgniteRepository(Vertx vertx, IgniteCache cache) {
>>>> this.vertx = vertx;
>>>> this.cache = cache;
>>>> }
>>>>
>>>> ...
>>>>
>>>> Future> query(final String sql, final long timeoutMs, 
>>>> final Object... args) {
>>>> final Promise> promise = Promise.promise();
>>>>
>>>> vertx.setTimer(timeoutMs, l -> {
>>>> promise.tryFail(new TimeoutException("Cache operation did not 
>>>> complete within: " + timeoutMs + " Ms.")); // THIS FIRE IF THE BLOE 
>>>> DOESN"T COMPLETE IN TIME.
>>>> });
>>>>
>>>> vertx.>executeBlocking(code -> {
>>>> SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args);
>>>> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS);
>>>>
>>>>
>>>> try (QueryCursor> cursor = cache.query(query)) { // 
>>>> <--- BLOCKS HERE.
>>>> List rows = new ArrayList<>();
>>>> Iterator> iterator = cursor.iterator();
>>>>
>>>> while(iterator.hasNext()) {
>>>> List currentRow = iterator.next();
>>>> JsonArray row = new JsonArray();
>>>>
>>>> currentRow.forEach(o -> row.add(o));
>>>>
>>>> rows.add(row);
>>>> }
>>>>
>>>> code.complete(rows);
>>>> } catch(Exception ex) {
>>>> code.fail(ex);
>>>> }
>>>> }, result -> {
>>>> if(result.succeeded()) {
>>>> promise.tryComplete(result.result());
>>>> } else {
>>>> promise.tryFail(result.cause());
>>>> }
>>>> });
>>>>
>>>> return promise.future();
>>>> }
>>>>
>>>> public  T cache() {
>>>> return (T) cache;
>>>> }
>>>> }
>>>>
>>>>
>>>>
>>>> On Thu, 13 Aug 2020 at 16:29, Denis Magda  wrote:
>>>>
>>>>> I've created a simple test and always getting the exception below on
>>>>> an attempt to get a reference to an IgniteCache instance in cases when the
>>>>> cluster is not activated:
>>>>>
>>>>> *Excep

Re: Operation block on Cluster recovery/rebalance.

2020-08-14 Thread John Smith
Hi Denis, so to understand it's all operations or just the query?

On Fri., Aug. 14, 2020, 12:53 p.m. Denis Magda,  wrote:

> John,
>
> Ok, we nailed it. That's the current expected behavior. Generally, I agree
> with you that the platform should support an option when operations fail if
> the cluster is deactivated. Could you propose the change by starting a
> discussion on the dev list? You can refer to this user list discussion for
> reference. Let me know if you need help with this.
>
> -
> Denis
>
>
> On Thu, Aug 13, 2020 at 5:55 PM John Smith  wrote:
>
>> No I, reuse the instance. The cache instance is created once at startup
>> of the application and I pass it to my "repository" class
>>
>> public abstract class AbstractIgniteRepository implements 
>> CacheRepository {
>> public final long DEFAULT_OPERATION_TIMEOUT = 2000;
>>
>> private Vertx vertx;
>> private IgniteCache cache;
>>
>> AbstractIgniteRepository(Vertx vertx, IgniteCache cache) {
>> this.vertx = vertx;
>> this.cache = cache;
>> }
>>
>> ...
>>
>> Future> query(final String sql, final long timeoutMs, 
>> final Object... args) {
>> final Promise> promise = Promise.promise();
>>
>> vertx.setTimer(timeoutMs, l -> {
>> promise.tryFail(new TimeoutException("Cache operation did not 
>> complete within: " + timeoutMs + " Ms.")); // THIS FIRE IF THE BLOE DOESN"T 
>> COMPLETE IN TIME.
>> });
>>
>> vertx.>executeBlocking(code -> {
>> SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args);
>> query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS);
>>
>>
>> try (QueryCursor> cursor = cache.query(query)) { // <--- 
>> BLOCKS HERE.
>> List rows = new ArrayList<>();
>> Iterator> iterator = cursor.iterator();
>>
>> while(iterator.hasNext()) {
>> List currentRow = iterator.next();
>> JsonArray row = new JsonArray();
>>
>> currentRow.forEach(o -> row.add(o));
>>
>> rows.add(row);
>> }
>>
>> code.complete(rows);
>> } catch(Exception ex) {
>> code.fail(ex);
>> }
>> }, result -> {
>> if(result.succeeded()) {
>> promise.tryComplete(result.result());
>> } else {
>> promise.tryFail(result.cause());
>> }
>> });
>>
>> return promise.future();
>> }
>>
>> public  T cache() {
>> return (T) cache;
>> }
>> }
>>
>>
>>
>> On Thu, 13 Aug 2020 at 16:29, Denis Magda  wrote:
>>
>>> I've created a simple test and always getting the exception below on an
>>> attempt to get a reference to an IgniteCache instance in cases when the
>>> cluster is not activated:
>>>
>>> *Exception in thread "main" class org.apache.ignite.IgniteException: Can
>>> not perform the operation because the cluster is inactive. Note, that the
>>> cluster is considered inactive by default if Ignite Persistent Store is
>>> used to let all the nodes join the cluster. To activate the cluster call
>>> Ignite.active(true)*
>>>
>>> Are you trying to get a new IgniteCache reference whenever the client
>>> reconnects successfully to the cluster? My guts feel that currently, Ignite
>>> verifies the activation status and generates the exception above whenever
>>> you're getting a reference to an IgniteCache or IgniteCompute. But once you
>>> got those references and try to run some operations then those get stuck if
>>> the cluster is not activated.
>>> -
>>> Denis
>>>
>>>
>>> On Thu, Aug 13, 2020 at 6:37 AM John Smith 
>>> wrote:
>>>
>>>> The cache.query() starts to block when ignite server nodes are being
>>>> restarted and there's no baseline topology yet. The server nodes do not
>>>> block. It's the client that blocks.
>>>>
>>>> The dumpfiles are of the server nodes. The screen shot is from the
>>>> client app using your kit profiler on the client side the threads are
>>>> marked as red on your kit.
>>>>
>>>> The app is simple, make h

Re: Operation block on Cluster recovery/rebalance.

2020-08-13 Thread John Smith
No I, reuse the instance. The cache instance is created once at startup of
the application and I pass it to my "repository" class

public abstract class AbstractIgniteRepository implements
CacheRepository {
public final long DEFAULT_OPERATION_TIMEOUT = 2000;

private Vertx vertx;
private IgniteCache cache;

AbstractIgniteRepository(Vertx vertx, IgniteCache cache) {
this.vertx = vertx;
this.cache = cache;
}

...

Future> query(final String sql, final long
timeoutMs, final Object... args) {
final Promise> promise = Promise.promise();

vertx.setTimer(timeoutMs, l -> {
promise.tryFail(new TimeoutException("Cache operation did
not complete within: " + timeoutMs + " Ms.")); // THIS FIRE IF THE
BLOE DOESN"T COMPLETE IN TIME.
});

vertx.>executeBlocking(code -> {
SqlFieldsQuery query = new SqlFieldsQuery(sql).setArgs(args);
query.setTimeout((int) timeoutMs, TimeUnit.MILLISECONDS);


try (QueryCursor> cursor = cache.query(query)) {
// <--- BLOCKS HERE.
List rows = new ArrayList<>();
Iterator> iterator = cursor.iterator();

while(iterator.hasNext()) {
List currentRow = iterator.next();
JsonArray row = new JsonArray();

currentRow.forEach(o -> row.add(o));

rows.add(row);
}

code.complete(rows);
} catch(Exception ex) {
code.fail(ex);
}
}, result -> {
if(result.succeeded()) {
promise.tryComplete(result.result());
} else {
promise.tryFail(result.cause());
}
});

return promise.future();
}

public  T cache() {
return (T) cache;
}
}



On Thu, 13 Aug 2020 at 16:29, Denis Magda  wrote:

> I've created a simple test and always getting the exception below on an
> attempt to get a reference to an IgniteCache instance in cases when the
> cluster is not activated:
>
> *Exception in thread "main" class org.apache.ignite.IgniteException: Can
> not perform the operation because the cluster is inactive. Note, that the
> cluster is considered inactive by default if Ignite Persistent Store is
> used to let all the nodes join the cluster. To activate the cluster call
> Ignite.active(true)*
>
> Are you trying to get a new IgniteCache reference whenever the client
> reconnects successfully to the cluster? My guts feel that currently, Ignite
> verifies the activation status and generates the exception above whenever
> you're getting a reference to an IgniteCache or IgniteCompute. But once you
> got those references and try to run some operations then those get stuck if
> the cluster is not activated.
> -
> Denis
>
>
> On Thu, Aug 13, 2020 at 6:37 AM John Smith  wrote:
>
>> The cache.query() starts to block when ignite server nodes are being
>> restarted and there's no baseline topology yet. The server nodes do not
>> block. It's the client that blocks.
>>
>> The dumpfiles are of the server nodes. The screen shot is from the client
>> app using your kit profiler on the client side the threads are marked as
>> red on your kit.
>>
>> The app is simple, make http request, it runs cache Sql query on ignite
>> and if it succeeds does a put back to ignite.
>>
>> The Client disconnected exception only happens when all server nodes in
>> the cluster are down. The blockage only happens when the cluster is trying
>> to establish baseline topology.
>>
>> On Wed., Aug. 12, 2020, 6:28 p.m. Denis Magda,  wrote:
>>
>>> John,
>>>
>>> I don't see any traits of an application-caused deadlock in the thread
>>> dumps. Please elaborate on the following:
>>>
>>> 7- Restart 1st node, run operation, operation fails with
>>>> ClientDisconectedException but application still able to complete it's
>>>> request.
>>>
>>>
>>> What's the IP address of the server node the client app uses to join the
>>> cluster? If that's not the address of the 1st node, that is already
>>> restarted, then the client couldn't join the cluster and it's expected that
>>> it fails with the ClientDisconnectedException.
>>>
>>> 8- Start 2nd node, run operation, from here on all operations just block.
>>>
>>>
>>> Are the operations unblocked and completed successfully when the third
>>> node joins the cluster and the cluster gets activated automatically?
>>>
>>> -
>>> Denis
>>>
>>>
>>> O

Re: Operation block on Cluster recovery/rebalance.

2020-08-13 Thread John Smith
The cache.query() starts to block when ignite server nodes are being
restarted and there's no baseline topology yet. The server nodes do not
block. It's the client that blocks.

The dumpfiles are of the server nodes. The screen shot is from the client
app using your kit profiler on the client side the threads are marked as
red on your kit.

The app is simple, make http request, it runs cache Sql query on ignite and
if it succeeds does a put back to ignite.

The Client disconnected exception only happens when all server nodes in the
cluster are down. The blockage only happens when the cluster is trying to
establish baseline topology.

On Wed., Aug. 12, 2020, 6:28 p.m. Denis Magda,  wrote:

> John,
>
> I don't see any traits of an application-caused deadlock in the thread
> dumps. Please elaborate on the following:
>
> 7- Restart 1st node, run operation, operation fails with
>> ClientDisconectedException but application still able to complete it's
>> request.
>
>
> What's the IP address of the server node the client app uses to join the
> cluster? If that's not the address of the 1st node, that is already
> restarted, then the client couldn't join the cluster and it's expected that
> it fails with the ClientDisconnectedException.
>
> 8- Start 2nd node, run operation, from here on all operations just block.
>
>
> Are the operations unblocked and completed successfully when the third
> node joins the cluster and the cluster gets activated automatically?
>
> -
> Denis
>
>
> On Wed, Aug 12, 2020 at 11:08 AM John Smith 
> wrote:
>
>> Ok Denis here they are...
>>
>> 3 nodes and I capture a yourlit screenshot of what it thinks are
>> deadlocks on the client app.
>>
>> https://www.dropbox.com/sh/2cxjkngvx0ubw3b/AADa--HQg-rRsY3RBo2vQeJ9a?dl=0
>>
>> On Wed, 12 Aug 2020 at 11:07, John Smith  wrote:
>>
>>> Hi Denis. I will asap but you I think you were right it is the query
>>> that blocks.
>>>
>>> My application first first runs a select on the cache and then does a
>>> put to cache.
>>>
>>> On Tue, 11 Aug 2020 at 19:22, Denis Magda  wrote:
>>>
>>>> John,
>>>>
>>>> It sounds like a deadlock caused by the application logic. Is there any
>>>> chance that the operation you run on step 8 accesses several keys in one
>>>> order while the other operations work with the same keys but in a different
>>>> order. The deadlocks are possible when you use Ignite Transaction API or
>>>> simply execute bulk operations such as cache.readAll() or
>>>> cache.writeAll(..).
>>>>
>>>> Please take and attach thread dumps from all the cluster nodes for
>>>> analysis if we need to dig deeper.
>>>>
>>>> -
>>>> Denis
>>>>
>>>>
>>>> On Mon, Aug 10, 2020 at 6:23 PM John Smith 
>>>> wrote:
>>>>
>>>>> Hi Denis, I think you are right. It's the query that blocks the other
>>>>> k/v operations are ok.
>>>>>
>>>>> Any thoughts on this?
>>>>>
>>>>> On Mon, 10 Aug 2020 at 15:28, John Smith 
>>>>> wrote:
>>>>>
>>>>>> I tried with 2.8.1, same issue. Operations block indefinitely...
>>>>>>
>>>>>> 1- Start 3 node cluster
>>>>>> 2- Start client application client = true with Ignition.start()
>>>>>> 3- Run some cache operations, everything ok...
>>>>>> 4- Shut down one node, run operation, still ok
>>>>>> 5- Shut down 2nd node, run operation, still ok
>>>>>> 6- Shut down 3rd node, run operation, still ok... Operations start
>>>>>> failing with ClientDisconectedException...
>>>>>> 7- Restart 1st node, run operation, operation fails
>>>>>> with ClientDisconectedException but application still able to complete 
>>>>>> it's
>>>>>> request.
>>>>>> 8- Start 2nd node, run operation, from here on all operations just
>>>>>> block.
>>>>>>
>>>>>> Basically the client application is an HTTP Server on each HTTP
>>>>>> request does cache exception.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, 7 Aug 2020 at 19:46, John Smith 
>>>>>> wrote:
>>>>>>
>>>>>>> No, everything blocks... Also using 2.7.0 just in c

Re: Operation block on Cluster recovery/rebalance.

2020-08-12 Thread John Smith
Ok Denis here they are...

3 nodes and I capture a yourlit screenshot of what it thinks are deadlocks
on the client app.

https://www.dropbox.com/sh/2cxjkngvx0ubw3b/AADa--HQg-rRsY3RBo2vQeJ9a?dl=0

On Wed, 12 Aug 2020 at 11:07, John Smith  wrote:

> Hi Denis. I will asap but you I think you were right it is the query that
> blocks.
>
> My application first first runs a select on the cache and then does a put
> to cache.
>
> On Tue, 11 Aug 2020 at 19:22, Denis Magda  wrote:
>
>> John,
>>
>> It sounds like a deadlock caused by the application logic. Is there any
>> chance that the operation you run on step 8 accesses several keys in one
>> order while the other operations work with the same keys but in a different
>> order. The deadlocks are possible when you use Ignite Transaction API or
>> simply execute bulk operations such as cache.readAll() or
>> cache.writeAll(..).
>>
>> Please take and attach thread dumps from all the cluster nodes for
>> analysis if we need to dig deeper.
>>
>> -
>> Denis
>>
>>
>> On Mon, Aug 10, 2020 at 6:23 PM John Smith 
>> wrote:
>>
>>> Hi Denis, I think you are right. It's the query that blocks the other
>>> k/v operations are ok.
>>>
>>> Any thoughts on this?
>>>
>>> On Mon, 10 Aug 2020 at 15:28, John Smith  wrote:
>>>
>>>> I tried with 2.8.1, same issue. Operations block indefinitely...
>>>>
>>>> 1- Start 3 node cluster
>>>> 2- Start client application client = true with Ignition.start()
>>>> 3- Run some cache operations, everything ok...
>>>> 4- Shut down one node, run operation, still ok
>>>> 5- Shut down 2nd node, run operation, still ok
>>>> 6- Shut down 3rd node, run operation, still ok... Operations start
>>>> failing with ClientDisconectedException...
>>>> 7- Restart 1st node, run operation, operation fails
>>>> with ClientDisconectedException but application still able to complete it's
>>>> request.
>>>> 8- Start 2nd node, run operation, from here on all operations just
>>>> block.
>>>>
>>>> Basically the client application is an HTTP Server on each HTTP request
>>>> does cache exception.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, 7 Aug 2020 at 19:46, John Smith  wrote:
>>>>
>>>>> No, everything blocks... Also using 2.7.0 just in case.
>>>>>
>>>>> Only time I get exception is if the cluster is completely off, then I
>>>>> get ClientDisconectedException...
>>>>>
>>>>> On Fri, 7 Aug 2020 at 18:52, Denis Magda  wrote:
>>>>>
>>>>>> If I'm not mistaken, key-value operations (cache.get/put) and compute
>>>>>> calls fail with an exception if the cluster is deactivated. Do those fail
>>>>>> on your end?
>>>>>>
>>>>>> As for the async and SQL operations, let's see what other community
>>>>>> members say.
>>>>>>
>>>>>> -
>>>>>> Denis
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 7, 2020 at 1:06 PM John Smith 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi any thoughts on this?
>>>>>>>
>>>>>>> On Thu, 6 Aug 2020 at 23:33, John Smith 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Here is another example where it blocks.
>>>>>>>>
>>>>>>>> SqlFieldsQuery query = new SqlFieldsQuery(
>>>>>>>> "select * from my_table")
>>>>>>>> .setArgs(providerId, carrierCode);
>>>>>>>> query.setTimeout(1000, TimeUnit.MILLISECONDS);
>>>>>>>>
>>>>>>>> try (QueryCursor> cursor = cache.query(query))
>>>>>>>>
>>>>>>>> cache.query just blocks even with the timeout set.
>>>>>>>>
>>>>>>>> Is there a way to timeout and at least have the application
>>>>>>>> continue and respond with an appropriate message?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, 6 Aug 2020 at 23:06, John Smith 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi running 2.7.0
>>>>>>>>>
>>>>>>>>> When I reboot a node and it begins to rejoin the cluster or the
>>>>>>>>> cluster is not yet activated with baseline topology operations seem to
>>>>>>>>> block forever, operations that are supposed to return IgniteFuture. 
>>>>>>>>> I.e:
>>>>>>>>> putAsync, getAsync etc... They just block, until the cluster resolves 
>>>>>>>>> it's
>>>>>>>>> state.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>


Re: Operation block on Cluster recovery/rebalance.

2020-08-12 Thread John Smith
Hi Denis. I will asap but you I think you were right it is the query that
blocks.

My application first first runs a select on the cache and then does a put
to cache.

On Tue, 11 Aug 2020 at 19:22, Denis Magda  wrote:

> John,
>
> It sounds like a deadlock caused by the application logic. Is there any
> chance that the operation you run on step 8 accesses several keys in one
> order while the other operations work with the same keys but in a different
> order. The deadlocks are possible when you use Ignite Transaction API or
> simply execute bulk operations such as cache.readAll() or
> cache.writeAll(..).
>
> Please take and attach thread dumps from all the cluster nodes for
> analysis if we need to dig deeper.
>
> -
> Denis
>
>
> On Mon, Aug 10, 2020 at 6:23 PM John Smith  wrote:
>
>> Hi Denis, I think you are right. It's the query that blocks the other k/v
>> operations are ok.
>>
>> Any thoughts on this?
>>
>> On Mon, 10 Aug 2020 at 15:28, John Smith  wrote:
>>
>>> I tried with 2.8.1, same issue. Operations block indefinitely...
>>>
>>> 1- Start 3 node cluster
>>> 2- Start client application client = true with Ignition.start()
>>> 3- Run some cache operations, everything ok...
>>> 4- Shut down one node, run operation, still ok
>>> 5- Shut down 2nd node, run operation, still ok
>>> 6- Shut down 3rd node, run operation, still ok... Operations start
>>> failing with ClientDisconectedException...
>>> 7- Restart 1st node, run operation, operation fails
>>> with ClientDisconectedException but application still able to complete it's
>>> request.
>>> 8- Start 2nd node, run operation, from here on all operations just block.
>>>
>>> Basically the client application is an HTTP Server on each HTTP request
>>> does cache exception.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, 7 Aug 2020 at 19:46, John Smith  wrote:
>>>
>>>> No, everything blocks... Also using 2.7.0 just in case.
>>>>
>>>> Only time I get exception is if the cluster is completely off, then I
>>>> get ClientDisconectedException...
>>>>
>>>> On Fri, 7 Aug 2020 at 18:52, Denis Magda  wrote:
>>>>
>>>>> If I'm not mistaken, key-value operations (cache.get/put) and compute
>>>>> calls fail with an exception if the cluster is deactivated. Do those fail
>>>>> on your end?
>>>>>
>>>>> As for the async and SQL operations, let's see what other community
>>>>> members say.
>>>>>
>>>>> -
>>>>> Denis
>>>>>
>>>>>
>>>>> On Fri, Aug 7, 2020 at 1:06 PM John Smith 
>>>>> wrote:
>>>>>
>>>>>> Hi any thoughts on this?
>>>>>>
>>>>>> On Thu, 6 Aug 2020 at 23:33, John Smith 
>>>>>> wrote:
>>>>>>
>>>>>>> Here is another example where it blocks.
>>>>>>>
>>>>>>> SqlFieldsQuery query = new SqlFieldsQuery(
>>>>>>> "select * from my_table")
>>>>>>> .setArgs(providerId, carrierCode);
>>>>>>> query.setTimeout(1000, TimeUnit.MILLISECONDS);
>>>>>>>
>>>>>>> try (QueryCursor> cursor = cache.query(query))
>>>>>>>
>>>>>>> cache.query just blocks even with the timeout set.
>>>>>>>
>>>>>>> Is there a way to timeout and at least have the application continue
>>>>>>> and respond with an appropriate message?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 6 Aug 2020 at 23:06, John Smith 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi running 2.7.0
>>>>>>>>
>>>>>>>> When I reboot a node and it begins to rejoin the cluster or the
>>>>>>>> cluster is not yet activated with baseline topology operations seem to
>>>>>>>> block forever, operations that are supposed to return IgniteFuture. 
>>>>>>>> I.e:
>>>>>>>> putAsync, getAsync etc... They just block, until the cluster resolves 
>>>>>>>> it's
>>>>>>>> state.
>>>>>>>>
>>>>>>>>
>>>>>>>>


Re: Operation block on Cluster recovery/rebalance.

2020-08-10 Thread John Smith
Hi Denis, I think you are right. It's the query that blocks the other k/v
operations are ok.

Any thoughts on this?

On Mon, 10 Aug 2020 at 15:28, John Smith  wrote:

> I tried with 2.8.1, same issue. Operations block indefinitely...
>
> 1- Start 3 node cluster
> 2- Start client application client = true with Ignition.start()
> 3- Run some cache operations, everything ok...
> 4- Shut down one node, run operation, still ok
> 5- Shut down 2nd node, run operation, still ok
> 6- Shut down 3rd node, run operation, still ok... Operations start failing
> with ClientDisconectedException...
> 7- Restart 1st node, run operation, operation fails
> with ClientDisconectedException but application still able to complete it's
> request.
> 8- Start 2nd node, run operation, from here on all operations just block.
>
> Basically the client application is an HTTP Server on each HTTP request
> does cache exception.
>
>
>
>
>
>
> On Fri, 7 Aug 2020 at 19:46, John Smith  wrote:
>
>> No, everything blocks... Also using 2.7.0 just in case.
>>
>> Only time I get exception is if the cluster is completely off, then I get
>> ClientDisconectedException...
>>
>> On Fri, 7 Aug 2020 at 18:52, Denis Magda  wrote:
>>
>>> If I'm not mistaken, key-value operations (cache.get/put) and compute
>>> calls fail with an exception if the cluster is deactivated. Do those fail
>>> on your end?
>>>
>>> As for the async and SQL operations, let's see what other community
>>> members say.
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Fri, Aug 7, 2020 at 1:06 PM John Smith 
>>> wrote:
>>>
>>>> Hi any thoughts on this?
>>>>
>>>> On Thu, 6 Aug 2020 at 23:33, John Smith  wrote:
>>>>
>>>>> Here is another example where it blocks.
>>>>>
>>>>> SqlFieldsQuery query = new SqlFieldsQuery(
>>>>> "select * from my_table")
>>>>> .setArgs(providerId, carrierCode);
>>>>> query.setTimeout(1000, TimeUnit.MILLISECONDS);
>>>>>
>>>>> try (QueryCursor> cursor = cache.query(query))
>>>>>
>>>>> cache.query just blocks even with the timeout set.
>>>>>
>>>>> Is there a way to timeout and at least have the application continue
>>>>> and respond with an appropriate message?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, 6 Aug 2020 at 23:06, John Smith 
>>>>> wrote:
>>>>>
>>>>>> Hi running 2.7.0
>>>>>>
>>>>>> When I reboot a node and it begins to rejoin the cluster or the
>>>>>> cluster is not yet activated with baseline topology operations seem to
>>>>>> block forever, operations that are supposed to return IgniteFuture. I.e:
>>>>>> putAsync, getAsync etc... They just block, until the cluster resolves 
>>>>>> it's
>>>>>> state.
>>>>>>
>>>>>>
>>>>>>


Re: Operation block on Cluster recovery/rebalance.

2020-08-10 Thread John Smith
I tried with 2.8.1, same issue. Operations block indefinitely...

1- Start 3 node cluster
2- Start client application client = true with Ignition.start()
3- Run some cache operations, everything ok...
4- Shut down one node, run operation, still ok
5- Shut down 2nd node, run operation, still ok
6- Shut down 3rd node, run operation, still ok... Operations start failing
with ClientDisconectedException...
7- Restart 1st node, run operation, operation fails
with ClientDisconectedException but application still able to complete it's
request.
8- Start 2nd node, run operation, from here on all operations just block.

Basically the client application is an HTTP Server on each HTTP request
does cache exception.






On Fri, 7 Aug 2020 at 19:46, John Smith  wrote:

> No, everything blocks... Also using 2.7.0 just in case.
>
> Only time I get exception is if the cluster is completely off, then I get
> ClientDisconectedException...
>
> On Fri, 7 Aug 2020 at 18:52, Denis Magda  wrote:
>
>> If I'm not mistaken, key-value operations (cache.get/put) and compute
>> calls fail with an exception if the cluster is deactivated. Do those fail
>> on your end?
>>
>> As for the async and SQL operations, let's see what other community
>> members say.
>>
>> -
>> Denis
>>
>>
>> On Fri, Aug 7, 2020 at 1:06 PM John Smith  wrote:
>>
>>> Hi any thoughts on this?
>>>
>>> On Thu, 6 Aug 2020 at 23:33, John Smith  wrote:
>>>
>>>> Here is another example where it blocks.
>>>>
>>>> SqlFieldsQuery query = new SqlFieldsQuery(
>>>> "select * from my_table")
>>>> .setArgs(providerId, carrierCode);
>>>> query.setTimeout(1000, TimeUnit.MILLISECONDS);
>>>>
>>>> try (QueryCursor> cursor = cache.query(query))
>>>>
>>>> cache.query just blocks even with the timeout set.
>>>>
>>>> Is there a way to timeout and at least have the application continue
>>>> and respond with an appropriate message?
>>>>
>>>>
>>>>
>>>> On Thu, 6 Aug 2020 at 23:06, John Smith  wrote:
>>>>
>>>>> Hi running 2.7.0
>>>>>
>>>>> When I reboot a node and it begins to rejoin the cluster or the
>>>>> cluster is not yet activated with baseline topology operations seem to
>>>>> block forever, operations that are supposed to return IgniteFuture. I.e:
>>>>> putAsync, getAsync etc... They just block, until the cluster resolves it's
>>>>> state.
>>>>>
>>>>>
>>>>>


Re: Operation block on Cluster recovery/rebalance.

2020-08-07 Thread John Smith
No, everything blocks... Also using 2.7.0 just in case.

Only time I get exception is if the cluster is completely off, then I get
ClientDisconectedException...

On Fri, 7 Aug 2020 at 18:52, Denis Magda  wrote:

> If I'm not mistaken, key-value operations (cache.get/put) and compute
> calls fail with an exception if the cluster is deactivated. Do those fail
> on your end?
>
> As for the async and SQL operations, let's see what other community
> members say.
>
> -
> Denis
>
>
> On Fri, Aug 7, 2020 at 1:06 PM John Smith  wrote:
>
>> Hi any thoughts on this?
>>
>> On Thu, 6 Aug 2020 at 23:33, John Smith  wrote:
>>
>>> Here is another example where it blocks.
>>>
>>> SqlFieldsQuery query = new SqlFieldsQuery(
>>> "select * from my_table")
>>> .setArgs(providerId, carrierCode);
>>> query.setTimeout(1000, TimeUnit.MILLISECONDS);
>>>
>>> try (QueryCursor> cursor = cache.query(query))
>>>
>>> cache.query just blocks even with the timeout set.
>>>
>>> Is there a way to timeout and at least have the application continue and
>>> respond with an appropriate message?
>>>
>>>
>>>
>>> On Thu, 6 Aug 2020 at 23:06, John Smith  wrote:
>>>
>>>> Hi running 2.7.0
>>>>
>>>> When I reboot a node and it begins to rejoin the cluster or the cluster
>>>> is not yet activated with baseline topology operations seem to block
>>>> forever, operations that are supposed to return IgniteFuture. I.e:
>>>> putAsync, getAsync etc... They just block, until the cluster resolves it's
>>>> state.
>>>>
>>>>
>>>>


Re: Operation block on Cluster recovery/rebalance.

2020-08-07 Thread John Smith
Hi any thoughts on this?

On Thu, 6 Aug 2020 at 23:33, John Smith  wrote:

> Here is another example where it blocks.
>
> SqlFieldsQuery query = new SqlFieldsQuery(
> "select * from my_table")
> .setArgs(providerId, carrierCode);
> query.setTimeout(1000, TimeUnit.MILLISECONDS);
>
> try (QueryCursor> cursor = cache.query(query))
>
> cache.query just blocks even with the timeout set.
>
> Is there a way to timeout and at least have the application continue and
> respond with an appropriate message?
>
>
>
> On Thu, 6 Aug 2020 at 23:06, John Smith  wrote:
>
>> Hi running 2.7.0
>>
>> When I reboot a node and it begins to rejoin the cluster or the cluster
>> is not yet activated with baseline topology operations seem to block
>> forever, operations that are supposed to return IgniteFuture. I.e:
>> putAsync, getAsync etc... They just block, until the cluster resolves it's
>> state.
>>
>>
>>


Re: Operation block on Cluster recovery/rebalance.

2020-08-06 Thread John Smith
Here is another example where it blocks.

SqlFieldsQuery query = new SqlFieldsQuery(
"select * from my_table")
.setArgs(providerId, carrierCode);
query.setTimeout(1000, TimeUnit.MILLISECONDS);

try (QueryCursor> cursor = cache.query(query))

cache.query just blocks even with the timeout set.

Is there a way to timeout and at least have the application continue and
respond with an appropriate message?



On Thu, 6 Aug 2020 at 23:06, John Smith  wrote:

> Hi running 2.7.0
>
> When I reboot a node and it begins to rejoin the cluster or the cluster is
> not yet activated with baseline topology operations seem to block forever,
> operations that are supposed to return IgniteFuture. I.e: putAsync,
> getAsync etc... They just block, until the cluster resolves it's state.
>
>
>


Operation block on Cluster recovery/rebalance.

2020-08-06 Thread John Smith
Hi running 2.7.0

When I reboot a node and it begins to rejoin the cluster or the cluster is
not yet activated with baseline topology operations seem to block forever,
operations that are supposed to return IgniteFuture. I.e: putAsync,
getAsync etc... They just block, until the cluster resolves it's state.


Re: Is there a way for client to lazy join the cluster?

2020-08-06 Thread John Smith
Ok I see...

On Thu, 6 Aug 2020 at 17:13, Evgenii Zhuravlev 
wrote:

> It should be handled on your application side. For example, you can
> make initialization of Ignite instance in a separate thread and add a check
> on other API invocations that instance was initialized.
>
> Evgenii
>
> чт, 6 авг. 2020 г. в 09:03, John Smith :
>
>> I'm testing failover scenarios and currently I have the full cluster shut
>> off. I would still like my application to continue working even if the
>> cache is not there...
>>
>> When my application starts...
>>
>> It calls Ignition.start(config)
>>
>> The application will not start until Ignition.start(config) finishes I.e:
>> I start the cluster back up.
>>
>


Is there a way for client to lazy join the cluster?

2020-08-06 Thread John Smith
I'm testing failover scenarios and currently I have the full cluster shut
off. I would still like my application to continue working even if the
cache is not there...

When my application starts...

It calls Ignition.start(config)

The application will not start until Ignition.start(config) finishes I.e: I
start the cluster back up.


Re: What does all partition owners have left the grid, partition data has been lost mean?

2020-08-06 Thread John Smith
Ok if I have 5 nodes with persistence then all nodes need to be in baseline?

Also what are the docs for backup to make sure I have it right?


On Thu, 6 Aug 2020 at 10:08, Ilya Kasnacheev 
wrote:

> Hello!
>
> You are confusing baseline with backups here.
>
> You should have 1 backup to afford losing a node.
>
> You should have all data nodes in the baseline.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 5 авг. 2020 г. в 17:56, John Smith :
>
>> I mean I have 3 nodes and the baseline is set to 3. Does it mean if I put
>> 2 as baseline then I can lose at least 1? If I remove one node from
>> baseline does it mean it will not store data?
>>
>> Or is it better to have 3 baseline nodes and add a 4th node? In that case
>> if I still lose a baseline node will I still be able to do operations on
>> the cache?
>>
>> On Wed, 5 Aug 2020 at 08:21, John Smith  wrote:
>>
>>> I have 3 nodes and baseline topology is 3 so if I lose 1 I guess it's
>>> enough... Should it be 2?
>>>
>>> On Tue., Aug. 4, 2020, 10:57 a.m. Ilya Kasnacheev, <
>>> ilya.kasnach...@gmail.com> wrote:
>>>
>>>> Hello!
>>>>
>>>> What is your baseline topology at this moment? It means just that: you
>>>> have lost enough nodes of your distributed grid that data is nowhere to be
>>>> found now.
>>>>
>>>> Regards,
>>>> --
>>>> Ilya Kasnacheev
>>>>
>>>>
>>>> пн, 3 авг. 2020 г. в 19:12, John Smith :
>>>>
>>>>> I get the below exception on my client...
>>>>>
>>>>> #1 I rebooted the cache nodes error still continued.
>>>>> #2 restarted the client node error went away.
>>>>> #3 this seems to happen every few weeks.
>>>>> #4 is there some sort of timeout values and retries I can put?
>>>>> #5 cache operations seem to block when rebooting the nodes (I have 3
>>>>> nodes). Is there a way not to block?
>>>>>
>>>>> javax.cache.CacheException: class
>>>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>>>>> Failed to execute cache operation (all partition owners have left the 
>>>>> grid,
>>>>> partition data has been lost) [cacheName=xx, part=273, 
>>>>> key=16479796986]
>>>>> at
>>>>> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
>>>>> at
>>>>> org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
>>>>> at
>>>>> org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:157)
>>>>> at
>>>>> com.xx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$394d953f$1(IgniteCacheRepository.java:59)
>>>>> at
>>>>> org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
>>>>> at
>>>>> com.xx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
>>>>> at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
>>>>> at
>>>>> io.vertx.core.impl.EventLoopContext.lambda$executeAsync$0(EventLoopContext.java:38)
>>>>> at
>>>>> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>>>>> at
>>>>> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>>>>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
>>>>> at
>>>>> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>>>>> at
>>>>> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>>>>> at
>>>>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>>>>> at java.lang.Thread.run(Thread.java:748)
>>>>> Caused by:
>>>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>>>>> Failed to execute cache operation (all partition owners have left the 
>>>>> grid,
>>>>> partition data has been lost) [cacheName=xx, part=273, 
>>>>> key=16479796986]
>>>>> at
>>>>

Re: What does all partition owners have left the grid, partition data has been lost mean?

2020-08-05 Thread John Smith
I mean I have 3 nodes and the baseline is set to 3. Does it mean if I put 2
as baseline then I can lose at least 1? If I remove one node from baseline
does it mean it will not store data?

Or is it better to have 3 baseline nodes and add a 4th node? In that case
if I still lose a baseline node will I still be able to do operations on
the cache?

On Wed, 5 Aug 2020 at 08:21, John Smith  wrote:

> I have 3 nodes and baseline topology is 3 so if I lose 1 I guess it's
> enough... Should it be 2?
>
> On Tue., Aug. 4, 2020, 10:57 a.m. Ilya Kasnacheev, <
> ilya.kasnach...@gmail.com> wrote:
>
>> Hello!
>>
>> What is your baseline topology at this moment? It means just that: you
>> have lost enough nodes of your distributed grid that data is nowhere to be
>> found now.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 3 авг. 2020 г. в 19:12, John Smith :
>>
>>> I get the below exception on my client...
>>>
>>> #1 I rebooted the cache nodes error still continued.
>>> #2 restarted the client node error went away.
>>> #3 this seems to happen every few weeks.
>>> #4 is there some sort of timeout values and retries I can put?
>>> #5 cache operations seem to block when rebooting the nodes (I have 3
>>> nodes). Is there a way not to block?
>>>
>>> javax.cache.CacheException: class
>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>>> Failed to execute cache operation (all partition owners have left the grid,
>>> partition data has been lost) [cacheName=xx, part=273, key=16479796986]
>>> at
>>> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
>>> at
>>> org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
>>> at
>>> org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:157)
>>> at
>>> com.xx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$394d953f$1(IgniteCacheRepository.java:59)
>>> at
>>> org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
>>> at
>>> com.xx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
>>> at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
>>> at
>>> io.vertx.core.impl.EventLoopContext.lambda$executeAsync$0(EventLoopContext.java:38)
>>> at
>>> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>>> at
>>> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
>>> at
>>> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>>> at
>>> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>>> at
>>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>>> at java.lang.Thread.run(Thread.java:748)
>>> Caused by:
>>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>>> Failed to execute cache operation (all partition owners have left the grid,
>>> partition data has been lost) [cacheName=xx, part=273, key=16479796986]
>>> at
>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:161)
>>> at
>>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
>>> at
>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:417)
>>> at
>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248)
>>> at
>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$26.apply(GridDhtAtomicCache.java:1146)
>>> at
>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$26.apply(GridDhtAtomicCache.java:1144)
>>> at
>>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAto

Re: What does all partition owners have left the grid, partition data has been lost mean?

2020-08-05 Thread John Smith
I have 3 nodes and baseline topology is 3 so if I lose 1 I guess it's
enough... Should it be 2?

On Tue., Aug. 4, 2020, 10:57 a.m. Ilya Kasnacheev, <
ilya.kasnach...@gmail.com> wrote:

> Hello!
>
> What is your baseline topology at this moment? It means just that: you
> have lost enough nodes of your distributed grid that data is nowhere to be
> found now.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> пн, 3 авг. 2020 г. в 19:12, John Smith :
>
>> I get the below exception on my client...
>>
>> #1 I rebooted the cache nodes error still continued.
>> #2 restarted the client node error went away.
>> #3 this seems to happen every few weeks.
>> #4 is there some sort of timeout values and retries I can put?
>> #5 cache operations seem to block when rebooting the nodes (I have 3
>> nodes). Is there a way not to block?
>>
>> javax.cache.CacheException: class
>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>> Failed to execute cache operation (all partition owners have left the grid,
>> partition data has been lost) [cacheName=xx, part=273, key=16479796986]
>> at
>> org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
>> at
>> org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
>> at
>> org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:157)
>> at
>> com.xx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$394d953f$1(IgniteCacheRepository.java:59)
>> at
>> org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
>> at
>> com.xx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
>> at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
>> at
>> io.vertx.core.impl.EventLoopContext.lambda$executeAsync$0(EventLoopContext.java:38)
>> at
>> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
>> at
>> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
>> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
>> at
>> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>> at
>> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>> at
>> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by:
>> org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
>> Failed to execute cache operation (all partition owners have left the grid,
>> partition data has been lost) [cacheName=xx, part=273, key=16479796986]
>> at
>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:161)
>> at
>> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
>> at
>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:417)
>> at
>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248)
>> at
>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$26.apply(GridDhtAtomicCache.java:1146)
>> at
>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$26.apply(GridDhtAtomicCache.java:1144)
>> at
>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
>> at
>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1144)
>> at
>> org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.putAsync0(GridDhtAtomicCache.java:641)
>> at
>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.putAsync(GridCacheAdapter.java:2828)
>> at
>> org.apache.ignite.internal.processors.cache.GridCacheAdapter.putAsync(GridCacheAdapter.java:2809)
>> at
>> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.putAsync0(IgniteCacheProxyImpl.java:1125)
>> at
>> org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.

What does all partition owners have left the grid, partition data has been lost mean?

2020-08-03 Thread John Smith
I get the below exception on my client...

#1 I rebooted the cache nodes error still continued.
#2 restarted the client node error went away.
#3 this seems to happen every few weeks.
#4 is there some sort of timeout values and retries I can put?
#5 cache operations seem to block when rebooting the nodes (I have 3
nodes). Is there a way not to block?

javax.cache.CacheException: class
org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
Failed to execute cache operation (all partition owners have left the grid,
partition data has been lost) [cacheName=xx, part=273, key=16479796986]
at
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at
org.apache.ignite.internal.processors.cache.IgniteCacheFutureImpl.convertException(IgniteCacheFutureImpl.java:62)
at
org.apache.ignite.internal.util.future.IgniteFutureImpl.get(IgniteFutureImpl.java:157)
at
com.xx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$executeAsync$394d953f$1(IgniteCacheRepository.java:59)
at
org.apache.ignite.internal.util.future.AsyncFutureListener$1.run(AsyncFutureListener.java:53)
at
com.xx.common.vertx.ext.data.impl.VertxIgniteExecutorAdapter.lambda$execute$0(VertxIgniteExecutorAdapter.java:18)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:369)
at
io.vertx.core.impl.EventLoopContext.lambda$executeAsync$0(EventLoopContext.java:38)
at
io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
at
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497)
at
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by:
org.apache.ignite.internal.processors.cache.CacheInvalidStateException:
Failed to execute cache operation (all partition owners have left the grid,
partition data has been lost) [cacheName=xx, part=273, key=16479796986]
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validatePartitionOperation(GridDhtTopologyFutureAdapter.java:161)
at
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:116)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(GridNearAtomicSingleUpdateFuture.java:417)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.map(GridNearAtomicAbstractUpdateFuture.java:248)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$26.apply(GridDhtAtomicCache.java:1146)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$26.apply(GridDhtAtomicCache.java:1144)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:761)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(GridDhtAtomicCache.java:1144)
at
org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.putAsync0(GridDhtAtomicCache.java:641)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.putAsync(GridCacheAdapter.java:2828)
at
org.apache.ignite.internal.processors.cache.GridCacheAdapter.putAsync(GridCacheAdapter.java:2809)
at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.putAsync0(IgniteCacheProxyImpl.java:1125)
at
org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.putAsync(IgniteCacheProxyImpl.java:1114)
at
org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.putAsync(GatewayProtectedCacheProxy.java:832)
at
com.xx.common.vertx.ext.data.impl.IgniteCacheRepository.lambda$put$0(IgniteCacheRepository.java:27)
at
com.xx.common.vertx.ext.data.impl.IgniteCacheRepository.executeAsync(IgniteCacheRepository.java:55)
at
com.xx.common.vertx.ext.data.impl.IgniteCacheRepository.put(IgniteCacheRepository.java:27)
at com.xx.service.impl.xx.put(xx.java:52)
at com.xx.impl.xx.moV1(xx.java:134)
at com.xx.api.xx.moV1(xx.java:100)
at io.vertx.ext.web.impl.RouteState.handleContext(RouteState.java:1034)
at
io.vertx.ext.web.impl.RoutingContextImplBase.iterateNext(RoutingContextImplBase.java:131)
at
io.vertx.ext.web.impl.RoutingContextImpl.next(RoutingContextImpl.java:128)
at
com.xx.common.vertx.ext.web.handler.impl.JsonLoggerHandlerImpl.handle(JsonLoggerHandlerImpl.java:168)
at
com.xx.common.vertx.ext.web.handler.impl.JsonLoggerHandlerImpl.handle(JsonLoggerHandlerImpl.java:24)
at 

Re: How to do cache.get() on SQL table by primary key with multiple columns?

2020-07-15 Thread John Smith
Well it's more because I have to inject the ignite instance into my cache
"repository" which abstracts away the IgniteCache only.

And if the builder is not an expensive operation I guess I can find a way
to pass the ignite instance down into the cache repo implementation.

On Wed, 15 Jul 2020 at 17:45, Evgenii Zhuravlev 
wrote:

> John,
>
> Then you should just get a new builder every time when you need it:
> myIgniteInstance.binary().builder("MyKey"). I don't see why you need to
> reuse builder from multiple threads here.
>
> Evgenii
>
> ср, 15 июл. 2020 г. в 14:34, John Smith :
>
>> I'm using it in Vertx.io. if you understand the concept a bit. I have 2
>> vertices.
>>
>> I create 2 instances of BinaryObjectBuilder
>>
>> Each builder creates a new object (binary key) per "event" that comes in.
>>
>> So if I get 2 events then each builder will build one...
>>
>> If I get 3 events, the 3rd event will wait until one of the event loops
>> can process the next event...
>>
>>
>>
>> On Wed., Jul. 15, 2020, 3:43 p.m. Evgenii Zhuravlev, <
>> e.zhuravlev...@gmail.com> wrote:
>>
>>> 1. This builder can be used for making one object, do you want to
>>> construct one object from multiple threads?
>>> 2. No, you still can work with BinaryObjects instead of actual classes.
>>>
>>> Evgenii
>>>
>>> ср, 15 июл. 2020 г. в 08:50, John Smith :
>>>
>>>> Hi Evgenii, it works good. I have two questions...
>>>>
>>>> 1- Is the BinaryObjectBuilder obtained from
>>>> myIgniteInstance.binary().builder("MyKey"); thread safe? Can I pass the
>>>> same builder to multiple instances of my cache "repository" wrapper I 
>>>> wrote?
>>>> 2- If we want to use the actual MyKey class then I suppose that needs
>>>> to be in the classpath on all nodes?
>>>>
>>>> On Wed, 15 Jul 2020 at 10:43, John Smith 
>>>> wrote:
>>>>
>>>>> Ok I will try it...
>>>>>
>>>>> On Tue, 14 Jul 2020 at 22:34, Evgenii Zhuravlev <
>>>>> e.zhuravlev...@gmail.com> wrote:
>>>>>
>>>>>> John,
>>>>>>
>>>>>> It's not necessary to have class at all, you can specify any type,
>>>>>> you just need to use this type when creating binary object for this key.
>>>>>>
>>>>>> вт, 14 июл. 2020 г. в 17:50, John Smith :
>>>>>>
>>>>>>> I just used two columns as primary key...
>>>>>>>
>>>>>>> Of I use key_type and specify a type does that class need to exist
>>>>>>> in the class path of the server nodes?
>>>>>>>
>>>>>>> Like if I have
>>>>>>>
>>>>>>> class MyKeyClass {
>>>>>>>Integer col1;
>>>>>>>Integer col2;
>>>>>>> }
>>>>>>>
>>>>>>> Does this class need to be loaded in all nodes or ignite can figure
>>>>>>> it out and marshal it?
>>>>>>>
>>>>>>> On Tue., Jul. 14, 2020, 6:50 p.m. Evgenii Zhuravlev, <
>>>>>>> e.zhuravlev...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi John,
>>>>>>>>
>>>>>>>> To do this, you need to create a key object with the same type as
>>>>>>>> you have for the table. If you don't specify KEY_TYPE in the create 
>>>>>>>> table
>>>>>>>> script, it will be generated automatically. I would recommend to 
>>>>>>>> specify it
>>>>>>>> for the command(just type name, if you don't have a class) and, when 
>>>>>>>> you
>>>>>>>> need to get data using key-value API, just create a binary object of 
>>>>>>>> this
>>>>>>>> type with these fields:
>>>>>>>> https://www.gridgain.com/docs/latest/developers-guide/key-value-api/binary-objects#creating-and-modifying-binary-objects
>>>>>>>>
>>>>>>>> Evgenii
>>>>>>>>
>>>>>>>> вт, 14 июл. 2020 г. в 07:18, John Smith :
>>>>>>>>
>>>>>>>>> Hi, I have an SQL table
>>>>>>>>>
>>>>>>>>> create table if not exists my_table (
>>>>>>>>> column1 int,
>>>>>>>>> column2 int,
>>>>>>>>> column3 varchar(16),
>>>>>>>>> PRIMARY KEY (column1, column2)
>>>>>>>>> ) with "template=replicatedTpl";
>>>>>>>>>
>>>>>>>>> and I'm creating my near cache as follows...
>>>>>>>>>
>>>>>>>>> IgniteCache myCache;
>>>>>>>>>
>>>>>>>>> NearCacheConfiguration nearConfig = new
>>>>>>>>> NearCacheConfiguration<>();
>>>>>>>>> nearConfig.setNearEvictionPolicyFactory(new
>>>>>>>>> LruEvictionPolicyFactory<>(1024));
>>>>>>>>>
>>>>>>>>> myCache =
>>>>>>>>> this.ignite.getOrCreateNearCache(SQL_PUBLIC_MY_TABLE, nearConfig)
>>>>>>>>> .withExpiryPolicy(new AccessedExpiryPolicy(new
>>>>>>>>> Duration(TimeUnit.HOURS, 1)));
>>>>>>>>>
>>>>>>>>> So if I use myCache.get()...
>>>>>>>>>
>>>>>>>>> 1- How do I specify the primary key if it's 2 columns?
>>>>>>>>> 2- I assume the data will be put in near cache?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>


Re: How to do cache.get() on SQL table by primary key with multiple columns?

2020-07-15 Thread John Smith
I'm using it in Vertx.io. if you understand the concept a bit. I have 2
vertices.

I create 2 instances of BinaryObjectBuilder

Each builder creates a new object (binary key) per "event" that comes in.

So if I get 2 events then each builder will build one...

If I get 3 events, the 3rd event will wait until one of the event loops can
process the next event...



On Wed., Jul. 15, 2020, 3:43 p.m. Evgenii Zhuravlev, <
e.zhuravlev...@gmail.com> wrote:

> 1. This builder can be used for making one object, do you want to
> construct one object from multiple threads?
> 2. No, you still can work with BinaryObjects instead of actual classes.
>
> Evgenii
>
> ср, 15 июл. 2020 г. в 08:50, John Smith :
>
>> Hi Evgenii, it works good. I have two questions...
>>
>> 1- Is the BinaryObjectBuilder obtained from
>> myIgniteInstance.binary().builder("MyKey"); thread safe? Can I pass the
>> same builder to multiple instances of my cache "repository" wrapper I wrote?
>> 2- If we want to use the actual MyKey class then I suppose that needs to
>> be in the classpath on all nodes?
>>
>> On Wed, 15 Jul 2020 at 10:43, John Smith  wrote:
>>
>>> Ok I will try it...
>>>
>>> On Tue, 14 Jul 2020 at 22:34, Evgenii Zhuravlev <
>>> e.zhuravlev...@gmail.com> wrote:
>>>
>>>> John,
>>>>
>>>> It's not necessary to have class at all, you can specify any type, you
>>>> just need to use this type when creating binary object for this key.
>>>>
>>>> вт, 14 июл. 2020 г. в 17:50, John Smith :
>>>>
>>>>> I just used two columns as primary key...
>>>>>
>>>>> Of I use key_type and specify a type does that class need to exist in
>>>>> the class path of the server nodes?
>>>>>
>>>>> Like if I have
>>>>>
>>>>> class MyKeyClass {
>>>>>Integer col1;
>>>>>Integer col2;
>>>>> }
>>>>>
>>>>> Does this class need to be loaded in all nodes or ignite can figure it
>>>>> out and marshal it?
>>>>>
>>>>> On Tue., Jul. 14, 2020, 6:50 p.m. Evgenii Zhuravlev, <
>>>>> e.zhuravlev...@gmail.com> wrote:
>>>>>
>>>>>> Hi John,
>>>>>>
>>>>>> To do this, you need to create a key object with the same type as you
>>>>>> have for the table. If you don't specify KEY_TYPE in the create table
>>>>>> script, it will be generated automatically. I would recommend to specify 
>>>>>> it
>>>>>> for the command(just type name, if you don't have a class) and, when you
>>>>>> need to get data using key-value API, just create a binary object of this
>>>>>> type with these fields:
>>>>>> https://www.gridgain.com/docs/latest/developers-guide/key-value-api/binary-objects#creating-and-modifying-binary-objects
>>>>>>
>>>>>> Evgenii
>>>>>>
>>>>>> вт, 14 июл. 2020 г. в 07:18, John Smith :
>>>>>>
>>>>>>> Hi, I have an SQL table
>>>>>>>
>>>>>>> create table if not exists my_table (
>>>>>>> column1 int,
>>>>>>> column2 int,
>>>>>>> column3 varchar(16),
>>>>>>> PRIMARY KEY (column1, column2)
>>>>>>> ) with "template=replicatedTpl";
>>>>>>>
>>>>>>> and I'm creating my near cache as follows...
>>>>>>>
>>>>>>> IgniteCache myCache;
>>>>>>>
>>>>>>> NearCacheConfiguration nearConfig = new
>>>>>>> NearCacheConfiguration<>();
>>>>>>> nearConfig.setNearEvictionPolicyFactory(new
>>>>>>> LruEvictionPolicyFactory<>(1024));
>>>>>>>
>>>>>>> myCache =
>>>>>>> this.ignite.getOrCreateNearCache(SQL_PUBLIC_MY_TABLE, nearConfig)
>>>>>>> .withExpiryPolicy(new AccessedExpiryPolicy(new
>>>>>>> Duration(TimeUnit.HOURS, 1)));
>>>>>>>
>>>>>>> So if I use myCache.get()...
>>>>>>>
>>>>>>> 1- How do I specify the primary key if it's 2 columns?
>>>>>>> 2- I assume the data will be put in near cache?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>


cache.getAsync() blocks if cluster is not activated.

2020-07-15 Thread John Smith
Hi, testing some failover scenarios etc...

When we call cache.getAsync() and the state of the cluster is not active.
It seems to block.

I implemented a cache repository as follows and using Vertx.io. It seems to
block at the cacheOperation.apply(cache)

So when I call myRepo.get(myKey) which underneath applies the
cache.getAsync() function it blocks.

public class IgniteCacheRepository implements CacheRepository {
public final long DEFAULT_OPERATION_TIMEOUT = 1000;
private final TimeUnit DEFAULT_TIMEOUT_UNIT = TimeUnit.MILLISECONDS;

private Vertx vertx;
private IgniteCache cache;

public IgniteCacheRepository(Vertx vertx, IgniteCache cache) {
this.vertx = vertx;
this.cache = cache;
}

@Override
public Future put(K key, V value) {
return executeAsync(cache -> cache.putAsync(key, value),
DEFAULT_OPERATION_TIMEOUT, DEFAULT_TIMEOUT_UNIT);
}

@Override
public Future get(K key) {
return executeAsync(cache -> cache.getAsync(key),
DEFAULT_OPERATION_TIMEOUT, DEFAULT_TIMEOUT_UNIT);
}

@Override
public  Future invoke(K key, EntryProcessor
processor, Object... arguments) {
return executeAsync(cache -> cache.invokeAsync(key, processor,
arguments), DEFAULT_OPERATION_TIMEOUT, DEFAULT_TIMEOUT_UNIT);
}

@Override
public  T cache() {
return (T) cache;
}

/**
 * Adapt Ignite async operation to vertx futures.
 *
 * @param cacheOperation The ignite operation to execute async.
 * @return The value from the cache operation.
 */
private  Future executeAsync(Function,
IgniteFuture> cacheOperation, long timeout, TimeUnit unit) {
Future future = Future.future();

try {
IgniteFuture value = cacheOperation.apply(cache);

value.listenAsync(result -> {
try {
future.complete(result.get(timeout, unit));
} catch(Exception ex) {
future.fail(ex);
}
}, 
VertxIgniteExecutorAdapter.getOrCreate(vertx.getOrCreateContext()));
} catch(Exception ex) {
// Catch some RuntimeException that can be thrown by Ignite cache.
future.fail(ex);
}

return future;
}
}


Re: How to do cache.get() on SQL table by primary key with multiple columns?

2020-07-15 Thread John Smith
Hi Evgenii, it works good. I have two questions...

1- Is the BinaryObjectBuilder obtained from
myIgniteInstance.binary().builder("MyKey"); thread safe? Can I pass the
same builder to multiple instances of my cache "repository" wrapper I wrote?
2- If we want to use the actual MyKey class then I suppose that needs to be
in the classpath on all nodes?

On Wed, 15 Jul 2020 at 10:43, John Smith  wrote:

> Ok I will try it...
>
> On Tue, 14 Jul 2020 at 22:34, Evgenii Zhuravlev 
> wrote:
>
>> John,
>>
>> It's not necessary to have class at all, you can specify any type, you
>> just need to use this type when creating binary object for this key.
>>
>> вт, 14 июл. 2020 г. в 17:50, John Smith :
>>
>>> I just used two columns as primary key...
>>>
>>> Of I use key_type and specify a type does that class need to exist in
>>> the class path of the server nodes?
>>>
>>> Like if I have
>>>
>>> class MyKeyClass {
>>>Integer col1;
>>>Integer col2;
>>> }
>>>
>>> Does this class need to be loaded in all nodes or ignite can figure it
>>> out and marshal it?
>>>
>>> On Tue., Jul. 14, 2020, 6:50 p.m. Evgenii Zhuravlev, <
>>> e.zhuravlev...@gmail.com> wrote:
>>>
>>>> Hi John,
>>>>
>>>> To do this, you need to create a key object with the same type as you
>>>> have for the table. If you don't specify KEY_TYPE in the create table
>>>> script, it will be generated automatically. I would recommend to specify it
>>>> for the command(just type name, if you don't have a class) and, when you
>>>> need to get data using key-value API, just create a binary object of this
>>>> type with these fields:
>>>> https://www.gridgain.com/docs/latest/developers-guide/key-value-api/binary-objects#creating-and-modifying-binary-objects
>>>>
>>>> Evgenii
>>>>
>>>> вт, 14 июл. 2020 г. в 07:18, John Smith :
>>>>
>>>>> Hi, I have an SQL table
>>>>>
>>>>> create table if not exists my_table (
>>>>> column1 int,
>>>>> column2 int,
>>>>> column3 varchar(16),
>>>>> PRIMARY KEY (column1, column2)
>>>>> ) with "template=replicatedTpl";
>>>>>
>>>>> and I'm creating my near cache as follows...
>>>>>
>>>>> IgniteCache myCache;
>>>>>
>>>>> NearCacheConfiguration nearConfig = new
>>>>> NearCacheConfiguration<>();
>>>>> nearConfig.setNearEvictionPolicyFactory(new
>>>>> LruEvictionPolicyFactory<>(1024));
>>>>>
>>>>> myCache =
>>>>> this.ignite.getOrCreateNearCache(SQL_PUBLIC_MY_TABLE, nearConfig)
>>>>> .withExpiryPolicy(new AccessedExpiryPolicy(new
>>>>> Duration(TimeUnit.HOURS, 1)));
>>>>>
>>>>> So if I use myCache.get()...
>>>>>
>>>>> 1- How do I specify the primary key if it's 2 columns?
>>>>> 2- I assume the data will be put in near cache?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>


Re: How to do cache.get() on SQL table by primary key with multiple columns?

2020-07-15 Thread John Smith
Ok I will try it...

On Tue, 14 Jul 2020 at 22:34, Evgenii Zhuravlev 
wrote:

> John,
>
> It's not necessary to have class at all, you can specify any type, you
> just need to use this type when creating binary object for this key.
>
> вт, 14 июл. 2020 г. в 17:50, John Smith :
>
>> I just used two columns as primary key...
>>
>> Of I use key_type and specify a type does that class need to exist in the
>> class path of the server nodes?
>>
>> Like if I have
>>
>> class MyKeyClass {
>>Integer col1;
>>Integer col2;
>> }
>>
>> Does this class need to be loaded in all nodes or ignite can figure it
>> out and marshal it?
>>
>> On Tue., Jul. 14, 2020, 6:50 p.m. Evgenii Zhuravlev, <
>> e.zhuravlev...@gmail.com> wrote:
>>
>>> Hi John,
>>>
>>> To do this, you need to create a key object with the same type as you
>>> have for the table. If you don't specify KEY_TYPE in the create table
>>> script, it will be generated automatically. I would recommend to specify it
>>> for the command(just type name, if you don't have a class) and, when you
>>> need to get data using key-value API, just create a binary object of this
>>> type with these fields:
>>> https://www.gridgain.com/docs/latest/developers-guide/key-value-api/binary-objects#creating-and-modifying-binary-objects
>>>
>>> Evgenii
>>>
>>> вт, 14 июл. 2020 г. в 07:18, John Smith :
>>>
>>>> Hi, I have an SQL table
>>>>
>>>> create table if not exists my_table (
>>>> column1 int,
>>>> column2 int,
>>>> column3 varchar(16),
>>>> PRIMARY KEY (column1, column2)
>>>> ) with "template=replicatedTpl";
>>>>
>>>> and I'm creating my near cache as follows...
>>>>
>>>> IgniteCache myCache;
>>>>
>>>> NearCacheConfiguration nearConfig = new
>>>> NearCacheConfiguration<>();
>>>> nearConfig.setNearEvictionPolicyFactory(new
>>>> LruEvictionPolicyFactory<>(1024));
>>>>
>>>> myCache =
>>>> this.ignite.getOrCreateNearCache(SQL_PUBLIC_MY_TABLE, nearConfig)
>>>> .withExpiryPolicy(new AccessedExpiryPolicy(new Duration(TimeUnit.HOURS,
>>>> 1)));
>>>>
>>>> So if I use myCache.get()...
>>>>
>>>> 1- How do I specify the primary key if it's 2 columns?
>>>> 2- I assume the data will be put in near cache?
>>>>
>>>>
>>>>
>>>>
>>>>


Re: How to do cache.get() on SQL table by primary key with multiple columns?

2020-07-14 Thread John Smith
I just used two columns as primary key...

Of I use key_type and specify a type does that class need to exist in the
class path of the server nodes?

Like if I have

class MyKeyClass {
   Integer col1;
   Integer col2;
}

Does this class need to be loaded in all nodes or ignite can figure it out
and marshal it?

On Tue., Jul. 14, 2020, 6:50 p.m. Evgenii Zhuravlev, <
e.zhuravlev...@gmail.com> wrote:

> Hi John,
>
> To do this, you need to create a key object with the same type as you have
> for the table. If you don't specify KEY_TYPE in the create table script, it
> will be generated automatically. I would recommend to specify it for the
> command(just type name, if you don't have a class) and, when you need to
> get data using key-value API, just create a binary object of this type with
> these fields:
> https://www.gridgain.com/docs/latest/developers-guide/key-value-api/binary-objects#creating-and-modifying-binary-objects
>
> Evgenii
>
> вт, 14 июл. 2020 г. в 07:18, John Smith :
>
>> Hi, I have an SQL table
>>
>> create table if not exists my_table (
>> column1 int,
>> column2 int,
>> column3 varchar(16),
>> PRIMARY KEY (column1, column2)
>> ) with "template=replicatedTpl";
>>
>> and I'm creating my near cache as follows...
>>
>> IgniteCache myCache;
>>
>> NearCacheConfiguration nearConfig = new
>> NearCacheConfiguration<>();
>> nearConfig.setNearEvictionPolicyFactory(new
>> LruEvictionPolicyFactory<>(1024));
>>
>> myCache =
>> this.ignite.getOrCreateNearCache(SQL_PUBLIC_MY_TABLE, nearConfig)
>> .withExpiryPolicy(new AccessedExpiryPolicy(new Duration(TimeUnit.HOURS,
>> 1)));
>>
>> So if I use myCache.get()...
>>
>> 1- How do I specify the primary key if it's 2 columns?
>> 2- I assume the data will be put in near cache?
>>
>>
>>
>>
>>


How to do cache.get() on SQL table by primary key with multiple columns?

2020-07-14 Thread John Smith
Hi, I have an SQL table

create table if not exists my_table (
column1 int,
column2 int,
column3 varchar(16),
PRIMARY KEY (column1, column2)
) with "template=replicatedTpl";

and I'm creating my near cache as follows...

IgniteCache myCache;

NearCacheConfiguration nearConfig = new
NearCacheConfiguration<>();
nearConfig.setNearEvictionPolicyFactory(new
LruEvictionPolicyFactory<>(1024));

myCache =
this.ignite.getOrCreateNearCache(SQL_PUBLIC_MY_TABLE, nearConfig)
.withExpiryPolicy(new AccessedExpiryPolicy(new Duration(TimeUnit.HOURS,
1)));

So if I use myCache.get()...

1- How do I specify the primary key if it's 2 columns?
2- I assume the data will be put in near cache?


How to get local node near cache metrics?

2020-07-13 Thread John Smith
Hi I want to get how many entries are on the thick client's near cache. Is
there a way?


Re: How to do address resolution?

2020-07-09 Thread John Smith
You mean the connection config for visor to connect to the cluster?

On Wed., Jul. 8, 2020, 5:53 p.m. Humphrey,  wrote:

> Not sure if this will help, I've also had issues with Visor keeps hanging
> the
> cluster.
>
> When I changed the configuration to ClientMode (default is ServerMode) it
> solved my problem. Might be good to give it a try.
>
> Humphrey
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: What does all partition owners have left the grid on the client side mean?

2020-07-08 Thread John Smith
Yeah I restarted the server nodes. But I guess the client didn't
reconnect Hum

On Tue., Jul. 7, 2020, 5:52 p.m. Evgenii Zhuravlev, <
e.zhuravlev...@gmail.com> wrote:

> John,
>
> Unfortunately, I didn't find messages about lost partitions for this
> cache, there is a chance that it happened before. What Partition Loss
> policy do you have?
>
> Logs says that there is a problem with partition distribution:
>  "Local node affinity assignment distribution is not ideal [cache=cache1,
> expectedPrimary=512.00, actualPrimary=493, expectedBackups=512.00,
> actualBackups=171, warningThreshold=50.00%]"
> How do you restart nodes? Do you wait until rebalance completed?
>
> Evgenii
>
>
>
> пт, 3 июл. 2020 г. в 09:03, John Smith :
>
>> Hi Evgenii, did you have a chance to look at the latest logs?
>>
>> On Thu, 25 Jun 2020 at 11:32, John Smith  wrote:
>>
>>> Ok
>>>
>>> stdout.copy.zip
>>>
>>> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
>>>
>>> On Thu, 25 Jun 2020 at 11:01, John Smith  wrote:
>>>
>>>> Because in between it's all the business logs. Let me make sure I
>>>> didn't filter anything relevant. So maybe in those 13 hours nothing
>>>> happened?
>>>>
>>>>
>>>> On Thu, 25 Jun 2020 at 10:53, Evgenii Zhuravlev <
>>>> e.zhuravlev...@gmail.com> wrote:
>>>>
>>>>> This doesn't seem to be a full log. There is a gap for more than 13
>>>>> hours in the log :
>>>>> {"appTimestamp":"2020-06-23T23:06:41.658+00:00","threadName":"ignite-update-notifier-timer","level":"WARN","loggerName":"org.apache.ignite.internal.processors.cluster.GridUpdateNotifier","message":"New
>>>>> version is available at ignite.apache.org: 2.8.1"}
>>>>> {"appTimestamp":"2020-06-24T12:58:42.294+00:00","threadName":"disco-event-worker-#35%xx%","level":"INFO","loggerName":"org.apache.ignite.internal.managers.discovery.GridDiscoveryManager","message":"Node
>>>>> left topology: TcpDiscoveryNode [id=02949ae0-4eea-4dc9-8aed-b3f50e8d7238,
>>>>> addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, xxx.xxx.xxx.73],
>>>>> sockAddrs=[0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0,
>>>>> xx-task-0003/xxx.xxx.xxx.73:0], discPort=0, order=1258, intOrder=632,
>>>>> lastExchangeTime=1592890182021, loc=false,
>>>>> ver=2.7.0#20181130-sha1:256ae401, isClient=true]"}
>>>>>
>>>>> I don't see any exceptions in the log. When did the issue happen? Can
>>>>> you share the full log?
>>>>>
>>>>> Evgenii
>>>>>
>>>>> чт, 25 июн. 2020 г. в 07:36, John Smith :
>>>>>
>>>>>> Hi Evgenii, same folder shared stdout.copy
>>>>>>
>>>>>> Just in case:
>>>>>> https://www.dropbox.com/sh/ejcddp2gcml8qz2/AAD_VfUecE0hSNZX7wGbfDh3a?dl=0
>>>>>>
>>>>>> On Wed, 24 Jun 2020 at 21:23, Evgenii Zhuravlev <
>>>>>> e.zhuravlev...@gmail.com> wrote:
>>>>>>
>>>>>>> No, it's not. It's not clear when it happened and what was with the
>>>>>>> cluster and the client node itself at this moment.
>>>>>>>
>>>>>>> Evgenii
>>>>>>>
>>>>>>> ср, 24 июн. 2020 г. в 18:16, John Smith :
>>>>>>>
>>>>>>>> Ok I'll try... The stack trace isn't enough?
>>>>>>>>
>>>>>>>> On Wed., Jun. 24, 2020, 4:30 p.m. Evgenii Zhuravlev, <
>>>>>>>> e.zhuravlev...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> John, right, didn't notice them before. Can you share the full log
>>>>>>>>> for the client node with an issue?
>>>>>>>>>
>>>>>>>>> Evgenii
>>>>>>>>>
>>>>>>>>> ср, 24 июн. 2020 г. в 12:29, John Smith :
>>>>>>>>>
>>>>>>>>>> I thought I did! The link doesn't have them?
>>>>>>>>>>
>>>>>>>>>> On Wed., Jun. 24, 2020, 2:43 p.m. Evgenii Zhuravlev, <
>>>>>&

  1   2   3   >