Re: Issue with establishing the baseline after nodes restart

2022-06-21 Thread Ray Zhang
Thanks for the information Veena.

This is what I figured out too. Here is what I did to find the offending
client. I enabled the rest API access for the ignite servers and then query
the node information using the node id mentioned in the error message. In
the response, the ip of the node that's having the issue will be printed
out and then restarting the node would resolve that particular error, like
you said another similar error would popup and the process needs to be
repeated until all those clients are restarted.

curl 'localhost:8081/ignite?cmd=node=PUT_NODE_ID_HERE' 2>/dev/null | jq
'.response|.consistentId+.tcpHostNames[0]'

Ray

*Sent with Shift
<https://tryshift.com/?utm_source=SentWithShift_campaign=Sent+with+Shift+Signature_medium=Email+Signature_content=General+Email+Group>*


On Tue, Jun 21, 2022 at 6:08 AM Veena Mithare 
wrote:

> Hello,
>
> Not sure if you are still facing this issue,
>
> When we would face this issue , we would find which client node is the
> baseline command saying is not found. And restart that client node.
>
> Sometimes after restarting the client node, the baseline command would say
> the same thing about another client node - so we would need to restart
> another client node etc.
>
> Once all the nodes that the baseline command mentions are restarted, the
> baseline command works correctly.
>
> regards,
> Veena.
>
> On Thu, May 26, 2022 at 2:04 AM Ray Zhang  wrote:
>
>> Hi all,
>>
>> I have been having a hard time to find a proper procedure to re-establish
>> the baseline of the Ignite cluster with 3 nodes after all nodes restarts.
>> I am running ignite 2.8.1.
>>
>> The baseline command is basically not functional at all after all the
>> nodes in the ignite cluster rolling restarted.
>>
>> If I check the "top" view in the visor, the output of the command shows 3
>> expected nodes that form the cluster and all cache are in good status.
>>
>> Here is the output of the command. It seems to suggest one of the nodes
>> is no longer found and the same output is returned when the command is run
>> in any of the nodes in the cluster.
>>
>> My question is how can I reset the baseline state?  In the current state,
>> no matter what command I provide, either to add, remove or set a new
>> baseline with the *control.sh --baseline command*, it always returns the
>> same error. I have also tried 'deactivate' then 'reactivate' the cluster
>> state, and also restarted the nodes a number of times, but none of these
>> steps work.
>>
>> Is this a known bug? Any suggestions to get out of this state?
>>
>> */usr/share/apache-ignite/bin/control.sh --baseline*
>> Control utility [*ver. 2.8.1#20200521-*sha1:86422096]
>> 2020 Copyright(C) Apache Software Foundation
>> User: root
>> Time: 2022-05-26T00:51:26.874
>> Command [BASELINE] started
>> Arguments: --baseline
>>
>> 
>> Failed to execute baseline command='collect'
>> Node with id=b23787ee-dbc2-4407-93a5-d5f92ff450ad not found
>> Check arguments. *Node with id=b23787ee-dbc2-4407-93a5-d5f92ff450ad *not
>> found
>> Command [BASELINE] finished with code: 1
>> Control utility has completed execution at: 2022-05-26T00:51:27.161
>> Execution time: 287 ms
>>
>> Thanks .
>>
>> Ray
>>
>> *Sent with Shift
>> <https://tryshift.com/?utm_source=SentWithShift_campaign=Sent+with+Shift+Signature_medium=Email+Signature_content=General+Email+Group>*
>>
>>
>> This electronic communication and the information and any files
>> transmitted with it, or attached to it, are confidential and are intended
>> solely for the use of the individual or entity to whom it is addressed and
>> may contain information that is confidential, legally privileged, protected
>> by privacy laws, or otherwise restricted from disclosure to anyone else. If
>> you are not the intended recipient or the person responsible for delivering
>> the e-mail to the intended recipient, you are hereby notified that any use,
>> copying, distributing, dissemination, forwarding, printing, or copying of
>> this e-mail is strictly prohibited. If you received this e-mail in error,
>> please return the e-mail to the sender, delete it from your computer, and
>> destroy any printed copy of it.
>
>

-- 
Ray

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, lega

Issue with establishing the baseline after nodes restart

2022-05-25 Thread Ray Zhang
Hi all,

I have been having a hard time to find a proper procedure to re-establish
the baseline of the Ignite cluster with 3 nodes after all nodes restarts.
I am running ignite 2.8.1.

The baseline command is basically not functional at all after all the nodes
in the ignite cluster rolling restarted.

If I check the "top" view in the visor, the output of the command shows 3
expected nodes that form the cluster and all cache are in good status.

Here is the output of the command. It seems to suggest one of the nodes is
no longer found and the same output is returned when the command is run in
any of the nodes in the cluster.

My question is how can I reset the baseline state?  In the current state,
no matter what command I provide, either to add, remove or set a new
baseline with the *control.sh --baseline command*, it always returns the
same error. I have also tried 'deactivate' then 'reactivate' the cluster
state, and also restarted the nodes a number of times, but none of these
steps work.

Is this a known bug? Any suggestions to get out of this state?

*/usr/share/apache-ignite/bin/control.sh --baseline*
Control utility [*ver. 2.8.1#20200521-*sha1:86422096]
2020 Copyright(C) Apache Software Foundation
User: root
Time: 2022-05-26T00:51:26.874
Command [BASELINE] started
Arguments: --baseline

Failed to execute baseline command='collect'
Node with id=b23787ee-dbc2-4407-93a5-d5f92ff450ad not found
Check arguments. *Node with id=b23787ee-dbc2-4407-93a5-d5f92ff450ad *not
found
Command [BASELINE] finished with code: 1
Control utility has completed execution at: 2022-05-26T00:51:27.161
Execution time: 287 ms

Thanks .

Ray

*Sent with Shift
<https://tryshift.com/?utm_source=SentWithShift_campaign=Sent+with+Shift+Signature_medium=Email+Signature_content=General+Email+Group>*

-- 
This electronic communication and the information and any files transmitted 
with it, or attached to it, are confidential and are intended solely for 
the use of the individual or entity to whom it is addressed and may contain 
information that is confidential, legally privileged, protected by privacy 
laws, or otherwise restricted from disclosure to anyone else. If you are 
not the intended recipient or the person responsible for delivering the 
e-mail to the intended recipient, you are hereby notified that any use, 
copying, distributing, dissemination, forwarding, printing, or copying of 
this e-mail is strictly prohibited. If you received this e-mail in error, 
please return the e-mail to the sender, delete it from your computer, and 
destroy any printed copy of it.


smime.p7s
Description: S/MIME Cryptographic Signature


Re: SQL support for (insert if not exists, update if exists) operation.

2018-12-05 Thread Ray
Hello Ilya,

Thanks for the reply.
I've tried "Merge into", it does not satisfy my needs.

For example,
I create a table with
create table a(a varchar, b varchar,c varchar, primary key(a));

And I inserted one record into table a with 
merge into a(a,b) values('1','1');

Then I want to update this record by adding value to column c.
But when I try this sql

merge into a(a,c) values('1','1');

The value of column b is deleted.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


SQL support for (insert if not exists, update if exists) operation.

2018-12-05 Thread Ray
I want to know if it's possible to do "replace into" like Mysql in Ignite.
Because currently I have these following code to implement this (insert if
not exists, update if exists) logic.
  if not exists (select 1 from t where id = 1) 
  insert into t(id, update_time) values(1, getdate())
  else
  update t set update_time = getdate() where id = 1

In mysql, I can use this one liner sql to do the job.

replace into t(id, update_time) values(1, now());




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: ODBC driver build error

2018-12-02 Thread Ray
Hello Ilya,

The thing is I just want to apply a few bug fixes to the Ignite 2.6.
The nightly release will include all the new tickets in master since 2.6 if
my understanding is correct.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: ODBC driver build error

2018-11-29 Thread Ray
Thank you for the reply Igor,

After adding legacy_stdio_definitions.lib in the linker's input, I built the
ODBC driver successfully.

I want to build ODBC driver myself because I want to apply this ticket to
Ignite 2.6 because 2.7 is not yet released.

https://issues.apache.org/jira/browse/IGNITE-8930

By the way, I followed the instructions in the
modules/platforms/cpp/DEVNOTES.txt, I think this document should be updated.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Failed to get page IO instance (page content is corrupted) after onenode failed when trying to reboot.

2018-11-28 Thread Ray Liu (rayliu)
Here's my analysis

Looks like I encountered this bug
https://issues.apache.org/jira/browse/IGNITE-8659
Because in log file ignite-9d66b750-first-restart-node1.log, I see
[2018-11-29T03:01:39,135][INFO 
][exchange-worker-#162][GridCachePartitionExchangeManager] Rebalancing started 
[top=AffinityTopologyVersion [topVer=11834, minorTopVer=0], evt=NODE_JOINED, 
node=6018393e-a88c-40f5-8d77-d136d5226741]
[2018-11-29T03:01:39,136][INFO 
][exchange-worker-#162][GridDhtPartitionDemander] Starting rebalancing 
[grp=SQL_PUBLIC_WBXSITEACCOUNT, mode=ASYNC, 
fromNode=6018393e-a88c-40f5-8d77-d136d5226741, partitionsCount=345, 
topology=AffinityTopologyVersion [topVer=11834, minorTopVer=0], rebalanceId=47]

But why did rebalance started after two hours after the node started?
Is it because PME got stuck for two hours?

Also it looks like the PME got stuck again when rebalance started (This is when 
I restarted node2 and node3).
Because in the same log file, I see
[2018-11-29T03:01:59,443][WARN 
][exchange-worker-#162][GridDhtPartitionsExchangeFuture] Unable to await 
partitions release latch within timeout: ServerLatch [permits=2, 
pendingAcks=[6018393e-a88c-40f5-8d77-d136d5226741, 
75a180ea-78de-4d63-8bd5-291557bd58f4], super=CompletableLatch [id=exchange, 
topVer=AffinityTopologyVersion [topVer=11835, minorTopVer=0]]]

Based on this document 
https://apacheignite.readme.io/docs/rebalancing#section-rebalance-modes, the 
rebalance is async by default.
So what is block PME this time?

So basically I have three questions.
1. Why node1 can't join cluster("Still waiting for initial partition map 
exchange" for two hours) when restarted?
Is it because node2 and node3 have some newly ingested data when node1 is down?

2. Why is node3 blocked by " Unable to await partitions release latch within 
timeout " when restarted?

3. Is https://issues.apache.org/jira/browse/IGNITE-8659 the solution?

Andrew, can you take a look please?
I think it's a critical problem because the only way to get node1 working is to 
delete data and wal folder.
No need to say, it will cause data loss.

Thanks

Ray wrote:

This issue happened again.

Here's the summary.
I'm running a three nodes of Ignite 2.6 cluster with these config


http://www.springframework.org/schema/beans;
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
   xsi:schemaLocation="
   http://www.springframework.org/schema/beans
   http://www.springframework.org/schema/beans/spring-beans.xsd;>































node1:49500
node2:49500
node3:49500














I have a few caches setup with TTL with enabled persistence.
Why I'm mentioning this because I check this thread

http://apache-ignite-users.70518.x6.nabble.com/And-again-Failed-to-get-page-IO-instance-page-content-is-corrupted-td20095.html#a22037
and a few tickets mentioned in this ticket.
https://issues.apache.org/jira/browse/IGNITE-8659
https://issues.apache.org/jira/browse/IGNITE-5874
Other issues is ignored because they're already fixed in 2.6


Node1 goes down because of a long GC pause.
When I try to restart Ignite service on Node1, I got "Still waiting for
initial partition map exchange" warning log going on for more than 2 hours. 
[WARN ][main][GridCachePartitionExchangeManager] Still waiting for initial
partition map exchange [fut=GridDhtPartitionsExchangeFuture
[firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
[id=9d66b750-09a3-4f0e-afa9-7cf24847ee6a, addrs=[10.252.4.60, 127.0.0.1],
sockAddrs=[rpsj1ign001.webex.com/10.252.4.60:49500, /127.0.0.1:49500],
discPort=49500, order=11813, intOrder=5909, lastExchangeTime=1543451981558,
loc=true, ver=2.6.0#20180709-sha1:5faffcee, isClient=false], topVer=11813,
nodeId8=9d66b750, msg=null, type=NODE_JOINED, tstamp=1543451943071],
crd=TcpDiscoveryNode [id=f14c8e36-9a20-4668-b52e-0de64c743700,
addrs=[10.252.10.20, 127.0.0.1],
sockAddrs=[rpsj1ign003.web

Re: ODBC driver build error

2018-11-28 Thread Ray
Hi Igor,

Thanks for the reply.
I installed OpenSSL 1.0.2 instead, but there's another error.

SeverityCodeDescription Project FileLineSuppression 
State
Error   LNK1120 1 unresolved externals  odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\project\vs\Win32\Debug\ignite.odbc.dll
1   
Error   LNK2019 unresolved external symbol __vsnwprintf_s referenced in
function _StringVPrintfWorkerW@20   odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\project\vs\odbccp32.lib(dllload.obj)
1   




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


RE: Failed to get page IO instance (page content is corrupted) after onenode failed when trying to reboot.

2018-11-28 Thread Ray
This issue happened again.

Here's the summary.
I'm running a three nodes of Ignite 2.6 cluster with these config


http://www.springframework.org/schema/beans;
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
   xsi:schemaLocation="
   http://www.springframework.org/schema/beans
   http://www.springframework.org/schema/beans/spring-beans.xsd;>































node1:49500
node2:49500
node3:49500














I have a few caches setup with TTL with enabled persistence.
Why I'm mentioning this because I check this thread
http://apache-ignite-users.70518.x6.nabble.com/And-again-Failed-to-get-page-IO-instance-page-content-is-corrupted-td20095.html#a22037
and a few tickets mentioned in this ticket.
https://issues.apache.org/jira/browse/IGNITE-8659
https://issues.apache.org/jira/browse/IGNITE-5874
Other issues is ignored because they're already fixed in 2.6


Node1 goes down because of a long GC pause.
When I try to restart Ignite service on Node1, I got "Still waiting for
initial partition map exchange" warning log going on for more than 2 hours. 
[WARN ][main][GridCachePartitionExchangeManager] Still waiting for initial
partition map exchange [fut=GridDhtPartitionsExchangeFuture
[firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
[id=9d66b750-09a3-4f0e-afa9-7cf24847ee6a, addrs=[10.252.4.60, 127.0.0.1],
sockAddrs=[rpsj1ign001.webex.com/10.252.4.60:49500, /127.0.0.1:49500],
discPort=49500, order=11813, intOrder=5909, lastExchangeTime=1543451981558,
loc=true, ver=2.6.0#20180709-sha1:5faffcee, isClient=false], topVer=11813,
nodeId8=9d66b750, msg=null, type=NODE_JOINED, tstamp=1543451943071],
crd=TcpDiscoveryNode [id=f14c8e36-9a20-4668-b52e-0de64c743700,
addrs=[10.252.10.20, 127.0.0.1],
sockAddrs=[rpsj1ign003.webex.com/10.252.10.20:49500, /127.0.0.1:49500],
discPort=49500, order=2310, intOrder=1158, lastExchangeTime=1543451942304,
loc=false, ver=2.6.0#20180709-sha1:5faffcee, isClient=false],
exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion
[topVer=11813, minorTopVer=0], discoEvt=DiscoveryEvent
[evtNode=TcpDiscoveryNode [id=9d66b750-09a3-4f0e-afa9-7cf24847ee6a,
addrs=[10.252.4.60, 127.0.0.1],
sockAddrs=[rpsj1ign001.webex.com/10.252.4.60:49500, /127.0.0.1:49500],
discPort=49500, order=11813, intOrder=5909, lastExchangeTime=1543451981558,
loc=true, ver=2.6.0#20180709-sha1:5faffcee, isClient=false], topVer=11813,
nodeId8=9d66b750, msg=null, type=NODE_JOINED, tstamp=1543451943071],
nodeId=9d66b750, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter
[ignoreInterrupts=false, state=INIT, res=null, hash=830022440], init=false,
lastVer=null, partReleaseFut=PartitionReleaseFuture
[topVer=AffinityTopologyVersion [topVer=11813, minorTopVer=0],
futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion
[topVer=11813, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture
[topVer=AffinityTopologyVersion [topVer=11813, minorTopVer=0], futures=[]],
DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=11813,
minorTopVer=0], futures=[]], LocalTxReleaseFuture
[topVer=AffinityTopologyVersion [topVer=11813, minorTopVer=0], futures=[]],
AllTxReleaseFuture [topVer=AffinityTopologyVersion [topVer=11813,
minorTopVer=0], futures=[RemoteTxReleaseFuture
[topVer=AffinityTopologyVersion [topVer=11813, minorTopVer=0],
futures=[]], exchActions=ExchangeActions [startCaches=null,
stopCaches=null, startGrps=[], stopGrps=[], resetParts=null,
stateChangeRequest=null], affChangeMsg=null, initTs=1543451943112,
centralizedAff=false, forceAffReassignment=false, changeGlobalStateE=null,
done=false, state=SRV, evtLatch=0,
remaining=[0126e998-0c18-452f-8f3b-b6dd4b2ae84c,
f14c8e36-9a20-4668-b52e-0de64c743700], super=GridFutureAdapter
[ignoreInterrupts=false, state=INIT, res=null, hash=773110813]]]

So I try to reboot Ignite service on node2 and node3.
But only node2 manages to join the cluster, node3 prints "Still waiting for
initial partition map exchange" for more than 30 minutes.

So I stopped all three nodes, and restarted the Ignite service on them.
Then I got Failed to get page IO instance (page content is corrupted) on
Node1.

[ERROR][exchange-worker-#162][] Critical system error detected. Will be
handled accordingly to configured handler 

ODBC driver build error

2018-11-28 Thread Ray
I try to build ODBC driver on Windows Visual Studio 2017.
I installed these dependencies 
Windows SDK 7.1
JDK 8
Win64 OpenSSL v1.1.1a from https://slproweb.com/products/Win32OpenSSL.html

I set OPENSSL_HOME=C:\Program Files\OpenSSL-Win64

When I try to build, there's these error logs. 

SeverityCodeDescription Project FileLineSuppression 
State
Error   C7525   inline variables require at least '/std:c++17'  odbc
d:\ignite-ignite-2.6\modules\platforms\cpp\odbc\include\ignite\odbc\ssl\ssl_bindings.h
133 
Error (active)  E0325   inline specifier allowed on function declarations only
odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\include\ignite\odbc\ssl\ssl_bindings.h
133 
Error (active)  E0018   expected a ')'  odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\include\ignite\odbc\ssl\ssl_bindings.h
133 
Error (active)  E0065   expected a ';'  odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\include\ignite\odbc\ssl\ssl_bindings.h
134 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member
"SSL_set_tlsext_host_name_" odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
80  
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member "SSL_free_"
odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
86  
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member
"SSL_set_connect_state_"odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
91  
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member "SSL_free_"
odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
97  
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member
"SSL_get_peer_certificate"  odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
103 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member "X509_free"
odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
105 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member "SSL_free_"
odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
111 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member "SSL_free_"
odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
124 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member
"SSL_write_"odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
152 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member
"SSL_pending_"  odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
172 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member "SSL_read_"
odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
180 
Error (active)  E0109   expression preceding parentheses of apparent call must
have (pointer-to-) function typeodbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
206 
Error (active)  E0109   expression preceding parentheses of apparent call must
have (pointer-to-) function typeodbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
208 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member
"SSLv23_client_method_" odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
216 
Error (active)  E0020   identifier "SSL_CTRL_OPTIONS" is undefined  odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
237 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member
"BIO_new_ssl_connect"   odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
292 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member
"BIO_set_nbio_" odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
301 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member
"BIO_set_conn_hostname_"odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
315 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member
"BIO_free_all"  odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
320 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member
"BIO_get_ssl_"  odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
326 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member
"BIO_free_all"  odbc
D:\ignite-ignite-2.6\modules\platforms\cpp\odbc\src\ssl\secure_socket_client.cpp
331 
Error (active)  E0135   namespace "ignite::odbc::ssl" has no member
"SSL_connect_"  odbc

Re: New added property with @QuerySqlField does not persist

2018-11-21 Thread Ray
Actually you can't, because the schema stored in cache configuration is the
old one before adding column.

I've asked a similar question on dev list.

http://apache-ignite-developers.2346864.n4.nabble.com/Schema-in-CacheConfig-is-not-updated-after-DDL-commands-Add-drop-column-Create-drop-index-td38002.html



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: SQL Support for IN predicate

2018-11-14 Thread Ray
I'm using Ignite 2.6.

Maybe the document is outdated?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: SQL Support for IN predicate

2018-11-14 Thread Ray
Hello Evgenii,

I have a question about IN clause not using indexes.

>From my local test results, it seems IN clause does use indexes.

I have a table named aaa and this table has an index on field d.

The query plan indicates that IN clause does use indexes.

0: jdbc:ignite:thin://127.0.0.1/> explain select * from aaa where d in
('1','2','3');
PLAN  SELECT
__Z0.A AS __C0_0,
__Z0.B AS __C0_1,
__Z0.D AS __C0_2,
__Z0.E AS __C0_3
FROM PUBLIC.AAA __Z0
/* PUBLIC."aaa_d_asc_idx": D IN('1', '2', '3') */
WHERE __Z0.D IN('1', '2', '3')

PLAN  SELECT
__C0_0 AS A,
__C0_1 AS B,
__C0_2 AS D,
__C0_3 AS E
FROM PUBLIC.__T0
/* PUBLIC."merge_scan" */

2 rows selected (0.022 seconds)



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Local node SEGMENTED error causing node goes down for no obvious reason

2018-11-07 Thread Ray
I'm running a six nodes Ignite 2.6 cluster.
The config for each server is as follows
































10.29.42.231:49500
10.29.42.233:49500
10.29.42.234:49500
10.29.42.235:49500
10.29.42.236:49500
10.29.42.232:49500














I also enabled direct io plugin.

When I try to ingest data into Ignite using Spark dataframe API, the cluster
will be very slow after the Spark driver connects to the cluster and some of
the server nodes will go down eventually with this error:

Local node SEGMENTED: TcpDiscoveryNode
[id=8ce23742-702e-4309-934a-affd80bf3653, addrs=[10.29.42.232, 127.0.0.1],
sockAddrs=[/10.29.42.232:49500, /127.0.0.1:49500], discPort=49500, order=2,
intOrder=2, lastExchangeTime=1541571124026, loc=true,
ver=2.6.0#20180709-sha1:5faffcee, isClient=false]
2018-11-07T06:12:04,032][INFO ][disco-pool-#457][TcpDiscoverySpi] Finished
node ping [nodeId=844fab1e-4189-4f10-bc84-b069bc18a267, res=true, time=6ms]
[2018-11-07T06:12:04,033][ERROR][tcp-disco-srvr-#2][] Critical system error
detected. Will be handled accordingly to configured handler [hnd=class
o.a.i.failure.StopNodeOrHaltFailureHandler, failureCtx=FailureContext
[type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread
tcp-disco-srvr-#2 is terminated unexpectedly.]]
java.lang.IllegalStateException: Thread tcp-disco-srvr-#2 is terminated
unexpectedly.
at
org.apache.ignite.spi.discovery.tcp.ServerImpl$TcpServer.body(ServerImpl.java:5687)
[ignite-core-2.6.0.jar:2.6.0]
at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
[ignite-core-2.6.0.jar:2.6.0]
[2018-11-07T06:12:04,036][ERROR][tcp-disco-srvr-#2][] JVM will be halted
immediately due to the failure: [failureCtx=FailureContext
[type=SYSTEM_WORKER_TERMINATION, err=java.lang.IllegalStateException: Thread
tcp-disco-srvr-#2 is terminated unexpectedly.]]

I examined the GC log and all nodes don't have long GC pause.
The network interconnectivity between all these nodes is fine.

The complete logs for all six servers and client are in the attachment.


>From my observation, the PME process when a new thick client in Spark
dataframe API joins topology is very slow and can leads to many problems.
I think the proposal suggested by Nikolay to change thick clients to java
thin clients is a good way to improve this. 
http://apache-ignite-developers.2346864.n4.nabble.com/DISCUSSION-Spark-Data-Frame-through-Thin-Client-td36814.html

iglog.zip
  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Create index got stuck and freeze whole cluster.

2018-10-30 Thread Ray
I'm using a five nodes Ignite 2.6 cluster.
When I try try to create index on table with10 million records using sql
"create index on table(a,b,c,d)", the whole cluster freezes and prints the
following log for 40 minutes.

2018-10-30T02:48:44,086][WARN
][exchange-worker-#162][GridDhtPartitionsExchangeFuture] Unable to await
partitions release latch within timeout: ServerLatch [permits=4,
pendingAcks=[20aa5929-3f26-4923-87a3-27b4f6d4f744,
ec5be25e-6601-468c-9f0e-7ab7c8caa9e9, 45819b05-a338-4bc4-b104-f0c7567fd49d,
cbb80db7-b342-4b97-ba61-97d57c194a1a], super=CompletableLatch [id=exchange,
topVer=AffinityTopologyVersion [topVer=202, minorTopVer=1]]]

I noticed one of the servers(log in server3.zip) is stuck in checkpoint
process, and this server acts as coordinator in PME.
In the log I see only 856610 pages needs to be flushed to disk, but the
checkpoint takes 32 minutes to finish.
While another node takes 7 minutes to finish writing 919060 pages to disk.
Also the disk usage on the slow checkpoint server is not 100%.

Here's the whole log file for 5 servers.
server1.zip
  
server2.zip
  
server3.zip
  
server4.zip
  
server5.zip
  




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


How to use BinaryObject Cache API to query a table created from JDBC?

2018-10-12 Thread Ray
Let's say I create a table from jdbc client using this command.

create table test(a varchar, b varchar, c varchar,primary key(a,b));

I inserted one record in this table.

 select * from test;
++++
|   A|   B| 
 
C|
++++
| 8  | 8  | 9   
  
|
++++

How can I query this table using BinaryObject Cache API?

I tried the code below, but it returns null for object o.

public class TestKey {

private String a;

private String b;

public TestKey() {
}

public String getA() {
return a;
}

public void setA(String a) {
this.a = a;
}

public String getB() {
return b;
}

public void setB(String b) {
this.b = b;
}
}



TestKey tk = new TestKey();
tk.setA("8");
tk.setB("8");
Object o = ignite.cache("SQL_PUBLIC_TEST").get(tk);



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-28 Thread Ray
Actually there's only one row in b.

SELECT COUNT(*) FROM b where x = '1';
COUNT(*)  1

1 row selected (0.003 seconds)

Maybe because the join performance drops dramatically when the data size is
more than 10 million or cluster has a lot of clients connected?
My 6 node cluster has 10 clients connected to it and some of them has slow
network connectivity.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-28 Thread Ray
Here's the detailed information for my join test.

0: jdbc:ignite:thin://sap-datanode6/> select * from a;
x  1
y  1
A   bearbrick

1 row selected (0.002 seconds)
0: jdbc:ignite:thin://sap-datanode6/> select count(*) from b;
COUNT(*)  14337959

1 row selected (0.299 seconds)
0: jdbc:ignite:thin://sap-datanode6/> select x,y from b where _key = '1';
x  1
y  1

1 row selected (0.002 seconds)


select a.x,a.y from a join b where a.x = b.x and a.y = b.y;
x  1
y  1

1 row selected (6.036 seconds)  -- Takes 6 seconds to join a table with one
row to 14 million row table using affinity key x

explain select a.x,a.y from a join b where a.x = b.x and a.y = b.y;

PLAN  SELECT
A__Z0.x AS __C0_0,
A__Z0.y AS __C0_1
FROM PUBLIC.B__Z1
/* PUBLIC.B.__SCAN_ */
INNER JOIN PUBLIC.T A__Z0
/* PUBLIC.AFFINITY_KEY: x = B__Z1.x */
ON 1=1
WHERE (A__Z0.y = B__Z1.y)
AND (A__Z0.x = B__Z1.x)

PLAN  SELECT
__C0_0 AS x,
__C0_1 AS y
FROM PUBLIC.__T0
/* PUBLIC."merge_scan" */

If I create a index on table b on field x and y, it takes 6.8 seconds to
finish join.

create index on b(x,y);
No rows affected (31.316 seconds)

0: jdbc:ignite:thin://sap-datanode6/> select a.x,a.y from a join b where a.y
= b.y and a.x = b.x;
x  1
y  1

1 row selected (6.865 seconds)

0: jdbc:ignite:thin://sap-datanode6/> explain select a.x,a.y from a join b
where a.y = b.y and a.x = b.x;
PLAN  SELECT
A__Z0.x AS __C0_0,
A__Z0.y AS __C0_1
FROM PUBLIC.T A__Z0
/* PUBLIC.T.__SCAN_ */
INNER JOIN PUBLIC.B__Z1
/* PUBLIC."b_x_asc_y_asc_idx": y = A__Z0.y
AND x = A__Z0.x
 */
ON 1=1
WHERE (A__Z0.y = B__Z1.y)
AND (A__Z0.x = B__Z1.x)

PLAN  SELECT
__C0_0 AS x,
__C0_1 AS y
FROM PUBLIC.__T0
/* PUBLIC."merge_scan" */

2 rows selected (0.003 seconds)

Here's my configuration


http://www.springframework.org/schema/beans;
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
   xsi:schemaLocation="
   http://www.springframework.org/schema/beans
   http://www.springframework.org/schema/beans/spring-beans.xsd;>




























node1:49500
node2:49500
node3:49500
node4:49500
node5:49500
node6:49500
















--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-24 Thread Ray
Let's say I have two tables I want to join together.
Table a has around 10 millions of rows and it's primary key is x and y. 
I have created index on field x and y for table a.

Table b has one row and it's primary key is x and y.
The primary key for that row in table b has a correspondent row in table a
which has the same primary key.

When I try to execute this query to join "select a.*,b.* from a inner join b
where (a.x=b.x) and (a.y = b.y);", ti takes more than 4 seconds to show only
one record.
I also examined the plan for that sql and confirmed the index I created is
used for this sql.

Ideally, if we use hash join it should take less than half a second.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-20 Thread Ray
Hi Val, thanks for the reply.

I'll try again and let you know if I missed something.

By "Ignite is not optimized for join", I mean currently Ignite only supports
nest loop join which is very inefficient when joining two large table.
Please refer to these two tickets for details.
https://issues.apache.org/jira/browse/IGNITE-6201
https://issues.apache.org/jira/browse/IGNITE-6202



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: How does lazy load work internally in Ignite?

2018-09-19 Thread Ray
Hi Ilya, thanks for the reply.

So is it like cursor on the server side?

Let's say user ran a query "select * from tableA" where tablaA has a million
records.
When the lazy loading flag is on Ignite server will send the first batch of
records to the user.
When user's client asks for second batch then Ignite sends the second batch
of records.

Is my understanding correct?

What's the default batch size sent to user if my understanding is correct?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: How much heap to allocate

2018-09-18 Thread Ray
Hi Mikhail,

Can you explain how is lazy loading working when I use this sql "select *
from table" to query a big table which don't fit in on heap memory?
Does Ignite send part of the result set to the client? 
If Ignite is still sending whole result set back to the client, how can lazy
load avoid loading full result set on heap?  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


How does lazy load work internally in Ignite?

2018-09-18 Thread Ray
>From this document
https://apacheignite-sql.readme.io/docs/performance-and-debugging#section-result-set-lazy-load,
it mentioned that when setting lazy load flag on the query it can avoid
prolonged GC pauses and even OutOfMemoryError.

But it confuses me how does lazy load work internally in Ignite?
Does Ignite send part of the result set to the client?
If Ignite is still sending whole result set back to the client, how can lazy
load avoid loading full result set on heap?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Is there a way to use Ignite optimization and Spark optimization together when using Spark Dataframe API?

2018-09-18 Thread Ray
Currently, OPTION_DISABLE_SPARK_SQL_OPTIMIZATION option can only be set on
spark session level.
It means I can only have Ignite optimization or Spark optimization for one
Spark job.

Let's say I want to load data into spark memory with pushdown filters using
Ignite optimization.
For example, I want to load one day's data using this sql "select * from
tableA where date = '2018-09-01'".
With Ignite optimization, this sql is executed on Ignite and the where
clause filter is applied on Ignite.
But with Spark optimization, all the data in this table will be loaded into
Spark memory and do filter later.

Then I want to join filtered tableA with filtered tableB which is also
loaded from Ignite.
But I want use Spark's join feature to do the join because both filtered
tableA with filtered tableB contains millions or rows and Ignite is not
optimized for join.
How can I do that?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Performance of SQL query by partial primary key

2018-09-17 Thread Ray
To answer my own question here, basically the index created on PK now is
useless according to this ticket.
https://issues.apache.org/jira/browse/IGNITE-8386

Ignite will perform a whole table scan when trying to execute the SQL query
I posted above unless an index identical to PK is created manually.

Please correct me if understand it wrong.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


RE: Failed to get page IO instance (page content is corrupted) after onenode failed when trying to reboot.

2018-09-17 Thread Ray
Hi Stan,

Thanks for the reply.

10.29.42.49 is a client node trying to connect to cluster to write data.
Don't know why the log shows it's not a client node.

Anyway, can you help take a look why restart 2-4 didn't work?
In restart 2-4 theres no server trying to join cluster as server nodes.

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


How to start an Ignite cluster with fixed number of servers?

2018-09-17 Thread Ray
Let's say I want to start an Ignite cluster of three server nodes with fixed
IP address and prevent other servers joining cluster as Ignite server nodes,
how can I do that?






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Failed to get page IO instance (page content is corrupted) after one node failed when trying to reboot.

2018-09-17 Thread Ray
I have a three nodes Ignite 2.6.0 cluster with native persistence enabled.
Here's the config

http://www.springframework.org/schema/beans;
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
   xsi:schemaLocation="
   http://www.springframework.org/schema/beans
   http://www.springframework.org/schema/beans/spring-beans.xsd;>































node1:49500
node2:49500
node3:49500














One node failed when trying to process a long running sql, the detailed log
can be found in attachment with filename nodefail.log.

The other two nodes may have some new data coming in when one node in failed
state.
When I try to reboot this server after several hours, first it got stuck for
one hour with "Failed to wait for partition map exchange" exception.
The detailed log can be found in attachment with filename restart1.log.

So I try to reboot this server again but got
"org.apache.ignite.spi.IgniteSpiException: Node with set up BaselineTopology
is not allowed to join cluster without one:" exception.
The detailed log can be found in attachment with filename restart2.log.

So I try to reboot the whole cluster by starting failed node first, but I
got "Unable to await partitions release latch within timeout:" exception
when two other servers are started.
The detailed log can be found in attachment with filename restart3.log.

So I try to reboot this server again, but I got "Failed to get page IO
instance (page content is corrupted)" exception.
The detailed log can be found in attachment with filename restart4.log.

>From this point on, the cluster is in non-recoverable state.
Please advice me how to avoid this situation and how to recover data.
The log of failed node is in log.zip.
The other two files are logs for two good nodes.

log.zip   
log-goodNode.zip
  
log-GoodNode2.zip
  






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Performance of SQL query by partial primary key

2018-09-14 Thread Ray
I have some trouble understanding how Ignite maintain primary key.
This document
https://apacheignite.readme.io/docs/memory-architecture#section-b-trees-and-index-pages
says "The cache keys are also stored in B+ trees and are ordered by their
hash code values."

So if I create a table using this command, create table t(a varchar, b
varchar, c varchar, primary key(a,b));
The primary key for this table is a custom object consists of a and b.

So my question is when I try to query this table using this sql "select c
from t where a= 'some value' " , will there be a performance issue when
table t contains billions of records?


Quote from
"https://apacheignite.readme.io/docs/memory-architecture#section-b-trees-and-index-pages;

"For instance, when myCache.get(keyA) operation is executed, it will trigger
the following execution flow:

1. Ignite will look for a memory region to which myCache belongs to.
2. The meta page pointing to the hash index B+ tree of myCache will be
located.
3. Based on the keyA hash code, the index page the key belongs to will be
located in the B+ tree.
4. If the corresponding index page is not found in the memory or on disk,
then Ignite concludes that the key does not exist and will return null.
5. If the index page exists, then it will contain all the information needed
to find the data page of the cache entry keyA refers to.
6. Ignite will locate the data page for keyA and will return the value to
the user."


When I use the above query to get records, the where statement only contains
partial primary key.
So there's no way for calculate the key hash code in step 3, right?
I wonder how Ignite handles this case, does Ignite perform a whole table
scan?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


How to set query timeout or cancel query when submit SQL query from SQL tool like DBweaver and sqlline?

2018-09-13 Thread Ray
When I go through this document
https://apacheignite-sql.readme.io/docs/query-cancellation, it seems query
can set timeout if the query submitted through java API(SqlQuery,
SqlFieldsQuery).
So my question is how to set query timeout or cancel query when submit SQL
query from SQL tool like DBweaver?

One more question if query submitted through SQL tool can't be cancelled,
how to set global query timeout parameter?
https://issues.apache.org/jira/browse/IGNITE-2680
In the comments of this ticket, Alexie mentioned the global timeout
parameter is jdbc:h2:~/db/test;query_timeout=1.
But I don't see a way of setting it.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Getting an exception when listing partitions of IgniteDataFrame

2018-08-08 Thread Ray
Hi Rama,

Did you solve this problem?
Please let me know your solution if you have solved this problem.

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-08-02 Thread Ray
The root cause for this issue is the network throttle between client and
servers.

When I move the clients to run in the same cluster as the servers, there's
no such problem any more.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-07-30 Thread Ray
Hello Pavel,

I was able to reproduce this issue and I've attached the DEBUG log and
thread dump for three nodes as you suggested.
Archive.zip
  

This time, there's no "no route to host" exception between server and client
nodes.

Node2 and node3 logs "Unable to await partitions release latch within
timeout: ClientLatch" shortly after cluster starts, node1 don't have
explicitly errors.

And cluster begins to freeze after about 20 minutes after the data ingestion
starts.

The attached picture is data streaming threads running/park time slice in
each of three nodes.
You can see that node3 freezes first then node2 freezes.
So client can only writes to node1 and triggered a lot of rebalancing.

node1.png

node2.png
  
node3.png
  

By the time I wrote the post, the data ingestion usually takes 5 minutes is
still not finished after 1.1 hour.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-07-26 Thread Ray
Hello Pavel, 

Thanks for the explanation, it's been great help.

Can you take a guess why PME has performed a long time due to communication
issues between server nodes?
>From the logs, the "no route to host" exception happened because server
can't connect to client's ports.
But I didn't see any logs indicating the network issues between server
nodes.
I tested connectivity of communication SPI ports(47100 in this case) and
discovery SPI ports(49500 in this case) between server nodes, it's all good.

And on client(spark executor) side, there's no exception log when PME takes
a long time to finish.
It will hang forever.
Spark.log
  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: "Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-07-25 Thread Ray
Hello Pavel,

Here's the log for for node ids = [429edc2b-eb14-414f-a978-9bfe35443c8c,
6783732c-9a13-466f-800a-ad4c8d9be3bf]. 
6783732c-9a13-466f-800a-ad4c8d9be3bf.zip

  
429edc2b-eb14-414f-a978-9bfe35443c8c.zip

  
I examined the logs and looks like there's a network issue here because
there's a lot of "java.net.NoRouteToHostException: No route to host"
exception.

I did a little research and found this ticket may be the cause.
https://issues.apache.org/jira/browse/IGNITE-8739

Will the client(spark executor in this case) retry data insert if I apply
this patch when the network glitch is resolved?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


"Unable to await partitions release latch within timeout: ServerLatch" exception causing cluster freeze

2018-07-25 Thread Ray
I have a three node Ignite 2.6 cluster setup with the following config. 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
node1:49500 
node2:49500 
node3:49500 
 
 
 
 
 
 
 
 
 
 
 
 
 

And I used this command to start Ignite service on three nodes. 

./ignite.sh -J-Xmx32000m -J-Xms32000m -J-XX:+UseG1GC 
-J-XX:+ScavengeBeforeFullGC -J-XX:+DisableExplicitGC -J-XX:+AlwaysPreTouch 
-J-XX:+PrintGCDetails -J-XX:+PrintGCTimeStamps -J-XX:+PrintGCDateStamps 
-J-XX:+PrintAdaptiveSizePolicy -XX:+PrintGCApplicationStoppedTime 
-XX:+PrintGCApplicationConcurrentTime 
-J-Xloggc:/spare/ignite/log/ignitegc-$(date +%Y_%m_%d-%H_%M).log 
config/persistent-config.xml 

When I'm using Spark dataframe API to ingest data into this cluster, the
cluster freezes after some time and no new data can be ingested into Ignite.
Both the client(spark executor) and server are showing the "Unable to await
partitions release latch within timeout: ServerLatch" exception starts from
line 51834 in full log like this

[2018-07-25T09:45:42,177][WARN
][exchange-worker-#162][GridDhtPartitionsExchangeFuture] Unable to await
partitions release latch within timeout: ServerLatch [permits=2,
pendingAcks=[429edc2b-eb14-414f-a978-9bfe35443c8c,
6783732c-9a13-466f-800a-ad4c8d9be3bf], super=CompletableLatch
[id=exchange, topVer=AffinityTopologyVersion [topVer=239, minorTopVer=0]]]

Here's the full log on server node having the exception.
07-25.zip
  





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Can't write to Ignite cluster when one node in baseline is down

2018-07-25 Thread Ray
Hi,

I tried two cases for backup, which is backup=1 and no backup.
Both cases failed with the exception I attached.

I'm sure all 3 nodes are in baseline.
I checked using ./control.sh --baseline command.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Can't write to Ignite cluster when one node in baseline is down

2018-07-25 Thread Ray
I'm running a 3 node Ignite 2.6 cluster with persistence store enabled.

I tested this following use case.

1. Start all three nodes.
2. Activate the cluster
3. Stop one of the node
4. Start ingesting data

But I got following exception in step 4.
class
org.apache.ignite.internal.cluster.ClusterTopologyServerNotFoundException:
Failed to find server node for cache (all affinity nodes have left the grid
or cache was stopped):

Is this expected behavior?
Does it mean if one node in baseline is down, then no data can be ingested
into this cluster?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite node failed for no obvious reason

2018-07-24 Thread Ray
Hi,

All nodes are all inter-connectable.
I think it's a short time outage in network because earlier data ingesting
is successful until  the network glitch.

I'll try to build with IGNITE-8739 and try again.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Ignite data can't be recovered after node fail

2018-07-24 Thread Ray
Following node fail described in this thread
http://apache-ignite-users.70518.x6.nabble.com/Ignite-node-failed-for-no-obvious-reason-td22866.html,
I tried to reboot this node and recover the data to make Ignite cluster
available again.

First, I try reboot node2 directly but failed.
The node log is as follows.

[2018-07-24T02:57:38,956][INFO ][main][IgniteKernal] 

>>>__    
>>>   /  _/ ___/ |/ /  _/_  __/ __/  
>>>  _/ // (7 7// /  / / / _/
>>> /___/\___/_/|_/___/ /_/ /___/   
>>> 
>>> ver. 2.6.0#20180710-sha1:669feacc
>>> 2018 Copyright(C) Apache Software Foundation
>>> 
>>> Ignite documentation: http://ignite.apache.org

[2018-07-24T02:57:38,976][INFO ][main][IgniteKernal] Config URL:
file:/opt/apache-ignite-fabric-2.6.0-bin/config/persistent-config.xml
[2018-07-24T02:57:38,984][INFO ][main][IgniteKernal] IgniteConfiguration
[igniteInstanceName=null, pubPoolSize=56, svcPoolSize=56,
callbackPoolSize=56, stripedPoolSize=56, sysPoolSize=56, mgmtPoolSize=4,
igfsPoolSize=56, dataStreamerPoolSize=56, utilityCachePoolSize=56,
utilityCacheKeepAliveTime=6, p2pPoolSize=2, qryPoolSize=56,
igniteHome=/opt/apache-ignite-fabric-2.6.0-bin,
igniteWorkDir=/opt/apache-ignite-fabric-2.6.0-bin/work,
mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6f94fa3e,
nodeId=7e3c0623-a6a5-4a7b-966e-6882b86ff922,
marsh=org.apache.ignite.internal.binary.BinaryMarshaller@1890516e,
marshLocJobs=false, daemon=false, p2pEnabled=true, netTimeout=5000,
sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=1,
metricsUpdateFreq=2000, metricsExpTime=9223372036854775807,
discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=0, ackTimeout=0,
marsh=null, reconCnt=10, reconDelay=2000, maxAckTimeout=60,
forceSrvMode=false, clientReconnectDisabled=false, internalLsnr=null],
segPlc=RESTART_JVM, segResolveAttempts=2, waitForSegOnStart=true,
allResolversPassReq=true, segChkFreq=1, commSpi=TcpCommunicationSpi
[connectGate=null, connPlc=null, enableForcibleNodeKill=false,
enableTroubleshootingLog=false,
srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@42e25b0b,
locAddr=null, locHost=null, locPort=47100, locPortRange=100, shmemPort=-1,
directBuf=true, directSndBuf=false, idleConnTimeout=60,
connTimeout=5000, maxConnTimeout=60, reconCnt=10, sockSndBuf=32768,
sockRcvBuf=32768, msgQueueLimit=0, slowClientQueueLimit=0, nioSrvr=null,
shmemSrv=null, usePairedConnections=false, connectionsPerNode=1,
tcpNoDelay=true, filterReachableAddresses=false, ackSndThreshold=32,
unackedMsgsBufSize=0, sockWriteTimeout=2000, lsnr=null, boundTcpPort=-1,
boundTcpShmemPort=-1, selectorsCnt=28, selectorSpins=0, addrRslvr=null,
ctxInitLatch=java.util.concurrent.CountDownLatch@39b43d60[Count = 1],
stopping=false,
metricsLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationMetricsListener@44be0077],
evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@2205a05d,
colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [lsnr=null],
indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@5f20155b,
addrRslvr=null, clientMode=false, rebalanceThreadPoolSize=1,
txCfg=org.apache.ignite.configuration.TransactionConfiguration@72ade7e3,
cacheSanityCheckEnabled=true, discoStartupDelay=6, deployMode=SHARED,
p2pMissedCacheSize=100, locHost=null, timeSrvPortBase=31100,
timeSrvPortRange=100, failureDetectionTimeout=6,
clientFailureDetectionTimeout=3, metricsLogFreq=6, hadoopCfg=null,
connectorCfg=org.apache.ignite.configuration.ConnectorConfiguration@239105a8,
odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration
[seqReserveSize=1000, cacheMode=PARTITIONED, backups=1, aff=null,
grpName=null], classLdr=null, sslCtxFactory=null, platformCfg=null,
binaryCfg=null, memCfg=null, pstCfg=null, dsCfg=DataStorageConfiguration
[sysRegionInitSize=41943040, sysCacheMaxSize=104857600, pageSize=0,
concLvl=0, dfltDataRegConf=DataRegionConfiguration [name=default_Region,
maxSize=493921239040, initSize=107374182400, swapPath=null,
pageEvictionMode=DISABLED, evictionThreshold=0.9, emptyPagesPoolSize=100,
metricsEnabled=false, metricsSubIntervalCount=5,
metricsRateTimeInterval=6, persistenceEnabled=true,
checkpointPageBufSize=8589934592], storagePath=/data/ignite/persistence,
checkpointFreq=60, lockWaitTime=1, checkpointThreads=4,
checkpointWriteOrder=SEQUENTIAL, walHistSize=20, walSegments=10,
walSegmentSize=67108864, walPath=/wal, walArchivePath=/wal/archive,
metricsEnabled=false, walMode=BACKGROUND, walTlbSize=131072, walBuffSize=0,
walFlushFreq=5000, walFsyncDelay=1000, walRecordIterBuffSize=67108864,
alwaysWriteFullPages=false,
fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory@609bcfb6,
metricsSubIntervalCnt=5, metricsRateTimeInterval=6,
walAutoArchiveAfterInactivity=-1, writeThrottlingEnabled=false,
walCompactionEnabled=false], activeOnStart=true, autoActivation=true,
longQryWarnTimeout=3000, sqlConnCfg=null,

Ignite node failed for no obvious reason

2018-07-24 Thread Ray
I have a three node Ignite 2.6 cluster setup with the following config.































node1:49500
node2:49500
node3:49500














And I used this command to start Ignite service on three nodes.

./ignite.sh -J-Xmx32000m -J-Xms32000m -J-XX:+UseG1GC
-J-XX:+ScavengeBeforeFullGC -J-XX:+DisableExplicitGC -J-XX:+AlwaysPreTouch
-J-XX:+PrintGCDetails -J-XX:+PrintGCTimeStamps -J-XX:+PrintGCDateStamps
-J-XX:+PrintAdaptiveSizePolicy -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
-J-Xloggc:/spare/ignite/log/ignitegc-$(date +%Y_%m_%d-%H_%M).log 
config/persistent-config.xml

When I'm using Spark dataframe API to ingest data into this cluster, node2
failed.
The logic for this Spark job is rather simple, it just extract data from
hdfs then ingest into Ignite day by day.
Every Spark job represents one day's data, it usually take less then 10
minutes to complete.
>From the following Spark job web management page, we can see that the Job Id
282 takes more than 8 hours.
spark.png
  

So I suspect Ignite cluster is malfunctioning around 2018/07/23 16:25:52.

 
>From line 8619 in ignite-f7266ac7-1-2018-07-28.log at 2018-07-23T16:26:04,
it looks like there're some kind of network issue between node2 and Spark
client because there's lots of log saying "No route to host" etc.
node2-log.zip
  

Node2 at that time is still connected in the cluster, but after some time at
2018-07-24T00:50:14,941, node2 printed last metrics log and failed.
I can't see any obvious reason why node2 failed because node2 only prints
metric log in ignite-f7266ac7.log and the GC time is normal.

So I also check log in node1 and node3 around the time node2 failed, the log
snippet for node3 is
  14213 [2018-07-24T00:51:16,622][INFO
][tcp-disco-sock-reader-#4][TcpDiscoverySpi] Finished serving remote node
connection [rmtAddr=/10.252.10.4:59144, rmtPort=59144
  14214 [2018-07-24T00:51:16,623][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi]
TCP discovery accepted incoming connection [rmtAddr=/10.252.4.60, rm   
tPort=23056]
  14215 [2018-07-24T00:51:16,623][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi]
TCP discovery spawning a new thread for connection [rmtAddr=/10.252.   
4.60, rmtPort=23056]
  14216 [2018-07-24T00:51:16,624][INFO
][tcp-disco-sock-reader-#44][TcpDiscoverySpi] Started serving remote node
connection [rmtAddr=/10.252.4.60:23056, rmtPort=23056]
  14217 [2018-07-24T00:51:16,626][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi]
TCP discovery accepted incoming connection [rmtAddr=/10.29.42.45, rm   
tPort=45605]
  14218 [2018-07-24T00:51:16,626][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi]
TCP discovery spawning a new thread for connection [rmtAddr=/10.29.4   
2.45, rmtPort=45605]
  14219 [2018-07-24T00:51:16,626][INFO
][tcp-disco-sock-reader-#45][TcpDiscoverySpi] Started serving remote node
connection [rmtAddr=/10.29.42.45:45605, rmtPort=45605]
  14220 [2018-07-24T00:51:16,630][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi]
TCP discovery accepted incoming connection [rmtAddr=/10.29.42.48, rm   
tPort=57089]
  14221 [2018-07-24T00:51:16,630][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi]
TCP discovery spawning a new thread for connection [rmtAddr=/10.29.4   
2.48, rmtPort=57089]
  14222 [2018-07-24T00:51:16,630][INFO
][tcp-disco-sock-reader-#47][TcpDiscoverySpi] Started serving remote node
connection [rmtAddr=/10.29.42.48:57089, rmtPort=57089]
  14223 [2018-07-24T00:51:16,666][WARN
][disco-event-worker-#161][GridDiscoveryManager] Node FAILED:
TcpDiscoveryNode [id=f7266ac7-409a-454f-b601-b077b15594b3,
addrs=[10.252.10.4, 127.0.0.1],
sockAddrs=[rpsj1ign002.webex.com/10.252.10.4:49500, /127.0.0.1:49500],
discPort=49500, order=2, intOrder=2, lastExchangeTime=1532323119618,
loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=false]
  14224 [2018-07-24T00:51:16,667][INFO
][disco-event-worker-#161][GridDiscoveryManager] Topology snapshot [ver=90,
servers=2, clients=8, CPUs=392, offheap=920.0GB, heap=170.0GB]
  14225 [2018-07-24T00:51:16,667][INFO

Can wal archive be deleted?

2018-07-23 Thread Ray
The default config of Ignite is to store wal archive in 20 checkpoints.
So with the growing size of wal archive folder(336GB already while the data
folder is only 73GB on one node), can these archive files be deleted without
impacting the data integrity?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Apache Flink Sink + Ignite: Ouch! Argument is invalid

2018-07-23 Thread Ray
Hello Saikat,

Thanks for the fix.
I validated this fix by running my WordCount application in both standalone
mode and cluster mode.
The data can be inserted.

But I found another problem here.
The data written into Ignite is not correct.
My application counts the word occurrence in this the following sentence.
 "To be, or not to be,--that is the question:--",
 "Whether 'tis nobler in the mind to suffer",
 "The slings and arrows of outrageous fortune",
 "Or to take arms against a sea of troubles,

The count of word "to" should be 9.
But when I check the result in Ignite, all the values of every word is 1.
Clearly it's wrong.
The reproducer program is the same as I attached in the  JIRA ticket.

Please let me know if you can reproduce this issue.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Can't write to Ignite cluster

2018-07-16 Thread Ray
Hello Pavel,

I have found out why the topology version keeps increasing.
It's because my colleague created a customized Ignite monitor system which
will fetches metrics from Ignite visor.
And this monitor system will launch a Visor client connects to the cluster
every minute, after fetching all the metrics from Visor it will shut down.
This behavior will cause the topology version increasing. (This is a bad
practice to do monitor)
This is also why you're seeing the NODE_JOINED log.

Now I've stooped this monitor system and restarted cluster.
This issue seems to be fixed.

But we need to think is launching a Visor client connects to the cluster
every minute will cause cluster freeze expected behavior?
As I observed launching a Visor client connects to the cluster is not a
simple operation, it usually takes more than 5 seconds to finish partition
exchange and other steps before Visor is connected. 

Thanks for your help, Pavel.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: spark ignite - Failed to instantiate Spring XML application context

2018-07-13 Thread Ray
Please refer to this ticket for more information.

https://issues.apache.org/jira/browse/IGNITE-8534



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: spark ignite - Failed to instantiate Spring XML application context

2018-07-13 Thread Ray
Currently, Ignite can only works with Spark 2.2.
When Ignite 2.6 releases, it will be able to work with Spark 2.3.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Can't write to Ignite cluster

2018-07-13 Thread Ray
Here's the full log and thread dump for three nodes and client to ingest
data.

node1.zip
  
node2.zip
  
node3.zip
  
client_log.client_log

  

And I can only query Ignite cluster using sqlline.
When when I try to launch a Java client, it failed.
The log is similar with client_log I attached.
These two logs is printed again and again
18/07/13 01:33:18 WARN cache.GridCachePartitionExchangeManager: Failed to
wait for initial partition map exchange. Possible reasons are: 
  ^-- Transactions in deadlock.
  ^-- Long running transactions (ignore if this is the case).
  ^-- Unreleased explicit locks.
18/07/13 01:33:18 WARN internal.diagnostic: Failed to wait for partition map
exchange [topVer=AffinityTopologyVersion [topVer=25308, minorTopVer=0],
node=3c164ab8-0cf1-4451-8bfe-0c415ac932cd]. Dumping pending objects that
might be the cause: 

Can anybody advise me why the topology version keeps increasing so I can do
a preliminary research?
>From my prior experience with Ignite, the topology version shouldn't be
increasing when there's no data ingested into cluster.
 




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Can't write to Ignite cluster

2018-07-12 Thread Ray
I'm running a 3 nodes Ignite 2.4 cluster.
After some time, I can't write to this cluster using datastreamer.
Here's the log snippet.

[2018-07-12T09:31:05,839][ERROR][srvc-deploy-#179][GridServiceProcessor]
Error when executing service: null
 org.apache.ignite.IgniteException: Failed to resolve nodes topology
[cacheGrp=ignite-sys-cache, topVer=AffinityTopologyVersion [topVer=17315,
minorTopVer=0], history=[AffinityTopologyVersion [topVer=19039,
minorTopVer=0], AffinityTopologyVersion [topVer=19040, minorTopVer=0],
AffinityTopologyVersion [topVer=19041, minorTopVer=0],
AffinityTopologyVersion [topVer=19042, minorTopVer=0],
AffinityTopologyVersion [topVer=19043, minorTopVer=0],
AffinityTopologyVersion [topVer=19044, minorTopVer=0],
AffinityTopologyVersion [topVer=19045, minorTopVer=0],
AffinityTopologyVersion [topVer=19046, minorTopVer=0],
AffinityTopologyVersion [topVer=19047, minorTopVer=0],
AffinityTopologyVersion [topVer=19048, minorTopVer=0],
AffinityTopologyVersion [topVer=19049, minorTopVer=0],
AffinityTopologyVersion [topVer=19050, minorTopVer=0],
AffinityTopologyVersion [topVer=19051, minorTopVer=0],
AffinityTopologyVersion [topVer=19052, minorTopVer=0],
AffinityTopologyVersion [topVer=19053, minorTopVer=0],
AffinityTopologyVersion [topVer=19054, minorTopVer=0],
AffinityTopologyVersion [topVer=19055, minorTopVer=0],
AffinityTopologyVersion [topVer=19056, minorTopVer=0],
AffinityTopologyVersion [topVer=19057, minorTopVer=0],
AffinityTopologyVersion [topVer=19058, minorTopVer=0],
AffinityTopologyVersion [topVer=19059, minorTopVer=0],
AffinityTopologyVersion [topVer=19060, minorTopVer=0],
AffinityTopologyVersion [topVer=19061, minorTopVer=0],
AffinityTopologyVersion [topVer=19062, minorTopVer=0],
AffinityTopologyVersion [topVer=19063, minorTopVer=0],
AffinityTopologyVersion [topVer=19064, minorTopVer=0],
AffinityTopologyVersion [topVer=19065, minorTopVer=0],
AffinityTopologyVersion [topVer=19066, minorTopVer=0],
AffinityTopologyVersion [topVer=19067, minorTopVer=0],
AffinityTopologyVersion [topVer=19068, minorTopVer=0],
AffinityTopologyVersion [topVer=19069, minorTopVer=0],
AffinityTopologyVersion [topVer=19070, minorTopVer=0],
AffinityTopologyVersion [topVer=19071, minorTopVer=0],
AffinityTopologyVersion [topVer=19072, minorTopVer=0],
AffinityTopologyVersion [topVer=19073, minorTopVer=0],
AffinityTopologyVersion [topVer=19074, minorTopVer=0],
AffinityTopologyVersion [topVer=19075, minorTopVer=0],
AffinityTopologyVersion [topVer=19076, minorTopVer=0],
AffinityTopologyVersion [topVer=19077, minorTopVer=0],
AffinityTopologyVersion [topVer=19078, minorTopVer=0],
AffinityTopologyVersion [topVer=19079, minorTopVer=0],
AffinityTopologyVersion [topVer=19080, minorTopVer=0],
AffinityTopologyVersion [topVer=19081, minorTopVer=0],
AffinityTopologyVersion [topVer=19082, minorTopVer=0],
AffinityTopologyVersion [topVer=19083, minorTopVer=0],
AffinityTopologyVersion [topVer=19084, minorTopVer=0],
AffinityTopologyVersion [topVer=19085, minorTopVer=0],
AffinityTopologyVersion [topVer=19086, minorTopVer=0],
AffinityTopologyVersion [topVer=19087, minorTopVer=0],
AffinityTopologyVersion [topVer=19088, minorTopVer=0],
AffinityTopologyVersion [topVer=19089, minorTopVer=0],
AffinityTopologyVersion [topVer=19090, minorTopVer=0],
AffinityTopologyVersion [topVer=19091, minorTopVer=0],
AffinityTopologyVersion [topVer=19092, minorTopVer=0],
AffinityTopologyVersion [topVer=19093, minorTopVer=0],
AffinityTopologyVersion [topVer=19094, minorTopVer=0],
AffinityTopologyVersion [topVer=19095, minorTopVer=0],
AffinityTopologyVersion [topVer=19096, minorTopVer=0],
AffinityTopologyVersion [topVer=19097, minorTopVer=0],
AffinityTopologyVersion [topVer=19098, minorTopVer=0],
AffinityTopologyVersion [topVer=19099, minorTopVer=0],
AffinityTopologyVersion [topVer=19100, minorTopVer=0],
AffinityTopologyVersion [topVer=19101, minorTopVer=0],
AffinityTopologyVersion [topVer=19102, minorTopVer=0],
AffinityTopologyVersion [topVer=19103, minorTopVer=0],
AffinityTopologyVersion [topVer=19104, minorTopVer=0],
AffinityTopologyVersion [topVer=19105, minorTopVer=0],
AffinityTopologyVersion [topVer=19106, minorTopVer=0],
AffinityTopologyVersion [topVer=19107, minorTopVer=0],
AffinityTopologyVersion [topVer=19108, minorTopVer=0],
AffinityTopologyVersion [topVer=19109, minorTopVer=0],
AffinityTopologyVersion [topVer=19110, minorTopVer=0],
AffinityTopologyVersion [topVer=19111, minorTopVer=0],
AffinityTopologyVersion [topVer=19112, minorTopVer=0],
AffinityTopologyVersion [topVer=19113, minorTopVer=0],
AffinityTopologyVersion [topVer=19114, minorTopVer=0],
AffinityTopologyVersion [topVer=19115, minorTopVer=0],
AffinityTopologyVersion [topVer=19116, minorTopVer=0],
AffinityTopologyVersion [topVer=19117, minorTopVer=0],
AffinityTopologyVersion [topVer=19118, minorTopVer=0],
AffinityTopologyVersion [topVer=19119, minorTopVer=0],
AffinityTopologyVersion [topVer=19120, minorTopVer=0],
AffinityTopologyVersion [topVer=19121, minorTopVer=0],
AffinityTopologyVersion 

Re: Node pause for no obvious reason

2018-06-08 Thread Ray
I didn't have the dstat logs.
But I think these charts is the same as dstats logs.
CPU usage
CPU.png   
Memory usage
Memory.png
  
Swap
swap.png
  
Server load
load.png
  
Disk
disk.png
  

We're using bare metal servers for Ignite, each server has 768Gb memory and
56 core CPU.
One node per server.
I don't think hardware is the bottleneck here.

Will enable DEBUG log help in this case?
In my understanding, if JVM is in pause state Ignite can't produce any logs
even DEBUG level log is enabled, right?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Node pause for no obvious reason

2018-06-08 Thread Ray
Hi, 

Please see the GC log and the picture I attached.
Looks like the GC is not taking a very long time.

Yes, the checkpoint is taking a long time to finish.
Could it be the checkpoint thread has something to do with the node crash?
In my understanding, the checkpoint will not block other threads, right?

Yes, I'm using HDD and there's plenty of free space on the disc.
I disabled swapping using sysctl –w vm.swappiness=0 as this document says
https://apacheignite.readme.io/docs/durable-memory-tuning#section-adjust-swappiness-settings/
The CPU usage is normal when the node goes down.

What's weird about this is that the other 5 nodes have the same
configuration and under the same heavy write circumstance and they didn't go
down.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Node pause for no obvious reason

2018-06-08 Thread Ray
I setup a six node Ignite cluster to test the performance and stability.
Here's my setup.


















  

And I used this command to start the Ignite node.
./ignite.sh -J-Xmx32000m -J-Xms32000m -J-XX:+UseG1GC
-J-XX:+ScavengeBeforeFullGC -J-XX:+DisableExplicitGC -J-XX:+AlwaysPreTouch
-J-XX:+PrintGCDetails -J-XX:+PrintGCTimeStamps -J-XX:+PrintGCDateStamps
-J-XX:+PrintAdaptiveSizePolicy -J-Xloggc:/ignitegc-$(date
+%Y_%m_%d-%H_%M).log  config/persistent-config.xml

One of the node just dropped from the topology. Here's the log for last
three minutes before this node going down.
[08:39:58,982][INFO][grid-timeout-worker-#119][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=8333aa56, uptime=02:34:01.948]
^-- H/N/C [hosts=9, nodes=16, CPUs=552]
^-- CPU [cur=41%, avg=33.18%, GC=0%]
^-- PageMemory [pages=8912687]
^-- Heap [used=8942MB, free=72.05%, comm=32000MB]
^-- Non heap [used=70MB, free=95.35%, comm=73MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=6, qSize=0]
[08:40:51,945][INFO][db-checkpoint-thread-#178][GridCacheDatabaseSharedManager]
Checkpoint finished [cpId=77cf2fa2-2a9f-48ea-bdeb-dda81b15dac1,
pages=2050858, markPos=FileWALPointer [idx=2051, fileOff=38583904,
len=15981], walSegmentsCleared=0, markDuration=920ms, pagesWrite=12002ms,
fsync=965250ms, total=978172ms]
[08:40:53,086][INFO][db-checkpoint-thread-#178][GridCacheDatabaseSharedManager]
Checkpoint started [checkpointId=14d929ac-1b5c-4df2-a71f-002d5eb41f14,
startPtr=FileWALPointer [idx=2242, fileOff=65211837, len=15981],
checkpointLockWait=0ms, checkpointLockHoldTime=39ms,
walCpRecordFsyncDuration=720ms, pages=2110545, reason='timeout']
[08:40:57,793][INFO][data-streamer-stripe-1-#58][PageMemoryImpl] Throttling
is applied to page modifications [percentOfPartTime=0.22, markDirty=7192
pages/sec, checkpointWrite=2450 pages/sec, estIdealMarkDirty=139543
pages/sec, curDirty=0.00, maxDirty=0.17, avgParkTime=1732784 ns, pages:
(total=2110545, evicted=0, written=875069, synced=0, cpBufUsed=92,
cpBufTotal=518215)]
[08:40:58,991][INFO][grid-timeout-worker-#119][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=8333aa56, uptime=02:35:01.957]
^-- H/N/C [hosts=9, nodes=16, CPUs=552]
^-- CPU [cur=9.3%, avg=33%, GC=0%]
^-- PageMemory [pages=8920631]
^-- Heap [used=13262MB, free=58.55%, comm=32000MB]
^-- Non heap [used=70MB, free=95.34%, comm=73MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=6, qSize=0]
[08:41:29,050][WARNING][jvm-pause-detector-worker][] Possible too long JVM
pause: 22667 milliseconds.
[08:41:29,050][INFO][tcp-disco-sock-reader-#11][TcpDiscoverySpi] Finished
serving remote node connection [rmtAddr=/10.29.41.23:32815, rmtPort=32815
[08:41:29,052][INFO][tcp-disco-sock-reader-#13][TcpDiscoverySpi] Finished
serving remote node connection [rmtAddr=/10.29.41.25:46515, rmtPort=46515
[08:41:30,063][INFO][data-streamer-stripe-3-#60][PageMemoryImpl] Throttling
is applied to page modifications [percentOfPartTime=0.49, markDirty=26945
pages/sec, checkpointWrite=2612 pages/sec, estIdealMarkDirty=210815
pages/sec, curDirty=0.00, maxDirty=0.34, avgParkTime=1024456 ns, pages:
(total=2110545, evicted=0, written=1861330, synced=0, cpBufUsed=8657,
cpBufTotal=518215)]
[08:42:42,276][WARNING][jvm-pause-detector-worker][] Possible too long JVM
pause: 67967 milliseconds.
[08:42:42,277][INFO][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Local node
seems to be disconnected from topology (failure detection timeout is
reached) [failureDetectionTimeout=6, connCheckFreq=2]
[08:42:42,280][INFO][tcp-disco-sock-reader-#10][TcpDiscoverySpi] Finished
serving remote node connection [rmtAddr=/10.29.42.49:36509, rmtPort=36509
[08:42:42,286][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
accepted incoming connection [rmtAddr=/10.29.42.45, rmtPort=38712]
[08:42:42,286][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
spawning a new thread for connection [rmtAddr=/10.29.42.45, rmtPort=38712]
[08:42:42,287][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Started
serving remote node connection [rmtAddr=/10.29.42.45:38712, rmtPort=38712]
[08:42:42,289][WARNING][tcp-disco-msg-worker-#3][TcpDiscoverySpi] Node is
out of topology (probably, due to short-time network problems).
[08:42:42,290][INFO][tcp-disco-sock-reader-#15][TcpDiscoverySpi] Finished
serving remote node connection [rmtAddr=/10.29.42.45:38712, rmtPort=38712
[08:42:42,290][WARNING][disco-event-worker-#161][GridDiscoveryManager] Local
node 

Re: Apache Flink Sink + Ignite: Ouch! Argument is invalid

2018-06-05 Thread Ray
Yes, the cache is already created before running my flink application.

The issue can be reproduced when you submit your flink application to your
flink cluster.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Apache Flink Sink + Ignite: Ouch! Argument is invalid

2018-06-04 Thread Ray
I think it's a code bug in flink sink.
I had this same problem some time ago.
I think it's caused by compiler optimization of variable initialization in
multi thread environment(flink cluster mode).
In this case, the variable "cacheName" is not initialized when being used
because compile will optimize the variable initialize order in multi thread
environment.

I have created the ticket in jira and assigned to the author of flink sink.

https://issues.apache.org/jira/browse/IGNITE-8697



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Spark + Ignite standalone mode on Kubernetes cluster.

2018-06-01 Thread Ray
Currently, Ignite supports Spark up to version 2.2.0.
Please try with Spark 2.2.0 or wait until this ticket is resolved.
https://issues.apache.org/jira/browse/IGNITE-8534





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Large durable caches

2018-05-17 Thread Ray
I ran into this issue as well.
I'm running tests on a six node Ignite node cluster, the data load stuck
after 1 billion data is ingested.
Can someone take a look at this issue please?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: How to cancel long running sql query in sqlline?

2018-05-10 Thread Ray
I tried killing the sqlline process, but it is not working.
The sql is still running in the Ignite cluster because Ignite's response
time is affected and the disk io is high(the long running sql is intended to
load data from persistent store to memory)

Also I tried ODBC, looks like when abort the query half way in the ODBC
client side also can't cancel the sql query.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


How to cancel long running sql query in sqlline?

2018-05-10 Thread Ray
After I restarted an Ignite cluster using persistent store, I ran a query in
sqlline which takes a long time to finish.
Is there any way to cancel that query?

I googled and found run ran by java code can be canceled, so I wonder is
there any way to cancel a sql query ran by sqlline?





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


How to set Expiry Policies when using Dataframe API to save data to Ignite?

2018-04-11 Thread Ray
According to the document
https://apacheignite.readme.io/docs/expiry-policies, the expiry polices have
to be setup in the CacheConfiguration.
In the old RDD api, I can pass the CacheConfiguration to get a Ignite rdd
with expiry polices.
But in the Dataframe API, I don't see the expiry settings exists in the
IgniteDataFrameSettings.

So my question is how to set Expiry Policies when using Dataframe API to
save data to Ignite?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Strange node fail

2018-04-09 Thread Ray
I'm running two cluster on the same 8 nodes, one cluster with native
persistence enabled and the other one is memory only.
The only difference between these two cluster is that the cluster with
native persistence enabled has larger JVM heap than the other one.
The two nodes in the cluster with native persistence enabled failed today, I
looked at the log and GC log and found nothing strange.
I suspect because of the larger JVM heap, the GC takes a lot of time and
then the node is kicked out.
But I found nothing special in the GC log.
Please see the attachment for detailed log information.

node2gc.log
  
node2.log
  
node1gc.log
  
node1.log
  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Different behavior when saving date from Dataframe API and RDD API

2018-03-23 Thread Ray
I was trying out one of Ignite 2.4's new features - saving data from
dataframe.
But I found some inconsistency between the Dataframe API and RDD API.

This is the code from saving dataframe to Ignite.
DF.write
.format(FORMAT_IGNITE)
.mode(SaveMode.Append)
.option(OPTION_CONFIG_FILE, CONFIG)
.option(OPTION_TABLE, "table_name")
.option(OPTION_CREATE_TABLE_PRIMARY_KEY_FIELDS, "a,b,c,d")
.option(OPTION_CREATE_TABLE_PARAMETERS,
"template=partitioned,affinitykey=a")
.option(OPTION_STREAMER_ALLOW_OVERWRITE, "true")
.save()
After data finished saving, I ran this command to create an index on field
a.
CREATE INDEX IF NOT EXISTS idx ON table_name (a);
Then I run this query to see if the index is working.

explain select a from table_name where a = '303';
PLAN  SELECT
__Z0.a AS __C0_0
FROM PUBLIC.table_name __Z0
/* PUBLIC.AFFINITY_KEY: a = '303' */
WHERE __Z0.a = '303'

But when I try query the data I insert in the old RDD way, the result is
explain select a from table_name where a = '303';
PLAN  SELECT
__Z0.a AS __C0_0
FROM PUBLIC.table_name __Z0
/* PUBLIC.table_name_IDX: a = '303' */WHERE __Z0.a = '303'

The result shows with affinity key, the index created is not effective.
I tried creating index on other non affinity key field, the index is
working.
Please advise is this behavior expected?

THanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


DataFrame support for Apache Spark 1.6

2018-03-19 Thread Ray
I'm trying to save spark dataframe to Ignite 2.4 using Apache Spark 1.6.
But it failed, with the following error
Exception in thread "main" java.util.ServiceConfigurationError:
org.apache.spark.sql.sources.DataSourceRegister: Provider
org.apache.ignite.spark.impl.IgniteRelationProvider could not be
instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at 
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at
scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at 
scala.collection.TraversableLike$class.filter(TraversableLike.scala:263)
at scala.collection.AbstractTraversable.filter(Traversable.scala:105)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.lookupDataSource(ResolvedDataSource.scala:59)
at
org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:102)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:244)
at 
IgniteDataFrameWriteExample$.main(IgniteDataFrameWriteExample.scala:40)
at IgniteDataFrameWriteExample.main(IgniteDataFrameWriteExample.scala)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/internal/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructor0(Class.java:3075)
at java.lang.Class.newInstance(Class.java:412)
at 
java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 16 more
Caused by: java.lang.ClassNotFoundException:
org.apache.spark.internal.Logging
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 33 more

But it works fine under Spark 2.2.
So I'm wondering will Spark dataframe feature supports Spark 1.6 in the
future?
I can't upgrade to Spark 2.2 because Cloudera won't upgrade.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Two persistent data stores for a single Ignite cluster - RDBMS and Ignite native

2017-12-01 Thread Ray
Hi Slava,

Thanks for the reply, I have the same doubt.

One more question here, how can a update or new inserts back-propagate to 
Ignite when another application(not ignite) writes to 3rd party persistence? 

For example, Ignite and 3rd party persistence both have one entry for now. 
When another application adds an entry to 3rd party persistence, now 3rd
party persistence has two entries. 
Can Ignite be notified and load the newly added entry automatically? 

>From the document, it looks like the data can only be propagated from Ignite 
to persistence, not the other way around. 




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Can Ignite native persistence used with 3rd party persistence?

2017-12-01 Thread Ray
http://apache-ignite-users.70518.x6.nabble.com/Two-persistent-data-stores-for-a-single-Ignite-cluster-RDBMS-and-Ignite-native-td18463.html

Found a similar case here, I think I'll try Slava's suggestions first.

One more question here, how can a update or new inserts back-propagate to
Ignite when another application(not ignite) writes to persistence(hbase)?

For example, Ignite and hbase both have one entry for now.
When another application adds an entry to hbase, now hbase has two entries.
Can Ignite be notified and load the newly added entry automatically?

>From the document, it looks like the data can only be propagated from Ignite
to persistence, not the other way around.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Can Ignite native persistence used with 3rd party persistence?

2017-11-30 Thread Ray
Ignite native persistence provides on-disk sql query and quick cluster
startup without data loading, so we definitely want to use it.
But we have a legacy hbase served as persistence layer and there are some
business rely on it.
So can Ignite native persistence used with 3rd party persistence?

Basically we want data to be both persisted both on native persistence and
hbase when new entries goes into Ignite.
And when user queries data not in memory, Ignite will query its native
persistence.

Is this design supported by Ignite?



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite node crashes after one query fetches many entries from cache

2017-11-30 Thread Ray
Anton, thanks for the heads up.
Any idea on how to set the timeout if I'm using ODBC to do the query?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Ignite node crashes after one query fetches many entries from cache

2017-11-28 Thread Ray
I try to fetch all the results of a table with billions of entries using sql
like this "select * from table_name".
As far as my understanding, Ignite will prepare all the data on the node
running this query then return the results to the client.
The problem is that after a while, the node crashes(probably because of long
GC pause or running out of memory).
Is node crashing the expected behavior?
I mean it's unreasonable that Ignite node crashes after this kind of query.

>From my experience with other databases,  running this kind of full table
scan will not crash the node.

The optimal way for handling this kind of situation is Ignite node stays
alive, the query will be stopped by Ignite node when the node find out it
will run out of memory soon.
Then an error response shall be returned to the client.

Please advice me if this mechanism already exists and there is hidden switch
to turn it on.
Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Benchmark results questions

2017-11-02 Thread Ray
Hi Dmitry,

Thanks for you results.
I'll try to increase the number of clients as well to see if the throughput
can increase.

Can you also test the sql query throughput?
In my result, it seems with more servers the sql query throughput goes down
dramatically.
Is there a valid explanation for this phenomenon?

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Benchmark results questions

2017-11-01 Thread Ray
Hi Dmitry,

It's been a while now, did you check out what happens?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Inserting data into Ignite got stuck when memory is full with persistent store enabled.

2017-10-19 Thread Ray
Hi Dmitriy,

Thanks for the reply.

I know the eviction is automatic, but does eviction happen only when the
memory is full?
>From the log, I didn't see any "Page evictions started, this will affect
storage performance" message.

So my guess is that memory is not fully used up and no eviction happened.






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Inserting data into Ignite got stuck when memory is full with persistent store enabled.

2017-10-18 Thread Ray
Hi Dmitriy,

Thanks for the answers.

The cluster is stable during the data ingestion, no node joining or leaving
happened.
I've been monitoring the cluster's topology and cache entries numbers from
visor the whole time.
I'm also confused why rebalancing is triggered, from visor I can see that
every node has nearly the same amount of entries.

Did you find anything from the thread dump and log files?

As the eviction is not triggered so the memory in default region is not used
up, right?
Why increasing available memory will help with upload speed?





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Benchmark results questions

2017-10-18 Thread Ray
Hi Dmitry,

The dstat and gc log for 4 and 12 are in the attachment.

dstat.zip
  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Inserting data into Ignite got stuck when memory is full with persistent store enabled.

2017-10-17 Thread Ray
Hi Dmitriy,

I'll try setting those two parameters you mentioned, but I doubt it will
make a difference.
As I mentioned in the reply to Alexey, I find out that when the ingestion
speed is slowed down, ignite spends a lot of time on balancing.


Here's the thread dump for client and server.

The server has 757Gb of memory, but I limit the default memory region to
consume 32Gb of memory.
You can find the ignite server setup detail xml in my earlier reply.

My question is that, does the index also allocate from the default memory
region?
And when the memory is full with persistent store enabled, what eviction
policy will take effect?


client.txt
  
server.txt
  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Inserting data into Ignite got stuck when memory is full with persistent store enabled.

2017-10-17 Thread Ray
After reviewing the log, I don't see any logs with "Page evictions started,
this will affect storage performance" message.

But I find out that when the ingestion speed is slowed down, ignite spends a
lot of time on balancing.
I posted the log file earlier, you can check the archive file for logs and
gc logs.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Inserting data into Ignite got stuck when memory is full with persistent store enabled.

2017-10-17 Thread Ray
I'm using ignite 2.1.

The phenomenon I observed is that for the first 130M entries the speed is
OK, but after about 130M entries it slowed down tremendously and finally it
will stuck.
The problem is that when I ingest small amount of data like 20M, it works OK
and the performance is acceptable.
But when the entries number grow, the issue will happen for sure.
Please try ingesting more data and see if you can reproduce this issue.

My servers have HDD, and I tested the random write speed using this command
" fio -filename=/dev/sda -direct=1 -iodepth 1 -thread -rw=randwrite
-ioengine=psync -bs=16k -size=200G -numjobs=30 -runtime=1000
-group_reporting -name=mytest
"
The result is
mytest: (g=0): rw=randwrite, bs=16K-16K/16K-16K/16K-16K, ioengine=psync,
iodepth=1
...
mytest: (g=0): rw=randwrite, bs=16K-16K/16K-16K/16K-16K, ioengine=psync,
iodepth=1
fio-2.0.13
Starting 30 threads
Jobs: 11 (f=11): [___ww__w_w___ww__w] [5.0% done] [0K/3072K/0K
/s] [0 /192 /0  iops] [eta 05h:16m:40s]
mytest: (groupid=0, jobs=30): err= 0: pid=17321: Tue Oct 17 13:54:12 2017
  write: io=5835.2MB, bw=5969.5KB/s, iops=373 , runt=1000960msec
clat (usec): min=149 , max=2013.4K, avg=80349.31, stdev=53474.06
 lat (usec): min=150 , max=2013.4K, avg=80351.00, stdev=53474.13
clat percentiles (msec):
 |  1.00th=[4],  5.00th=[   10], 10.00th=[   14], 20.00th=[   48],
 | 30.00th=[   67], 40.00th=[   78], 50.00th=[   86], 60.00th=[   92],
 | 70.00th=[   99], 80.00th=[  109], 90.00th=[  123], 95.00th=[  139],
 | 99.00th=[  180], 99.50th=[  215], 99.90th=[  424], 99.95th=[  611],
 | 99.99th=[ 2008]
bw (KB/s)  : min=7, max= 8440, per=3.36%, avg=200.43, stdev=72.14
lat (usec) : 250=0.01%, 500=0.04%, 750=0.05%, 1000=0.01%
lat (msec) : 2=0.09%, 4=1.71%, 10=3.23%, 20=9.55%, 50=5.60%
lat (msec) : 100=50.94%, 250=28.38%, 500=0.33%, 750=0.02%, 1000=0.01%
lat (msec) : 2000=0.01%, >=2000=0.02%
  cpu  : usr=0.00%, sys=0.00%, ctx=331635, majf=0,
minf=18446744073700073665
  IO depths: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
 issued: total=r=0/w=373449/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=5835.2MB, aggrb=5969KB/s, minb=5969KB/s, maxb=5969KB/s,
mint=1000960msec, maxt=1000960msec

Disk stats (read/write):
  sda: ios=166/373605, merge=0/108, ticks=203/30925452, in_queue=30935375,
util=100.00%



The estimated field length for key+value is 300 bytes each entry.
So the total amount will be write to disc will be
(300+300)*1.3*550*1000*1000 bytes = 426GB
Because I have four ignite servers, so the total time will be 426 / 4 /
(5969 / 1000 / 1000) / 60 / 60 = 5 hours.

So apparently the disk writing speed is not an issue, it took ignite more
13.5 hours to ingest 250M entries.

By the way, I launched six ignite clients on the same node ignite server
runs.








--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Inserting data into Ignite got stuck when memory is full with persistent store enabled.

2017-10-16 Thread Ray
The above log is captured when the data ingestion slowed down, not stuck
completely.
The job has been running two and a half hour now, and the total records to
be ingested is 550 million.
During last ten minutes, less than one million records has been ingested
into Ignite.

The performance for writing using IgniteDataStreamer is really slow with
persistent store enabled.
I also did this test on another cluster without persistent store enabled, it
took about forty minutes to save all 550 million records using the same
code.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Benchmark results questions

2017-10-16 Thread Ray
Already tried tuning a few parameters, here's some quick update.

Increasing 128 threads to 256 threads in yardstick configuration for 8 node
setup, the result is still the same.

Increasing warm up seconds from 60s to 120s in yardstick configuration for 8
node setup, the result is still the same.

Increasing 1 driver to 2 driver in yardstick configuration for 8 node setup,
the result is still the same.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Benchmark results questions

2017-10-15 Thread Ray
Please see the attachment for  yardstick configuration/parameters and cluster
nodes configuration I used.

Yes, I run the standard classes without modification, I run benchmark for
these three classes IgnitePutBenchmark.java IgnitePutGetBenchmark.java
IgniteSqlQueryBenchmark.java.
 
config.zip
  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Benchmark results questions

2017-10-13 Thread Ray
We use Yardstick to do some benchmark of Ignite in our lab.
The Ignite cluster size scale from one node with 16GB memory to 12 node with
64GB memory.
We evaluated Atomic Put, Atomic PUT/GET and SQL Query.
Please see the results in the attachment.

My question is that the performance for Atomic Put and Atomic PUT/GET is
almost the same from 4 node with 16GB to 12 node with 64GB, is it normal?

And another question is that the SQL Query's performance dropped
dramatically when there's more nodes in the cluster, is it normal?

We're relying on the benchmark results to determine our cluster size.
Benchmark.pdf
  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Inserting data into Ignite got stuck when memory is full with persistent store enabled.

2017-10-12 Thread Ray
My Ignite config is as follows





























 






It's stuck forever and I waited 10 more hours and the ingestion still not
finished.
And on the other memory only Ignite cluster without persistent store
enabled, the job took 30 minutes to ingest 550 million entries.

Thanks for the suggestion, I'll try to add the checkpoint config.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Ignite SQL function questions

2017-10-11 Thread Ray
I've been reading Ignite official document and going through Ignite source
code for a few while now.

>From the source code I see that Ignite can be a key/value store and the data
is stored in a ConcurrentHashMap for every partition.
So this is one copy of data in memory.

For Ignite SQL function, it seems like Ignite creates a H2 table in memory
based on the fields user want to be query with SQL.
So does it mean for every key with @QuerySqlField annotation, there will be
two copies of data in memory(one in ConcurrentHashMap and one in in-memory
H2 table)?








--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Ignite SQL function questions

2017-10-11 Thread Ray Liu (rayliu)
I've been reading Ignite official document and going through Ignite source code 
for a few while now.

From the source code, I see that Ignite can be a key/value store and the data 
is stored in a ConcurrentHashMap for every partition.
So, this is one copy of data in memory.

For Ignite SQL function, it seems like Ignite creates a H2 table in memory 
based on the fields user want to be query with SQL.
So, does it mean for every key with @QuerySqlField annotation, there will be 
two copies of data in memory (one in ConcurrentHashMap and one in in-memory H2 
table)?




Inserting data into Ignite got stuck when memory is full with persistent store enabled.

2017-10-11 Thread Ray
I had the same problem in this thread
http://apache-ignite-users.70518.x6.nabble.com/Performance-of-persistent-store-too-low-when-bulb-loading-td16247.html

Basically, I'm using ignite-spark's saveParis method(which is also a 
IgniteDataStreamer) to ingest 550 million entries of data into Ignite, and 
the ingesting speed got slow down after first few minutes and seems stuck 
for now with persistent store enabled. 
My setup is 4 nodes with 16GB heap size and 32GB off heap size. 

After initial analysis, I found that when the 32GB off heap space is used
up, the writing is almost stuck.
>From the document, my understanding is if I enable persistent store the old
data will be swapped to disc.
But the writing stuck.

Here's the dstat result for about a minute when the ingesting is stuck. 
total-cpu-usage --memory-usage- -dsk/total- ---paging-- 
swap--- --filesystem- 
usr sys idl wai hiq siq| used  buff  cach  free| read  writ|  in   out | 
used  free|files  inodes 
  1   0  99   0   0   0|89.9G 3970M  157G  507G|  30k  967k|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   012M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   0  9548k|   0 0 |   0
 
0 | 8176453k 
  0   0  96   3   0   0|89.9G 3970M  157G  507G|   018M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   017M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   017M|   0 0 |   0
 
0 | 8176453k 
  0   0  96   3   0   0|89.9G 3970M  157G  507G|   016M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   015M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   017M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   016M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   018M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   017M|   0 0 |   0
 
0 | 8176453k 
  2   1  94   3   0   0|89.9G 3970M  157G  507G|   015M|   0 0 |   0
 
0 | 8176453k 
  1   0  96   3   0   0|89.9G 3970M  157G  507G|   024M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   025M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   020M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   026M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   016M|   0 0 |   0
 
0 | 8176453k 
  0   0  96   3   0   0|89.9G 3970M  157G  507G|   010M|   0 0 |   0
 
0 | 8176453k 
  1   0  96   3   0   0|89.9G 3970M  157G  507G|   024M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   020M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   025M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   031M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   023M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   021M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   021M|   0 0 |   0
 
0 | 8176453k 
  0   0  96   3   0   0|89.9G 3970M  157G  507G|   027M|   0 0 |   0
 
0 | 8176453k 
  0   0  96   3   0   0|89.9G 3970M  157G  507G|   024M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   027M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   016M|   0 0 |   0
 
0 | 8176453k 
  0   0  96   3   0   0|89.9G 3970M  157G  507G|   020M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   028M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   022M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   024M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   027M|   0 0 |   0
 
0 | 8176453k 
  1   0  96   3   0   0|89.9G 3970M  157G  507G|   027M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   024M|   0 0 |   0
 
0 | 8176453k 
  1   0  96   3   0   0|89.9G 3970M  157G  507G|   029M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   027M|   0 0 |   0
 
0 | 8176453k 
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   023M| 

Re: Performance of persistent store too low when bulb-loading

2017-10-09 Thread Ray
I also had the same problem here.
I'm using ignite-spark's saveParis method(which is also a
IgniteDataStreamer) to ingest 550 million entries of data into Ignite, and
the ingesting speed got slow down after first few minutes and seems stuck
for now with persistent store enabled. 
My setup is 4 nodes with 16GB heap size and 32GB off heap size.

Here's the dstat result for about a minute when the ingesting is stuck.
total-cpu-usage --memory-usage- -dsk/total- ---paging--
swap--- --filesystem-
usr sys idl wai hiq siq| used  buff  cach  free| read  writ|  in   out |
used  free|files  inodes
  1   0  99   0   0   0|89.9G 3970M  157G  507G|  30k  967k|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   012M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   0  9548k|   0 0 |   0
0 | 8176453k
  0   0  96   3   0   0|89.9G 3970M  157G  507G|   018M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   017M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   017M|   0 0 |   0
0 | 8176453k
  0   0  96   3   0   0|89.9G 3970M  157G  507G|   016M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   015M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   017M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   016M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   018M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   017M|   0 0 |   0
0 | 8176453k
  2   1  94   3   0   0|89.9G 3970M  157G  507G|   015M|   0 0 |   0
0 | 8176453k
  1   0  96   3   0   0|89.9G 3970M  157G  507G|   024M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   025M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   020M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   026M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   016M|   0 0 |   0
0 | 8176453k
  0   0  96   3   0   0|89.9G 3970M  157G  507G|   010M|   0 0 |   0
0 | 8176453k
  1   0  96   3   0   0|89.9G 3970M  157G  507G|   024M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   020M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   025M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   031M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   023M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   021M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   021M|   0 0 |   0
0 | 8176453k
  0   0  96   3   0   0|89.9G 3970M  157G  507G|   027M|   0 0 |   0
0 | 8176453k
  0   0  96   3   0   0|89.9G 3970M  157G  507G|   024M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   027M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   016M|   0 0 |   0
0 | 8176453k
  0   0  96   3   0   0|89.9G 3970M  157G  507G|   020M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   028M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   022M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   024M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   027M|   0 0 |   0
0 | 8176453k
  1   0  96   3   0   0|89.9G 3970M  157G  507G|   027M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   024M|   0 0 |   0
0 | 8176453k
  1   0  96   3   0   0|89.9G 3970M  157G  507G|   029M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   027M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   023M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   017M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   062M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   016M|   0 0 |   0
0 | 8176453k
  0   0  96   3   0   0|89.9G 3970M  157G  507G|   017M|   0 0 |   0
0 | 8176453k
  0   0  97   3   0   0|89.9G 3970M  157G  507G|   020M|   0 

Re: Ignite started as a server node instead of client node in yarn-client mode

2017-10-09 Thread Ray
Solved my own problem by adding

in the configuration file.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite started as a server node instead of client node in yarn-client mode

2017-10-08 Thread Ray
Found these code on IgniteContext.scala
// Start ignite server node on each worker in server mode.
sparkContext.parallelize(1 to workers, workers).foreachPartition(it
⇒ ignite())

Looks like starting server node on each executor is expected.
But I want to ingest data to an existing Ignite cluster using a spark job,
when the spark job finishes the executor will be revoked.
If server node is launched on each executor the data stored on the server
node will be lost.
Please advise me how I can accomplish my requirement.

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Ignite started as a server node instead of client node in yarn-client mode

2017-10-08 Thread Ray
I'm trying to ingest data into using ignite-spark-2.10.
I followed the guide in ScalarSharedRDDExample.scala and started
IgniteContext like this
val igniteContext = new IgniteContext(sparkContext, "example-ignite.xml",
true)

And I started six executors in yarn-client mode, and I expect Ignite will
launch six client on each of the executors to ingest data into Ignite.
But Ignite launch six server nodes instead of client nodes on the executors.
I see these two lines of code in IgniteContext.scala on line number 139.
// check if called from driver
if (sparkContext != null) igniteCfg.setClientMode(true)

So it confuses me, I clearly passed the sparkContext into IgniteContext but
Ignite still launches server nodes on executors.
So I wonder is there any configuration that you didn't mention in the docs
and ScalarSharedRDDExample.scala I should change to get Ignite launch as
client mode on the executors or is this a bug?

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Question about number of total onheap and offheap cache entries.

2017-10-03 Thread Ray
Looks like this ticket is the answer I'm looking for.
https://issues.apache.org/jira/browse/IGNITE-6131




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Question about number of total onheap and offheap cache entries.

2017-10-03 Thread Ray
I mean in the sixth column "Size" of last chart of "cache -a" command, the
"Heap" entry count is not 0.
As the above picture shows.
 

But in the cache summary chart, the on-heap entry count is 0.
 

It's really confusing that these two statistics does not match.
Please let me know which of these is correct.

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Question about number of total onheap and offheap cache entries.

2017-10-03 Thread Ray
But my on-heap entry count is still not 0 here.
As I didn't set onHeapEnabled=true, the value should be false in default,
right?
So is this a visor bug that the on-heap entry count is not 0?




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Question about number of total onheap and offheap cache entries.

2017-10-02 Thread Ray
Hi Alexey

My cache configuration is as follows.
cacheConfig.setName("DailyAggData")
cacheConfig.setIndexedTypes(classOf[A], classOf[B])
cacheConfig.setSqlSchema("PUBLIC")
aggredCacheConfig.setBackups(2)
cacheConfig.setQueryParallelism(8)

I didn't explicitly set "onHeapEnabled=true".
So what will happen if I perform get & sql operations with
onHeapEnabled=false?
Will off-heap entries be brought on-heap?

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Question about number of total onheap and offheap cache entries.

2017-10-02 Thread Ray
Thank you Vasiliy for your answer.

In the ticket, it looks like the problem is with the off-heap entry count.
But my question is why the on-heap entry count is always the same as the
off-heap entry count?
Ignite stores the cache off-heap by default, so in design there will be zero
on-heap entry count when the cache is not visited, right?
Or is there some eviction configuration I need to set up for the entries to
be evicted after they are visited and brought on-heap?
I've been watching the visor stats for a day now, and the cache is not
visited during the day.
But the on-heap entry count is still not zero.

And is there any release date for Ignite 2.3?

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Question about number of total onheap and offheap cache entries.

2017-09-30 Thread Ray
I have a cache with 1 million entries.
This cache is set up with partitioned cache mode and two backup.
So Ignite will store 3 million entries on all nodes, right?
But when I try run cache and cache command in vision, visor tells that this
cache has total 7.5 million rows.
So I'm confused, why does Ignite store 7.5 million rows?
And I'm using Ignite 2.1, and by default the cache is stored off heap.
But visor tells that 3.5 million entries are store onheap and the other 3.5
million is stored offheap.
Here's the visor log

visor> cache
[2017-09-30T12:15:10,497][INFO
][grid-nio-worker-tcp-comm-0-#65%null%][TcpCommunicationSpi] Established
outgoing communication connection [locAddr=/10.29.42.49:34233,
rmtAddr=sap-datanode3/10.29.42.46:49100]
[2017-09-30T12:15:10,509][INFO
][grid-nio-worker-tcp-comm-1-#66%null%][TcpCommunicationSpi] Established
outgoing communication connection [locAddr=/10.29.42.49:40210,
rmtAddr=sap-datanode5/10.29.42.48:49100]
[2017-09-30T12:15:10,514][INFO
][grid-nio-worker-tcp-comm-2-#67%null%][TcpCommunicationSpi] Established
outgoing communication connection [locAddr=/127.0.0.1:47416,
rmtAddr=/127.0.0.1:49100]
[2017-09-30T12:15:10,521][INFO
][grid-nio-worker-tcp-comm-3-#68%null%][TcpCommunicationSpi] Established
outgoing communication connection [locAddr=/10.29.42.49:17331,
rmtAddr=sap-datanode4/10.29.42.47:49100]
Time of the snapshot: 09/30/17, 12:15:10
+=+
|  Name(@)  |Mode | Nodes | Entries (Heap / Off-heap)
|   Hits|  Misses   |   Reads   |  Writes   |
+=+
| DailyAggData(@c0) | PARTITIONED | 4 | min: 744297 (0 / 744297) 
| min: 0| min: 0| min: 0| min: 0|
|   | |   | avg: 752107.50 (0.00 /
752107.50) | avg: 0.00 | avg: 0.00 | avg: 0.00 | avg: 0.00 |
|   | |   | max: 758153 (0 / 758153) 
| max: 0| max: 0| max: 0| max: 0|
+-+

Use "-a" flag to see detailed statistics.
visor> cache -a
Time of the snapshot: 09/30/17, 12:15:16
+=+
|  Name(@)  |Mode | Nodes | Entries (Heap / Off-heap)
|   Hits|  Misses   |   Reads   |  Writes   |
+=+
| DailyAggData(@c0) | PARTITIONED | 4 | min: 744297 (0 / 744297) 
| min: 0| min: 0| min: 0| min: 0|
|   | |   | avg: 752107.50 (0.00 /
752107.50) | avg: 0.00 | avg: 0.00 | avg: 0.00 | avg: 0.00 |
|   | |   | max: 758153 (0 / 758153) 
| max: 0| max: 0| max: 0| max: 0|
+-+

Cache 'DailyAggData(@c0)':
+---+
| Name(@) | DailyAggData(@c0)   |
| Nodes   | 4   |
| Total size Min/Avg/Max  | 744297 / 752107.50 / 758153 |
|   Heap size Min/Avg/Max | 0 / 0.00 / 0|
|   Off-heap size Min/Avg/Max | 744297 / 752107.50 / 758153 |
+---+

Nodes for: DailyAggData(@c0)
+==+
|  Node ID8(@), IP   | CPUs | Heap Used | CPU Load |   Up Time| 
   
Size | Hi/Mi/Rd/Wr |
+==+
| 7A05C9B9(@n3), | 56   | 8.31 %| 0.00 %   | 30:06:45:032 | Total:
1507662   | Hi: 0   |
||  |   |  |  |  
Heap: 753831   | Mi: 0   |
||  |   |  |  |  
Off-Heap: 753831   | Rd: 0   |
||  |   |  |  |  
Off-Heap Memory: 0 | Wr: 0   |
++--+---+--+--+--+-+
| B39533DA(@n1), | 56   | 17.61 %   | 0.00 %   | 30:08:13:327 | Total:
1488594   | Hi: 0   |
||  |   |  |  |  
Heap: 744297   | Mi: 0   |
||  |   |  |  |  
Off-Heap: 744297   | Rd: 0   

Re: Full table scan query by ODBC causing node shutdown

2017-09-27 Thread Ray
Ok, I'll try increasing the heap size.

One more question here, from the log it says the full table scan query is
taking 30s.
And I wonder is there any way to speed the query up?
I found this article
https://apacheignite.readme.io/v2.2/docs/sql-performance-and-debugging#section-query-parallelism

And my question is if I set CacheConfiguration.queryParallelism parameter
when I ingest the data, will it take effect when I query the data by ODBC?
Or is it only effective querying the data by Java API with
CacheConfiguration.queryParallelism specified?

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Full table scan query by ODBC causing node shutdown

2017-09-27 Thread Ray
I have a cache with 6 million rows.
The cache is configured with Partitioned cache mode and 3 backups. 
And I'm running Ignite 2.1 on 4 nodes and each have a 8Gb heap size and 30Gb
non-heap size configured.
When I'm trying to fetch all the rows using "select * from mytable" by odbc
driver on windows, the node I'm querying will throw exception and eventually
shut down.

The debug log is in the attachment, the sql query begins at line 2579.
 
Please advice me how to solve this problem
Thanks
ignite-6593a74d.log
 
 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite YARN deployment mode issues

2017-09-20 Thread Ray
Hi Ilya

Do you mean modify the original config from 
 
 
 
 
 

to 

 
 
 
 

if my understanding is correct?

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite YARN deployment mode issues

2017-09-20 Thread Ray
Thanks for the reply, Ilya.

The IGNITE_HOME for yarn is
/yarn/nm/usercache/username/appcache/application_appid/container_containerID/ignite/

The log4j2.xml is under IGNITE_HOME/bin folder too.
If I launch ignite using ignist.sh, it's working fine and the log4j2.xml
under  IGNITE_HOME/bin folder  can be picked up by ignite.
But for YARN deployment, the issue still exists.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


WAL log folder issue in YARN mode

2017-09-20 Thread Ray
When I deployed ignite as a YARN application with persistent store enabled.
The WAL logs are under the
/yarn/nm/usercache/username/appcache/application_appid/container_containerID/ignite/apache-ignite-fabric-2.1.0-bin/work/db/wal/.
But when ignite is restarted using YARN, a new appid will be created so the
WAL in the old app will not be copied to the new app.
I tried setting IGNITE_WORK_DIR to a local directory and IGNITE_RELEASES_DIR
to a hdfs directory hoping ignite will save the WAL logs to these folders,
but it does not.
Please advice me how to solve this issue.

Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite YARN deployment mode issues

2017-09-19 Thread Ray
Figured the second question out by myself.
The node I'm running ./control.sh does not seem to have ignite grid started.
Here's the yarn log.



[08:09:30,866][SEVERE][main][IgniteKernal] Exception during start
processors, node will be stopped and close connections
class org.apache.ignite.IgniteException:
/yarn/nm/usercache/root/appcache/application_1505700561210_0016/container_1505700561210_0016_01_76/ignite/apache-ignite-fabric-2.1.0-bin/work/db/10_29_42_49_127_0_0_1_47500/lock
(Permission denied)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$FileLockHolder.(GridCacheDatabaseSharedManager.java:2931)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$FileLockHolder.(GridCacheDatabaseSharedManager.java:2899)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:374)
at
org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61)
at
org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:696)
at
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1788)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:929)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1896)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1648)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1076)
at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:994)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:880)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:779)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:649)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:618)
at org.apache.ignite.Ignition.start(Ignition.java:347)
at
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:302)
Caused by: java.io.FileNotFoundException:
/yarn/nm/usercache/root/appcache/application_1505700561210_0016/container_1505700561210_0016_01_76/ignite/apache-ignite-fabric-2.1.0-bin/work/db/10_29_42_49_127_0_0_1_47500/lock
(Permission denied)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.(RandomAccessFile.java:243)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$FileLockHolder.(GridCacheDatabaseSharedManager.java:2925)
... 16 more
[08:09:30,868][SEVERE][main][IgniteKernal] Got exception while starting
(will rollback startup routine).
class org.apache.ignite.IgniteException:
/yarn/nm/usercache/root/appcache/application_1505700561210_0016/container_1505700561210_0016_01_76/ignite/apache-ignite-fabric-2.1.0-bin/work/db/10_29_42_49_127_0_0_1_47500/lock
(Permission denied)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$FileLockHolder.(GridCacheDatabaseSharedManager.java:2931)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$FileLockHolder.(GridCacheDatabaseSharedManager.java:2899)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.start0(GridCacheDatabaseSharedManager.java:374)
at
org.apache.ignite.internal.processors.cache.GridCacheSharedManagerAdapter.start(GridCacheSharedManagerAdapter.java:61)
at
org.apache.ignite.internal.processors.cache.GridCacheProcessor.start(GridCacheProcessor.java:696)
at
org.apache.ignite.internal.IgniteKernal.startProcessor(IgniteKernal.java:1788)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:929)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1896)
at
org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1648)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1076)
at
org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:994)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:880)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:779)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:649)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:618)
at org.apache.ignite.Ignition.start(Ignition.java:347)
at
org.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:302)
Caused by: java.io.FileNotFoundException:

Ignite YARN deployment mode issues

2017-09-19 Thread Ray
I'm using ignite 2.1, my ignite config xml is as follows.

   












And in standalone mode, ignite can read the log4j2.xml in local directory
and works ok.
When deployed in YARN, ignite says it can find log4j2.xml with a bunch of
spring exceptions.
My cluster.properties is as follows 

IGNITE_NODE_COUNT=6
IGNITE_RUN_CPU_PER_NODE=6
IGNITE_MEMORY_PER_NODE=3
IGNITE_VERSION=2.1.0
IGNITE_PATH=/***/apache-ignite-fabric-2.1.0.zip
IGNITE_XML_CONFIG=/***/ignite-config/default-config.xml
IGNITE_USERS_LIBS=/***/ignite-libs/

I tried to put the log4j2.xml file under IGNITE_USERS_LIBS, still not
working.

And another question is how to activate Ignite with persistent store enabled
in YARN mode.
I tried running ./control.sh --host ignitenode --port 11211 --activate
And the log says.
Sep 19, 2017 6:25:27 AM
org.apache.ignite.internal.client.impl.GridClientImpl 
WARNING: Failed to initialize topology on client start. Will retry in
background.
Sep 19, 2017 6:25:27 AM
org.apache.ignite.internal.client.impl.GridClientImpl 
INFO: Client started [id=5f00be2b-7679-46e7-9f8e-3435f7f1d759, protocol=TCP]
Something fail during activation, exception message: Latest topology update
failed.
Sep 19, 2017 6:25:27 AM
org.apache.ignite.internal.client.impl.GridClientImpl stop
INFO: Client stopped [id=5f00be2b-7679-46e7-9f8e-3435f7f1d759,
waitCompletion=true]
And there's no log in the server node side.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: Ignite cluster with persistent store enabled did not load from wal after restarting.

2017-09-18 Thread Ray
This problem magically fixed itself after a few restart.

Don't know what happened, but delete all files under work/db might help.




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


  1   2   >