Re: FW: class loading, peer class loading, jars, fun times in ignite

2019-05-29 Thread Dave Harvey
With these symptoms when the programmer understands the rules, there is still a somewhat frequent bug where "cache.withKeepBinary() "is what is required, rather than simply "cache". On Wed, May 29, 2019 at 12:29 PM Dmitriy Pavlov wrote: > Hi Scott, > > actually, users are encouraged to suggest

Re: JMX port for Ignite in docker

2019-03-18 Thread Dave Harvey
We had found we needed to change this in ignite.sh along these lines, so we only had to expose one port out of the container.Otherwise you need to expose the RMI port also. # Newer Java versions (1.8.0_121+) allow the RMI port to be the same port. if [ -n "$JMX_PORT" ]; then

Re: Different paths for storagePath and WAL from docker

2019-02-15 Thread Dave Harvey
https://apacheignite.readme.io/docs/docker-deployment shows sudo docker run -it --net=host -e "CONFIG_URI=$CONFIG_URI" [-e "OPTION_LIBS=$OPTION_LIBS"] [-e "JVM_OPTS=$JVM_OPTS"] ... $CONFIG_URI can be https:xml which is the configuration file. A configuration file can say to use ENVIRONMENT

Re: Different paths for storagePath and WAL from docker

2019-02-15 Thread Dave Harvey
We have a common spring file accessible via HTTP. Inside and we vary the environment variables. NOTE: the work directory has state that also needs to be persistent. persistent copies of the above are worthless without a persistent copy of

Re: Avoiding Docker Bridge network when using S3 discovery

2018-12-21 Thread Dave Harvey
Created https://jira.apache.org/jira/browse/IGNITE-10791 for this -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite in docker (Native Persistence)

2018-12-18 Thread Dave Harvey
"protocol": "tcp", "containerPort": 9000, "hostPort": 9000 }, { "protocol": "tcp", &

Re: Ignite in docker (Native Persistence)

2018-12-18 Thread Dave Harvey
See attached, which we use in our AWS ECS containers. Note that beside WAL and data, the work directory needs persistence, because it has all the typeID mappings. On Tue, Dec 18, 2018 at 7:32 AM Павлухин Иван wrote: > Hi Rahul, > > Could you please share an ignite configuration and how do you

Re: Avoiding Docker Bridge network when using S3 discovery

2018-12-04 Thread Dave Harvey
, Dec 3, 2018 at 3:31 PM Stanislav Lukyanov wrote: > Hi, > > > > Have you been able to solve this? > > I think specifying TcpDiscoverySpi.localAddress should work. > > > > Stan > > > > *From: *Dave Harvey > *Sent: *17 октября 2018 г. 20:10 >

Re: Snapshotting and Restore in Ignite

2018-11-14 Thread Dave Harvey
Gridgain has some kind of snapshoting add in. You can save and restore the workDirectory from each node when the cluster is in a stable state, provided that you use the same CONSISTENT_ID when restoring. We were able to convert the directory name back into a consistent ID on restore, but we had

Re: CPU Count Drop On ECS

2018-10-26 Thread Dave Harvey
Reverting the AWS AMI did not revert the ECS agent version. On Fri, Oct 26, 2018 at 6:11 PM Dave Harvey wrote: > I was running a 8x i3.8xlarge cluster on AWS ECS, and it would normally > display > > ^-- H/N/C [hosts=8, nodes=8, CPUs=256] > > > Then I recreated it and g

CPU Count Drop On ECS

2018-10-26 Thread Dave Harvey
I was running a 8x i3.8xlarge cluster on AWS ECS, and it would normally display ^-- H/N/C [hosts=8, nodes=8, CPUs=256] Then I recreated it and got this, and problems like I can no longer start visor. ^-- H/N/C [hosts=8, nodes=8, CPUs=8] Our ECS task specifies CPU=0 -> 1 share. I'm

Avoiding Docker Bridge network when using S3 discovery

2018-10-17 Thread Dave Harvey
When we use S3 discovery and Ignite containers running under ECS using host networking, the S3 bucket end up with 172.17.0.1#47500 along with the other server addresses. Then on cluster startup we must wait for the network timeout.Is there a way to avoid having this address pushed to the S3

Re: Query 3x slower with index

2018-10-11 Thread Dave Harvey
"Ignite will only use one index per table" I assume you mean "Ignite will only use one index per table per query"? On Thu, Oct 11, 2018 at 1:55 PM Stanislav Lukyanov wrote: > Hi, > > > > It is a rather lengthy thread and I can’t dive into details right now, > > but AFAICS the issue now is

Re: Message grid failure due to userVersion setting

2018-09-18 Thread Dave Harvey
v > > > пн, 17 сент. 2018 г. в 19:32, Dave Harvey : > >> I probably did not explain this clearly. When sending a message from >> server to client using the message grid, from a context unrelated to any >> client call, the server, as you would expect uses its install

Re: Message grid failure due to userVersion setting

2018-09-17 Thread Dave Harvey
; Hello! > > I think that Ignite cannot unload old version of code, unless it is loaded > with something like URI deployment module. > Version checking is there but server can't get rid of old code if it's on > classpath. > > Regards, > -- > Ilya Kasnacheev > > >

Message grid failure due to userVersion setting

2018-09-17 Thread Dave Harvey
We have a client that uses the compute grid and message grid, as well as the discovery API. It communicates with a server plugin. The cluster is configured for CONTINUOUS peer class loading. In order to force the proper code to be loaded for the compute tasks, we change the user version, e.g.,

Transition from FULL_ASYNC/PRIMARY_SYNC to FULL_SYNC

2018-09-06 Thread Dave Harvey
It is my understanding that for Ignite transactions to be ACID, we need to have the caches configured as FULL_SYNC. [ Some of the code seems to imply only at least one of the caches in the transaction needs to be FULL_SYNC, but that is outside the scope of my question. ] The initial load of

ignite.compute(grp).affinityRun()...

2018-08-30 Thread Dave Harvey
It is unclear what the intended semantics of using IgniteCompute.affinityRun() when a subset of the grid was selected. From reading the code, my current guess is IgniteCompute.affinityRun() will run on the primary, regardless of whether only a subset of the grid was specified. In my case I

Re: Cache Configuration Templates

2018-08-29 Thread Dave Harvey
at. > > Regards, > -- > Ilya Kasnacheev > > > вт, 28 авг. 2018 г. в 17:37, Dave Harvey : > >> I did a suggested edit adding the Spring configuration of templates. >> The rest of the current semantics seem a bit odd, so I was somewhat at a >> loss as to wha

Re: How to check if key exists in DataStreamer buffer so that it can be flushed?

2018-08-29 Thread Dave Harvey
The DataStreamer is unordered. If you have duplicate keys with different values, and you don't flush or take other action, then you will get an arbitrary result. AllowOverwrite is not a solution. Adding to the streamer returns a Future, and all of those futures are notified when the buffer

Re: Cache Configuration Templates

2018-08-28 Thread Dave Harvey
and just always have this configuration around. > 6) See above about the '*'. > > Regards, > -- > Ilya Kasnacheev > > > сб, 25 авг. 2018 г. в 0:55, Dave Harvey : > >> I found what I've read in this area confusing, and here is my current >> understandin

Cache Configuration Templates

2018-08-24 Thread Dave Harvey
I found what I've read in this area confusing, and here is my current understanding. When creating an IgniteConfiguration in Java or XML, I can specify the property cacheConfiguration, which is an array of CacheConfigurations. This causes Ignite to preserve these configurations, but this will not

Transaction Throughput in Data Streamer

2018-08-09 Thread Dave Harvey
We are trying to load and transform a large amount of data using the IgniteDataStreamer using a custom StreamReceiver.We'd like this to run a lot faster, and we cannot find anything that is close to saturated, except the data-streamer threads, queues. This is 2.5, with Ignite persistence,

Statistics Monitoring Integrations

2018-08-09 Thread Dave Harvey
I've been able to look at cache and thread pool statistics using JVisualVM with Mbeans support. Has anyone found a way to get these statistics out to a tool like NewRelic or DataDog? Thanks, Dave Harvey Disclaimer The information contained in this communication from the sender

Re: S3 discovery and bridge networks

2018-08-07 Thread Dave Harvey
n you please elaborate with ifconfig information and stuff? > > Regards, > > -- > Ilya Kasnacheev > > 2018-08-03 16:53 GMT+03:00 Dave Harvey : > >> I've been successfully running 2.5 on AWS ECS with host or AWSVPC >> networking for the Ignite containers. Is t

S3 discovery and bridge networks

2018-08-03 Thread Dave Harvey
I've been successfully running 2.5 on AWS ECS with host or AWSVPC networking for the Ignite containers. Is there any way around the fact that with bridge networking, the Ignite node registers it's unmapped address on S3? Disclaimer The information contained in this communication from the

ALTER TABLE ... NOLOGGING

2018-08-02 Thread Dave Harvey
We did the following while loading a lot of data into 2.5 1) Started data loading on 8 node cluster 2) ALTER TABLE name NOLOGGING on tables A,B,C,D but not X 3) continued loading 4) deactivated cluster 5) changed the config xml, to increase maxSize of the data region (from 2G to 160G) and

Changing existing persistent caches.

2018-07-30 Thread Dave Harvey
I know Ignite only allows very limited changes to caches at runtime, e.g., turn on statistics or add/remove index or field. I'm wondering if there is a way to change any of the cache configuration for persistent caches at cluster startup.I have the impression that at some point I saw some code

Re: Help needed with BinaryObjectException

2018-07-26 Thread Dave Harvey
The cluster needs to agree on how to decode various versions of the BinaryObjectSchema. Changing the type of a field name or an enum's value are non-upwards compatible changes which Ignite cannot handle. There is the question of the lifetime of the version of a type, and while you may know that

Re: Understanding the mechanics of peer class loading

2018-07-18 Thread Dave Harvey
I added this ticket, because we hit a similar problem, as was able to find some quite suspect code: https://issues.apache.org/jira/browse/IGNITE-9026 -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Affinity calls in stream receiver

2018-07-17 Thread Dave Harvey
We switched to CONTINUOUS mode based on the assumption that SHARED mode had regressed in a way that allowed it to create many class loaders, and eventually run out of Metaspace. CONTINUOUS mode failed much sooner, and we were able to reproduce that failure and identify bugs in the code. The

Re: Affinity calls in stream receiver

2018-07-15 Thread Dave Harvey
We are running in SHARED_MODE on 2.5, and are currently quite suspicious of this change in 2.4, the essence of this change is, in SHARED_MODE , to just skip the code that will "Find existing deployments that need to be checked whether they should be reused for this request"

Tracing all SQL Queries

2018-07-12 Thread Dave Harvey
Is there a simple way inside Ignite to get a log of all SQL Queries against the cluster, either in the debug logs or elsewhere ?This is not a easy question to phase in a way that Google will find a useful answer. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Affinity calls in stream receiver

2018-07-11 Thread Dave Harvey
The nested class hypothesis seems unlikely. We have 6000+ GridDeploymentClassLoaders on a node, because there are many instances of "GridDeploymentPerVersionStore.SharedDeployment". The userVersion is not changing, nor is the cluster topology. I have enough data to debug this, just need some

RE: Deadlock during cache loading

2018-06-28 Thread Dave Harvey
Your original stack trace shows a call to your custom stream receiver which appears to itself call invoke(). I can only guess that your code does, but it appears to be making an call off node to something that is not returning.

RE: Deadlock during cache loading

2018-06-28 Thread Dave Harvey
2.4 should be OK. What you showed that the stream receiver called invoke() and did not get an answer, not a deadlock. Nothing looks particularly wrong there. When we created this bug, it was our a stream receiver called invoke() and that in turn did another invoke, which was the actual bug. It

RE: Deadlock during cache loading

2018-06-25 Thread Dave Harvey
"When receiver is invoked for key K, it’s holding the lock for K." is not correct, at least in the 2.4 code. When a custom stream receiver is called, the data streamer thread has a read-lock preventing termination, and there is a real-lock on the topology, but DataStreamerUpdateJob.call() does

Running Node removal from baseline

2018-06-22 Thread Dave Harvey
The documentation describes the use case where a node is stopped and removed from the baseline, which reduces the number of backups/replicas when the node is stopped. I assume that there is no current code to support removing the node from the baseline first, so that at least desired number of

Re: Data Region Concurrency

2018-05-28 Thread Dave Harvey
It does appear that if the global concurrentLevel is not set, that it defaults to # CPUs not 4 * # CPUs as documented here https://apacheignite.readme.io/docs/memory-configuration#section-global-configuration-parameters private long[] calculateFragmentSizes(int concLvl, long cacheSize,

Re: Large durable caches

2018-05-18 Thread Dave Harvey
Early on running on 2.3 we had hit a clear deadlock that I never root-caused, where the cluster just stopped working. At the time I was use the same DataStreamer from multiple threads and we tuned up the buffer size because of that, and we were running against EBS, and perhaps with too short

Re: What is the most efficient way to scan all data partitions?

2018-05-09 Thread Dave Harvey
When running on AWS, I found that what the "disk" that you are writing to is the most critical issue for Ignite. EC2 instances with local SSDs have about 20x the write rate as a multiple 3 TB GP2 problems, and using actual disks (e.g., EBS) for Ignite Persistence storage is a non-starter.

Re: Effective Data through DataStream

2018-04-26 Thread Dave Harvey
When you set the stream receiver, an instance of its class is created and serialized, which will also include any class it is nested in. On each Data Streamer buffer, the serialized form of that class is sent. If the class containing the stream receiver has a pointer that is not

Re: One time replicating of the cluster data for setting up a new cluster

2018-04-03 Thread Dave Harvey
We had done this to group all of the data that needs to be backed up onto the SSD. Work also contains the log directory, and I haven't seen how to put that elsewhere.

Baseline Topology and Node Failure

2018-03-28 Thread Dave Harvey
The introduction in 2.4 of Baselines seems quite helpful. If a node restarts, it will avoid excessive rebalancing. What is unclear from the documentation is what happens in the case where a node fails, and doesn't come back. I'm assuming that in fact nothing happens, except that the backups

Determining BinaryObject field type

2018-03-23 Thread Dave Harvey
Once a BinaryObjectSchema is created, it is not possible to change the type of a field for a known field name. My question is whether there is any way to determine the type of that field in the Schema. We are hitting a case were the way we get the data out of a different database returns a

Re: SELECT Statement cancellation & memory sizing

2018-03-08 Thread Dave Harvey
Just saw 2.4 release notes: Improved COUNT(*) performance -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Setting userVersion on client node causes ignite.active(true) to fail

2018-02-27 Thread Dave Harvey
The server node was already active, and when I commented out ignite.active(true) the client came up. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Setting userVersion on client node causes ignite.active(true) to fail

2018-02-27 Thread Dave Harvey
If I change userVersion in ignite.xml on the client to 5, when the docker image is in SHARED mode, in order to ensure that our peer-class-loaded classes are reloaded, I cannot start the client. final Ignite ignite = Ignition.start(igniteConfig); ignite.active(true); <<<

Re: Large durable caches

2018-02-21 Thread Dave Harvey
I fought with trying to get Ignite Persistence to work well on AWS GP2 volumes, and finally gave up, and moved to i3 instances, where the $ per write IOP are much lower, and a i3.8xlarge gets 720,000 4K write IOPS vs on the order of 10,000 for about the same cost. -- Sent from:

Re: 20 minute 12x throughput drop using data streamer and Ignite persistence

2018-02-20 Thread Dave Harvey
I've started reproducing this issue with more statistics, but have not reached the worst performance point yet, but somethings are starting to become clearer: The DataStreamer hashes the affinity key to partition, and then maps the partition to a node, and fills a single buffer at a time for the

Re: 20 minute 12x throughput drop using data streamer and Ignite persistence

2018-02-20 Thread Dave Harvey
I've started reproducing this issue with more statistics, but have not reached the worst performance point yet, but somethings are starting to become clearer: The DataStreamer hashes the affinity key to partition, and then maps the partition to a node, and fills a single buffer at a time for the

Re: Issues trying to force redeployment in shared mode

2018-02-20 Thread Dave Harvey
I've done some additional testing. By shutting down another (the last) client node that was running independent code, I was able to purge the bad version of my code from the servers, while leaving the userVersion at "0". Apparently in this case, the client nodes are "master" nodes. (The

Issues trying to force redeployment in shared mode

2018-02-19 Thread Dave Harvey
I was trying to demonstrate changing classes on a client node so that classes on servers get replaced with new code, but the symptoms make me believe that I don't understand the rules at all. The deployment mode is SHARED. I read the instructions and creates an ignite.xml with a different

Re: 20 minute 12x throughput drop using data streamer and Ignite persistence

2018-02-13 Thread Dave Harvey
I made improvements to the statistics collection in the stream receiver, and I'm finding an excessive number of retry's of the optimistic transactions we are using. I will understand that and retry. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: The client does not receive response after closure completion...

2017-12-28 Thread Dave Harvey
I got similar symptoms, but for a different root cause. I was getting the original stack trace using the "cache" command in Visor, but only when a client was connected. Using this command caused the clients to disconnect. It turned out that I had enabled inbound TCP ports to the servers, but

Re: TcpDiscoveryS3IpFinder AmazonS3Exception: Slow Down

2017-10-24 Thread Dave Harvey
Opened a support tickets with GridGain and AWS. The former suggested this, which helped: 20 21 22 23 24 25 The throttling

Re: TcpDiscoveryS3IpFinder AmazonS3Exception: Slow Down

2017-09-21 Thread Dave Harvey
The only possibly different thing we are doing is using a VPC endpoint to allow the nodes to access S3 directly, without having to supply credentials.

TcpDiscoveryS3IpFinder AmazonS3Exception: Slow Down

2017-09-21 Thread Dave Harvey
Is TcpDiscoveryS3IpFinder expected to work? I randomly get exceptions which seem to be considered part of normal S3 operation, but are not handled/retried. com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request

Re: Ignite/yardstick benchmarking from within Docker Image

2017-09-20 Thread Dave Harvey
Yes. When I figured that out, my immediate reaction was to kill the running ignite instance. But since that process was what was keeping the docker container running, the container exits. So instructions on how to run the benchmarks from inside the docker container they are delivered in

Re: Ignite/yardstick benchmarking from within Docker Image

2017-09-20 Thread Dave Harvey
As I understand more, I changed the subject. The Docker image for Ignite 2.1.0 contains the ignite-yardstick benchmarks, which include a ReadMe file that provides instructions that *do not work* if you simply try to run them from inside the Docker Container where Ignite is already running. I

Re: AWS Apache Ignite AMI startup.sh reports spurrious errors if options have blanks

2017-09-20 Thread Dave Harvey
Sorry for the delay, spam filter. I haven't been able to find an explanation for startup.sh, but when I login to an EC2 instance created using the AWS community AMI for Ignite (which only has docker and pulls ignite by the version you specify), I find startup.sh in the current directory, and it

AWS Apache Ignite AMI startup.sh does not allow documented JVM options

2017-09-07 Thread Dave Harvey
If I use this in Advanced Details JVM_OPTS=-Xms1g -Xmx1g -server -XX:+AggressiveOpts -XX:MaxPermSize=256m I get startup.sh: line 15: export: `-Xmx1g': not a valid identifier startup.sh: line 15: export: `-server': not a valid identifier startup.sh: line 15: export: `-XX:+AggressiveOpts': not a

Re: CONFIG_INI not copied to ./ignite-config.xml

2017-09-07 Thread Dave Harvey
Now I see the "Type 'help "command name"' to see how to use this command." below the list. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

CONFIG_INI not copied to ./ignite-config.xml

2017-09-07 Thread Dave Harvey
The documentation around CONFIG_INI says "The downloaded config file will be saved to ./ignite-config.xml", except that did not happen. Therefore when I started visor, I had no way to discover the cluster, so visor created a new one. After much newbie confusion, I copied the config file

Re: Issue with starting Ignite node on AWS

2017-09-05 Thread Dave Harvey
I'm also a newbie, but I'm running 2.1.0 and I seem to be hitting the same problem, which sounds like it was fixed a long time ago.Is there something else going on? I've uploaded the 2 lines I pass when creating the EC2 instance from the AMI as well as the config file I'm using, as well as the