Re: [julia-users] Re: High-performance metric collector in Julia
BTW, I've measured throughput of Redis on my machine, and it resulted in only 30K/sec, which is 10 times slower than ZMQ. Possibly, some hybrid solution of several alternative backends will work. On Saturday, May 23, 2015 at 2:04:02 AM UTC+3, Andrei Zh wrote: It still helps a lot, because you can have many reporting programs, each talking to different processes on the server, and those processes are able to get the transactions done very quickly, with multiple write daemons and journaling daemons doing the actual I/O from the shared buffer pool. (and yes, this has been used precisely for collecting metrics, on a very large scale, across all of Germany in one case, IIRC) I've got curious about this use case, are you permitted to share more details about it? Were they technical or business metrics? What volume of data passed through the system and what were parameters or the server? In fact, my initial question in this thread was purposely very broad, because I was looking for any related projects. Now I see 2 different kinds of metric collection systems. One is for reporting and monitoring purposes, e.g. measuring operation run time, memory usage, number of simultaneously connected users, etc. Normally, in such systems metrics are not stored on collector side for a long time, but instead are aggregated and sent to something like graphite almost immediately. Also it's ok to lose some part of this information or delete old metrics. Another kind is for collecting and further analysis of business metrics. And in this case we need reliable storage first of all. My primary goal in this little project is to create system for first kind of metrics, but if there's interest in second kind, I'll be glad to spend some time on making something for broader range of users.
Re: [julia-users] Re: High-performance metric collector in Julia
It still helps a lot, because you can have many reporting programs, each talking to different processes on the server, and those processes are able to get the transactions done very quickly, with multiple write daemons and journaling daemons doing the actual I/O from the shared buffer pool. (and yes, this has been used precisely for collecting metrics, on a very large scale, across all of Germany in one case, IIRC) I've got curious about this use case, are you permitted to share more details about it? Were they technical or business metrics? What volume of data passed through the system and what were parameters or the server? In fact, my initial question in this thread was purposely very broad, because I was looking for any related projects. Now I see 2 different kinds of metric collection systems. One is for reporting and monitoring purposes, e.g. measuring operation run time, memory usage, number of simultaneously connected users, etc. Normally, in such systems metrics are not stored on collector side for a long time, but instead are aggregated and sent to something like graphite almost immediately. Also it's ok to lose some part of this information or delete old metrics. Another kind is for collecting and further analysis of business metrics. And in this case we need reliable storage first of all. My primary goal in this little project is to create system for first kind of metrics, but if there's interest in second kind, I'll be glad to spend some time on making something for broader range of users.
Re: [julia-users] Re: High-performance metric collector in Julia
But it also means that this trick won't work with separate machines for database (metric server) and reporting program, which is one of the goals. On Thu, May 21, 2015 at 12:19 AM, Scott Jones scott.paul.jo...@gmail.com wrote: On Wednesday, May 20, 2015 at 4:26:40 PM UTC-4, Andrei Zh wrote: Well, if they don't use any tricks like passing data through shared memory or heavy batching, then it's pretty impressive. But, as you mentioned, in this particular case Caché is not an option. I would say that *any* decent database does tricks like using shared memory... Aerospike does, I don't know about Redis... Caché has a large shared buffer pool... all processes can read or wrote B+ tree blocks via that buffer pool, and there are daemons that take care of making sure the journal is sync'ed to disk, that the blocks get out to disk every so often, etc.
Re: [julia-users] Re: High-performance metric collector in Julia
It still helps a lot, because you can have many reporting programs, each talking to different processes on the server, and those processes are able to get the transactions done very quickly, with multiple write daemons and journaling daemons doing the actual I/O from the shared buffer pool. (and yes, this has been used precisely for collecting metrics, on a very large scale, across all of Germany in one case, IIRC) On Thursday, May 21, 2015 at 2:28:20 AM UTC-4, Andrei Zh wrote: But it also means that this trick won't work with separate machines for database (metric server) and reporting program, which is one of the goals. On Thu, May 21, 2015 at 12:19 AM, Scott Jones scott.pa...@gmail.com javascript: wrote: On Wednesday, May 20, 2015 at 4:26:40 PM UTC-4, Andrei Zh wrote: Well, if they don't use any tricks like passing data through shared memory or heavy batching, then it's pretty impressive. But, as you mentioned, in this particular case Caché is not an option. I would say that *any* decent database does tricks like using shared memory... Aerospike does, I don't know about Redis... Caché has a large shared buffer pool... all processes can read or wrote B+ tree blocks via that buffer pool, and there are daemons that take care of making sure the journal is sync'ed to disk, that the blocks get out to disk every so often, etc.
Re: [julia-users] Re: High-performance metric collector in Julia
I was able to get 2M/sec without transactions, and 909K/sec with transactions (so it's durable), on my laptop, using Caché... (from InterSystems... I used to consult for them) They do have a free single user database engine, Globals, that you might be able to use... I don't recall what's available with that version... I suppose you got 2M/sec on some server with pretty high resources - I cannot imagine 2M network operations on my local machine. Anyway, I think I will start with Redis, which is more accessible from Julia both in terms of programming and openness. No, that's on my 1 year old MacBook Pro... with just the default database settings: Well, if they don't use any tricks like passing data through shared memory or heavy batching, then it's pretty impressive. But, as you mentioned, in this particular case Caché is not an option.
Re: [julia-users] Re: High-performance metric collector in Julia
On Wednesday, May 20, 2015 at 4:26:40 PM UTC-4, Andrei Zh wrote: Well, if they don't use any tricks like passing data through shared memory or heavy batching, then it's pretty impressive. But, as you mentioned, in this particular case Caché is not an option. I would say that *any* decent database does tricks like using shared memory... Aerospike does, I don't know about Redis... Caché has a large shared buffer pool... all processes can read or wrote B+ tree blocks via that buffer pool, and there are daemons that take care of making sure the journal is sync'ed to disk, that the blocks get out to disk every so often, etc.
Re: [julia-users] Re: High-performance metric collector in Julia
On Tuesday, May 19, 2015 at 7:23:13 PM UTC-4, Andrei Zh wrote: But, ZMQ isn't storing it anywhere... that's just a messaging protocol, isn't it? Yes, and in fact, it may be even enough if you have sufficiently smart consumer on other end. For example, if the goal is to keep only statistics, you can use running max, average, histogram, etc. In this case only very little memory is needed. I'm not saying that persistence is not needed in this case - it would be nice to have it too - but high throughput and low latency are much more desirable. I was able to get 2M/sec without transactions, and 909K/sec with transactions (so it's durable), on my laptop, using Caché... (from InterSystems... I used to consult for them) They do have a free single user database engine, Globals, that you might be able to use... I don't recall what's available with that version... I suppose you got 2M/sec on some server with pretty high resources - I cannot imagine 2M network operations on my local machine. Anyway, I think I will start with Redis, which is more accessible from Julia both in terms of programming and openness. No, that's on my 1 year old MacBook Pro... with just the default database settings: Platform Info: System: Darwin (x86_64-apple-darwin14.4.0) CPU: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz Starting Control Process Automatically configuring buffers Allocated 374MB shared memory: 256MB global buffers, 35MB routine buffers My point was mainly that a good KVS should be able to achieve those kinds of speeds, even on a laptop... unfortunately, Caché is not open source software, so you'd probably need some other solution... Redis may be a good bet... I'd be interested in what performance numbers you get with Julia/Redis!
Re: [julia-users] Re: High-performance metric collector in Julia
But, ZMQ isn't storing it anywhere... that's just a messaging protocol, isn't it? Yes, and in fact, it may be even enough if you have sufficiently smart consumer on other end. For example, if the goal is to keep only statistics, you can use running max, average, histogram, etc. In this case only very little memory is needed. I'm not saying that persistence is not needed in this case - it would be nice to have it too - but high throughput and low latency are much more desirable. I was able to get 2M/sec without transactions, and 909K/sec with transactions (so it's durable), on my laptop, using Caché... (from InterSystems... I used to consult for them) They do have a free single user database engine, Globals, that you might be able to use... I don't recall what's available with that version... I suppose you got 2M/sec on some server with pretty high resources - I cannot imagine 2M network operations on my local machine. Anyway, I think I will start with Redis, which is more accessible from Julia both in terms of programming and openness. Indices also have their cost. And, most important, they are not really needed here. In Kafka, for example, new records are simply added to the end of the queue, and pointer to the end is moved further. In our tests it gives about 50K/sec, which is much higher than for RBMSs mentioned above, and unlike most key-value storages provides easy and fast way to read all messages down the stream. That seems *really* slow... but, if you are just adding records to the end of a queue, and you have your key simply be the counter, why would that be slow? 50K/sec is just our result from quick test, Kakfa's own test show several times higher results. Anyway, the point was that Kafka out of the box is *much* faster than things like MySQL, so there's really no point to mess up with this kind of databases.