Re: [julia-users] Re: High-performance metric collector in Julia

2015-05-22 Thread Andrei Zh
BTW, I've measured throughput of Redis on my machine, and it resulted in 
only 30K/sec, which is 10 times slower than ZMQ. Possibly, some hybrid 
solution of several alternative backends will work. 

On Saturday, May 23, 2015 at 2:04:02 AM UTC+3, Andrei Zh wrote:


 It still helps a lot, because you can have many reporting programs, each 
 talking to different processes on the server, and those processes are able 
 to  get the transactions done very quickly, with multiple write daemons and 
 journaling daemons doing the actual I/O from the shared buffer pool.  (and 
 yes, this has been used precisely for collecting metrics, on a very large 
 scale, across all of Germany in one case, IIRC)


 I've got curious about this use case, are you permitted to share more 
 details about it? Were they technical or business metrics? What volume of 
 data passed through the system and what were parameters or the server? 

 In fact, my initial question in this thread was purposely very broad, 
 because I was looking for any related projects. Now I see 2 different kinds 
 of metric collection systems. One is for reporting and monitoring purposes, 
 e.g. measuring operation run time, memory usage, number of simultaneously 
 connected users, etc. Normally, in such systems metrics are not stored on 
 collector side for a long time, but instead are aggregated and sent to 
 something like graphite almost immediately. Also it's ok to lose some part 
 of this information or delete old metrics. Another kind is for collecting 
 and further analysis of business metrics. And in this case we need reliable 
 storage first of all. 

 My primary goal in this little project is to create system for first kind 
 of metrics, but if there's interest in second kind, I'll be glad to spend 
 some time on making something for broader range of users. 



Re: [julia-users] Re: High-performance metric collector in Julia

2015-05-22 Thread Andrei Zh


 It still helps a lot, because you can have many reporting programs, each 
 talking to different processes on the server, and those processes are able 
 to  get the transactions done very quickly, with multiple write daemons and 
 journaling daemons doing the actual I/O from the shared buffer pool.  (and 
 yes, this has been used precisely for collecting metrics, on a very large 
 scale, across all of Germany in one case, IIRC)


I've got curious about this use case, are you permitted to share more 
details about it? Were they technical or business metrics? What volume of 
data passed through the system and what were parameters or the server? 

In fact, my initial question in this thread was purposely very broad, 
because I was looking for any related projects. Now I see 2 different kinds 
of metric collection systems. One is for reporting and monitoring purposes, 
e.g. measuring operation run time, memory usage, number of simultaneously 
connected users, etc. Normally, in such systems metrics are not stored on 
collector side for a long time, but instead are aggregated and sent to 
something like graphite almost immediately. Also it's ok to lose some part 
of this information or delete old metrics. Another kind is for collecting 
and further analysis of business metrics. And in this case we need reliable 
storage first of all. 

My primary goal in this little project is to create system for first kind 
of metrics, but if there's interest in second kind, I'll be glad to spend 
some time on making something for broader range of users. 


Re: [julia-users] Re: High-performance metric collector in Julia

2015-05-21 Thread Andrei
But it also means that this trick won't work with separate machines for
database (metric server) and reporting program, which is one of the goals.

On Thu, May 21, 2015 at 12:19 AM, Scott Jones scott.paul.jo...@gmail.com
wrote:



 On Wednesday, May 20, 2015 at 4:26:40 PM UTC-4, Andrei Zh wrote:



 Well, if they don't use any tricks like passing data through shared
 memory or heavy batching, then it's pretty impressive. But, as you
 mentioned, in this particular case Caché is not an option.


 I would say that *any* decent database does tricks like using shared
 memory... Aerospike does, I don't know about Redis...  Caché has a large
 shared buffer pool... all processes can read or wrote
 B+ tree blocks via that buffer pool, and there are daemons that take care
 of making sure the journal is sync'ed to disk, that the blocks get out to
 disk every so often, etc.



Re: [julia-users] Re: High-performance metric collector in Julia

2015-05-21 Thread Scott Jones
It still helps a lot, because you can have many reporting programs, each 
talking to different processes on the server, and those processes are able 
to  get the transactions done very quickly, with multiple write daemons and 
journaling daemons doing the actual I/O from the shared buffer pool.  (and 
yes, this has been used precisely for collecting metrics, on a very large 
scale, across all of Germany in one case, IIRC)

On Thursday, May 21, 2015 at 2:28:20 AM UTC-4, Andrei Zh wrote:

 But it also means that this trick won't work with separate machines for 
 database (metric server) and reporting program, which is one of the goals. 

 On Thu, May 21, 2015 at 12:19 AM, Scott Jones scott.pa...@gmail.com 
 javascript: wrote:



 On Wednesday, May 20, 2015 at 4:26:40 PM UTC-4, Andrei Zh wrote:



 Well, if they don't use any tricks like passing data through shared 
 memory or heavy batching, then it's pretty impressive. But, as you 
 mentioned, in this particular case Caché is not an option.


 I would say that *any* decent database does tricks like using shared 
 memory... Aerospike does, I don't know about Redis...  Caché has a large 
 shared buffer pool... all processes can read or wrote
 B+ tree blocks via that buffer pool, and there are daemons that take care 
 of making sure the journal is sync'ed to disk, that the blocks get out to 
 disk every so often, etc. 




Re: [julia-users] Re: High-performance metric collector in Julia

2015-05-20 Thread Andrei Zh



 I was able to get 2M/sec without transactions, and 909K/sec with 
 transactions (so it's durable), on my laptop, using Caché... (from 
 InterSystems... I used to consult for them)
 They do have a free single user database engine, Globals, that you might 
 be able to use... I don't recall what's available with that version...


 I suppose you got 2M/sec on some server with pretty high resources - I 
 cannot imagine 2M network operations on my local machine. Anyway, I think I 
 will start with Redis, which is more accessible from Julia both in terms of 
 programming and openness.  


 No, that's on my 1 year old MacBook Pro... with just the default database 
 settings:


Well, if they don't use any tricks like passing data through shared memory 
or heavy batching, then it's pretty impressive. But, as you mentioned, in 
this particular case Caché is not an option.
  


Re: [julia-users] Re: High-performance metric collector in Julia

2015-05-20 Thread Scott Jones


On Wednesday, May 20, 2015 at 4:26:40 PM UTC-4, Andrei Zh wrote:



 Well, if they don't use any tricks like passing data through shared memory 
 or heavy batching, then it's pretty impressive. But, as you mentioned, in 
 this particular case Caché is not an option.


I would say that *any* decent database does tricks like using shared 
memory... Aerospike does, I don't know about Redis...  Caché has a large 
shared buffer pool... all processes can read or wrote
B+ tree blocks via that buffer pool, and there are daemons that take care 
of making sure the journal is sync'ed to disk, that the blocks get out to 
disk every so often, etc. 


Re: [julia-users] Re: High-performance metric collector in Julia

2015-05-19 Thread Scott Jones


On Tuesday, May 19, 2015 at 7:23:13 PM UTC-4, Andrei Zh wrote:

  

 But, ZMQ isn't storing it anywhere... that's just a messaging protocol, 
 isn't it?


 Yes, and in fact, it may be even enough if you have sufficiently smart 
 consumer on other end. For example, if the goal is to keep only statistics, 
 you can use running max, average, histogram, etc. In this case only very 
 little memory is needed. 

 I'm not saying that persistence is not needed in this case - it would be 
 nice to have it too - but high throughput and low latency are much more 
 desirable. 


 I was able to get 2M/sec without transactions, and 909K/sec with 
 transactions (so it's durable), on my laptop, using Caché... (from 
 InterSystems... I used to consult for them)
 They do have a free single user database engine, Globals, that you might 
 be able to use... I don't recall what's available with that version...


 I suppose you got 2M/sec on some server with pretty high resources - I 
 cannot imagine 2M network operations on my local machine. Anyway, I think I 
 will start with Redis, which is more accessible from Julia both in terms of 
 programming and openness.  


No, that's on my 1 year old MacBook Pro... with just the default database 
settings:

Platform Info:
  System: Darwin (x86_64-apple-darwin14.4.0)
  CPU: Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz


Starting Control Process
Automatically configuring buffers
Allocated 374MB shared memory: 256MB global buffers, 35MB routine buffers

My point was mainly that a good KVS should be able to achieve those kinds 
of speeds, even on a laptop... unfortunately, Caché is not open source 
software, so you'd probably need some other solution...

Redis may be a good bet... I'd be interested in what performance numbers 
you get with Julia/Redis!



Re: [julia-users] Re: High-performance metric collector in Julia

2015-05-19 Thread Andrei
 But, ZMQ isn't storing it anywhere... that's just a messaging protocol,
 isn't it?


Yes, and in fact, it may be even enough if you have sufficiently smart
consumer on other end. For example, if the goal is to keep only statistics,
you can use running max, average, histogram, etc. In this case only very
little memory is needed.

I'm not saying that persistence is not needed in this case - it would be
nice to have it too - but high throughput and low latency are much more
desirable.


I was able to get 2M/sec without transactions, and 909K/sec with
 transactions (so it's durable), on my laptop, using Caché... (from
 InterSystems... I used to consult for them)
 They do have a free single user database engine, Globals, that you might
 be able to use... I don't recall what's available with that version...


I suppose you got 2M/sec on some server with pretty high resources - I
cannot imagine 2M network operations on my local machine. Anyway, I think I
will start with Redis, which is more accessible from Julia both in terms of
programming and openness.



 Indices also have their cost. And, most important, they are not really
 needed here. In Kafka, for example, new records are simply added to the end
 of the queue, and pointer to the end is moved further. In our tests it
 gives about 50K/sec, which is much higher than for RBMSs mentioned above,
 and unlike most key-value storages  provides easy and fast way to read all
 messages down the stream.


 That seems *really* slow... but, if you are just adding records to the end
 of a queue, and you have your key simply be the counter, why would that be
 slow?


50K/sec is just our result from quick test, Kakfa's own test show several
times higher results. Anyway, the point was that Kafka out of the box is
*much* faster than things like MySQL, so there's really no point to mess up
with this kind of databases.