Question from a data analytics/log management dude

Stephan Buys Wed, 25 May 2016 04:43:35 -0700

Hi all,

Hope I'm in the right forum, I'm someone with about a decade's worth of log 
management/event analytics experience - for the last 2 years though we've been 
building our own solutions based on a variety of open source technologies. As 
hopefully some of you might appreciate, whenever you want to do something 
interesting, or at scale with timeseries/event data a lot of the tools are 
lacking.


I started off working in Splunk and it sort off spoiled me with 
end-user/administrator functionality from the get go (even if it prohibitively 
expensive and slow). In Splunk the 'sandpit' that you play in has all the toys 
a non developer can ask for: built-in map/reduce + streaming, and manipulation 
of results/streams through a simple DSL familiar to anyone with a bit of Unix 
CLI/Bash experience. (ie. search something | filter | map | eval | visualise 
http://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsearchlanguagesyntax)

At the moment we spend our days in logstash + elasticsearch (and sundry). 

I looked into Beam and Flink a bit and from a technical perspective it seems 
like the ideal direction to go, combining many sources of data (such as 
elasticsearch, influxdb, rethinkdb, etc) and many analytics use-cases. The only 
gotcha seems to be that, from what I can see, the target audience is almost 
always developers. This isn't a problem for myself, but ideally I would want to 
bolt a simple DSL (submittable via simple interfaces, such as cli) on top of my 
datasets but have all of the stream/batch processing capabilities that project 
like Flink allow.

Is anyone aware of projects/efforts along these lines? Ideas on how we could 
there from a project such as Apache Beam? (Am I being naive?)

Your input/perspectives are most welcome!

Kind regards,
Stephan Buys

Question from a data analytics/log management dude

Reply via email to