Re: Question from a data analytics/log management dude

Stephan Buys Thu, 26 May 2016 00:54:07 -0700

Thanks so much for the feedback (Max and JB) so far as well as the reference to 
the projects, my reading list keeps growing.


Continuing with my bad habit of just asking before I'm really familiar with a 
subject...

The more I look at the examples and read about the kinds of problems 
Dataflow/Beam attempt to solve I'm running into a perceived chasm between 
stacks such as ELK (elastic/logstash,etc) or Splunk and projects such as Apache 
Beam. I guess that even though the problems that are solved are the same in a 
strict sense Splunk/ELK/etc are more suited to querying/searching/investigation 
where projects such as Beam are well suited to being a pipeline feeding those 
systems, a pipeline integrating with those systems for realtime 
metrics/reporting as well as a pipeline for alerting/training.

In my mind a proper streaming system keeps looping back into and originating 
from a data store such as elasticsearch/hdfs. Am I on the right track? Is there 
a 'grand unified' vision for these kinds of systems that I can delve into a 
bit? 

Regards,
Stephan

> On 25 May 2016, at 4:14 PM, Jean-Baptiste Onofré <[email protected]> wrote:
> 
> Hi Stephan,
> 
> I created Karaf Decanter as an alternative to logstash/elasticsearch.
> 
> What you describe looks like a DSL to me (as a bit discussed here:
> 
> - Technical Vision
> - http://blog.nanthrax.net/2016/01/introducing-apache-dataflow/
> 
> I'm working on a PoC to mix Decanter with Beam, which can result to a DSL ;)
> 
> Regards
> JB
> 
> On 05/25/2016 01:43 PM, Stephan Buys wrote:
>> Hi all,
>> 
>> Hope I'm in the right forum, I'm someone with about a decade's worth of log 
>> management/event analytics experience - for the last 2 years though we've 
>> been building our own solutions based on a variety of open source 
>> technologies. As hopefully some of you might appreciate, whenever you want 
>> to do something interesting, or at scale with timeseries/event data a lot of 
>> the tools are lacking.
>> 
>> I started off working in Splunk and it sort off spoiled me with 
>> end-user/administrator functionality from the get go (even if it 
>> prohibitively expensive and slow). In Splunk the 'sandpit' that you play in 
>> has all the toys a non developer can ask for: built-in map/reduce + 
>> streaming, and manipulation of results/streams through a simple DSL familiar 
>> to anyone with a bit of Unix CLI/Bash experience. (ie. search something | 
>> filter | map | eval | visualise 
>> http://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsearchlanguagesyntax)
>> 
>> At the moment we spend our days in logstash + elasticsearch (and sundry).
>> 
>> I looked into Beam and Flink a bit and from a technical perspective it seems 
>> like the ideal direction to go, combining many sources of data (such as 
>> elasticsearch, influxdb, rethinkdb, etc) and many analytics use-cases. The 
>> only gotcha seems to be that, from what I can see, the target audience is 
>> almost always developers. This isn't a problem for myself, but ideally I 
>> would want to bolt a simple DSL (submittable via simple interfaces, such as 
>> cli) on top of my datasets but have all of the stream/batch processing 
>> capabilities that project like Flink allow.
>> 
>> Is anyone aware of projects/efforts along these lines? Ideas on how we could 
>> there from a project such as Apache Beam? (Am I being naive?)
>> 
>> Your input/perspectives are most welcome!
>> 
>> Kind regards,
>> Stephan Buys
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com

Re: Question from a data analytics/log management dude

Reply via email to