Thanks so much for the feedback (Max and JB) so far as well as the reference to the projects, my reading list keeps growing.
Continuing with my bad habit of just asking before I'm really familiar with a subject... The more I look at the examples and read about the kinds of problems Dataflow/Beam attempt to solve I'm running into a perceived chasm between stacks such as ELK (elastic/logstash,etc) or Splunk and projects such as Apache Beam. I guess that even though the problems that are solved are the same in a strict sense Splunk/ELK/etc are more suited to querying/searching/investigation where projects such as Beam are well suited to being a pipeline feeding those systems, a pipeline integrating with those systems for realtime metrics/reporting as well as a pipeline for alerting/training. In my mind a proper streaming system keeps looping back into and originating from a data store such as elasticsearch/hdfs. Am I on the right track? Is there a 'grand unified' vision for these kinds of systems that I can delve into a bit? Regards, Stephan > On 25 May 2016, at 4:14 PM, Jean-Baptiste Onofré <[email protected]> wrote: > > Hi Stephan, > > I created Karaf Decanter as an alternative to logstash/elasticsearch. > > What you describe looks like a DSL to me (as a bit discussed here: > > - Technical Vision > - http://blog.nanthrax.net/2016/01/introducing-apache-dataflow/ > > I'm working on a PoC to mix Decanter with Beam, which can result to a DSL ;) > > Regards > JB > > On 05/25/2016 01:43 PM, Stephan Buys wrote: >> Hi all, >> >> Hope I'm in the right forum, I'm someone with about a decade's worth of log >> management/event analytics experience - for the last 2 years though we've >> been building our own solutions based on a variety of open source >> technologies. As hopefully some of you might appreciate, whenever you want >> to do something interesting, or at scale with timeseries/event data a lot of >> the tools are lacking. >> >> I started off working in Splunk and it sort off spoiled me with >> end-user/administrator functionality from the get go (even if it >> prohibitively expensive and slow). In Splunk the 'sandpit' that you play in >> has all the toys a non developer can ask for: built-in map/reduce + >> streaming, and manipulation of results/streams through a simple DSL familiar >> to anyone with a bit of Unix CLI/Bash experience. (ie. search something | >> filter | map | eval | visualise >> http://docs.splunk.com/Documentation/Splunk/latest/Search/Aboutsearchlanguagesyntax) >> >> At the moment we spend our days in logstash + elasticsearch (and sundry). >> >> I looked into Beam and Flink a bit and from a technical perspective it seems >> like the ideal direction to go, combining many sources of data (such as >> elasticsearch, influxdb, rethinkdb, etc) and many analytics use-cases. The >> only gotcha seems to be that, from what I can see, the target audience is >> almost always developers. This isn't a problem for myself, but ideally I >> would want to bolt a simple DSL (submittable via simple interfaces, such as >> cli) on top of my datasets but have all of the stream/batch processing >> capabilities that project like Flink allow. >> >> Is anyone aware of projects/efforts along these lines? Ideas on how we could >> there from a project such as Apache Beam? (Am I being naive?) >> >> Your input/perspectives are most welcome! >> >> Kind regards, >> Stephan Buys >> >> >> >> >> > > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com
