Hello World / CRUNCH Framework

2018-12-14 Thread Julian Feinauer
Hi all,

I just joined the incubator ML and wanted to present myself and possibly also 
start a discussion about a software project we developed in the past.
But first things first. My name is Julian Feinauer and I come from Germany 
where I run two “start-up” companies where we work a lot on the “industrial 
IoT” topics, data science and processing of “larger amounts of data”. We love 
open source and so we love the ASF. Most notably, I closely follow the Apache 
Calcite project and hopefully find some time soon to contribute a bit more than 
in the last monts. Futhermore, I am engaged in the (incubating) PLC4X project 
as (P)PMC and in the  (incubating) Edgent project where I try to “revive” the 
community as new (P)PMC together with Christopher Dutz.

Now to the real topic. Over the last 3 years I started to develop a 
“Framework/Library” (currently a set of jars) to facilitate processing of 
timeseries data. The focus is mostly on processing of data from test stands, 
e.g., automotive tests, driving profiles and so on. Furthermore, in the recent 
year we added a lot of functionality for processing of “industrial data”. This 
means that we want to make it easy to analyze things like “how long did the 
machine spend in this state”, “when are the following set of bits set” or 
“nofity when the following conditions is true for the first time”.
It is a bit technical and I don’t want to go too deep into it, but generally 
speaking we try to introduce the “right” semantics to answer the typical 
questions when analyzing machine or test data. This project is called “CRUNCH” 
and we are in the process of making it open source (will be moved to a public 
github repo in this year) under the Apache 2.0 License.

As there can be seen a close relationship to other (incubating or TLP) projects 
we are thinking about if this project could fit into the incubator. Some 
examples for Apache projects that we see as “related” are Apache Flink (which 
we can use as the Streaming Engine to process the stream), (incubating) Edgent 
which we also can support as Streaming Engine and where we try to find a 
suitable project goal and community currently as some of the (P)PMC members 
retired or went inactive. Finally, CRUNCH has a very natural fit with PLC4X 
because it can directly process the data gathered form PLCs (and in fact we are 
already using it in some of our projects that way). I had several discussions 
with some of the (P)PMCs of PLC4X, namely Sebastian Rühl and Christpher Dutz wo 
encouraged me to introduce the project to the incubator because they also see 
some potential for the project to enrich the OSS ecosystem with regards to edge 
/ stream processing of (I)IoT data.

So please feel free to ask questions or discuss your view on this topic as I 
would like to find out if this project could fit in the Apache Ecosystem and 
the Incubator or not.

Thank you already!
Julian


Re: Hello World / CRUNCH Framework

2018-12-14 Thread Christofer Dutz
Hi Julian,

For me it always felt like crunch can't directly be compared to the other 
"streaming engines" as I see it as a bundle of a streaming engine and a higher 
level framework for doing typical industry operations on top of that.

I think the higher level library should actually be able to run on top of any 
of the other streaming engines we have. Does such a split make sense, or did I 
get something wrong? Perhaps it would make sense to evaluate Edgents stream 
processing and eventually merge in improvements. I don't see a need multiple 
edge stream frameworks especially if we have to revive the existing one.

I think an engine for higher level functions on top of a streaming engine of 
choice would be a great addition, because adding such logic to only one of the 
existing seems to be a waste.

Chris

Outlook for Android herunterladen


From: Julian Feinauer 
Sent: Friday, December 14, 2018 11:11:40 AM
To: general@incubator.apache.org
Subject: Hello World / CRUNCH Framework

Hi all,

I just joined the incubator ML and wanted to present myself and possibly also 
start a discussion about a software project we developed in the past.
But first things first. My name is Julian Feinauer and I come from Germany 
where I run two “start-up” companies where we work a lot on the “industrial 
IoT” topics, data science and processing of “larger amounts of data”. We love 
open source and so we love the ASF. Most notably, I closely follow the Apache 
Calcite project and hopefully find some time soon to contribute a bit more than 
in the last monts. Futhermore, I am engaged in the (incubating) PLC4X project 
as (P)PMC and in the  (incubating) Edgent project where I try to “revive” the 
community as new (P)PMC together with Christopher Dutz.

Now to the real topic. Over the last 3 years I started to develop a 
“Framework/Library” (currently a set of jars) to facilitate processing of 
timeseries data. The focus is mostly on processing of data from test stands, 
e.g., automotive tests, driving profiles and so on. Furthermore, in the recent 
year we added a lot of functionality for processing of “industrial data”. This 
means that we want to make it easy to analyze things like “how long did the 
machine spend in this state”, “when are the following set of bits set” or 
“nofity when the following conditions is true for the first time”.
It is a bit technical and I don’t want to go too deep into it, but generally 
speaking we try to introduce the “right” semantics to answer the typical 
questions when analyzing machine or test data. This project is called “CRUNCH” 
and we are in the process of making it open source (will be moved to a public 
github repo in this year) under the Apache 2.0 License.

As there can be seen a close relationship to other (incubating or TLP) projects 
we are thinking about if this project could fit into the incubator. Some 
examples for Apache projects that we see as “related” are Apache Flink (which 
we can use as the Streaming Engine to process the stream), (incubating) Edgent 
which we also can support as Streaming Engine and where we try to find a 
suitable project goal and community currently as some of the (P)PMC members 
retired or went inactive. Finally, CRUNCH has a very natural fit with PLC4X 
because it can directly process the data gathered form PLCs (and in fact we are 
already using it in some of our projects that way). I had several discussions 
with some of the (P)PMCs of PLC4X, namely Sebastian Rühl and Christpher Dutz wo 
encouraged me to introduce the project to the incubator because they also see 
some potential for the project to enrich the OSS ecosystem with regards to edge 
/ stream processing of (I)IoT data.

So please feel free to ask questions or discuss your view on this topic as I 
would like to find out if this project could fit in the Apache Ecosystem and 
the Incubator or not.

Thank you already!
Julian


Re: Hello World / CRUNCH Framework

2018-12-14 Thread Julian Feinauer
Hi Chris,

yes, you got it right.
We do not care about "how does this message get from this processing node to 
that".
We "transpile" the higher level input into a DAG which can then run on 
basically every streaming engine (I agree, we do NOT need yet another one), in 
that sense it is a bit like Apache Beam.
Thus, I do not see it as a contender to Edgent but more as a complementary, 
because edgents focus is more the engine and Cloud Communication and CRUNCHs 
focus is more of "what exactly does the pipeline do".

Julian

Am 14.12.18, 11:54 schrieb "Christofer Dutz" :

Hi Julian,

For me it always felt like crunch can't directly be compared to the other 
"streaming engines" as I see it as a bundle of a streaming engine and a higher 
level framework for doing typical industry operations on top of that.

I think the higher level library should actually be able to run on top of 
any of the other streaming engines we have. Does such a split make sense, or 
did I get something wrong? Perhaps it would make sense to evaluate Edgents 
stream processing and eventually merge in improvements. I don't see a need 
multiple edge stream frameworks especially if we have to revive the existing 
one.

I think an engine for higher level functions on top of a streaming engine 
of choice would be a great addition, because adding such logic to only one of 
the existing seems to be a waste.

Chris

Outlook for Android herunterladen


From: Julian Feinauer 
Sent: Friday, December 14, 2018 11:11:40 AM
To: general@incubator.apache.org
Subject: Hello World / CRUNCH Framework

Hi all,

I just joined the incubator ML and wanted to present myself and possibly 
also start a discussion about a software project we developed in the past.
But first things first. My name is Julian Feinauer and I come from Germany 
where I run two “start-up” companies where we work a lot on the “industrial 
IoT” topics, data science and processing of “larger amounts of data”. We love 
open source and so we love the ASF. Most notably, I closely follow the Apache 
Calcite project and hopefully find some time soon to contribute a bit more than 
in the last monts. Futhermore, I am engaged in the (incubating) PLC4X project 
as (P)PMC and in the  (incubating) Edgent project where I try to “revive” the 
community as new (P)PMC together with Christopher Dutz.

Now to the real topic. Over the last 3 years I started to develop a 
“Framework/Library” (currently a set of jars) to facilitate processing of 
timeseries data. The focus is mostly on processing of data from test stands, 
e.g., automotive tests, driving profiles and so on. Furthermore, in the recent 
year we added a lot of functionality for processing of “industrial data”. This 
means that we want to make it easy to analyze things like “how long did the 
machine spend in this state”, “when are the following set of bits set” or 
“nofity when the following conditions is true for the first time”.
It is a bit technical and I don’t want to go too deep into it, but 
generally speaking we try to introduce the “right” semantics to answer the 
typical questions when analyzing machine or test data. This project is called 
“CRUNCH” and we are in the process of making it open source (will be moved to a 
public github repo in this year) under the Apache 2.0 License.

As there can be seen a close relationship to other (incubating or TLP) 
projects we are thinking about if this project could fit into the incubator. 
Some examples for Apache projects that we see as “related” are Apache Flink 
(which we can use as the Streaming Engine to process the stream), (incubating) 
Edgent which we also can support as Streaming Engine and where we try to find a 
suitable project goal and community currently as some of the (P)PMC members 
retired or went inactive. Finally, CRUNCH has a very natural fit with PLC4X 
because it can directly process the data gathered form PLCs (and in fact we are 
already using it in some of our projects that way). I had several discussions 
with some of the (P)PMCs of PLC4X, namely Sebastian Rühl and Christpher Dutz wo 
encouraged me to introduce the project to the incubator because they also see 
some potential for the project to enrich the OSS ecosystem with regards to edge 
/ stream processing of (I)IoT data.

So please feel free to ask questions or discuss your view on this topic as 
I would like to find out if this project could fit in the Apache Ecosystem and 
the Incubator or not.

Thank you already!
Julian




Re: Hello World / CRUNCH Framework

2018-12-14 Thread Brian Devins-Suresh
Hi Julian,

This seems like a cool project. I'd like to point out one thing with your
name, there is already an Apache Crunch project: https://crunch.apache.org

On a couple collaborative notes, as a (incubating) Zipkin developer I could
see some use for your project in our community. We've had trouble building
reusable analytics components so if your framework would help facilitate
that then we might be able to get community buy-in for adopting.

My other collaboration point is it would be cool if it could be used with
the newly incubating IoTDB once that has all of its source available.

- Brian

On Fri, Dec 14, 2018 at 5:59 AM Julian Feinauer <
j.feina...@pragmaticminds.de> wrote:

> Hi Chris,
>
> yes, you got it right.
> We do not care about "how does this message get from this processing node
> to that".
> We "transpile" the higher level input into a DAG which can then run on
> basically every streaming engine (I agree, we do NOT need yet another one),
> in that sense it is a bit like Apache Beam.
> Thus, I do not see it as a contender to Edgent but more as a
> complementary, because edgents focus is more the engine and Cloud
> Communication and CRUNCHs focus is more of "what exactly does the pipeline
> do".
>
> Julian
>
> Am 14.12.18, 11:54 schrieb "Christofer Dutz" :
>
> Hi Julian,
>
> For me it always felt like crunch can't directly be compared to the
> other "streaming engines" as I see it as a bundle of a streaming engine and
> a higher level framework for doing typical industry operations on top of
> that.
>
> I think the higher level library should actually be able to run on top
> of any of the other streaming engines we have. Does such a split make
> sense, or did I get something wrong? Perhaps it would make sense to
> evaluate Edgents stream processing and eventually merge in improvements. I
> don't see a need multiple edge stream frameworks especially if we have to
> revive the existing one.
>
> I think an engine for higher level functions on top of a streaming
> engine of choice would be a great addition, because adding such logic to
> only one of the existing seems to be a waste.
>
> Chris
>
> Outlook for Android herunterladen
>
> 
> From: Julian Feinauer 
> Sent: Friday, December 14, 2018 11:11:40 AM
> To: general@incubator.apache.org
> Subject: Hello World / CRUNCH Framework
>
> Hi all,
>
> I just joined the incubator ML and wanted to present myself and
> possibly also start a discussion about a software project we developed in
> the past.
> But first things first. My name is Julian Feinauer and I come from
> Germany where I run two “start-up” companies where we work a lot on the
> “industrial IoT” topics, data science and processing of “larger amounts of
> data”. We love open source and so we love the ASF. Most notably, I closely
> follow the Apache Calcite project and hopefully find some time soon to
> contribute a bit more than in the last monts. Futhermore, I am engaged in
> the (incubating) PLC4X project as (P)PMC and in the  (incubating) Edgent
> project where I try to “revive” the community as new (P)PMC together with
> Christopher Dutz.
>
> Now to the real topic. Over the last 3 years I started to develop a
> “Framework/Library” (currently a set of jars) to facilitate processing of
> timeseries data. The focus is mostly on processing of data from test
> stands, e.g., automotive tests, driving profiles and so on. Furthermore, in
> the recent year we added a lot of functionality for processing of
> “industrial data”. This means that we want to make it easy to analyze
> things like “how long did the machine spend in this state”, “when are the
> following set of bits set” or “nofity when the following conditions is true
> for the first time”.
> It is a bit technical and I don’t want to go too deep into it, but
> generally speaking we try to introduce the “right” semantics to answer the
> typical questions when analyzing machine or test data. This project is
> called “CRUNCH” and we are in the process of making it open source (will be
> moved to a public github repo in this year) under the Apache 2.0 License.
>
> As there can be seen a close relationship to other (incubating or TLP)
> projects we are thinking about if this project could fit into the
> incubator. Some examples for Apache projects that we see as “related” are
> Apache Flink (which we can use as the Streaming Engine to process the
> stream), (incubating) Edgent which we also can support as Streaming Engine
> and where we try to find a suitable project goal and community currently as
> some of the (P)PMC members retired or went inactive. Finally, CRUNCH has a
> very natural fit with PLC4X because it can directly process the data
> gathered form PLCs (and in fact we are already using it in some of our
> projects that way). I had several discussions with some of the (P)PMCs of
> PLC4X, namely Sebastian

Re: Hello World / CRUNCH Framework

2018-12-14 Thread Julian Feinauer
Hi Brian,

thanks for your email.
Regarding your hint, we know about the "real" CRUNCH project and are willing to 
change the projects name if we would enter the incubator (sadly we are 
massively lacking creativity...).
 
Regarding Zipkin, I could imagine that some of the things we are doing fits 
well with use cases but I have to admit that I never went too deep into Zipkin 
because it looks to crazy what you are doing there.
And in fact, this is what we are mostly looking for, 

Regarding IoTDB... I was very excited when I heard of the project (and also 
looked through the Codebase in the "old" github repo) and I really like it. We 
use parquet in some situations but IotDB is definetly better suited for the 
specific workloads we have in mind. Thus, I am really looking forward to the 
project really starting, from a PLC4X, Edgent and CRUNCH dev perspective.

Julian

PS.: Do you have some sample questions or analytics that you have in mind for 
Zipkin, to get a feeling?

Am 14.12.18, 16:30 schrieb "Brian Devins-Suresh" :

Hi Julian,

This seems like a cool project. I'd like to point out one thing with your
name, there is already an Apache Crunch project: https://crunch.apache.org

On a couple collaborative notes, as a (incubating) Zipkin developer I could
see some use for your project in our community. We've had trouble building
reusable analytics components so if your framework would help facilitate
that then we might be able to get community buy-in for adopting.

My other collaboration point is it would be cool if it could be used with
the newly incubating IoTDB once that has all of its source available.

- Brian

On Fri, Dec 14, 2018 at 5:59 AM Julian Feinauer <
j.feina...@pragmaticminds.de> wrote:

> Hi Chris,
>
> yes, you got it right.
> We do not care about "how does this message get from this processing node
> to that".
> We "transpile" the higher level input into a DAG which can then run on
> basically every streaming engine (I agree, we do NOT need yet another 
one),
> in that sense it is a bit like Apache Beam.
> Thus, I do not see it as a contender to Edgent but more as a
> complementary, because edgents focus is more the engine and Cloud
> Communication and CRUNCHs focus is more of "what exactly does the pipeline
> do".
>
> Julian
>
> Am 14.12.18, 11:54 schrieb "Christofer Dutz" :
>
> Hi Julian,
>
> For me it always felt like crunch can't directly be compared to the
> other "streaming engines" as I see it as a bundle of a streaming engine 
and
> a higher level framework for doing typical industry operations on top of
> that.
>
> I think the higher level library should actually be able to run on top
> of any of the other streaming engines we have. Does such a split make
> sense, or did I get something wrong? Perhaps it would make sense to
> evaluate Edgents stream processing and eventually merge in improvements. I
> don't see a need multiple edge stream frameworks especially if we have to
> revive the existing one.
>
> I think an engine for higher level functions on top of a streaming
> engine of choice would be a great addition, because adding such logic to
> only one of the existing seems to be a waste.
>
> Chris
>
> Outlook for Android herunterladen
>
> 
> From: Julian Feinauer 
> Sent: Friday, December 14, 2018 11:11:40 AM
> To: general@incubator.apache.org
> Subject: Hello World / CRUNCH Framework
>
> Hi all,
>
> I just joined the incubator ML and wanted to present myself and
> possibly also start a discussion about a software project we developed in
> the past.
> But first things first. My name is Julian Feinauer and I come from
> Germany where I run two “start-up” companies where we work a lot on the
> “industrial IoT” topics, data science and processing of “larger amounts of
> data”. We love open source and so we love the ASF. Most notably, I closely
> follow the Apache Calcite project and hopefully find some time soon to
> contribute a bit more than in the last monts. Futhermore, I am engaged in
> the (incubating) PLC4X project as (P)PMC and in the  (incubating) Edgent
> project where I try to “revive” the community as new (P)PMC together with
> Christopher Dutz.
>
> Now to the real topic. Over the last 3 years I started to develop a
> “Framework/Library” (currently a set of jars) to facilitate processing of
> timeseries data. The focus is mostly on processing of data from test
> stands, e.g., automotive tests, driving profiles and so on. Furthermore, 
in
> the recent year we added a lot of functionality for processing of
> “industrial data”

Re: Hello World / CRUNCH Framework

2018-12-14 Thread Julian Feinauer
Hi Brian,

thanks for your email!
Regarding your hint, we know about the "real" CRUNCH project and are willing to 
change the projects name if we would enter the incubator (sadly we are 
massively lacking creativity...).

Regarding Zipkin, I could imagine that some of the things we are doing fits 
well with use cases but I have to admit that I never went too deep into Zipkin 
because it looks to crazy what you are doing there.
And in fact, this is what we are mostly looking for, combatants and people that 
bring in different perspectives. We already learned so much while discussing it 
with some PLC4X folks.

Regarding IoTDB... I was very excited when I heard of the project (and also 
looked through the Codebase in the "old" github repo) and I really like it. We 
use parquet in some situations but IotDB is definetly better suited for the 
specific workloads we have in mind. Thus, I am really looking forward to the 
project really starting, from a PLC4X, Edgent and CRUNCH dev perspective.

Julian

Am 14.12.18, 16:30 schrieb "Brian Devins-Suresh" :

Hi Julian,

This seems like a cool project. I'd like to point out one thing with your
name, there is already an Apache Crunch project: https://crunch.apache.org

On a couple collaborative notes, as a (incubating) Zipkin developer I could
see some use for your project in our community. We've had trouble building
reusable analytics components so if your framework would help facilitate
that then we might be able to get community buy-in for adopting.

My other collaboration point is it would be cool if it could be used with
the newly incubating IoTDB once that has all of its source available.

- Brian

On Fri, Dec 14, 2018 at 5:59 AM Julian Feinauer <
j.feina...@pragmaticminds.de> wrote:

> Hi Chris,
>
> yes, you got it right.
> We do not care about "how does this message get from this processing node
> to that".
> We "transpile" the higher level input into a DAG which can then run on
> basically every streaming engine (I agree, we do NOT need yet another 
one),
> in that sense it is a bit like Apache Beam.
> Thus, I do not see it as a contender to Edgent but more as a
> complementary, because edgents focus is more the engine and Cloud
> Communication and CRUNCHs focus is more of "what exactly does the pipeline
> do".
>
> Julian
>
> Am 14.12.18, 11:54 schrieb "Christofer Dutz" :
>
> Hi Julian,
>
> For me it always felt like crunch can't directly be compared to the
> other "streaming engines" as I see it as a bundle of a streaming engine 
and
> a higher level framework for doing typical industry operations on top of
> that.
>
> I think the higher level library should actually be able to run on top
> of any of the other streaming engines we have. Does such a split make
> sense, or did I get something wrong? Perhaps it would make sense to
> evaluate Edgents stream processing and eventually merge in improvements. I
> don't see a need multiple edge stream frameworks especially if we have to
> revive the existing one.
>
> I think an engine for higher level functions on top of a streaming
> engine of choice would be a great addition, because adding such logic to
> only one of the existing seems to be a waste.
>
> Chris
>
> Outlook for Android herunterladen
>
> 
> From: Julian Feinauer 
> Sent: Friday, December 14, 2018 11:11:40 AM
> To: general@incubator.apache.org
> Subject: Hello World / CRUNCH Framework
>
> Hi all,
>
> I just joined the incubator ML and wanted to present myself and
> possibly also start a discussion about a software project we developed in
> the past.
> But first things first. My name is Julian Feinauer and I come from
> Germany where I run two “start-up” companies where we work a lot on the
> “industrial IoT” topics, data science and processing of “larger amounts of
> data”. We love open source and so we love the ASF. Most notably, I closely
> follow the Apache Calcite project and hopefully find some time soon to
> contribute a bit more than in the last monts. Futhermore, I am engaged in
> the (incubating) PLC4X project as (P)PMC and in the  (incubating) Edgent
> project where I try to “revive” the community as new (P)PMC together with
> Christopher Dutz.
>
> Now to the real topic. Over the last 3 years I started to develop a
> “Framework/Library” (currently a set of jars) to facilitate processing of
> timeseries data. The focus is mostly on processing of data from test
> stands, e.g., automotive tests, driving profiles and so on. Furthermore, 
in
> the recent year we added a lot of functionality for processing of