Re: [DISCUSSION] METRON-1046 -> Stellar Files for multiple statement execution

2017-07-14 Thread Matt Foley
Yes, if the files are also stored in ZK, Curator can watch them, but it would 
require extension work in our Curator usage.  It currently manages a single 
tree cache.  Managing free-floating files would require careful design work.

Casey, my reference to JSON Pointers was itself the result of a 5-minute 
search; I inferred they might exist and searched for them :-)  But they should 
at least be looked into before we roll our own, especially if they do happen to 
work with Curator.  The initial pointers are:

Structuring a complex schema — Understanding JSON Schema 1.0 
...<https://spacetelescope.github.io/understanding-json-schema/structuring.html>
https://spacetelescope.github.io/understanding-json-schema/structuring.html
Likewise in JSON Schema, for anything but the most trivial schema, it's really 
useful ... $ref can also be a relative or absolute URI, so if you prefer to 
include your ...

RapidJSON: 
Pointer<https://www.google.com/url?sa=t=j==s=web=2=rja=8=0ahUKEwjllbnBrInVAhUhxoMKHbzpDcYQFggtMAE=http%3A%2F%2Frapidjson.org%2Fmd_doc_pointer.html=AFQjCNF1caePlOakrLfmwLDHt589CE-VtA>
http://rapidjson.org/md_doc_pointer.html
(This feature was released in v1.1.0). JSON Pointer is a standardized (RFC6901) 
way to select a value inside a JSON Document (DOM). This can be analogous ...

RFC 6901 - JavaScript Object Notation (JSON) Pointer - IETF 
Tools<https://tools.ietf.org/html/rfc6901>
https://tools.ietf.org/html/rfc6901
by M Nottingham - ‎2013 - ‎Related 
articles<https://scholar.google.com/scholar?um=1=UTF-8=related:Hx52_JmfwB4xYM:scholar.google.com/>
Abstract JSON Pointer defines a string syntax for identifying a specific value 
within a JavaScript Object Notation (JSON) document. Status of This Memo This 
is ...

JSON API — Latest Specification 
(v1.0)<https://www.google.com/url?sa=t=j==s=web=4=rja=8=0ahUKEwjmxbjarYnVAhXn7oMKHbZwBb4QFghCMAM=http%3A%2F%2Fjsonapi.org%2Fformat%2F=AFQjCNHbw71ba-0s-bXqK-IX6w0LK0PEvg>
http://jsonapi.org/format/
This page represents the latest published version of JSON API, which is 
currently 1.0 ... This section describes the structure of a JSON API document, 
which is ...
[Note: this references JSON Pointer as a standard entity [RFC6901]  but not as 
part of the JSON spec.]


From: Otto Fowler <ottobackwa...@gmail.com>
Date: Friday, July 14, 2017 at 10:42 AM
To: Matt Foley <mfo...@hortonworks.com>, "dev@metron.apache.org" 
<dev@metron.apache.org>
Subject: Re: [DISCUSSION] METRON-1046 -> Stellar Files for multiple statement 
execution

I think the ‘files’ should be stored in zk, and updated with the same mechanism.


On July 14, 2017 at 10:34:36, Casey Stella 
ceste...@gmail.com<mailto:ceste...@gmail.com> wrote:

Just chiming in on a part of this: definitely we do not want to lose
automatic config updates (at least, I'd be strongly, strongly STRONGLY
against it).

I definitely agree that JSON files could easily get unwieldy.  I don't know
anything about JSON pointers, could you cover that briefly, Matt?  Even a
URL or two to get started would be great.  Basic googling (while on
vacation) yielded that it was something like xpath for json, but I probably
just googled the wrong thing.

Casey


On July 14, 2017 at 13:27:36, Matt Foley 
(mfo...@hortonworks.com<mailto:mfo...@hortonworks.com>) wrote:
In the abstract, this is a good idea. I see it as related to METRON-987, which 
was the first step in allowing sequences of Stellar statements (aka "programs" 
:-) ) instead of just unrelated groups of single statements. Your proposal lets 
us really work with programs as first-class entities.

However, some concerns need to be resolved:

1. Syntax.

Currently Stellar syntax and JSON fit neatly together. Where would be the cut 
line for file substitutions? Referencing METRON-987, would you only allow a 
file substitution where we currently allow square-bracketed Stellar string 
sequences? What about Profile config syntax, where several chunks of code are 
intimately related (hence want to be located in the same file), but don't all 
get executed at the same time? (This is not a showstopper question because 
Profile configs are usually simple and don't really need file substitution. The 
need is much greater in Enrichment.)

2. Config Updates.

Currently Metron configuration is stored in ZK, but managed through Curator 
libraries. In return for considerable complexity, this gives instant update 
whenever a config changes, without effort in the BI part of the application. 
This differs sharply from file-based configuration, where updates in response 
to config changes require either a restart, an explicit reload command from the 
user, or frequent state-checking in the application.

So currently people trying to develop a new enrichment can update the config, 
and immediately test the result, without restarting and without any explicit 
reload command. We probably want to not lose this.

Rat

Re: [DISCUSSION] METRON-1046 -> Stellar Files for multiple statement execution

2017-07-14 Thread Ryan Merriman
A couple things I would like to point out.  You can test Stellar statements
without having to send data through parser/enrichment topologies.  There is
a REST endpoint that allows you to pass in a sample message and parser
config and returns a message with Stellar statements applied.  This could
easily be expanded to enrichment configs or testing generic stellar
statements against test messages.

Moving statements to a separate file is going to require a lot of work and
will make our mechanism for managing configuration in bolts more complex.
We would have to also listen for changes in these files and reconcile which
parser/enrichment configs are affected.



On Fri, Jul 14, 2017 at 12:42 PM, Otto Fowler 
wrote:

> I think the ‘files’ should be stored in zk, and updated with the same
> mechanism.
>
> On July 14, 2017 at 13:27:36, Matt Foley (mfo...@hortonworks.com) wrote:
>
> In the abstract, this is a good idea. I see it as related to METRON-987,
> which was the first step in allowing sequences of Stellar statements (aka
> "programs" :-) ) instead of just unrelated groups of single statements.
> Your proposal lets us really work with programs as first-class entities.
>
> However, some concerns need to be resolved:
>
> 1. Syntax.
>
> Currently Stellar syntax and JSON fit neatly together. Where would be the
> cut line for file substitutions? Referencing METRON-987, would you only
> allow a file substitution where we currently allow square-bracketed Stellar
> string sequences? What about Profile config syntax, where several chunks of
> code are intimately related (hence want to be located in the same file),
> but don't all get executed at the same time? (This is not a showstopper
> question because Profile configs are usually simple and don't really need
> file substitution. The need is much greater in Enrichment.)
>
> 2. Config Updates.
>
> Currently Metron configuration is stored in ZK, but managed through Curator
> libraries. In return for considerable complexity, this gives instant update
> whenever a config changes, without effort in the BI part of the
> application. This differs sharply from file-based configuration, where
> updates in response to config changes require either a restart, an explicit
> reload command from the user, or frequent state-checking in the
> application.
>
> So currently people trying to develop a new enrichment can update the
> config, and immediately test the result, without restarting and without any
> explicit reload command. We probably want to not lose this.
>
> Rather than roll our own file pointer model, can we use JSON Pointers? Will
> they work with Curator? Both of those get into some fairly obscure
> features, that would need to be studied. It also actually relates to the
> syntax question presented above.
>
>
> On 7/14/17, 6:17 AM, "Otto Fowler"  wrote:
>
> https://issues.apache.org/jira/browse/METRON-1046
>
> I was thinking this morning that managing stellar statements in the config
> json could become, and maybe is kind of unwieldy.
> To that end, if in say a parser configuration I can refer to a ‘file’ in
> zookeeper as an alternative, we would add the capability to execute and
> manage more complex statements, and even chain multiple statements
> together.
>
> These files could be shared as well.
>
> This could be a Bad Idea™, so I thought I’d throw it out to the list.
>
> Please check out the jira, give some thought, and comment there or on the
> list or both.
>
> O
>


Re: [DISCUSSION] METRON-1046 -> Stellar Files for multiple statement execution

2017-07-14 Thread Otto Fowler
I think the ‘files’ should be stored in zk, and updated with the same
mechanism.

On July 14, 2017 at 13:27:36, Matt Foley (mfo...@hortonworks.com) wrote:

In the abstract, this is a good idea. I see it as related to METRON-987,
which was the first step in allowing sequences of Stellar statements (aka
"programs" :-) ) instead of just unrelated groups of single statements.
Your proposal lets us really work with programs as first-class entities.

However, some concerns need to be resolved:

1. Syntax.

Currently Stellar syntax and JSON fit neatly together. Where would be the
cut line for file substitutions? Referencing METRON-987, would you only
allow a file substitution where we currently allow square-bracketed Stellar
string sequences? What about Profile config syntax, where several chunks of
code are intimately related (hence want to be located in the same file),
but don't all get executed at the same time? (This is not a showstopper
question because Profile configs are usually simple and don't really need
file substitution. The need is much greater in Enrichment.)

2. Config Updates.

Currently Metron configuration is stored in ZK, but managed through Curator
libraries. In return for considerable complexity, this gives instant update
whenever a config changes, without effort in the BI part of the
application. This differs sharply from file-based configuration, where
updates in response to config changes require either a restart, an explicit
reload command from the user, or frequent state-checking in the
application.

So currently people trying to develop a new enrichment can update the
config, and immediately test the result, without restarting and without any
explicit reload command. We probably want to not lose this.

Rather than roll our own file pointer model, can we use JSON Pointers? Will
they work with Curator? Both of those get into some fairly obscure
features, that would need to be studied. It also actually relates to the
syntax question presented above.


On 7/14/17, 6:17 AM, "Otto Fowler"  wrote:

https://issues.apache.org/jira/browse/METRON-1046

I was thinking this morning that managing stellar statements in the config
json could become, and maybe is kind of unwieldy.
To that end, if in say a parser configuration I can refer to a ‘file’ in
zookeeper as an alternative, we would add the capability to execute and
manage more complex statements, and even chain multiple statements
together.

These files could be shared as well.

This could be a Bad Idea™, so I thought I’d throw it out to the list.

Please check out the jira, give some thought, and comment there or on the
list or both.

O


Re: [DISCUSSION] METRON-1046 -> Stellar Files for multiple statement execution

2017-07-14 Thread Casey Stella
Just chiming in on a part of this: definitely we do not want to lose
automatic config updates (at least, I'd be strongly, strongly STRONGLY
against it).

I definitely agree that JSON files could easily get unwieldy.  I don't know
anything about JSON pointers, could you cover that briefly, Matt?  Even a
URL or two to get started would be great.  Basic googling (while on
vacation) yielded that it was something like xpath for json, but I probably
just googled the wrong thing.

Casey

On Fri, Jul 14, 2017 at 6:27 PM, Matt Foley  wrote:

> In the abstract, this is a good idea.  I see it as related to METRON-987,
> which was the first step in allowing sequences of Stellar statements (aka
> "programs" :-) ) instead of just unrelated groups of single statements.
> Your proposal lets us really work with programs as first-class entities.
>
> However, some concerns need to be resolved:
>
> 1. Syntax.
>
> Currently Stellar syntax and JSON fit neatly together.  Where would be the
> cut line for file substitutions?  Referencing METRON-987, would you only
> allow a file substitution where we currently allow square-bracketed Stellar
> string sequences?  What about Profile config syntax, where several chunks
> of code are intimately related (hence want to be located in the same file),
> but don't all get executed at the same time? (This is not a showstopper
> question because Profile configs are usually simple and don't really need
> file substitution.  The need is much greater in Enrichment.)
>
> 2. Config Updates.
>
> Currently Metron configuration is stored in ZK, but managed through
> Curator libraries.  In return for considerable complexity, this gives
> instant update whenever a config changes, without effort in the BI part of
> the application.  This differs sharply from file-based configuration, where
> updates in response to config changes require either a restart, an explicit
> reload command from the user, or frequent state-checking in the application.
>
> So currently people trying to develop a new enrichment can update the
> config, and immediately test the result, without restarting and without any
> explicit reload command.  We probably want to not lose this.
>
> Rather than roll our own file pointer model, can we use JSON Pointers?
> Will they work with Curator?  Both of those get into some fairly obscure
> features, that would need to be studied.  It also actually relates to the
> syntax question presented above.
>
>
> On 7/14/17, 6:17 AM, "Otto Fowler"  wrote:
>
> https://issues.apache.org/jira/browse/METRON-1046
>
> I was thinking this morning that managing stellar statements in the
> config
> json could become, and maybe is kind of unwieldy.
> To that end, if in say a parser configuration I can refer to a ‘file’
> in
> zookeeper as an alternative, we would add the capability to execute and
> manage more complex statements, and even chain multiple statements
> together.
>
> These files could be shared as well.
>
> This could be a Bad Idea™, so I thought I’d throw it out to the list.
>
> Please check out the jira, give some thought, and comment there or on
> the
> list or both.
>
> O
>
>
>