Re: [Bro-Dev] design summary: porting Bro scripts to use Broker
> On Oct 6, 2017, at 5:58 PM, Robin Sommerwrote: > > In the most simple version of this, the cluster framework would just > hard-code a subscription to "bro/cluster/". And then scripts like the > Intel framework would just publish all their events to "bro/cluster/" > directly through Broker. I just noticed that Bro configures Broker to use its new automatic multihop message forwarding which interacts poorly with a generic “bro/cluster” topic that every node subscribes to. When configuring a simple cluster of 1 manager, 1 worker, and 1 proxy using the traditional cluster layout (worker connects to both, and proxy connects to manager), I wanted nodes to keep track of which peers are still alive. To do this I have a simple “hello” event that is sent on seeing a new connection containing the needed information (a broker node id mapping to cluster node name). Sending that event over the “bro/cluster” topic causes it to be routed around until the TTL kills it. In this particular case, maybe not so bad since it’s expected to happen infrequently, but doesn’t seem like something that’s desirable or intuitive in a general sense. It’s trivial to just disable automatic message forwarding via a global flag, though before going that way, I want to check if I’m missing other context/use-cases. For the current script-porting work, are there plans/expectations to use automatic message forwarding or to change the traditional cluster topology so it doesn’t contain cycles? - Jon ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] design summary: porting Bro scripts to use Broker
> On Oct 10, 2017, at 2:10 PM, Johanna Amannwrote: > > it - after you set up a store with Broker::InitStore, how do you interact > with Software::tracked_store? Probably best to look at this Broker script API: https://github.com/bro/bro/blob/topic/actor-system/scripts/base/frameworks/broker/store.bro e.g. you have get/push/etc. operations you can do on it, like this example: https://github.com/bro/bro/blob/topic/actor-system/testing/btest/broker/store/ops.bro > I am especially curious how this handles the strong typing of Bro. All data in stores are an opaque Data type and the store operations (e.g. from API in link above) implicitly convert Bro types into that type. Then when retrieving data from a store, to convert Data to Bro values, you can use new ‘is’ or ‘as’ operators or a new type-based-switch-statement. Example: https://github.com/bro/bro/blob/topic/actor-system/testing/btest/broker/store/type-conversion.bro - Jon ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] design summary: porting Bro scripts to use Broker
On Mon, Oct 09, 2017 at 19:33 +, you wrote: > It’s just a matter of where you expect most users to feel comfortable > making customizations: in Bro scripts or in a broctl config file. True, though I think that applies to much of Bro's configuration, like the logging for example. Either way, starting with with script-only customization and then reevaluate later sounds good. > Maybe the key point is that these customizations only make sense to > happen once before init time? Yeah, that's right, changing store attributes afterwards seems unlikely. From that perspective I get the redef approach. I was more thinking about consistency with other script APIs. We use redef for simple tuning (single-value options, timeouts, etc), but less these days for more complex setups (see logging and input frameworks). I'd be interested to hear what other people prefer. Robin -- Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] design summary: porting Bro scripts to use Broker
> Script-Author Example Usage > --- > > # Script author that wants to utilize data stores doesn't have to be aware of > # whether user is running a cluster or if they want to use persistent storage > # backends. > > const Software::tracked_store_name = "bro/framework/software/tracked" > > global Software::tracked_store: opaque of Broker::Store; > > event bro_init() = +10 > { > Software::tracked_store = Broker::InitStore(Software::tracked_store_name); > } I hope that this was not already answered somewhere else and I just missed it - after you set up a store with Broker::InitStore, how do you interact with Software::tracked_store? I am especially curious how this handles the strong typing of Bro. Johanna ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] design summary: porting Bro scripts to use Broker
> On Fri, Oct 06, 2017 at 16:53 +, you wrote: > > > # contains topic prefixes > > const Cluster::manager_subscriptions: set[string] > > > > # contains (topic string, event name) pairs > > const Cluster::manager_publications: set[string, string] > > I'm wondering if we can simplify this with Broker. With the old comm > system we needed the event names because that's what was subscribed > to. Now that we have topics, does the cluster framework still need to > know about the events at all? I'm thinking we could just go with a > topic convention and then the various scripts would publish there > directly. > > In the most simple version of this, the cluster framework would just > hard-code a subscription to "bro/cluster/". And then scripts like the > Intel framework would just publish all their events to "bro/cluster/" > directly through Broker. > > To allow for distinguishing by node type we can define separate topic > hierarchies: "bro/cluster/{manager,worker,logger}/". Each node > subscribes to the hierarchy corresponding to its type, and each script > publishes according to where it wants to send events to (again > directly using the Broker API). > > I think we could fit in Justin's hashing here too: We add per node > topics as well ("bro/cluster/node/worker-1/", > "bro/cluster/node/worker-2/", etc.) and then the cluster framework can > provide a function that maps a hash key to a topic that corresponds to > currently active node: > > local topic = Cluster:topic_for_key("abcdef"); # Returns, e.g., > "bro/cluster/node/worker-1" > Broker::publish(topic, event); > > And that scheme may suggest that instead of hard-coding topics on the > sender side, the Cluster framework could generally provide a set of > functions to retrieve the right topic: > > # In SumStats framework: > local topic = Cluster::topic_for_manager() # Returns > "bro/cluster/manager". > Broker::public(topic, event); > > Bottom-line: If we can find a way to steer information by setting up > topics appropriately, we might not need much additional configuration > at all. Just to add my two cents here - I like this a whole lot better and agree that mostly steering events through topics seems like a neat choice. Johanna ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] design summary: porting Bro scripts to use Broker
> On Oct 6, 2017, at 5:58 PM, Robin Sommerwrote: > > In the most simple version of this, the cluster framework would just > hard-code a subscription to "bro/cluster/". And then scripts like the > Intel framework would just publish all their events to "bro/cluster/" > directly through Broker. > > To allow for distinguishing by node type we can define separate topic > hierarchies: "bro/cluster/{manager,worker,logger}/". Each node > subscribes to the hierarchy corresponding to its type, and each script > publishes according to where it wants to send events to (again > directly using the Broker API). Yeah, that could be a better way to approach it, thanks. I’ll try to go back and rework the design around that topic hierarchy/naming convention (that was the part I was most unsure about). > >> Broker Framework API >> > > I'm wondering if these store operations should become part of the > Cluster framework instead. If we added them to the Broker framework, > we'd have two separate store APIs there: one low-level version mapping > directly to the C++ Broker API, and one higher-level that configures > things like location of the DB files. That could be confusing. Yeah could be. I’ll try moving more stuff into Cluster and see if it still makes sense to me. > I like this. One additional idea: while I see that it's generally the > user who wants to configure which backend to use, the script author > may know already if it's data that should be persistent across > execution; I'm guessing that's usually implied by the script's > semantics. We could give InitStore() an additional boolean > "persistent" to indicate that. Ack. >> # User needs to be able to choose data store backends and which cluster node >> the >> # the master store lives on. They can either do this manually, or BroControl >> # will autogenerate the following in cluster-layout.bro: > > I don't really like the idea of autogenerating this, as it's pretty > complex information. Usually, the Broker::default_* values should be > fine, right? For the few cases where one wants to tweak that on a > per-store bassis, using a manual redef on the table sounds fine to me. It’s just a matter of where you expect most users to feel comfortable making customizations: in Bro scripts or in a broctl config file. I think it’s fine to first assume it won’t be needed often and so only provide the customization via Bro scripts directly. If we learn later that it’s a pain point for users, it’s easy add the "simpler" config file interface via broctl to help autogenerate it. > Hmm, actually, what would you think about using functions instead of > tables? We could model this similar to how the logging framework does > filters: there's a default filter installed, but you can retrieve and > update it. Here there'd be a default StoreInfo, which one can update. I think I went with the ‘redef’ interface first because it’s impossible for a user to screw up order of operations there, where with functions you can (technically) have some mishaps on bro_init() since the InitStore() function is also going to be running in bro_init(). Maybe the key point is that these customizations only make sense to happen once before init time? i.e. a function would imply calling it anytime at runtime could yield a useful result, but at the moment, we’re not allowing changing a store’s backend or master node dynamically at runtime, just once before bro_init(). So if you think that’s something to anticipate in the future, I’d agree that just using functions from the start would be better. >> redef Broker::default_store_dir = "/home/jon/stores"; > > Can the default_store_dir be set to some standard location through > BroControl? Would be neat if this all just worked in the standard case > without any custom configuration at all. Yeah, should be possible, I think I had just given a random example in the above. - Jon ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] design summary: porting Bro scripts to use Broker
> On Oct 9, 2017, at 2:08 PM, Siwek, Jonwrote: > > >> I got send_event_hashed to work via a bit of a hack >> (https://github.com/JustinAzoff/broker_distributed_events/blob/master/distributed_broker.bro), >> but it needs support from inside broker or at least the bro/broker >> integration to work properly in the case of node failure. >> >> My ultimate vision is a cluster with 2+ physical datanode/manager/logger >> boxes where one box can fail and the cluster will continue to function >> perfectly. >> The only thing this requires is a send_event_hashed function that does >> consistent ring hashing and is aware of node failure. > > Yeah, that sounds like a good idea that I can try to work into the design. > What is a “data node” though? We don’t currently have that? We did at one point, see topic/seth/broker-merge / topic/mfischer/broker-integration The data node replaced the proxies and did stuff related to broker data stores. I think the idea was that a data node process would own the broker data store. My usage of data nodes was for scaling out data aggregation, I never did anything with the data stores. The data nodes were just a place to stream scan attempts to for aggregation. > More broadly, it sounds like a user needs a way to specify which nodes they > want to belong to a worker pool, do you still imagine that is done like you > had in the example broctl.cfg from the earlier thread? Do you need to be > able to specify more than one type of pool? People have asked for this now as solution for fixing an overloaded manager process, but if we get load balancing/failover working as well as QoS/priorities there may not be a point into statically configuring things like that.. like someone might want to do # a node for tracking spam [spam] type = data/spam # a node for sumstats [sumstats] type = data/sumstats # a node for known hosts/certs/etc tracking [known] Type = data/known But I think just having the ability to do [data] type = data lb_procs = 6 This would work better for everyone. Sending one type of data to one type of data node is still going to eventually overload a single process. >> For things that don't need necessarily need consistent partitioning - like >> maybe logs if you were using Kafka, a way to designate that a topic should >> be distributed round-robin between subscribers would be useful too. > > Yeah, that seems like it would require pretty much the same set of > functionality to get working and then user can just specify a different > function to use for distributing events (e.g. hash vs. round-robin). > > - Jon Great! Right now broctl configures this in a 'round-robin' type way by assigning every other worker to a different logger node. With support for this in broker it could just connect every worker to every logger process and broker could handle the load balancing/failover. — Justin Azoff ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] design summary: porting Bro scripts to use Broker
> I got send_event_hashed to work via a bit of a hack > (https://github.com/JustinAzoff/broker_distributed_events/blob/master/distributed_broker.bro), > but it needs support from inside broker or at least the bro/broker > integration to work properly in the case of node failure. > > My ultimate vision is a cluster with 2+ physical datanode/manager/logger > boxes where one box can fail and the cluster will continue to function > perfectly. > The only thing this requires is a send_event_hashed function that does > consistent ring hashing and is aware of node failure. Yeah, that sounds like a good idea that I can try to work into the design. What is a “data node” though? We don’t currently have that? More broadly, it sounds like a user needs a way to specify which nodes they want to belong to a worker pool, do you still imagine that is done like you had in the example broctl.cfg from the earlier thread? Do you need to be able to specify more than one type of pool? > For things that don't need necessarily need consistent partitioning - like > maybe logs if you were using Kafka, a way to designate that a topic should be > distributed round-robin between subscribers would be useful too. Yeah, that seems like it would require pretty much the same set of functionality to get working and then user can just specify a different function to use for distributing events (e.g. hash vs. round-robin). - Jon ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] design summary: porting Bro scripts to use Broker
Nice! On Fri, Oct 06, 2017 at 16:53 +, you wrote: > # contains topic prefixes > const Cluster::manager_subscriptions: set[string] > > # contains (topic string, event name) pairs > const Cluster::manager_publications: set[string, string] I'm wondering if we can simplify this with Broker. With the old comm system we needed the event names because that's what was subscribed to. Now that we have topics, does the cluster framework still need to know about the events at all? I'm thinking we could just go with a topic convention and then the various scripts would publish there directly. In the most simple version of this, the cluster framework would just hard-code a subscription to "bro/cluster/". And then scripts like the Intel framework would just publish all their events to "bro/cluster/" directly through Broker. To allow for distinguishing by node type we can define separate topic hierarchies: "bro/cluster/{manager,worker,logger}/". Each node subscribes to the hierarchy corresponding to its type, and each script publishes according to where it wants to send events to (again directly using the Broker API). I think we could fit in Justin's hashing here too: We add per node topics as well ("bro/cluster/node/worker-1/", "bro/cluster/node/worker-2/", etc.) and then the cluster framework can provide a function that maps a hash key to a topic that corresponds to currently active node: local topic = Cluster:topic_for_key("abcdef"); # Returns, e.g., "bro/cluster/node/worker-1" Broker::publish(topic, event); And that scheme may suggest that instead of hard-coding topics on the sender side, the Cluster framework could generally provide a set of functions to retrieve the right topic: # In SumStats framework: local topic = Cluster::topic_for_manager() # Returns "bro/cluster/manager". Broker::public(topic, event); Bottom-line: If we can find a way to steer information by setting up topics appropriately, we might not need much additional configuration at all. > The old “communication” framework scripts can just go away as most > of its functions have direct corollaries in the new “broker” > framework. Yep, agree. > The one thing that is missing is the “Communication::nodes” table Agree that it doesn't look useful from an API perspective. The Broker framework may eventually need an equivalent table internally if we want to offer robustness mechanisms like Justin's hashing. > Broker Framework API > I'm wondering if these store operations should become part of the Cluster framework instead. If we added them to the Broker framework, we'd have two separate store APIs there: one low-level version mapping directly to the C++ Broker API, and one higher-level that configures things like location of the DB files. That could be confusing. > Software::tracked_store = Broker::InitStore(Software::tracked_store_name); I like this. One additional idea: while I see that it's generally the user who wants to configure which backend to use, the script author may know already if it's data that should be persistent across execution; I'm guessing that's usually implied by the script's semantics. We could give InitStore() an additional boolean "persistent" to indicate that. If that's true, it'd use the "default_backend" (or now maybe "default_db_backend"); if false, it'd always use the MEMORY backend. > # User needs to be able to choose data store backends and which cluster node > the > # the master store lives on. They can either do this manually, or BroControl > # will autogenerate the following in cluster-layout.bro: I don't really like the idea of autogenerating this, as it's pretty complex information. Usually, the Broker::default_* values should be fine, right? For the few cases where one wants to tweak that on a per-store bassis, using a manual redef on the table sounds fine to me. Hmm, actually, what would you think about using functions instead of tables? We could model this similar to how the logging framework does filters: there's a default filter installed, but you can retrieve and update it. Here there'd be a default StoreInfo, which one can update. > redef Broker::default_master_node = "manager"; > redef Broker::default_backend = Broker::MEMORY; > redef Broker::default_store_dir = "/home/jon/stores"; Can the default_store_dir be set to some standard location through BroControl? Would be neat if this all just worked in the standard case without any custom configuration at all. > BroControl Example Usage > I'll skip commenting on this and wait for your response to the above first, as I'm wondering if we need this BroControl functionality at all. Robin -- Robin Sommer * ICSI/LBNL * ro...@icir.org * www.icir.org/robin ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
Re: [Bro-Dev] design summary: porting Bro scripts to use Broker
> On Oct 6, 2017, at 12:53 PM, Siwek, Jonwrote: > > I want to check if there’s any feedback on the approach I’m planning to take > when porting over Bro’s scripts to use Broker. There’s two major areas to > consider: (1) how users specify network topology e.g. either for traditional > cluster configuration or manually connecting Bro instances and (2) replacing > with Broker’s distributed data storage features. > ... > Then subscriptions and auto-publications still get automatically set up by > the cluster framework in bro_init(). > > Other Manual/Custom Topologies > -- > > I don’t see anything to do here as the Broker API already has enough to set > up peerings and subscriptions in arbitrary ways. The old “communication” > framework scripts can just go away as most of its functions have direct > corollaries in the new “broker” framework. > > The one thing that is missing is the “Communication::nodes” table which acts > as both a state-tracking structure and an API that users may use to have the > comm. framework automatically set up connections between the nodes in the > table. I find this redundant — there’s two APIs to accomplish the same > thing, with the table being an additional layer of indirection to the actual > connect/listen functions a user can just as easily use themselves. I also > think it’s not useful for state-tracking as a user operating at the level of > this use-case is can easily track nodes themselves or has some other notion > of the state structures they need to track that is more intuitive for the > particular problem they're solving. Unless there’s arguments or I find it’s > actually needed, I don’t plan to port this to Broker. > I had some feedback related to this sort of thing earlier in the year: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/2017-February/012386.html http://mailman.icsi.berkeley.edu/pipermail/bro-dev/2017-March/012411.html I got send_event_hashed to work via a bit of a hack (https://github.com/JustinAzoff/broker_distributed_events/blob/master/distributed_broker.bro), but it needs support from inside broker or at least the bro/broker integration to work properly in the case of node failure. My ultimate vision is a cluster with 2+ physical datanode/manager/logger boxes where one box can fail and the cluster will continue to function perfectly. The only thing this requires is a send_event_hashed function that does consistent ring hashing and is aware of node failure. For things that don't need necessarily need consistent partitioning - like maybe logs if you were using Kafka, a way to designate that a topic should be distributed round-robin between subscribers would be useful too. — Justin Azoff ___ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
[Bro-Dev] design summary: porting Bro scripts to use Broker
I want to check if there’s any feedback on the approach I’m planning to take when porting over Bro’s scripts to use Broker. There’s two major areas to consider: (1) how users specify network topology e.g. either for traditional cluster configuration or manually connecting Bro instances and (2) replacing with Broker’s distributed data storage features. Broker-Based Topology = It’s again useful to decompose topology specification it into two main use-cases: Creating Clusters, e.g w/ BroControl This use-case should look familiar once ported to use Broker: the existing “cluster” framework will be used for specifying the topology of the cluster and for automatically setting up the connections between nodes. The one thing that will differ is the event subscription mechanism, which needs to change since Broker itself handles that differently, but I think the general idea can remain similar. The current mechanism for handling event subscription/publication: const Cluster::manager2worker_events = /Drop::.*/ # similar patterns follow for all node combinations... And a script author that is making their script usable on clusters writes: redef Cluster::manager2worker_events += /^Intel::(cluster_new_item|purge_item)$/; The new mechanism: # contains topic prefixes const Cluster::manager_subscriptions: set[string] # contains (topic string, event name) pairs const Cluster::manager_publications: set[string, string] # similar sets follow for all node types… And a script author writes: # topic naming convention relates to file hierarchy/organization of scripts redef Cluster::manager_subscriptions += { "bro/event/framework/control", "bro/event/framework/intel", }; # not sure how to get around referencing events via strings: can't use 'any' # to stick event values directly into the set, maybe that’s ok since we can # at least detect lookup failures at bro_init time and emit errors/abort. redef Cluster::manager_publications += { ["bro/event/framework/control/configuration_update_request", "Control::configuration_update_request"], ["bro/event/framework/intel/cluster_new_item", "Intel::cluster_new_item"], }; Then subscriptions and auto-publications still get automatically set up by the cluster framework in bro_init(). Other Manual/Custom Topologies -- I don’t see anything to do here as the Broker API already has enough to set up peerings and subscriptions in arbitrary ways. The old “communication” framework scripts can just go away as most of its functions have direct corollaries in the new “broker” framework. The one thing that is missing is the “Communication::nodes” table which acts as both a state-tracking structure and an API that users may use to have the comm. framework automatically set up connections between the nodes in the table. I find this redundant — there’s two APIs to accomplish the same thing, with the table being an additional layer of indirection to the actual connect/listen functions a user can just as easily use themselves. I also think it’s not useful for state-tracking as a user operating at the level of this use-case is can easily track nodes themselves or has some other notion of the state structures they need to track that is more intuitive for the particular problem they're solving. Unless there’s arguments or I find it’s actually needed, I don’t plan to port this to Broker. Broker-Based Data Distribution == Replacing requires completely new APIs that script authors can easily use to work for both cluster and non-cluster use-cases and independently of a user’s choice of persistent storage backend. Broker Framework API const Broker::default_master_node = "manager" const Broker::default_backend = MEMORY # Setting a default dir will, for persistent backends that have not # been given an explicit file path, automatically create a path within this # dir that is based on the name of the data store. const Broker::default_store_dir = "" type Broker::StoreInfo: record { name: string store: opaque of Broker::Store master_node: string =Broker::default_master_node; master: bool =F; backend: Broker::BackendType =default_backend; options: Broker::BackendOptions =Broker::BackendOptions(); }; # Primarily used by users to set up master store location and backend # configuration, but also possible to lookup an existing/open store by name. global Broker::stores: table[string] of StoreInfo =StoreInfo() # Set up data stores to properly function regardless of whether user is # operating a cluster. This also automatically sets up the store to # be a clone or a master as is