On Fri, Nov 5, 2021 at 9:57 AM Jürgen Kuri <[email protected]> wrote:
> El 05.11.21 a las 13:28, Nick Couchman escribió: > >> On Fri, Nov 5, 2021 at 7:50 AM Jürgen Kuri <[email protected] > <mailto:[email protected]>> wrote: > >> > >> Hello, > >> > >> it would be nice for admin users to have a possibility in the web > frontend to quickly identify current connections which consume a lot of > network bandwidth (kind of ranking of network packet count or so). This is > useful and more convenient if you have several simultaneous connections and > several Guacamole instances balanced and concentrated with a BGP network > router setup. For admins which are not so familiar with tools like netstat, > iptraf and friends it is extremely helpful. > > > > > > If you'd like to request a feature, Jira is the place to do it: > > https://issues.apache.org/jira/browse/GUACAMOLE < > https://issues.apache.org/jira/browse/GUACAMOLE> > > > > > >> Because of several Guacamole instances concentrated via BGP network > routers (from outside there is only visible one Guacamole access URL), the > network bandwidth utilisation values must be somewhere CENTRALLY stored and > updated in the Guacamole SQL database. These single and concentrated > Guacamole instances (frontend and backend) share all the same database here > in our setup. So, "logically" or from application "high level" view it is > just one instance with one access URL from the internet. This is for > example, why we see in the web frontend below "Active Sessions" not all > active sessions, just the ones to that internet frontend where the admin's > web session is routed to but not the ones from the neighbor internet > frontends. > > > > > > This would likely need to be thought out a little bit more thoroughly. I > see a couple of issues with this: > > * Depending on what type of information and how much you plan to store > in the database, this could cause a rapid growth in the size of the > database. It might be possible to add a couple of fields - total packet > count, and total byte count, or total in packets, total out packets, total > in bytes, and total out bytes - that could be tracked and updated > periodically for active and historical connection information. > Yes, if we want to historicize network metrics from past session, the > database will grow. Of course, I had that in my mind when I wrote this > feature request, this what admins want to have and let beat their hearts > higher. But for the first step, covering the need, the indentification of > the "hogs", additional database fields with the network metrics which are > updated, let's say every 30 seconds (configurable update interval?) would > be sufficient enough for the need here. And, of course, when the session > for a specific connection ends or latest when a new future session is > initiated for the same connection, the metrics in the database are reset! > So, this single metric fields just reflects a momentary situation but this > is enough to make a ranking for a quick identification of the hogs. And, in > order to reduce database and network strain , especially if we have > multiple simultaneous proxy sessions, guacd and the Java application should > send the network metrics in transaction aggregates for all current > connections. This is good for the network (less round trips) and the > database which performs the updates of the aggregated metrics with a view > I/O accesses. For that purpose it is sufficient not to have very up-to-date > network metric information. > > A leight weight approach for a kind of historiography of network > consumption could be an extra database table with one row per connection > and the network metrics. This table acts like a scoreboard. At the end of a > specific connection session the values are updated in that scoreboard > table. The web frontend presents that session scoreboard in descending > order with the network utilisation hogs at the top. > > > > But, if you're wanting to store a bunch of historic information about > when connections hogged the bandwidth, you're talking about a lot of > additional data (RRD-style). > Sorry, not agree fully, at least not from the storage space footprint > perspective if you allude to this. You don't want to keep this data years. > I think for trouble shooting two to four weeks is probably more than > enough. That might be different if you want to use this data e.g. for > accounting or so. > > Maybe I wasn't clear, here, but 2-4 weeks of connections, if you're keeping more than just total byte count - if you're keeping traffic information on a 60 second or 5 minute basis for all connections over that span of time, it will add up. I'm not saying it's not doable, just that, depending on what you mean by historical data, this could be a lot of data. > > > * Depending on how often you'd want it updated, this could result in > quite a heavy load just tracking this information. If you had 100 active > connections, and you wanted the data updated every second, or even every 10 > seconds, this would add quite a bit of load to what is otherwise a > relatively light-weight and low-utilization database. > See my comment above. > > > > * As you mentioned, there is currently no synchronization of active > connections between multiple web front-ends (Tomcat instances), so tracking > this information in a central place would likely require some far-reaching > changes to that, as well, so that active connections are synchronized > across those front-ends. > Yes, I expected this. But I'm not sure if it is that complex. Remember, I > described "logically" it is ONE Guacamole instance. We have several > frontend/backend pairs bundeled and balanced with network BGP routers > (multiplexing from the internet incoming sessions to the Tomcat frontend > servers) but ALL frontends share the SAME database. This works like a charm > at least for the tables guacamole_connection_history and > guacamole_user_history. The admin sees in "History" in the web frontend all > past sessions from all frontends. Why it shouldn't work in this way for our > network metrics? > > It's definitely doable, but even just adding a field to the current table that stores the total packet and byte count for a connection will give you historic information, but won't give you information about active connections, as those do not get written out to the DB until the connection is closed. This could be changed; however, it is a bit more complex than just writing the connections to the DB table sooner. There are other factors to consider - like how such changes would (or should) impact limits to concurrent connections. For example, right now, if you have 3 Tomcat instances running Guacamole Client, all pointed at the same database, and you configure a connection that allows a total of 5 concurrent connections, and 1 connection per user, it's actually possible that you could have a total of up to 15 connections to it, and up to 3 per user, as the active connections are not synchronized across the Tomcat nodes. Beyond that, when you're talking about synchronizing these active connections across multiple nodes, you also have to factor in and handle race conditions - that is, what if you're trying to enforce concurrency limits across all three nodes, you need to make sure you handle cases where the connections on two different front-end systems happen so quickly that the time that the field is written to the DB (if that's how it is handled) is indistinguishable between the two. Do you allow both connections? Or how do you choose who wins the race? Finally, the other piece of this that has to be factored in is Connection Sharing. If Active Connections are synchronized across multiple front-end nodes (via a DB, etc.), then you have the potential that someone will try to share a connection with another user, and that other user will be forwarded to another front-end node by the BGP, load balancer, etc. When the user attempts to connect to the shared session, you need to make sure that the "clustered" Guacamole instances have some way of handling that - by moving the user to the correct node, or making the connection available across all nodes, etc. And, this gets even more complex if you have more than one guacd back-end through which connections are funneled, because then you have to also make sure that Guacamole Client can direct that connection join to the correct guacd instance so that the connection can actually be joined. I'm not saying all of these things are applicable in your environment or use-case - maybe you don't use connection sharing, or enforce concurrency limits, etc. - but if it's something we're going to add to the overall Guacamole project, these are all things that need to be factored in, else it will break for someone who tries to use it and does require those bits of functionality. > > I'm not saying this shouldn't be done - I actually think it should be > done, eventually, just saying that this makes what you're requesting, for > your environment, quite a bit more complex. > Yes, agree. > > > * What you're requesting would likely only take care of one of the two > possible legs of bandwidth utilization - you'd be able to see traffic > between the clients (web browsers) and Tomcat (and ultimately guacd), but > there's also traffic between guacd and the remote servers that is worth > consideration, and which this would not be able to capture. > "but there's also traffic between guacd and the remote servers", that > puzzles me. What do you mean with "remote servers", the computers outside > in the internet running the remote desktop in their web browsers? I don't > understand. > > Guacamole works roughly like this: Browser <---> [Optional Reverse Proxy] <---> Tomcat <---> guacd <---> Remote Desktop Servers (RDP, SSH, VNC, etc) And, what that really breaks down to is: Browser <-- Guacamole Protocol tunneled via Tomcat --> guacd <-- RDP, SSH, VNC, Telnet --> Remote Server My point was that, what you're describing above is only going to handle the "Browser <---> Tomcat" or possibly "Browser <---> guacd via Tomcat" traffic, depending on how and where you measure the traffic. This doesn't measure the "guacd <--> Remote Desktop" traffic at all. Maybe that doesn't matter to you - maybe you're only concerned about the traffic between the end clients (browsers) and Guacamole Client instances, my point is just that this isn't a complete picture of network traffic utilization, and it's worth at least considering that. It's generally the case that the "guacd" instance(s) are located close enough to the Remote Desktop Servers that the bandwidth there isn't really a factor; however, if you're operating in a cloud environment this could matter a lot, because you could end up paying for some of that, depending on where the traffic is going (cross-region, for example). Anyway, I don't think it really impacts your request here too much, I was just pointing it out. -Nick
