On Fri, Nov 5, 2021 at 9:57 AM Jürgen Kuri <[email protected]> wrote:

> El 05.11.21 a las 13:28, Nick Couchman escribió:
> >> On Fri, Nov 5, 2021 at 7:50 AM Jürgen Kuri <[email protected]
> <mailto:[email protected]>> wrote:
> >>
> >>     Hello,
> >>
> >>     it would be nice for admin users to have a possibility in the web
> frontend to quickly identify current connections which consume a lot of
> network bandwidth (kind of ranking of network packet count or so). This is
> useful and more convenient if you have several simultaneous connections and
> several Guacamole instances balanced and concentrated with a BGP network
> router setup. For admins which are not so familiar with tools like netstat,
> iptraf and friends it is extremely helpful.
> >
> >
> > If you'd like to request a feature, Jira is the place to do it:
> > https://issues.apache.org/jira/browse/GUACAMOLE <
> https://issues.apache.org/jira/browse/GUACAMOLE>
> >
> >
> >>     Because of several Guacamole instances concentrated via BGP network
> routers (from outside there is only visible one Guacamole access URL), the
> network bandwidth utilisation values must be somewhere CENTRALLY stored and
> updated in the Guacamole SQL database. These single and concentrated
> Guacamole instances (frontend and backend) share all the same database here
> in our setup. So, "logically" or from application "high level" view it is
> just one instance with one access URL from the internet. This is for
> example, why we see in the web frontend below "Active Sessions" not all
> active sessions, just the ones to that internet frontend where the admin's
> web session is routed to but not the ones from the neighbor internet
> frontends.
> >
> >
> > This would likely need to be thought out a little bit more thoroughly. I
> see a couple of issues with this:
> > * Depending on what type of information and how much you plan to store
> in the database, this could cause a rapid growth in the size of the
> database. It might be possible to add a couple of fields - total packet
> count, and total byte count, or total in packets, total out packets, total
> in bytes, and total out bytes - that could be tracked and updated
> periodically for active and historical connection information.
> Yes, if we want to historicize network metrics from past session, the
> database will grow. Of course, I had that in my mind when I wrote this
> feature request, this what admins want to have and let beat their hearts
> higher. But for the first step, covering the need, the indentification of
> the "hogs", additional database fields with the network metrics which are
> updated, let's say every 30 seconds (configurable update interval?) would
> be sufficient enough for the need here. And, of course, when the session
> for a specific connection ends or latest when a new future session is
> initiated for the same connection, the metrics in the database are reset!
> So, this single metric fields just reflects a momentary situation but this
> is enough to make a ranking for a quick identification of the hogs. And, in
> order to reduce database and network strain , especially if we have
> multiple simultaneous proxy sessions, guacd and the Java application should
> send the network metrics in transaction aggregates for all current
> connections. This is good for the network (less round trips) and the
> database which performs the updates of the aggregated metrics with a view
> I/O accesses. For that purpose it is sufficient not to have very up-to-date
> network metric information.
>
> A leight weight approach for a kind of historiography of network
> consumption could be an extra database table with one row per connection
> and the network metrics. This table acts like a scoreboard. At the end of a
> specific connection session the values are updated in that scoreboard
> table. The web frontend presents that session scoreboard in descending
> order with the network utilisation hogs at the top.
>
>
> > But, if you're wanting to store a bunch of historic information about
> when connections hogged the bandwidth, you're talking about a lot of
> additional data (RRD-style).
> Sorry, not agree fully, at least not from the storage space footprint
> perspective if you allude to this. You don't want to keep this data years.
> I think for trouble shooting two to four weeks is probably more than
> enough. That might be different if you want to use this data e.g. for
> accounting or so.
>
>
Maybe I wasn't clear, here, but 2-4 weeks of connections, if you're keeping
more than just total byte count - if you're keeping traffic information on
a 60 second or 5 minute basis for all connections over that span of time,
it will add up. I'm not saying it's not doable, just that, depending on
what you mean by historical data, this could be a lot of data.


>
> > * Depending on how often you'd want it updated, this could result in
> quite a heavy load just tracking this information. If you had 100 active
> connections, and you wanted the data updated every second, or even every 10
> seconds, this would add quite a bit of load to what is otherwise a
> relatively light-weight and low-utilization database.
> See my comment above.
>
>
> > * As you mentioned, there is currently no synchronization of active
> connections between multiple web front-ends (Tomcat instances), so tracking
> this information in a central place would likely require some far-reaching
> changes to that, as well, so that active connections are synchronized
> across those front-ends.
> Yes, I expected this. But I'm not sure if it is that complex. Remember, I
> described "logically" it is ONE Guacamole instance. We have several
> frontend/backend pairs bundeled and balanced with network BGP routers
> (multiplexing from the internet incoming sessions to the Tomcat frontend
> servers) but ALL frontends share the SAME database. This works like a charm
> at least for the tables guacamole_connection_history and
> guacamole_user_history. The admin sees in "History" in the web frontend all
> past sessions from all frontends. Why it shouldn't work in this way for our
> network metrics?
>
>
It's definitely doable, but even just adding a field to the current table
that stores the total packet and byte count for a connection will give you
historic information, but won't give you information about active
connections, as those do not get written out to the DB until the connection
is closed. This could be changed; however, it is a bit more complex than
just writing the connections to the DB table sooner. There are other
factors to consider - like how such changes would (or should) impact limits
to concurrent connections. For example, right now, if you have 3 Tomcat
instances running Guacamole Client, all pointed at the same database, and
you configure a connection that allows a total of 5 concurrent connections,
and 1 connection per user, it's actually possible that you could have a
total of up to 15 connections to it, and up to 3 per user, as the active
connections are not synchronized across the Tomcat nodes.

Beyond that, when you're talking about synchronizing these active
connections across multiple nodes, you also have to factor in and handle
race conditions - that is, what if you're trying to enforce concurrency
limits across all three nodes, you need to make sure you handle cases where
the connections on two different front-end systems happen so quickly that
the time that the field is written to the DB (if that's how it is handled)
is indistinguishable between the two. Do you allow both connections? Or how
do you choose who wins the race?

Finally, the other piece of this that has to be factored in is Connection
Sharing. If Active Connections are synchronized across multiple front-end
nodes (via a DB, etc.), then you have the potential that someone will try
to share a connection with another user, and that other user will be
forwarded to another front-end node by the BGP, load balancer, etc. When
the user attempts to connect to the shared session, you need to make sure
that the "clustered" Guacamole instances have some way of handling that -
by moving the user to the correct  node, or making the connection available
across all nodes, etc. And, this gets even more complex if you have more
than one guacd back-end through which connections are funneled, because
then you have to also make sure that Guacamole Client can direct that
connection join to the correct guacd instance so that the connection can
actually be joined.

I'm not saying all of these things are applicable in your environment or
use-case - maybe you don't use connection sharing, or enforce concurrency
limits, etc. - but if it's something we're going to add to the overall
Guacamole project, these are all things that need to be factored in, else
it will break for someone who tries to use it and does require those bits
of functionality.


> > I'm not saying this shouldn't be done - I actually think it should be
> done, eventually, just saying that this makes what you're requesting, for
> your environment, quite a bit more complex.
> Yes, agree.
>
> > * What you're requesting would likely only take care of one of the two
> possible legs of bandwidth utilization - you'd be able to see traffic
> between the clients (web browsers) and Tomcat (and ultimately guacd), but
> there's also traffic between guacd and the remote servers that is worth
> consideration, and which this would not be able to capture.
> "but there's also traffic between guacd and the remote servers", that
> puzzles me. What do you mean with "remote servers", the computers outside
> in the internet running the remote desktop in their web browsers? I don't
> understand.
>
>
Guacamole works roughly like this:

Browser <---> [Optional Reverse Proxy] <---> Tomcat <---> guacd <--->
Remote Desktop Servers (RDP, SSH, VNC, etc)

And, what that really breaks down to is:

Browser <-- Guacamole Protocol tunneled via Tomcat --> guacd <-- RDP, SSH,
VNC, Telnet --> Remote Server

My point was that, what you're describing above is only going to handle the
"Browser <---> Tomcat" or possibly "Browser <---> guacd via Tomcat"
traffic, depending on how and where you measure the traffic. This doesn't
measure the "guacd <--> Remote Desktop" traffic at all. Maybe that doesn't
matter to you - maybe you're only concerned about the traffic between the
end clients (browsers) and Guacamole Client instances, my point is just
that this isn't a complete picture of network traffic utilization, and it's
worth at least considering that. It's generally the case that the "guacd"
instance(s) are located close enough to the Remote Desktop Servers that the
bandwidth there isn't really a factor; however, if you're operating in a
cloud environment this could matter a lot, because you could end up paying
for some of that, depending on where the traffic is going (cross-region,
for example). Anyway, I don't think it really impacts your request here too
much, I was just pointing it out.

-Nick

Reply via email to