Hi John,

As Stephen mentioned, Visor connects to the cluster in a way similar to
server nodes and thick clients. It's connected as a daemon node that is
filtered out from metrics and other public APIs. That's why you don't see
Visor being reported in the cluster topology metrics along with servers or
thick clients:
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/IgniteConfiguration.html#setDaemon-boolean-

As a daemon node, Visor uses the same networking protocols to join the
cluster and communicate with other cluster members:

   - Discovery SPI - As any server node or a thick client, Visor will join
   the cluster by connecting to one of the server nodes. It will use an IP
   Finder that you set in your IgniteConfiguration file. Once Visor joins the
   cluster, it will collect information about the cluster topology and display
   these basic metrics to you in a terminal window. Visor receives this
   information about the cluster topology through the server node used to join
   the cluster. The same server node will update Visor on any topology changes.
   - Communication SPI - Whenever Visor needs to get metrics from a
   specific server or thick client, it will open a direct TCP/IP connection
   with the server/client. In your case, it failed to reach out to some
   clients and hung. The hanging is not the right way of handling this type of
   issues and I've opened a ticket to address this:
   https://issues.apache.org/jira/browse/IGNITE-13201

Considering this implementation specificities, I can recommend you do one
of the following:

   - List all the thick clients in the AddressResolver configuration. This
   is required. Hope my explanation above makes things clear for you.
   - Or, run Visor from inside the private network. You would need to ssh
   to one of your machines. With this, you don't need to deal with
   AddressResolvers.
   - Or, use contemporary tools for Ignite cluster monitoring. Ignite
   supports JMX and OpenCensus protocols that allow you to consume metrics
   from tools like Zabbix or Prometheus. You deploy a tool inside of your
   private network so that it can collect metrics from the cluster and open a
   single port number for those who will observe the metrics via a tool's user
   interface. If you need both monitoring and *management* capabilities, then
   have a look at GridGain Control Center.

-
Denis


On Wed, Jul 1, 2020 at 8:39 AM John Smith <[email protected]> wrote:

> So this is what I gathered from this experience.
>
> When running commands on Visor's console, Visor will attempt to connect to
> the thick client.
>
> For example if you type the "node" command and attempt to get detailed
> statistics for a specific thick client, Visor will pause on the data region
> stats until it can connect.
>
> Furthermore if you have multiple thick clients and Visor has not
> connected to some of them yet and you call a more global command like
> "cache", this command will also pause until a connection has been made to
> all thick clients.
>
> 1- Whether this is good behaviour or not is up for debate. Especially the
> part when a thick client is listed in the topology/nodes but cannot be
> reached and visor hangs indefinitely.
> 2- Not sure if this behaviour in any way affects the server node if they
> ever attempt to open a connection to a thick client and the protocol
> somehow freezes just like #1 above.
>
> On Tue, 30 Jun 2020 at 09:54, John Smith <[email protected]> wrote:
>
>> Ok so. Is this expected behaviour? From user perspective this seems like
>> a bug.
>>
>> Visor is supposed to be used as a way to monitor...
>>
>> So if as a user we enter a command and it just freezes indefinently it
>> just seems unfriendly.
>>
>> In another thread the the team mentioned that they are working on
>> something that does not require the protocol to communicate back to a thick
>> client. So wondering if this is in a way related as well...
>>
>> On Tue., Jun. 30, 2020, 6:58 a.m. Ilya Kasnacheev, <
>> [email protected]> wrote:
>>
>>> Hello!
>>>
>>> I can see the following in the thread dump:
>>> "main" #1 prio=5 os_prio=0 tid=0x00007f02c400d800 nid=0x1e43 runnable
>>> [0x00007f02cad1e000]
>>>    java.lang.Thread.State: RUNNABLE
>>> at sun.nio.ch.Net.poll(Native Method)
>>> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:951)
>>> - locked <0x00000000ec066048> (a java.lang.Object)
>>> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121)
>>> - locked <0x00000000ec066038> (a java.lang.Object)
>>> at
>>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299)
>>> at
>>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987)
>>> at
>>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870)
>>> at
>>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2713)
>>> at
>>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672)
>>> at
>>> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1656)
>>> at
>>> org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1731)
>>> at
>>> org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1436)
>>> at
>>> org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:666)
>>> at
>>> org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:538)
>>> at
>>> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
>>> at
>>> org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:764)
>>> at
>>> org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:392)
>>> at
>>> org.apache.ignite.internal.IgniteComputeImpl.executeAsync0(IgniteComputeImpl.java:528)
>>> at
>>> org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.java:498)
>>> at org.apache.ignite.visor.visor$.execute(visor.scala:1800)
>>>
>>> It seems that Visor is trying to connect to client node via
>>> Communication, and it fails because the network connection is filtered out.
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пн, 29 июн. 2020 г. в 23:47, John Smith <[email protected]>:
>>>
>>>> Ok.
>>>>
>>>> I am able to reproduce the "issue" unless we have a misunderstanding
>>>> and we are talking about the same thing...
>>>>
>>>> My thick client runs inside a container in a closed network NOT bridged
>>>> and NOT host. I added a flag to my application that allows it to add the
>>>> address resolver to the config.
>>>>
>>>> 1- If I disable address resolution and I connect with visor to the
>>>> cluster and try to print detailed statistics for that particular client,
>>>> visor freezes indefinitely at the Data Region Snapshot.
>>>> Control C doesn't kill the visor either. It just stuck. This also
>>>> happens when running the cache command. Just freezes indefinitely.
>>>>
>>>> I attached the jstack output to the email but it is also here:
>>>> https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0
>>>>
>>>> 2- If I enable address resolution for the thick client then all the
>>>> commands work ok. I also see an "Accepted incoming communication
>>>> connection" log in the client.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev <
>>>> [email protected]> wrote:
>>>>
>>>>> Hello!
>>>>>
>>>>> The easiest way is jstack <process id of visor>
>>>>>
>>>>> Regards,
>>>>> --
>>>>> Ilya Kasnacheev
>>>>>
>>>>>
>>>>> пн, 29 июн. 2020 г. в 20:20, John Smith <[email protected]>:
>>>>>
>>>>>> How?
>>>>>>
>>>>>> On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hello!
>>>>>>>
>>>>>>> Try collecting thread dump from Visor as it freezes.
>>>>>>>
>>>>>>> Regards,
>>>>>>> --
>>>>>>> Ilya Kasnacheev
>>>>>>>
>>>>>>>
>>>>>>> пн, 29 июн. 2020 г. в 18:11, John Smith <[email protected]>:
>>>>>>>
>>>>>>>> How though?
>>>>>>>>
>>>>>>>> 1- Entered node command
>>>>>>>> 2- Got list of nodes, including thick clients
>>>>>>>> 3- Selected thick client
>>>>>>>> 4- Entered Y for detailed statistics
>>>>>>>> 5- Snapshot details displayed
>>>>>>>> 6- Data region stats frozen
>>>>>>>>
>>>>>>>> I think the address resolution is working for this as well. I need
>>>>>>>> to confirm. Because I fixed the resolver as per your solution and 
>>>>>>>> visor no
>>>>>>>> longer freezes on #6 above.
>>>>>>>>
>>>>>>>> On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hello!
>>>>>>>>>
>>>>>>>>> This usually means there's no connectivity between node and Visor.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> --
>>>>>>>>> Ilya Kasnacheev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> пн, 29 июн. 2020 г. в 17:01, John Smith <[email protected]>:
>>>>>>>>>
>>>>>>>>>> Also I think for Visor as well?
>>>>>>>>>>
>>>>>>>>>> When I do top or node commands, I can see the thick client. But
>>>>>>>>>> when I look at detailed statistics for that particular thick client 
>>>>>>>>>> it
>>>>>>>>>> freezes "indefinitely". Regular statistics it seems ok.
>>>>>>>>>>
>>>>>>>>>> On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello!
>>>>>>>>>>>
>>>>>>>>>>> For thick clients, you need both 47100 and 47500, both
>>>>>>>>>>> directions (perhaps for 47500 only client -> server is sufficient, 
>>>>>>>>>>> but for
>>>>>>>>>>> 47100, both are needed).
>>>>>>>>>>>
>>>>>>>>>>> For thin clients, 10800 is enough. For control.sh, 11211.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> --
>>>>>>>>>>> Ilya Kasnacheev
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> пт, 26 июн. 2020 г. в 22:06, John Smith <[email protected]
>>>>>>>>>>> >:
>>>>>>>>>>>
>>>>>>>>>>>> I'm askin in separate question so people can search for it if
>>>>>>>>>>>> they ever come across this...
>>>>>>>>>>>>
>>>>>>>>>>>> My server nodes are started as and I also connect the client as
>>>>>>>>>>>> such.
>>>>>>>>>>>>
>>>>>>>>>>>>                   <bean
>>>>>>>>>>>> class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
>>>>>>>>>>>>                       <property name="addresses">
>>>>>>>>>>>>                           <list>
>>>>>>>>>>>>                             <value>foo:47500</value>
>>>>>>>>>>>> ...
>>>>>>>>>>>>                           </list>
>>>>>>>>>>>>                       </property>
>>>>>>>>>>>>                   </bean>
>>>>>>>>>>>>
>>>>>>>>>>>> In my client code I used the basic address resolver
>>>>>>>>>>>>
>>>>>>>>>>>> And I put in the map
>>>>>>>>>>>>
>>>>>>>>>>>> "{internalHostIP}:47500", "{externalHostIp}:{externalPort}"
>>>>>>>>>>>>
>>>>>>>>>>>> igniteConfig.setAddressResolver(addrResolver);
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> QUESTIONS
>>>>>>>>>>>> ___________________
>>>>>>>>>>>>
>>>>>>>>>>>> 1- Port 47500 is used for discovery only?
>>>>>>>>>>>> 2- Port 47100 is used for actual coms to the nodes?
>>>>>>>>>>>> 3- In my container environment I have only mapped 47100, do I
>>>>>>>>>>>> also need to map for 47500 for the Tcp Discovery SPI?
>>>>>>>>>>>> 4- When I connect with Visor and I try to look at details for
>>>>>>>>>>>> the client node it blocks. I'm assuming that's because visor 
>>>>>>>>>>>> cannot connect
>>>>>>>>>>>> back to the client at 47100?
>>>>>>>>>>>> Se logs below
>>>>>>>>>>>>
>>>>>>>>>>>> LOGS
>>>>>>>>>>>> ___________________
>>>>>>>>>>>>
>>>>>>>>>>>> When I look at the client logs I get...
>>>>>>>>>>>>
>>>>>>>>>>>> IgniteConfiguration [
>>>>>>>>>>>> igniteInstanceName=xxxxxx,
>>>>>>>>>>>> ...
>>>>>>>>>>>> discoSpi=TcpDiscoverySpi [
>>>>>>>>>>>>   addrRslvr=null, <--- Do I need to use BasicResolver or here???
>>>>>>>>>>>> ...
>>>>>>>>>>>>   commSpi=TcpCommunicationSpi [
>>>>>>>>>>>> ...
>>>>>>>>>>>>     locAddr=null,
>>>>>>>>>>>>     locHost=null,
>>>>>>>>>>>>     locPort=47100,
>>>>>>>>>>>>     addrRslvr=null, <--- Do I need to use BasicResolver or
>>>>>>>>>>>> here???
>>>>>>>>>>>> ...
>>>>>>>>>>>>     ],
>>>>>>>>>>>> ...
>>>>>>>>>>>>     addrRslvr=BasicAddressResolver [
>>>>>>>>>>>>       inetAddrMap={},
>>>>>>>>>>>>       inetSockAddrMap={/internalIp:47100=/externalIp:2389}
>>>>>>>>>>>> <----
>>>>>>>>>>>>     ],
>>>>>>>>>>>> ...
>>>>>>>>>>>>     clientMode=true,
>>>>>>>>>>>> ...
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>

Reply via email to