Re: How to do address resolution?

Denis Magda Wed, 01 Jul 2020 13:51:17 -0700

Thanks, John. That connectivity improvement fixes situations when a server
needs to open a connection to a client but fails. The client will be
opening the connection instead after getting a special message via the
discovery networking layer. It won’t improve the communication between
Visor and clients.


We’ll document the address resolver in the future. Thanks for pointers.

Denis

On Wednesday, July 1, 2020, John Smith <[email protected]> wrote:

> Sorry, mixed the thread, it the one that asks if server nodes connect back
> to thick clients and it was you who mentioned the new feature...
>
> On Wed., Jul. 1, 2020, 4:03 p.m. John Smith, <[email protected]>
> wrote:
>
>> If you look for the "what does all partition owners have left mean?"
>> thread.
>>
>> There is mention to improve the protocol so that other nodes don't need
>> to connect to clients running inside containers... It links to another
>> thread indicating that there may be a PR to add a flag of some sort to mark
>> the client as "virtualized" or something like that...
>>
>> As for the docs... There's only this.... https://ignite.
>> apache.org/releases/latest/javadoc/org/apache/ignite/configuration/
>> BasicAddressResolver.html
>>
>> And nothing is mentioned elsewhere in the official docs.
>>
>> On Wed., Jul. 1, 2020, 2:22 p.m. Denis Magda, <[email protected]> wrote:
>>
>>> But you guys also mentioned in my other thread that you are working on a
>>>> feature that doesn't require connecting to the client when it's running
>>>> inside a container.
>>>
>>>
>>> What is the tread you're referring to? Visor always will be connecting
>>> to the clients regardless of your deployment configuration.
>>>
>>>  Anyways thanks for creating an issue and as well just wondering if any
>>>> docs should be updated for containers because I found the
>>>> BasicAddresResolver java doc by chance.
>>>
>>>
>>> You're always welcome. Could you point out the documentation you used to
>>> configure the AdressResolver? Agree, we need to document or blog about best
>>> practices.
>>>
>>> -
>>> Denis
>>>
>>>
>>> On Wed, Jul 1, 2020 at 10:49 AM John Smith <[email protected]>
>>> wrote:
>>>
>>>> Hi, yes I figured that visor is just another thick client.
>>>>
>>>> By using address resolver on my thick client applications inside
>>>> container everything works fine and visor also connects properly (no need
>>>> to add all client configs everywhere).
>>>>
>>>> As stated it just adds tiny delay when visor needs to connect to the
>>>> other clients. And of course the "issue" when it fully blocks because it
>>>> can't reach the client even though it knows the client is there.
>>>>
>>>> I dunno if I'm the only one who is using mixed environment. But you
>>>> guys also mentioned in my other thread that you are working on a feature
>>>> that doesn't require connecting to the client when it's running inside a
>>>> container.
>>>>
>>>> Anyways thanks for creating an issue and as well just wondering if any
>>>> docs should be updated for containers because I found the
>>>> BasicAddresResolver java doc by chance.
>>>>
>>>> On Wed., Jul. 1, 2020, 12:51 p.m. Denis Magda, <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi John,
>>>>>
>>>>> As Stephen mentioned, Visor connects to the cluster in a way similar
>>>>> to server nodes and thick clients. It's connected as a daemon node that is
>>>>> filtered out from metrics and other public APIs. That's why you don't see
>>>>> Visor being reported in the cluster topology metrics along with servers or
>>>>> thick clients: https://ignite.apache.org/releases/latest/
>>>>> javadoc/org/apache/ignite/configuration/IgniteConfiguration.html#
>>>>> setDaemon-boolean-
>>>>>
>>>>> As a daemon node, Visor uses the same networking protocols to join the
>>>>> cluster and communicate with other cluster members:
>>>>>
>>>>>    - Discovery SPI - As any server node or a thick client, Visor will
>>>>>    join the cluster by connecting to one of the server nodes. It will use 
>>>>> an
>>>>>    IP Finder that you set in your IgniteConfiguration file. Once Visor 
>>>>> joins
>>>>>    the cluster, it will collect information about the cluster topology and
>>>>>    display these basic metrics to you in a terminal window. Visor receives
>>>>>    this information about the cluster topology through the server node 
>>>>> used to
>>>>>    join the cluster. The same server node will update Visor on any 
>>>>> topology
>>>>>    changes.
>>>>>    - Communication SPI - Whenever Visor needs to get metrics from a
>>>>>    specific server or thick client, it will open a direct TCP/IP 
>>>>> connection
>>>>>    with the server/client. In your case, it failed to reach out to some
>>>>>    clients and hung. The hanging is not the right way of handling this 
>>>>> type of
>>>>>    issues and I've opened a ticket to address this:
>>>>>    https://issues.apache.org/jira/browse/IGNITE-13201
>>>>>    <https://issues.apache.org/jira/browse/IGNITE-13201>
>>>>>
>>>>> Considering this implementation specificities, I can recommend you do
>>>>> one of the following:
>>>>>
>>>>>    - List all the thick clients in the AddressResolver configuration.
>>>>>    This is required. Hope my explanation above makes things clear for you.
>>>>>    - Or, run Visor from inside the private network. You would need to
>>>>>    ssh to one of your machines. With this, you don't need to deal with
>>>>>    AddressResolvers.
>>>>>    - Or, use contemporary tools for Ignite cluster monitoring. Ignite
>>>>>    supports JMX and OpenCensus protocols that allow you to consume metrics
>>>>>    from tools like Zabbix or Prometheus. You deploy a tool inside of your
>>>>>    private network so that it can collect metrics from the cluster and 
>>>>> open a
>>>>>    single port number for those who will observe the metrics via a tool's 
>>>>> user
>>>>>    interface. If you need both monitoring and *management* capabilities, 
>>>>> then
>>>>>    have a look at GridGain Control Center.
>>>>>
>>>>> -
>>>>> Denis
>>>>>
>>>>>
>>>>> On Wed, Jul 1, 2020 at 8:39 AM John Smith <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> So this is what I gathered from this experience.
>>>>>>
>>>>>> When running commands on Visor's console, Visor will attempt to
>>>>>> connect to the thick client.
>>>>>>
>>>>>> For example if you type the "node" command and attempt to get
>>>>>> detailed statistics for a specific thick client, Visor will pause on the
>>>>>> data region stats until it can connect.
>>>>>>
>>>>>> Furthermore if you have multiple thick clients and Visor has not
>>>>>> connected to some of them yet and you call a more global command like
>>>>>> "cache", this command will also pause until a connection has been made to
>>>>>> all thick clients.
>>>>>>
>>>>>> 1- Whether this is good behaviour or not is up for debate. Especially
>>>>>> the part when a thick client is listed in the topology/nodes but cannot 
>>>>>> be
>>>>>> reached and visor hangs indefinitely.
>>>>>> 2- Not sure if this behaviour in any way affects the server node if
>>>>>> they ever attempt to open a connection to a thick client and the protocol
>>>>>> somehow freezes just like #1 above.
>>>>>>
>>>>>> On Tue, 30 Jun 2020 at 09:54, John Smith <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Ok so. Is this expected behaviour? From user perspective this seems
>>>>>>> like a bug.
>>>>>>>
>>>>>>> Visor is supposed to be used as a way to monitor...
>>>>>>>
>>>>>>> So if as a user we enter a command and it just freezes indefinently
>>>>>>> it just seems unfriendly.
>>>>>>>
>>>>>>> In another thread the the team mentioned that they are working on
>>>>>>> something that does not require the protocol to communicate back to a 
>>>>>>> thick
>>>>>>> client. So wondering if this is in a way related as well...
>>>>>>>
>>>>>>> On Tue., Jun. 30, 2020, 6:58 a.m. Ilya Kasnacheev, <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> I can see the following in the thread dump:
>>>>>>>> "main" #1 prio=5 os_prio=0 tid=0x00007f02c400d800 nid=0x1e43
>>>>>>>> runnable [0x00007f02cad1e000]
>>>>>>>>    java.lang.Thread.State: RUNNABLE
>>>>>>>> at sun.nio.ch.Net.poll(Native Method)
>>>>>>>> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:951)
>>>>>>>> - locked <0x00000000ec066048> (a java.lang.Object)
>>>>>>>> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121)
>>>>>>>> - locked <0x00000000ec066038> (a java.lang.Object)
>>>>>>>> at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.
>>>>>>>> createTcpClient(TcpCommunicationSpi.java:3299)
>>>>>>>> at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.
>>>>>>>> createNioClient(TcpCommunicationSpi.java:2987)
>>>>>>>> at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.
>>>>>>>> reserveClient(TcpCommunicationSpi.java:2870)
>>>>>>>> at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.
>>>>>>>> sendMessage0(TcpCommunicationSpi.java:2713)
>>>>>>>> at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.
>>>>>>>> sendMessage(TcpCommunicationSpi.java:2672)
>>>>>>>> at org.apache.ignite.internal.managers.communication.
>>>>>>>> GridIoManager.send(GridIoManager.java:1656)
>>>>>>>> at org.apache.ignite.internal.managers.communication.
>>>>>>>> GridIoManager.sendToGridTopic(GridIoManager.java:1731)
>>>>>>>> at org.apache.ignite.internal.processors.task.
>>>>>>>> GridTaskWorker.sendRequest(GridTaskWorker.java:1436)
>>>>>>>> at org.apache.ignite.internal.processors.task.GridTaskWorker.
>>>>>>>> processMappedJobs(GridTaskWorker.java:666)
>>>>>>>> at org.apache.ignite.internal.processors.task.GridTaskWorker.body(
>>>>>>>> GridTaskWorker.java:538)
>>>>>>>> at org.apache.ignite.internal.util.worker.GridWorker.run(
>>>>>>>> GridWorker.java:120)
>>>>>>>> at org.apache.ignite.internal.processors.task.
>>>>>>>> GridTaskProcessor.startTask(GridTaskProcessor.java:764)
>>>>>>>> at org.apache.ignite.internal.processors.task.
>>>>>>>> GridTaskProcessor.execute(GridTaskProcessor.java:392)
>>>>>>>> at org.apache.ignite.internal.IgniteComputeImpl.executeAsync0(
>>>>>>>> IgniteComputeImpl.java:528)
>>>>>>>> at org.apache.ignite.internal.IgniteComputeImpl.execute(
>>>>>>>> IgniteComputeImpl.java:498)
>>>>>>>> at org.apache.ignite.visor.visor$.execute(visor.scala:1800)
>>>>>>>>
>>>>>>>> It seems that Visor is trying to connect to client node via
>>>>>>>> Communication, and it fails because the network connection is filtered 
>>>>>>>> out.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> --
>>>>>>>> Ilya Kasnacheev
>>>>>>>>
>>>>>>>>
>>>>>>>> пн, 29 июн. 2020 г. в 23:47, John Smith <[email protected]>:
>>>>>>>>
>>>>>>>>> Ok.
>>>>>>>>>
>>>>>>>>> I am able to reproduce the "issue" unless we have a
>>>>>>>>> misunderstanding and we are talking about the same thing...
>>>>>>>>>
>>>>>>>>> My thick client runs inside a container in a closed network NOT
>>>>>>>>> bridged and NOT host. I added a flag to my application that allows it 
>>>>>>>>> to
>>>>>>>>> add the address resolver to the config.
>>>>>>>>>
>>>>>>>>> 1- If I disable address resolution and I connect with visor to the
>>>>>>>>> cluster and try to print detailed statistics for that particular 
>>>>>>>>> client,
>>>>>>>>> visor freezes indefinitely at the Data Region Snapshot.
>>>>>>>>> Control C doesn't kill the visor either. It just stuck. This also
>>>>>>>>> happens when running the cache command. Just freezes indefinitely.
>>>>>>>>>
>>>>>>>>> I attached the jstack output to the email but it is also here:
>>>>>>>>> https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0
>>>>>>>>>
>>>>>>>>> 2- If I enable address resolution for the thick client then all
>>>>>>>>> the commands work ok. I also see an "Accepted incoming
>>>>>>>>> communication connection" log in the client.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hello!
>>>>>>>>>>
>>>>>>>>>> The easiest way is jstack <process id of visor>
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> --
>>>>>>>>>> Ilya Kasnacheev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> пн, 29 июн. 2020 г. в 20:20, John Smith <[email protected]>:
>>>>>>>>>>
>>>>>>>>>>> How?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello!
>>>>>>>>>>>>
>>>>>>>>>>>> Try collecting thread dump from Visor as it freezes.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> --
>>>>>>>>>>>> Ilya Kasnacheev
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> пн, 29 июн. 2020 г. в 18:11, John Smith <[email protected]
>>>>>>>>>>>> >:
>>>>>>>>>>>>
>>>>>>>>>>>>> How though?
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1- Entered node command
>>>>>>>>>>>>> 2- Got list of nodes, including thick clients
>>>>>>>>>>>>> 3- Selected thick client
>>>>>>>>>>>>> 4- Entered Y for detailed statistics
>>>>>>>>>>>>> 5- Snapshot details displayed
>>>>>>>>>>>>> 6- Data region stats frozen
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think the address resolution is working for this as well. I
>>>>>>>>>>>>> need to confirm. Because I fixed the resolver as per your 
>>>>>>>>>>>>> solution and
>>>>>>>>>>>>> visor no longer freezes on #6 above.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This usually means there's no connectivity between node and
>>>>>>>>>>>>>> Visor.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Ilya Kasnacheev
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> пн, 29 июн. 2020 г. в 17:01, John Smith <
>>>>>>>>>>>>>> [email protected]>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also I think for Visor as well?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When I do top or node commands, I can see the thick client.
>>>>>>>>>>>>>>> But when I look at detailed statistics for that particular 
>>>>>>>>>>>>>>> thick client it
>>>>>>>>>>>>>>> freezes "indefinitely". Regular statistics it seems ok.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For thick clients, you need both 47100 and 47500, both
>>>>>>>>>>>>>>>> directions (perhaps for 47500 only client -> server is 
>>>>>>>>>>>>>>>> sufficient, but for
>>>>>>>>>>>>>>>> 47100, both are needed).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For thin clients, 10800 is enough. For control.sh, 11211.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> Ilya Kasnacheev
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> пт, 26 июн. 2020 г. в 22:06, John Smith <
>>>>>>>>>>>>>>>> [email protected]>:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm askin in separate question so people can search for it
>>>>>>>>>>>>>>>>> if they ever come across this...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My server nodes are started as and I also connect the
>>>>>>>>>>>>>>>>> client as such.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>                   <bean class="org.apache.ignite.spi.
>>>>>>>>>>>>>>>>> discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
>>>>>>>>>>>>>>>>>                       <property name="addresses">
>>>>>>>>>>>>>>>>>                           <list>
>>>>>>>>>>>>>>>>>                             <value>foo:47500</value>
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>                           </list>
>>>>>>>>>>>>>>>>>                       </property>
>>>>>>>>>>>>>>>>>                   </bean>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In my client code I used the basic address resolver
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> And I put in the map
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> "{internalHostIP}:47500", "{externalHostIp}:{
>>>>>>>>>>>>>>>>> externalPort}"
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> igniteConfig.setAddressResolver(addrResolver);
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> QUESTIONS
>>>>>>>>>>>>>>>>> ___________________
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1- Port 47500 is used for discovery only?
>>>>>>>>>>>>>>>>> 2- Port 47100 is used for actual coms to the nodes?
>>>>>>>>>>>>>>>>> 3- In my container environment I have only mapped 47100,
>>>>>>>>>>>>>>>>> do I also need to map for 47500 for the Tcp Discovery SPI?
>>>>>>>>>>>>>>>>> 4- When I connect with Visor and I try to look at details
>>>>>>>>>>>>>>>>> for the client node it blocks. I'm assuming that's because 
>>>>>>>>>>>>>>>>> visor cannot
>>>>>>>>>>>>>>>>> connect back to the client at 47100?
>>>>>>>>>>>>>>>>> Se logs below
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> LOGS
>>>>>>>>>>>>>>>>> ___________________
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> When I look at the client logs I get...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> IgniteConfiguration [
>>>>>>>>>>>>>>>>> igniteInstanceName=xxxxxx,
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> discoSpi=TcpDiscoverySpi [
>>>>>>>>>>>>>>>>>   addrRslvr=null, <--- Do I need to use BasicResolver or
>>>>>>>>>>>>>>>>> here???
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>   commSpi=TcpCommunicationSpi [
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>     locAddr=null,
>>>>>>>>>>>>>>>>>     locHost=null,
>>>>>>>>>>>>>>>>>     locPort=47100,
>>>>>>>>>>>>>>>>>     addrRslvr=null, <--- Do I need to use BasicResolver or
>>>>>>>>>>>>>>>>> here???
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>     ],
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>     addrRslvr=BasicAddressResolver [
>>>>>>>>>>>>>>>>>       inetAddrMap={},
>>>>>>>>>>>>>>>>>       inetSockAddrMap={/internalIp:47100=/externalIp:2389}
>>>>>>>>>>>>>>>>> <----
>>>>>>>>>>>>>>>>>     ],
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>     clientMode=true,
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>

-- 
-
Denis

Re: How to do address resolution?

Reply via email to