Sorry, mixed the thread, it the one that asks if server nodes connect back to thick clients and it was you who mentioned the new feature...
On Wed., Jul. 1, 2020, 4:03 p.m. John Smith, <[email protected]> wrote: > If you look for the "what does all partition owners have left mean?" > thread. > > There is mention to improve the protocol so that other nodes don't need to > connect to clients running inside containers... It links to another thread > indicating that there may be a PR to add a flag of some sort to mark the > client as "virtualized" or something like that... > > As for the docs... There's only this.... > https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/BasicAddressResolver.html > > And nothing is mentioned elsewhere in the official docs. > > On Wed., Jul. 1, 2020, 2:22 p.m. Denis Magda, <[email protected]> wrote: > >> But you guys also mentioned in my other thread that you are working on a >>> feature that doesn't require connecting to the client when it's running >>> inside a container. >> >> >> What is the tread you're referring to? Visor always will be connecting to >> the clients regardless of your deployment configuration. >> >> Anyways thanks for creating an issue and as well just wondering if any >>> docs should be updated for containers because I found the >>> BasicAddresResolver java doc by chance. >> >> >> You're always welcome. Could you point out the documentation you used to >> configure the AdressResolver? Agree, we need to document or blog about best >> practices. >> >> - >> Denis >> >> >> On Wed, Jul 1, 2020 at 10:49 AM John Smith <[email protected]> >> wrote: >> >>> Hi, yes I figured that visor is just another thick client. >>> >>> By using address resolver on my thick client applications inside >>> container everything works fine and visor also connects properly (no need >>> to add all client configs everywhere). >>> >>> As stated it just adds tiny delay when visor needs to connect to the >>> other clients. And of course the "issue" when it fully blocks because it >>> can't reach the client even though it knows the client is there. >>> >>> I dunno if I'm the only one who is using mixed environment. But you guys >>> also mentioned in my other thread that you are working on a feature that >>> doesn't require connecting to the client when it's running inside a >>> container. >>> >>> Anyways thanks for creating an issue and as well just wondering if any >>> docs should be updated for containers because I found the >>> BasicAddresResolver java doc by chance. >>> >>> On Wed., Jul. 1, 2020, 12:51 p.m. Denis Magda, <[email protected]> >>> wrote: >>> >>>> Hi John, >>>> >>>> As Stephen mentioned, Visor connects to the cluster in a way similar to >>>> server nodes and thick clients. It's connected as a daemon node that is >>>> filtered out from metrics and other public APIs. That's why you don't see >>>> Visor being reported in the cluster topology metrics along with servers or >>>> thick clients: >>>> https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/configuration/IgniteConfiguration.html#setDaemon-boolean- >>>> >>>> As a daemon node, Visor uses the same networking protocols to join the >>>> cluster and communicate with other cluster members: >>>> >>>> - Discovery SPI - As any server node or a thick client, Visor will >>>> join the cluster by connecting to one of the server nodes. It will use >>>> an >>>> IP Finder that you set in your IgniteConfiguration file. Once Visor >>>> joins >>>> the cluster, it will collect information about the cluster topology and >>>> display these basic metrics to you in a terminal window. Visor receives >>>> this information about the cluster topology through the server node >>>> used to >>>> join the cluster. The same server node will update Visor on any topology >>>> changes. >>>> - Communication SPI - Whenever Visor needs to get metrics from a >>>> specific server or thick client, it will open a direct TCP/IP connection >>>> with the server/client. In your case, it failed to reach out to some >>>> clients and hung. The hanging is not the right way of handling this >>>> type of >>>> issues and I've opened a ticket to address this: >>>> https://issues.apache.org/jira/browse/IGNITE-13201 >>>> >>>> Considering this implementation specificities, I can recommend you do >>>> one of the following: >>>> >>>> - List all the thick clients in the AddressResolver configuration. >>>> This is required. Hope my explanation above makes things clear for you. >>>> - Or, run Visor from inside the private network. You would need to >>>> ssh to one of your machines. With this, you don't need to deal with >>>> AddressResolvers. >>>> - Or, use contemporary tools for Ignite cluster monitoring. Ignite >>>> supports JMX and OpenCensus protocols that allow you to consume metrics >>>> from tools like Zabbix or Prometheus. You deploy a tool inside of your >>>> private network so that it can collect metrics from the cluster and >>>> open a >>>> single port number for those who will observe the metrics via a tool's >>>> user >>>> interface. If you need both monitoring and *management* capabilities, >>>> then >>>> have a look at GridGain Control Center. >>>> >>>> - >>>> Denis >>>> >>>> >>>> On Wed, Jul 1, 2020 at 8:39 AM John Smith <[email protected]> >>>> wrote: >>>> >>>>> So this is what I gathered from this experience. >>>>> >>>>> When running commands on Visor's console, Visor will attempt to >>>>> connect to the thick client. >>>>> >>>>> For example if you type the "node" command and attempt to get detailed >>>>> statistics for a specific thick client, Visor will pause on the data >>>>> region >>>>> stats until it can connect. >>>>> >>>>> Furthermore if you have multiple thick clients and Visor has not >>>>> connected to some of them yet and you call a more global command like >>>>> "cache", this command will also pause until a connection has been made to >>>>> all thick clients. >>>>> >>>>> 1- Whether this is good behaviour or not is up for debate. Especially >>>>> the part when a thick client is listed in the topology/nodes but cannot be >>>>> reached and visor hangs indefinitely. >>>>> 2- Not sure if this behaviour in any way affects the server node if >>>>> they ever attempt to open a connection to a thick client and the protocol >>>>> somehow freezes just like #1 above. >>>>> >>>>> On Tue, 30 Jun 2020 at 09:54, John Smith <[email protected]> >>>>> wrote: >>>>> >>>>>> Ok so. Is this expected behaviour? From user perspective this seems >>>>>> like a bug. >>>>>> >>>>>> Visor is supposed to be used as a way to monitor... >>>>>> >>>>>> So if as a user we enter a command and it just freezes indefinently >>>>>> it just seems unfriendly. >>>>>> >>>>>> In another thread the the team mentioned that they are working on >>>>>> something that does not require the protocol to communicate back to a >>>>>> thick >>>>>> client. So wondering if this is in a way related as well... >>>>>> >>>>>> On Tue., Jun. 30, 2020, 6:58 a.m. Ilya Kasnacheev, < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hello! >>>>>>> >>>>>>> I can see the following in the thread dump: >>>>>>> "main" #1 prio=5 os_prio=0 tid=0x00007f02c400d800 nid=0x1e43 >>>>>>> runnable [0x00007f02cad1e000] >>>>>>> java.lang.Thread.State: RUNNABLE >>>>>>> at sun.nio.ch.Net.poll(Native Method) >>>>>>> at sun.nio.ch.SocketChannelImpl.poll(SocketChannelImpl.java:951) >>>>>>> - locked <0x00000000ec066048> (a java.lang.Object) >>>>>>> at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:121) >>>>>>> - locked <0x00000000ec066038> (a java.lang.Object) >>>>>>> at >>>>>>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3299) >>>>>>> at >>>>>>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioClient(TcpCommunicationSpi.java:2987) >>>>>>> at >>>>>>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2870) >>>>>>> at >>>>>>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2713) >>>>>>> at >>>>>>> org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2672) >>>>>>> at >>>>>>> org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1656) >>>>>>> at >>>>>>> org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1731) >>>>>>> at >>>>>>> org.apache.ignite.internal.processors.task.GridTaskWorker.sendRequest(GridTaskWorker.java:1436) >>>>>>> at >>>>>>> org.apache.ignite.internal.processors.task.GridTaskWorker.processMappedJobs(GridTaskWorker.java:666) >>>>>>> at >>>>>>> org.apache.ignite.internal.processors.task.GridTaskWorker.body(GridTaskWorker.java:538) >>>>>>> at >>>>>>> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) >>>>>>> at >>>>>>> org.apache.ignite.internal.processors.task.GridTaskProcessor.startTask(GridTaskProcessor.java:764) >>>>>>> at >>>>>>> org.apache.ignite.internal.processors.task.GridTaskProcessor.execute(GridTaskProcessor.java:392) >>>>>>> at >>>>>>> org.apache.ignite.internal.IgniteComputeImpl.executeAsync0(IgniteComputeImpl.java:528) >>>>>>> at >>>>>>> org.apache.ignite.internal.IgniteComputeImpl.execute(IgniteComputeImpl.java:498) >>>>>>> at org.apache.ignite.visor.visor$.execute(visor.scala:1800) >>>>>>> >>>>>>> It seems that Visor is trying to connect to client node via >>>>>>> Communication, and it fails because the network connection is filtered >>>>>>> out. >>>>>>> >>>>>>> Regards, >>>>>>> -- >>>>>>> Ilya Kasnacheev >>>>>>> >>>>>>> >>>>>>> пн, 29 июн. 2020 г. в 23:47, John Smith <[email protected]>: >>>>>>> >>>>>>>> Ok. >>>>>>>> >>>>>>>> I am able to reproduce the "issue" unless we have a >>>>>>>> misunderstanding and we are talking about the same thing... >>>>>>>> >>>>>>>> My thick client runs inside a container in a closed network NOT >>>>>>>> bridged and NOT host. I added a flag to my application that allows it >>>>>>>> to >>>>>>>> add the address resolver to the config. >>>>>>>> >>>>>>>> 1- If I disable address resolution and I connect with visor to the >>>>>>>> cluster and try to print detailed statistics for that particular >>>>>>>> client, >>>>>>>> visor freezes indefinitely at the Data Region Snapshot. >>>>>>>> Control C doesn't kill the visor either. It just stuck. This also >>>>>>>> happens when running the cache command. Just freezes indefinitely. >>>>>>>> >>>>>>>> I attached the jstack output to the email but it is also here: >>>>>>>> https://www.dropbox.com/s/wujcee1gd87gk6o/jstack.out?dl=0 >>>>>>>> >>>>>>>> 2- If I enable address resolution for the thick client then all the >>>>>>>> commands work ok. I also see an "Accepted incoming communication >>>>>>>> connection" log in the client. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, 29 Jun 2020 at 15:30, Ilya Kasnacheev < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Hello! >>>>>>>>> >>>>>>>>> The easiest way is jstack <process id of visor> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> -- >>>>>>>>> Ilya Kasnacheev >>>>>>>>> >>>>>>>>> >>>>>>>>> пн, 29 июн. 2020 г. в 20:20, John Smith <[email protected]>: >>>>>>>>> >>>>>>>>>> How? >>>>>>>>>> >>>>>>>>>> On Mon, 29 Jun 2020 at 12:03, Ilya Kasnacheev < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hello! >>>>>>>>>>> >>>>>>>>>>> Try collecting thread dump from Visor as it freezes. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> -- >>>>>>>>>>> Ilya Kasnacheev >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> пн, 29 июн. 2020 г. в 18:11, John Smith <[email protected] >>>>>>>>>>> >: >>>>>>>>>>> >>>>>>>>>>>> How though? >>>>>>>>>>>> >>>>>>>>>>>> 1- Entered node command >>>>>>>>>>>> 2- Got list of nodes, including thick clients >>>>>>>>>>>> 3- Selected thick client >>>>>>>>>>>> 4- Entered Y for detailed statistics >>>>>>>>>>>> 5- Snapshot details displayed >>>>>>>>>>>> 6- Data region stats frozen >>>>>>>>>>>> >>>>>>>>>>>> I think the address resolution is working for this as well. I >>>>>>>>>>>> need to confirm. Because I fixed the resolver as per your solution >>>>>>>>>>>> and >>>>>>>>>>>> visor no longer freezes on #6 above. >>>>>>>>>>>> >>>>>>>>>>>> On Mon, 29 Jun 2020 at 10:54, Ilya Kasnacheev < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hello! >>>>>>>>>>>>> >>>>>>>>>>>>> This usually means there's no connectivity between node and >>>>>>>>>>>>> Visor. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> -- >>>>>>>>>>>>> Ilya Kasnacheev >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> пн, 29 июн. 2020 г. в 17:01, John Smith < >>>>>>>>>>>>> [email protected]>: >>>>>>>>>>>>> >>>>>>>>>>>>>> Also I think for Visor as well? >>>>>>>>>>>>>> >>>>>>>>>>>>>> When I do top or node commands, I can see the thick client. >>>>>>>>>>>>>> But when I look at detailed statistics for that particular thick >>>>>>>>>>>>>> client it >>>>>>>>>>>>>> freezes "indefinitely". Regular statistics it seems ok. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, 29 Jun 2020 at 08:08, Ilya Kasnacheev < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hello! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For thick clients, you need both 47100 and 47500, both >>>>>>>>>>>>>>> directions (perhaps for 47500 only client -> server is >>>>>>>>>>>>>>> sufficient, but for >>>>>>>>>>>>>>> 47100, both are needed). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For thin clients, 10800 is enough. For control.sh, 11211. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Ilya Kasnacheev >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> пт, 26 июн. 2020 г. в 22:06, John Smith < >>>>>>>>>>>>>>> [email protected]>: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm askin in separate question so people can search for it >>>>>>>>>>>>>>>> if they ever come across this... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> My server nodes are started as and I also connect the >>>>>>>>>>>>>>>> client as such. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> <bean >>>>>>>>>>>>>>>> class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder"> >>>>>>>>>>>>>>>> <property name="addresses"> >>>>>>>>>>>>>>>> <list> >>>>>>>>>>>>>>>> <value>foo:47500</value> >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> </list> >>>>>>>>>>>>>>>> </property> >>>>>>>>>>>>>>>> </bean> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In my client code I used the basic address resolver >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> And I put in the map >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> "{internalHostIP}:47500", "{externalHostIp}:{externalPort}" >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> igniteConfig.setAddressResolver(addrResolver); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> QUESTIONS >>>>>>>>>>>>>>>> ___________________ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 1- Port 47500 is used for discovery only? >>>>>>>>>>>>>>>> 2- Port 47100 is used for actual coms to the nodes? >>>>>>>>>>>>>>>> 3- In my container environment I have only mapped 47100, do >>>>>>>>>>>>>>>> I also need to map for 47500 for the Tcp Discovery SPI? >>>>>>>>>>>>>>>> 4- When I connect with Visor and I try to look at details >>>>>>>>>>>>>>>> for the client node it blocks. I'm assuming that's because >>>>>>>>>>>>>>>> visor cannot >>>>>>>>>>>>>>>> connect back to the client at 47100? >>>>>>>>>>>>>>>> Se logs below >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> LOGS >>>>>>>>>>>>>>>> ___________________ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> When I look at the client logs I get... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> IgniteConfiguration [ >>>>>>>>>>>>>>>> igniteInstanceName=xxxxxx, >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> discoSpi=TcpDiscoverySpi [ >>>>>>>>>>>>>>>> addrRslvr=null, <--- Do I need to use BasicResolver or >>>>>>>>>>>>>>>> here??? >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> commSpi=TcpCommunicationSpi [ >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> locAddr=null, >>>>>>>>>>>>>>>> locHost=null, >>>>>>>>>>>>>>>> locPort=47100, >>>>>>>>>>>>>>>> addrRslvr=null, <--- Do I need to use BasicResolver or >>>>>>>>>>>>>>>> here??? >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> addrRslvr=BasicAddressResolver [ >>>>>>>>>>>>>>>> inetAddrMap={}, >>>>>>>>>>>>>>>> inetSockAddrMap={/internalIp:47100=/externalIp:2389} >>>>>>>>>>>>>>>> <---- >>>>>>>>>>>>>>>> ], >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> clientMode=true, >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>
