Hey Burlen, on the bug report page for 10283, I think you need to fix the command line you are testing with :
$ ssh remote cmd1 && cm2 will execute cmd1 on remote and cmd2 locally. It should be: $ ssh remote "cmd1 && cmd2" Pat On Fri, Apr 30, 2010 at 9:12 AM, pat marion <[email protected]> wrote: > I have applied your patch. I agree that paraview should explicity close > the child process. But... what I am pointing out is that calling > QProcess::close() does not help in this situation. What I am saying is > that, even when paraview does kill the process, any commands run by ssh on > the other side of the netpipe will be orphaned by sshd. Are you sure you > can't reproduce it? > > > $ ssh localhost sleep 1d > $ < press control-c > > $ pidof sleep > $ # sleep is still running > > Pat > > > On Fri, Apr 30, 2010 at 2:08 AM, burlen <[email protected]> wrote: > >> Hi Pat, >> >> From my point of view the issue is philosophical, because practically >> speaking I couldn't reproduce the orphans with out doing something a little >> odd namely, ssh ... && sleep 1d. Although the fact that a user reported >> suggests that it may occur in the real world as well. The question is this: >> should an application explicitly clean up resources it allocates? or should >> an application rely on the user not only knowing that there is the potential >> for a resource leak but also knowing enough to do the right thing to avoid >> it (eg ssh -tt ...)? In my opinion, as a matter of principle, if PV spawns a >> process it should explicitly clean it up and there should be no way it can >> become an orphan. In this case the fact that the orphan can hold ports open >> is particularly insidious, because further connection attempt on that port >> fails with no helpful error information. Also it is not very difficult to >> clean up a spawned process. What it comes down to is a little book keeping >> to hang on to the qprocess handle and a few lines of code called from >> pqCommandServerStartup destructor to make certain it's cleaned up. This is >> from the patch I submitted when I filed the bug report. >> >> + // close running process >> + if (this->Process->state()==QProcess::Running) >> + { >> + this->Process->close(); >> + } >> + // free the object >> + delete this->Process; >> + this->Process=NULL; >> >> I think if the cluster admins out there new which ssh options >> (GatewayPorts etc) are important for ParView to work seamlessly, then they >> might be willing to open them up. It's my impression that the folks that >> build clusters want tools like PV to be easy to use, but they don't >> necessarily know all the in's and out's of confinguring and running PV. >> >> Thanks for looking at this again! The -tt option to ssh is indeed a good >> find. >> >> Burlen >> >> pat marion wrote: >> >>> Hi all! >>> >>> I'm bringing this thread back- I have learned a couple new things... >>> >>> ----------------------- >>> No more orphans: >>> >>> Here is an easy way to create an orphan: >>> >>> $ ssh localhost sleep 1d >>> $ <press control c> >>> >>> The ssh process is cleaned up, but sshd orphans the sleep process. You >>> can avoid this by adding '-t' to ssh: >>> >>> $ ssh -t localhost sleep 1d >>> >>> Works like a charm! But then there is another problem... try this >>> command from paraview (using QProcess) and it still leaves an orphan, doh! >>> Go back and re-read ssh's man page and you have the solution, use '-t' >>> twice: ssh -tt >>> >>> ------------------------- >>> GatewayPorts and portfwd workaround: >>> >>> In this scenario we have 3 machines: workstation, service-node, and >>> compute-node. I want to ssh from workstation to service-node and submit a >>> job that will run pvserver on compute-node. When pvserver starts on >>> compute-node I want it to reverse connect to service-node and I want >>> service-node to forward the connection to workstation. So here I go: >>> >>> $ ssh -R11111:localhost:11111 service-node qsub start_pvserver.sh >>> >>> Oops, the qsub command returns immediately and closes my ssh tunnel. >>> Let's pretend that the scheduler doesn't provide an easy way to keep the >>> command alive, so I have resorted to using 'sleep 1d'. So here I go, using >>> -tt to prevent orphans: >>> >>> $ ssh -tt -R11111:localhost:11111 service-node "qsub start_pvserver.sh >>> && sleep 1d" >>> >>> Well, this will only work if GatewayPorts is enabled in sshd_config on >>> service-node. If GatewayPorts is not enabled, the ssh tunnel will only >>> accept connections from localhost, it will not accept a connection from >>> compute-node. We can ask the sysadmin to enable GatewayPorts, or we could >>> use portfwd. You can run portfwd on service-node to forward port 22222 to >>> port 11111, then have compute-node connect to service-node:22222. So your >>> job script would launch pvserver like this: >>> >>> pvserver -rc -ch=service-node -sp=22222 >>> >>> Problem solved! Also convenient, we can use portfwd to replace 'sleep >>> 1d'. So the final command, executed by paraview client: >>> >>> ssh -tt -R 11111:localhost:11111 service-node "qsub start_pvserver.sh && >>> portfwd -g -c fwd.cfg" >>> >>> Where fwd.cfg contains: >>> >>> tcp { 22222 { => localhost:11111 } } >>> >>> >>> Hope this helps! >>> >>> Pat >>> >>> On Fri, Feb 12, 2010 at 7:06 PM, burlen <[email protected]<mailto: >>> [email protected]>> wrote: >>> >>> >>> Incidentally, this brings up an interesting point about >>> ParaView with client/server. It doesn't try to clean up it's >>> child processes, AFAIK. For example, if you set up this ssh >>> tunnel inside the ParaView GUI (e.g., using a command instead >>> of a manual connection), and you cancel the connection, it >>> will leave the ssh running. You have to track down the ssh >>> process and kill it yourself. It's minor thing, but it can >>> also prevent future connections if you don't realize there's a >>> zombie ssh that kept your ports open. >>> >>> I attempted to reproduce on my kubuntu 9.10, qt 4.5.2 system, with >>> slightly different results, which may be qt/distro/os specific. >>> >>> On my system as long as the process ParaView spawns finishes on >>> its own there is no problem. That's usually how one would expect >>> things to work out since when the client disconnects the server >>> closes followed by ssh. But, you are right that PV never >>> explicitly kills or otherwise cleans up after the process it >>> starts. So if the spawned process for some reason doesn't finish >>> orphan processes are introduced. >>> >>> I was able to produce orphan ssh processes, giving the PV client a >>> server start up command that doesn't finish. eg >>> >>> ssh ... pvserver ... && sleep 100d >>> >>> I get the situation you described which prevents further >>> connection on the same ports. Once PV tries and fails to connect >>> on th eopen ports, there is crash soon after. >>> >>> I filed a bug report with a patch: >>> http://www.paraview.org/Bug/view.php?id=10283 >>> >>> >>> >>> Sean Ziegeler wrote: >>> >>> Most batch systems have an option to wait until the job is >>> finished before the submit command returns. I know PBS uses >>> "-W block=true" and that SGE and LSF have similar options (but >>> I don't recall the precise flags). >>> >>> If your batch system doesn't provide that, I'd recommend >>> adding some shell scripting to loop through checking the queue >>> for job completion and not return until it's done. The sleep >>> thing would work, but wouldn't exit when the server finishes, >>> leaving the ssh tunnels (and other things like portfwd if you >>> put them in your scripts) lying around. >>> >>> Incidentally, this brings up an interesting point about >>> ParaView with client/server. It doesn't try to clean up it's >>> child processes, AFAIK. For example, if you set up this ssh >>> tunnel inside the ParaView GUI (e.g., using a command instead >>> of a manual connection), and you cancel the connection, it >>> will leave the ssh running. You have to track down the ssh >>> process and kill it yourself. It's minor thing, but it can >>> also prevent future connections if you don't realize there's a >>> zombie ssh that kept your ports open. >>> >>> >>> On 02/08/10 21:03, burlen wrote: >>> >>> I am curious to hear what Sean has to say. >>> >>> But, say the batch system returns right away after the job >>> is submitted, >>> I think we can doctor the command so that it will live for >>> a while >>> longer, what about something like this: >>> >>> ssh -R XXXX:localhost:YYYY remote_machine >>> "submit_my_job.sh && sleep >>> 100d" >>> >>> >>> pat marion wrote: >>> >>> Hey just checked out the wiki page, nice! One >>> question, wouldn't this >>> command hang up and close the tunnel after submitting >>> the job? >>> ssh -R XXXX:localhost:YYYY remote_machine submit_my_job.sh >>> Pat >>> >>> On Mon, Feb 8, 2010 at 8:12 PM, pat marion >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] >>> >>> <mailto:[email protected]>>> wrote: >>> >>> Actually I didn't write the notes at the hpc.mil >>> <http://hpc.mil> <http://hpc.mil> >>> >>> link. >>> >>> Here is something- and maybe this is the problem that >>> Sean refers >>> to- in some cases, when I have set up a reverse ssh >>> tunnel from >>> login node to workstation (command executed from >>> workstation) then >>> the forward does not work when the compute node >>> connects to the >>> login node. However, if I have the compute node >>> connect to the >>> login node on port 33333, then use portfwd to forward >>> that to >>> localhost:11111, where the ssh tunnel is listening on >>> port 11111, >>> it works like a charm. The portfwd tricks it into >>> thinking the >>> connection is coming from localhost and allow the ssh >>> tunnel to >>> work. Hope that made a little sense... >>> >>> Pat >>> >>> >>> On Mon, Feb 8, 2010 at 6:29 PM, burlen >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] >>> <mailto:[email protected]>>> wrote: >>> >>> Nice, thanks for the clarification. I am guessing that >>> your >>> example should probably be the recommended approach rather >>> than the portfwd method suggested on the PV wiki. :) I >>> took >>> the initiative to add it to the Wiki. KW let me know >>> if this >>> is not the case! >>> >>> >>> http://paraview.org/Wiki/Reverse_connection_and_port_forwarding#Reverse_connection_over_an_ssh_tunnel >>> >>> >>> >>> Would you mind taking a look to be sure I didn't miss >>> anything >>> or bollix it up? >>> >>> The sshd config options you mentioned may be why your >>> method >>> doesn't work on the Pleiades system, either that or >>> there is a >>> firewall between the front ends and compute nodes. In >>> either >>> case I doubt the NAS sys admins are going to >>> reconfigure for >>> me :) So at least for now I'm stuck with the two hop ssh >>> tunnels and interactive batch jobs. if there were >>> someway to >>> script the ssh tunnel in my batch script I would be >>> golden... >>> >>> By the way I put the details of the two hop ssh tunnel >>> on the >>> wiki as well, and a link to Pat's hpc.mil >>> <http://hpc.mil> <http://hpc.mil> >>> >>> notes. I don't dare try to summarize them since I've never >>> used portfwd and it refuses to compile both on my >>> workstation >>> and the cluster. >>> >>> Hopefully putting these notes on the Wiki will save future >>> ParaView users some time and headaches. >>> >>> >>> Sean Ziegeler wrote: >>> >>> Not quite- the pvsc calls ssh with both the tunnel options >>> and the commands to submit the batch job. You don't even >>> need a pvsc; it just makes the interface fancier. As long >>> as you or PV executes something like this from your >>> machine: >>> ssh -R XXXX:localhost:YYYY remote_machine submit_my_job.sh >>> >>> This means that port XXXX on remote_machine will be the >>> port to which the server must connect. Port YYYY (e.g., >>> 11111) on your client machine is the one on which PV >>> listens. You'd have to tell the server (in the batch >>> submission script, for example) the name of the node and >>> port XXXX to which to connect. >>> >>> One caveat that might be causing you problems, port >>> forwarding (and "gateway ports" if the server is running >>> on a different node than the login node) must be enabled >>> in the remote_machine's sshd_config. If not, no ssh >>> tunnels will work at all (see: man ssh and man >>> sshd_config). That's something that an administrator >>> would need to set up for you. >>> >>> On 02/08/10 12:26, burlen wrote: >>> >>> So to be sure about what you're saying: Your .pvsc >>> script ssh's to the >>> front end and submits a batch job which when it's >>> scheduled , your batch >>> script creates a -R style tunnel and starts pvserver >>> using PV reverse >>> connection. ? or are you using portfwd or a second ssh >>> session to >>> establish the tunnel ? >>> >>> If you're doing this all from your .pvsc script >>> without a second ssh >>> session and/or portfwd that's awesome! I haven't been >>> able to script >>> this, something about the batch system prevents the >>> tunnel created >>> within the batch job's ssh session from working. I >>> don't know if that's >>> particular to this system or a general fact of life >>> about batch systems. >>> >>> Question: How are you creating the tunnel in your >>> batch script? >>> >>> Sean Ziegeler wrote: >>> >>> Both ways will work for me in most cases, i.e. a >>> "forward" connection >>> with ssh -L or a reverse connection with ssh -R. >>> >>> However, I find that the reverse method is more >>> scriptable. You can >>> set up a .pvsc file that the client can load and >>> will call ssh with >>> the appropriate options and commands for the >>> remote host, all from the >>> GUI. The client will simply wait for the reverse >>> connection from the >>> server, whether it takes 5 seconds or 5 hours for >>> the server to get >>> through the batch queue. >>> >>> Using the forward connection method, if the server >>> isn't started soon >>> enough, the client will attempt to connect and >>> then fail. I've always >>> had to log in separately, wait for the server to >>> start running, then >>> tell my client to connect. >>> >>> -Sean >>> >>> On 02/06/10 12:58, burlen wrote: >>> >>> Hi Pat, >>> >>> My bad. I was looking at the PV wiki, and >>> thought you were talking about >>> doing this without an ssh tunnel and using >>> only port forward and >>> paraview's --reverse-connection option . Now >>> that I am reading your >>> hpc.mil <http://hpc.mil> <http://hpc.mil> post I see >>> >>> what you >>> mean :) >>> >>> Burlen >>> >>> >>> pat marion wrote: >>> >>> Maybe I'm misunderstanding what you mean >>> by local firewall, but >>> usually as long as you can ssh from your >>> workstation to the login node >>> you can use a reverse ssh tunnel. >>> >>> >>> _______________________________________________ >>> Powered by www.kitware.com <http://www.kitware.com> >>> <http://www.kitware.com> >>> >>> Visit other Kitware open-source projects at >>> http://www.kitware.com/opensource/opensource.html >>> >>> Please keep messages on-topic and check the >>> ParaView Wiki at: >>> http://paraview.org/Wiki/ParaView >>> >>> Follow this link to subscribe/unsubscribe: >>> http://www.paraview.org/mailman/listinfo/paraview >>> >>> >>> >>> >>> >>> >>> >> >
_______________________________________________ Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the ParaView Wiki at: http://paraview.org/Wiki/ParaView Follow this link to subscribe/unsubscribe: http://www.paraview.org/mailman/listinfo/paraview
