date:20131023

[gridengine users] job not killed after reaching h_vmem

2013-10-23 Thread Arnau Bria

Hi all,

In our cluster we use virtual_Free and h_vmmem as consumable resources
per job:

# qconf -sc|egrep 'virtual_free|h_vmem|^#'
#name   shortcut typerelop requestable consumable 
default  urgency 
#--
h_vmem  h_vmem   MEMORY  =YES JOB0 
   0
virtual_freevf   MEMORY  =YES JOB0 
   0


yesterday I found a paralle job that asked for 64GB of h_vmem that was
using more than 100GB of mem but SGE did not kill it :

# qstat -j 2098938|grep vmem
hard resource_list: virtual_free=64G,h_vmem=64G,h_rt=172800
usage1: cpu=18:26:24, mem=111455.48587 GBs, io=1735.61545, 
vmem=196.038G, maxvmem=197.132G

the node ran out of memory and it killed some processes, and finally we
killed (qdel) the job:

# grep 2098938 messages
10/22/2013 18:20:49|worker|ant-master2|W|job 2098938.1 failed on host YY 
assumedly after job because: job 2098938.1 died through signal KILL (9)


# qacct -j 2098938 -f joao 
==
qnamerg-el6  
hostname YY
groupXX
ownerjcurado 
project  NONE
department   defaultdepartment   
jobname  ZZ   
jobnumber2098938 
taskid   undefined
account  sge 
priority 0   
qsub_timeTue Oct 22 12:55:58 2013
start_time   Tue Oct 22 12:59:01 2013
end_time Tue Oct 22 18:20:48 2013
granted_pe   smp 
slots8   
failed   100 : assumedly after job
exit_status  137 
ru_wallclock 19307
ru_utime 0.058
ru_stime 1.662
ru_maxrss5412
ru_ixrss 0   
ru_ismrss0   
ru_idrss 0   
ru_isrss 0   
ru_minflt14819   
ru_majflt2   
ru_nswap 0   
ru_inblock   967416  
ru_oublock   1298344 
ru_msgsnd0   
ru_msgrcv0   
ru_nsignals  0   
ru_nvcsw 2324
ru_nivcsw15165   
cpu  67178.120
mem  125116.602
io   1745.077  
iow  0.000 
maxvmem  197.184G
arid undefined

I'm looking for some extra info in node YY, but I find nothing in
messages.
That node did kill other jobs becaue the used more memory than
requested in h_vmem:

main|YY|W|job 1993603 exceeds job hard limit h_vmem of queue rg-el6@YY 
(53771632640.0  limit:53687091200.0) - sending SIGKILL

So, why it did not kill that job? how may I start debugging the problem? (I'm 
submiting the exact same job)


TIA,
Arnau
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] subordination and RSQ conflict

2013-10-23 Thread Reuti

Hi,

Am 23.10.2013 um 03:50 schrieb Ian Mortimer:

 On 22/10/13 17:39, Reuti wrote:
 
 Correct - there is no look ahead feature in SGE - i.e. after the suspension
 you would be inside thegranted limit again. The suspension is the result of
  another job in another queue being started.  You can try:
 
 limit queues !urgent.q hosts {*} to slots=$num_proc
 
 with feasible limits inside the urgent.q. For a short time until the
  suspension starts you allow an oversubscription (although it might
  be parts of a second only).
 
 If we do that won't it lead to nodes being oversubscribed through the
 high priority queues?

Yes, for a fraction of a second until the subordinations you set up for the 
other queues take effect - this has to be set up anyway. But you have to allow 
to start the urgent jobs, so that as a result the subordination will suspend 
the jobs in the lower queues.

NB: The suspended jobs will still hold the reserved resources like memory and 
stay also in the process tree - just stopped to be continued later.

-- Reuti

 The alternative would be to limit the low priority queue to  $num_proc
 but then there'll be times when the cluster is underutilized with free
 slots and low priority jobs pending.
 
 
 Thanks
 -- 
 Ian
 i.morti...@uq.edu.au Ian Mortimer
 Tel: +61 7 3346 8528 Science IT
  University of Queensland
 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] job not killed after reaching h_vmem

2013-10-23 Thread Reuti

Hi,

Am 23.10.2013 um 08:59 schrieb Arnau Bria:

 In our cluster we use virtual_Free and h_vmmem as consumable resources
 per job:
 
 # qconf -sc|egrep 'virtual_free|h_vmem|^#'
 #name   shortcut typerelop requestable consumable 
 default  urgency 
 #--
 h_vmem  h_vmem   MEMORY  =YES JOB0   
  0
 virtual_freevf   MEMORY  =YES JOB0   
  0
 
 
 yesterday I found a paralle job that asked for 64GB of h_vmem that was
 using more than 100GB of mem but SGE did not kill it :

More than 100G in total or per slot (as the limit is multiplied)?


 # qstat -j 2098938|grep vmem
 hard resource_list: virtual_free=64G,h_vmem=64G,h_rt=172800
 usage1: cpu=18:26:24, mem=111455.48587 GBs, 
 io=1735.61545, vmem=196.038G, maxvmem=197.132G

Can you please `grep` the messages file for the executing node for other 
entries of job 2098938.

-- Reuti


 the node ran out of memory and it killed some processes, and finally we
 killed (qdel) the job:
 
 # grep 2098938 messages
 10/22/2013 18:20:49|worker|ant-master2|W|job 2098938.1 failed on host YY 
 assumedly after job because: job 2098938.1 died through signal KILL (9)
 
 
 # qacct -j 2098938 -f joao 
 ==
 qnamerg-el6  
 hostname YY
 groupXX
 ownerjcurado 
 project  NONE
 department   defaultdepartment   
 jobname  ZZ   
 jobnumber2098938 
 taskid   undefined
 account  sge 
 priority 0   
 qsub_timeTue Oct 22 12:55:58 2013
 start_time   Tue Oct 22 12:59:01 2013
 end_time Tue Oct 22 18:20:48 2013
 granted_pe   smp 
 slots8   
 failed   100 : assumedly after job
 exit_status  137 
 ru_wallclock 19307
 ru_utime 0.058
 ru_stime 1.662
 ru_maxrss5412
 ru_ixrss 0   
 ru_ismrss0   
 ru_idrss 0   
 ru_isrss 0   
 ru_minflt14819   
 ru_majflt2   
 ru_nswap 0   
 ru_inblock   967416  
 ru_oublock   1298344 
 ru_msgsnd0   
 ru_msgrcv0   
 ru_nsignals  0   
 ru_nvcsw 2324
 ru_nivcsw15165   
 cpu  67178.120
 mem  125116.602
 io   1745.077  
 iow  0.000 
 maxvmem  197.184G
 arid undefined
 
 I'm looking for some extra info in node YY, but I find nothing in
 messages.
 That node did kill other jobs becaue the used more memory than
 requested in h_vmem:
 
 main|YY|W|job 1993603 exceeds job hard limit h_vmem of queue rg-el6@YY 
 (53771632640.0  limit:53687091200.0) - sending SIGKILL
 
 So, why it did not kill that job? how may I start debugging the problem? (I'm 
 submiting the exact same job)
 
 
 TIA,
 Arnau
 ___
 users mailing list
 users@gridengine.org
 https://gridengine.org/mailman/listinfo/users


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] job not killed after reaching h_vmem

2013-10-23 Thread Arnau Bria

On Wed, 23 Oct 2013 10:06:12 +0200
Reuti Reuti wrote:

 Hi,
Hi Reuti,
 
  # qconf -sc|egrep 'virtual_free|h_vmem|^#'
  #name   shortcut typerelop requestable
  consumable default  urgency
  #--
  h_vmem  h_vmem   MEMORY  =YES
  JOB00 virtual_freevf   MEMORY
  =YES JOB00
  
  
  yesterday I found a paralle job that asked for 64GB of h_vmem that
  was using more than 100GB of mem but SGE did not kill it :
 
 More than 100G in total or per slot (as the limit is multiplied)?
?? 

from sge_complex:

A  consumable  defined  by ’y’ is a per slot consumables which means
the limit is multiplied by the number of slots being used by the job
before being applied.  In case of ’j’ the consumable is a per job
consumable.

doesn't JOB mean per job total?
 
 
  # qstat -j 2098938|grep vmem
  hard resource_list: virtual_free=64G,h_vmem=64G,h_rt=172800
  usage1: cpu=18:26:24, mem=111455.48587 GBs,
  io=1735.61545, vmem=196.038G, maxvmem=197.132G
 
 Can you please `grep` the messages file for the executing node for
 other entries of job 2098938.
 
# ls
active_jobs  job_scripts   messages-20130630.gz  messages-20130721.gz  
messages-20130811.gz  messages-20130901.gz  messages-20130922.gz  
messages-20131013.gz
execd.pidmessages  messages-20130707.gz  messages-20130728.gz  
messages-20130818.gz  messages-20130908.gz  messages-20130929.gz  
messages-20131020.gz
jobs messages-20130623.gz  messages-20130714.gz  messages-20130804.gz  
messages-20130825.gz  messages-20130915.gz  messages-20131006.gz
# zgrep 2098938 messages*
# 

there are no entries for that job

 -- Reuti
Thanks,
Arnau

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Welcome Home Grid Engine!

2013-10-23 Thread William Hay

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 22/10/13 13:39, John Kloss wrote:
 Should we assume that, since Univa claims ownership of all
 copyright and trademarks, _including_ the code under the SISSL,
 that Univa will be fighting to shutdown the open source versions of
 Grid Engine (open gridscheduler and Son of Grid Engine)?
 
 John Kloss II.
 
 
That might be excessively paranoid.  IANAL but I don't think owning
copyrights entitles one to revoke licenses already granted by your
predecessors in title.  If that were possible don't you think Oracle
would have used that to shut Univa down?  Oracle fit the model of
big,evil corporation much better than Univa.

William

 On Tue, Oct 22, 2013 at 8:12 AM, Fritz Ferstl
 ffer...@univa.commailto:ffer...@univa.com wrote: Hello,
 
 At the end of last week Reuti had already picked up that Oracle
 gave notification to customers that support for Oracle Grid Engine
 would transfer to Univa. Today, the transition has become official
 so allow me to provide more details and background.
 
 The Grid Engine engineering team always has been a tightly knit
 group, even prior to the days of joining Sun in 2000 and then
 throughout all the years at Sun, the one year at Oracle and now
 since Jan 2011 at Univa. Our dedication and passion is to evolve
 the Grid Engine technology and help Grid Engine users to apply Grid
 Engine successfully in their various use cases.
 
 The announcement Univa has made public today will allow us to do
 that directly for Oracle Grid Engine customers. Most noteworthy it
 will also remove confusion around Grid Engine as the transition has
 re-united the full intellectual property including trademarks and
 all copyright which my team has built over so many years. This
 encompasses code under the SISSL, the proprietary Oracle code and
 other assets like documentation, the certification and test suite,
 diagnostics tools and similar.
 
 So this is an exciting day for the Grid Engine technology and also
 for the Grid Engine team at Univa.
 
 If you wish to read more about this please see the press release
 here: http://www.univa.com/about/news/press_2013/10222013.php
 
 Best regards,
 
 Fritz
 
 --
 
 [Univa]Fritz Ferstl | CTO and Business Development, EMEA Univa
 Corporationhttp://www.univa.com/ | The Data Center Optimization
 Company E-Mail: ffer...@univa.commailto:ffer...@univa.com |
 Phone: +49.9471.200.195tel:%2B49.9471.200.195 | Mobile:
 +49.170.819.7390tel:%2B49.170.819.7390
 
 [Where Grid Engine lives]
 
 
 Visit us at SC13 at booth #4014!
 
 
 ___ users mailing list 
 users@gridengine.orgmailto:users@gridengine.org 
 https://gridengine.org/mailman/listinfo/users
 
 
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSZ4k4AAoJEKCzH4joEjNWEVAP/R3eWkP54Jn8VQkp0KqZzffz
De8crt7p3g/rDUTIGpHk8DxVldWmgMKZRJ1TniH6Jf6GRL7JRFzD+zN0LpA6LSD+
3wlMo7TFeCv8Im9FZfCf87exIZ/8baeFqx1FbMqkst1ifqwfCdDRwbtdz7+qWadb
vHze1rFQiKJkboHehQK/7TUtMOCo02lk5COc+vVstdMUiQ0F0ahl2w0elM1/1uOt
7R7+Tjf0j7YFBh8rgy3Dz5K8eUsHqQ9JARPhDD+RD68wvdSzIMFe6r9vEVT4DsVz
gpFGYDwLn3H5YCBGkdwExsWlxo3YGyTeZ58SmGBPdWQdwnhHB1ChvjqT6Mt5b3up
dex0ScGF2DpGDBe8JW8E/SQWG+DV0sC8c4jzdWhJdxBYwS7Xk53o6IFZXMks6kKc
AllrLq/APN9/iw94uwyU0aHNd27xqu+zG06457BMGSVq2LHkj/Fx8HqVWoo20aq/
lUYpdaTAi3fH05xwufofbrumtqgy3OQsoK6A4r84mNxETAa9x8sjl1hPtVw82eiV
hoJZR46r8oH533YzWvuiqywUktJEMkYY0mZLM5jrRwH92AcGiSwWE2EuEj02vE7q
xS8bZqpdAET2prp6Frnj9VHRU3EjohnF3VpCVivmdBR2FdDephyVAMvZbrT9Fopl
IVNHUYyrKowuwlmsXWOL
=z2O1
-END PGP SIGNATURE-

___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Welcome Home Grid Engine!

2013-10-23 Thread Fritz Ferstl

You are quite correct, 
Bill, that it is too early for this. We are currently busy with 
onboarding and assisting the Oracle customers and with ingesting the 
Oracle Grid Engine assets.

Cheers,

Fritz


   	   
   	William Deegan  
  23. Oktober 2013 
02:19
  Fritz,I
 know it's probably early to ask this, but will more of the IP be moved 
to SISSL or other open license in the near term or over time?Thanks,Bill
   	   
   	Fritz Ferstl  
  22. Oktober 2013 
15:51
  

Yes, Mark. This is in 
essence what I was intending to respond. The SISSL is a recognized and 
liberal open source license. As such and within the rules 
set forth by the license 
itself it allows for use of the code also by parties who are not the 
copyright owners of 
the code under the SISSL. 
  
Cheers,
  
Fritz




  
   	   
   	Mark Dixon  
  22. Oktober 2013 
15:17
  On Tue, 22 Oct 2013, John Kloss
 wrote:


Yikes! That's a rather pessimistic reading, isn't it?

The copyright of any code, even for free and open source software, 
is 
owned by someone. What matters is what they have licensed it to be used 
for. Last I checked, the OSI considers SISSL to be open source.

Feel free to call me naive, but this announcement sounds like good 
news to 
me - Oracle were clearly not interested in gridengine. Congratulations 
to 
Fritz, the engineers and Univa :)

Mark
___users 
mailing listusers@gridengine.orghttps://gridengine.org/mailman/listinfo/users
   	   
   	John Kloss  
  22. Oktober 2013 
14:39
  Should 
we assume that, since Univa claims ownership of all copyright and 
trademarks, _including_ the code under the SISSL, that Univa will be 
fighting to shutdown the open source versions of Grid Engine (open 
gridscheduler and Son of Grid Engine)?
 John 
Kloss II.

___users mailing 
listusers@gridengine.orghttps://gridengine.org/mailman/listinfo/users
   	   
   	Fritz Ferstl  
  22. Oktober 2013 
14:12
  

Hello,
  
At the end of last week Reuti had already picked up that Oracle gave 
notification to customers that support for Oracle Grid Engine would 
transfer to Univa. Today, the transition has become official so allow me
 to provide more details and background.
  
The Grid Engine engineering team always has been a tightly knit group, 
even prior to the days of joining Sun in 2000 and then throughout all 
the years at Sun, the one year at Oracle and now since Jan 2011 at 
Univa. Our dedication and passion is to evolve the Grid Engine 
technology and help Grid Engine users to apply Grid Engine successfully 
in their various use cases.
  
The announcement Univa has made public today will allow us to do that 
directly for Oracle Grid Engine customers. Most noteworthy it will also 
remove confusion around Grid Engine as the transition has re-united the 
full intellectual property including trademarks and allcopyright which 
my team has built over so many years. This encompasses code under the 
SISSL, the proprietary Oracle code and other assets like documentation, 
the certification and test suite, diagnostics tools and similar.
  
So this is an exciting day for the Grid Engine technology and also for 
the Grid Engine team at Univa.
  
If you wish to read more about this please see the press release here: 
http://www.univa.com/about/news/press_2013/10222013.php
  
Best regards,
  
Fritz
  
  
  


-- 
Fritz Ferstl | CTO and Business Development, EMEAUniva
Corporation | The Data Center Optimization CompanyE-Mail:
ffer...@univa.com | Phone: +49.9471.200.195 | Mobile:
+49.170.819.7390






Visit us at SC13 at booth #4014! 






___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] job not killed after reaching h_vmem

2013-10-23 Thread Reuti

Am 23.10.2013 um 10:29 schrieb Arnau Bria:

 On Wed, 23 Oct 2013 10:06:12 +0200
 Reuti Reuti wrote:
 
 Hi,
 Hi Reuti,
 
 # qconf -sc|egrep 'virtual_free|h_vmem|^#'
 #name   shortcut typerelop requestable
 consumable default  urgency
 #--
 h_vmem  h_vmem   MEMORY  =YES
 JOB00 virtual_freevf   MEMORY
 =YES JOB00
 
 
 yesterday I found a paralle job that asked for 64GB of h_vmem that
 was using more than 100GB of mem but SGE did not kill it :
 
 More than 100G in total or per slot (as the limit is multiplied)?
 ?? 
 
 from sge_complex:
 
 A  consumable  defined  by ’y’ is a per slot consumables which means
 the limit is multiplied by the number of slots being used by the job
 before being applied.  In case of ’j’ the consumable is a per job
 consumable.
 
 doesn't JOB mean per job total?

Hehe - I missed that JOB setting. There was already a discussion about this 
symptom for 6.2u5 and I don't know whether these were fixed already:

http://gridengine.org/pipermail/users/2013-January/005419.html

There are some flaws for this setting and it's sometimes working, sometimes not:

$ qconf -sc
#name   shortcut   typerelop   requestable consumable 
default  urgency 
...
h_vmem  h_vmem MEMORY  =  YES JOB128M  
   0


reuti@pc15370:~ qsub -pe openmpi 2 -l h_vmem=2M,h=pc15370 test.sh
Your job 10091 (test.sh) has been submitted
reuti@pc15370:~ qsub -pe openmpi 2 -l h_vmem=2M,h=pc15370 test.sh
Your job 10092 (test.sh) has been submitted

reuti@pc15370:~ qstat
job-ID  prior   name   user state submit/start at queue 
 slots ja-task-ID
-
  10091 1.05000 test.shreutir 10/23/2013 16:29:28 all.q@pc15370 
2
  10092 1.05000 test.shreutidr10/23/2013 16:29:28 all.q@pc15370 
2


10/23/2013 16:29:29|  main|pc15370|W|job 10092 exceeds job hard limit h_vmem 
of queue all.q@pc15370 (6164480.0  limit:4194304.0) - sending SIGKILL
10/23/2013 16:29:29|  main|pc15370|I|SIGNAL jid: 10092 jatask: 1 signal: KILL


But the other job 10091 survived and ended properly - and why is the limit 
4194304 and not 2M, not to mention the ulimit:

reuti@pc15370:~ grep virtual test.sh.o10091 test.sh.o10092
test.sh.o10091:virtual memory  (kbytes, -v) 10240
test.sh.o10092:virtual memory  (kbytes, -v) 10240

(Don't request h_vmem and it is fine set to the default but multiplied! by the 
requested slot count - despite the JOB setting.)

-- Reuti


 # qstat -j 2098938|grep vmem
 hard resource_list: virtual_free=64G,h_vmem=64G,h_rt=172800
 usage1: cpu=18:26:24, mem=111455.48587 GBs,
 io=1735.61545, vmem=196.038G, maxvmem=197.132G
 
 Can you please `grep` the messages file for the executing node for
 other entries of job 2098938.
 
 # ls
 active_jobs  job_scripts   messages-20130630.gz  messages-20130721.gz 
  messages-20130811.gz  messages-20130901.gz  messages-20130922.gz  
 messages-20131013.gz
 execd.pidmessages  messages-20130707.gz  messages-20130728.gz 
  messages-20130818.gz  messages-20130908.gz  messages-20130929.gz  
 messages-20131020.gz
 jobs messages-20130623.gz  messages-20130714.gz  messages-20130804.gz 
  messages-20130825.gz  messages-20130915.gz  messages-20131006.gz
 # zgrep 2098938 messages*
 # 
 
 there are no entries for that job
 
 -- Reuti
 Thanks,
 Arnau
 


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Welcome Home Grid Engine!

2013-10-23 Thread John Kloss

On Wed, Oct 23, 2013 at 4:30 AM, William Hay w@ucl.ac.uk wrote:
 That might be excessively paranoid.  IANAL but I don't think owning
 copyrights entitles one to revoke licenses already granted by your
 predecessors in title.  If that were possible don't you think Oracle
 would have used that to shut Univa down?  Oracle fit the model of
 big,evil corporation much better than Univa.


I see Oracles relation to Grid Engine as an elephant is to a mouse
(which, despite what comics and cartoons may lead you to believe, are
not natural enemies).  An elephant can destroy a mouse, but in general
it doesn't care a wit about the rodents well being, and mostly ignores
it.  I think that analogy reflects reality up to this point.

Univa is not an elephant.  It could be seen as more feline in nature
... or it could be seen as the mouses mother (yes, the analogy breaks
down here).  Univa started off acting more like the later when it
distributed Grid Engine as open core-- and then it stopped.

Not releasing source code is common when the code was never open in
the first place (no one ever expects Microsoft to release unencumbered
product code to the public, or even its friends ... if it has any).
But, Grid Engine under Sun was open source, and developed an open
source community.  There was at least the hope, if not the expectation
that the next steward would continue that community.  That's not what
Univa did, though.

 The announcement Univa has made public today will allow us to do that
 directly for Oracle Grid Engine customers. Most noteworthy it will also
 remove confusion around Grid Engine as the transition has re-united the
 full intellectual property including trademarks and all copyright which my
 team has built over so many years. This encompasses code under the
 SISSL, the proprietary Oracle code and other assets like documentation,
 the certification and test suite, diagnostics tools and similar.

I understand how trademark and copyright works (though my initial
email might indicate otherwise).  But what does the above mean?  If I
were to document the event model used between the qmaster and the
execd processes and request that Dave Love post that to the Son of
Grid Engine site, am I in violation of trademark, patent, or copyright
(I don't think the SISSL covers documentation or educational
material)?  Am I allowed to say such work covers Grid Engine
operations? For that matter, what about the _current_ documentation,
mailing list archives, and howtos, that Dave Love posts on the SoGE
site?  Can he even call his fork of the Grid Engine source code Son
of Grid Engine?  Doesn't that name cause confusion around Grid
Engine?

If Univa goes open core again, develops a community edition and a
developer community in general, allows and encourages patches and
additions to the Grid Engine core, welcomes documentation in formal
and/or wiki format, and acts like a steward to the Grid Engine code
base and community then, yes, Univa taking full control of the Grid
Engine trademark, copyright, assets, documentation, certification,
test suites, diagnostics tools, and similar is a fantastic change.
Really, it's great!  We all could unite around the Univa Grid Engine
moniker (though, obviously, I can only speak for myself).

Otherwise, I'd prefer the elephant.  It never seemed to notice us.

  John.
___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Welcome Home Grid Engine!

2013-10-23 Thread Joseph Farran


Are you kidding me?  NO?

Have you seen what Adaptive Computing did with Moab?They took Maui 
added/improved it
and are now charging a fortune for Moab.

If a company wants to start from scratch with a product fine, but to take a 
product contributed
by the community for free and then repackage it with bug fixes and added 
features, that's not
good.

We were using Maui and the $price$ for Moab was ridiculously expensive charging 
by
node sockets and thus why we ended up with Son of Grid Engine. The price for
Univa Grid Engine was equally expensive when we initially inquired.

I don't see anything good coming out of this in the long run for us...

Joseph


On 10/22/2013 06:20 AM, ChrisDag wrote:

John Kloss wrote:

Should we assume that, since Univa claims ownership of all copyright and
trademarks, _including_ the code under the SISSL, that Univa will be
fighting to shutdown the open source versions of Grid Engine (open
gridscheduler and Son of Grid Engine)?


No. You should not make that assumption.


___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Welcome Home Grid Engine!

2013-10-23 Thread ChrisDag

Just my $.02 ...

Joseph Farran wrote:
 
 If a company wants to start from scratch with a product fine, but to
 take a product contributed
 by the community for free and then repackage it with bug fixes and added
 features, that's not
 good.

Accidental or intentional that statement trivializes significant efforts
of lots of people, many of whom were with Grid Engine pre-Sun
Microsystems back when it was called CODINE. The open source community
improved and enhanced the product, these people built the damn thing.

Missing from the above characterization is ...hiring almost the entire
grid engine development team (and now all of the support engineers) and
continuing to ensure that there is a stable of active people being paid
to work on it full time ... Have you looked deep into the codebase? The
learning curve is pretty extreme which can be a major obstacle to long
term success of an OSS effort.

The open source forks are doing well - I was worried that they'd be
'bugfix only' but there is real enhancement work happening. The major
risk in my mind is that I suspect the number of serious active
committers is very small.

I see/deploy numerous Grid Engine systems every year for various people
and entities. I'd say that maybe 80% of them use OGS or SoGE but the
remaining 20% use the commercial flavor and are quite happy. Univa's
roadshows and roadmaps have been impressive enough that I suspect they
will continue to do well with GE.

I'm personally very glad that both options exist and hope this situation
continues.

Sorry for being long winded. My TL/DR summary:

 - I'm glad both options exist
 - I'm glad all of the various forks are doing well
 - I'm not willing to assume Evil on behalf of Univa. I respect both the
 management and the engineers and the worst they've ever done to me was
annoy me a few years back when some of their marketing and PR got a
little over aggressive

-dag



___
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

[gridengine users] job not killed after reaching h_vmem

Re: [gridengine users] subordination and RSQ conflict

Re: [gridengine users] job not killed after reaching h_vmem

Re: [gridengine users] job not killed after reaching h_vmem

Re: [gridengine users] Welcome Home Grid Engine!

Re: [gridengine users] Welcome Home Grid Engine!

Re: [gridengine users] job not killed after reaching h_vmem

Re: [gridengine users] Welcome Home Grid Engine!

Re: [gridengine users] Welcome Home Grid Engine!

Re: [gridengine users] Welcome Home Grid Engine!

10 matches

Site Navigation

Mail list logo

Footer information