[galaxy-dev] PBSpro Cluster Server possible?

2012-07-24 Thread Stephan Frickenhaus
Hello,

I have tried to couple our GALAXY-server (unified, running on a virtual server, 
ubuntu 12) with our big SGI Cluster-Server uv100 running PBSpro as job system
(pbs_version = PBSPro_11.1.1.112253).

On the GALAXY-server I have installed

ii  pbs-drmaa-dev1.0.10-2   DRMAA 
for Torque/PBS Pro - devel
ii  pbs-drmaa1   1.0.10-2   DRMAA 
for Torque/PBS Pro - runtime

We have the GALAXY-sources installed very recently; job-creation is prepared, 
but no job is scheduled to our uv100.

Did you ever hear that the libdrmaa.so coming with this package is useful to 
couple to a PBSpro-server?

The DRMAA_LIBRARY_PATH  has been set accordingly to  
/usr/lib/libdrmaa.so.1.0.10 .

we set
default_cluster_job_runner = drmaa://uv100.awi.de/slong/

We also checked that the torque-system on the GALAXY-Server could successfully 
submit jobs by "qsub" on the PBSpro-server in the slong queue.

Best regards and many thanks in advance,

S. Frickenhaus



--
--
Prof. Dr. Stephan Frickenhaus
Hochschule Bremerhaven
An der Karlstadt 8
27568 Bremerhaven
0471-4823-525
0151-1741 1631

Alfred-Wegener-Institut f. Polar- u. Meeresforschung
Am Handelshafen 12
27570 Bremerhaven
stephan.frickenh...@awi.de
0471-4831-1179  0151-1741 1631



--
Prof. Dr. Stephan Frickenhaus
FB1 - Biotechnologie
 Hochschule Bremerhaven
An der Karlstadt 8
27568 Bremerhaven
0471-4823-525
0151-1741 1631

Alfred-Wegener-Institut f. Polar- u. Meeresforschung
Bioinformatik/Rechenzentrum
Am Handelshafen 12
27570 Bremerhaven
stephan.frickenh...@awi.de
0471-4831-1179
0151-1741 1631

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] PBSpro Cluster Server possible?

2012-07-24 Thread Sascha Kastens
Hi Stephan,

I am no expert with PBS but your default_job_runner looks wrong

Instead of default_cluster_job_runner = drmaa://uv100.awi.de/slong/ you should 
use some PBS specific

paramters for the drmaa:/// part. I am using SGE an specifiy some of the qsub 
parameters:

default_cluster_job_runner = drmaa://-pe make 16 -q bioinf.q/ this submits the 
job to our bioinf-cluster

on a node with 16 free slots. So you should check how you submit jobs via 
commandline with PBS and use

these paramaters for setting drmaa:///

Cheers,

Sascha Kastens

Project Manager

 

GATC Biotech AG

Jakob-Stadler-Platz 7

D-78467 Konstanz

 

Phone: +49 (0) 7531-81604110

Fax: +49 (0) 7531-816081

Email:  s.kast...@gatc-biotech.com

 

http://www.gatc-biotech.com

http://www.twitter.com/gatcbiotech

http://www.facebook.com/gatcbiotech

http://www.xing.com/companies/gatcbiotechag

 

GATC Biotech AG
Chairman Supervisory Board: Fritz Pohl
Board of Directors: Peter Pohl, Thomas Pohl, Dr. Marcus Benz
UID: DE 142 315 733 | Registration: Konstanz, HRB 1757 | Registered Office: 
Konstanz

 

 

The information contained in this email is intended solely for the addressee. 
Access to this email by anyone else unauthorized. If you are not the intended 
recipient, any form of disclosure, reproduction, distribution or any action 
taken or refrained from in reliance on it, is prohibited and may be unlawful. 
Please notify the sender immediately. The content of this email is not legally 
binding unless confirmed by letter.

 

Original Message

processed by CONSOLIDATE

Subject:

[galaxy-dev] PBSpro Cluster Server possible?

Sent:

Dienstag, 24. Juli 2012 12:23

From:

Stephan Frickenhaus (stephan.frickenh...@awi.de)


Hello,

I have tried to couple our GALAXY-server (unified, running on a virtual server, 
ubuntu 12) with our big SGI Cluster-Server uv100 running PBSpro as job system

(pbs_version = PBSPro_11.1.1.112253).


On the GALAXY-server I have installed

ii  pbs-drmaa-dev    1.0.10-2   DRMAA 
for Torque/PBS Pro - devel
ii  pbs-drmaa1   1.0.10-2   DRMAA 
for Torque/PBS Pro - runtime

We have the GALAXY-sources installed very recently; job-creation is prepared, 
but no job is scheduled to our uv100.

Did you ever hear that the libdrmaa.so coming with this package is useful to 
couple to a PBSpro-server?

The DRMAA_LIBRARY_PATH  has been set accordingly to  
/usr/lib/libdrmaa.so.1.0.10 .

we set
default_cluster_job_runner = drmaa://uv100.awi.de/slong/

We also checked that the torque-system on the GALAXY-Server could successfully 
submit jobs by "qsub" on the PBSpro-server in the slong queue.

Best regards and many thanks in advance,
 
S. Frickenhaus



--

--

Prof. Dr. Stephan Frickenhaus

Hochschule Bremerhaven

An der Karlstadt 8

27568 Bremerhaven

0471-4823-525

0151-1741 1631

 

Alfred-Wegener-Institut f. Polar- u. Meeresforschung

Am Handelshafen 12

27570 Bremerhaven

stephan.frickenh...@awi.de

0471-4831-1179  0151-1741 1631

 

 

 

--

Prof. Dr. Stephan Frickenhaus

FB1 - Biotechnologie

 Hochschule Bremerhaven

An der Karlstadt 8

27568 Bremerhaven

0471-4823-525

0151-1741 1631

 

Alfred-Wegener-Institut f. Polar- u. Meeresforschung

Bioinformatik/Rechenzentrum

Am Handelshafen 12

27570 Bremerhaven

stephan.frickenh...@awi.de

0471-4831-1179

0151-1741 1631

 

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

[galaxy-dev] Local instance is running way too slow!!!

2012-07-24 Thread Di Nguyen

Dear All,

I successfully install Galaxy onto my new MBP with 16Gb or Ram but when 
I tried to use Galaxy, it is painfully slow! The first test I did was to 
create Admin and import data (RNA-seq fastq, about 6Gb in size) into 
database and then history and it worked fine. The second test was to run 
fasqgroomer on this fasq and it took forever (3 hours+).


Anybody got in idea of why it is so slow? Would it be possible that 
Galaxy was set up to run a single process instead of 8-core processor? 
If that is the case, how to fix it?


Please help!

Di Nguyen
Postdoc, U of W, Seattle, WA
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Local instance is running way too slow!!!

2012-07-24 Thread Langhorst, Brad
galaxy just wraps existing tools...  so it's probably not galaxy that is slow 
per se, but the fastqgroomer too.  Each tool has its own performance 
characteristics. 

I don't use fastqgroomer, so I don't know how it can be expected to perform.

Are you sure you need it?

If you know that your error is scaled in sanger units (iontorrent and casava  
1.8 fastqs are), then you may not.

If you look at your activity monitor you can see if CPU or disk is the limiting 
factor for the work you are doing.


Brad
On Jul 24, 2012, at 3:41 PM, Di Nguyen wrote:

> Dear All,
> 
> I successfully install Galaxy onto my new MBP with 16Gb or Ram but when I 
> tried to use Galaxy, it is painfully slow! The first test I did was to create 
> Admin and import data (RNA-seq fastq, about 6Gb in size) into database and 
> then history and it worked fine. The second test was to run fasqgroomer on 
> this fasq and it took forever (3 hours+).
> 
> Anybody got in idea of why it is so slow? Would it be possible that Galaxy 
> was set up to run a single process instead of 8-core processor? If that is 
> the case, how to fix it?
> 
> Please help!
> 
> Di Nguyen
> Postdoc, U of W, Seattle, WA
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
> http://lists.bx.psu.edu/

--
Brad Langhorst
langho...@neb.com
978-380-7564





___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Build indexes for BWA, Bowtie, and others in local instance

2012-07-24 Thread Di Nguyen

Dear all,

I just installed my local instance. In order to use NGS tools, I need 
indexes. Do I have to build these indexes species by species, program by 
programs or there is a SHORTCUT for Galaxy compatible readied for 
download indexes?


If I'am not mistaken, building these indexes can take weeks?

Kindest regards,

Di
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Local instance is running way too slow!!!

2012-07-24 Thread Kenny Sabir
Hello all,

we were having the same issues with groomer taking up to 12 hours for large
files. I had a look at the code and saw it was only using the single core.
I changed the code to split the fastq input into multiple file parts and
process it in parallel and reassemble the results. It also reassembles the
aggregator data (which prints the final summary).

For using 8 cores we saw a 7x  improvement. Naturally the data-output is
identical. One limitation is that it does not support fastq that has
multiple lines per single sequence. I have read that this practice is
discouraged anyway as it was problematic (though it was in the original
spec) and I haven't seen this occur in our data so far.

I believe there is still room to improve as the Python readLine has
suboptimal performance as it will do too much file I/O without enough
buffering.

I'm new to bioinformatics, though i come from a history of R&D comp eng. If
anyone is at the Chicago Galaxy conference, you can talk to Warren Kaplan
about this. I can provide the code.

regards
Kenny

--
Bioinformatics Architect
Garvan Institute

On Wed, Jul 25, 2012 at 5:54 AM, Langhorst, Brad  wrote:

> galaxy just wraps existing tools...  so it's probably not galaxy that is
> slow per se, but the fastqgroomer too.  Each tool has its own performance
> characteristics.
>
> I don't use fastqgroomer, so I don't know how it can be expected to
> perform.
>
> Are you sure you need it?
>
> If you know that your error is scaled in sanger units (iontorrent and
> casava  1.8 fastqs are), then you may not.
>
> If you look at your activity monitor you can see if CPU or disk is the
> limiting factor for the work you are doing.
>
>
> Brad
> On Jul 24, 2012, at 3:41 PM, Di Nguyen wrote:
>
> > Dear All,
> >
> > I successfully install Galaxy onto my new MBP with 16Gb or Ram but when
> I tried to use Galaxy, it is painfully slow! The first test I did was to
> create Admin and import data (RNA-seq fastq, about 6Gb in size) into
> database and then history and it worked fine. The second test was to run
> fasqgroomer on this fasq and it took forever (3 hours+).
> >
> > Anybody got in idea of why it is so slow? Would it be possible that
> Galaxy was set up to run a single process instead of 8-core processor? If
> that is the case, how to fix it?
> >
> > Please help!
> >
> > Di Nguyen
> > Postdoc, U of W, Seattle, WA
> > ___
> > Please keep all replies on the list by using "reply all"
> > in your mail client.  To manage your subscriptions to this
> > and other Galaxy lists, please use the interface at:
> >
> > http://lists.bx.psu.edu/
>
> --
> Brad Langhorst
> langho...@neb.com
> 978-380-7564
>
>
>
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/
>
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Local instance is running way too slow!!!

2012-07-24 Thread John Chilton
If you problem really just is that fastq groomer is slower, I
implemented several small optimizations for fastq groomer that I think
resulted in a big improvement in performance. It seems it is not
really used at my institution any more so I never pushed the changes
out to our production server or pushed to hard on the pull request.
But it did some testing as I was making the changes, and none of the
changes broke the functional tests so there is some chance they don't
break anything. You can pull my changes from here if you are
interested:

https://bitbucket.org/galaxy/galaxy-central/pull-request/20/fastq_groomer-optimizations

-John


John Chilton
Senior Software Developer
University of Minnesota Supercomputing Institute
Office: 612-625-0917
Cell: 612-226-9223

On Tue, Jul 24, 2012 at 6:32 PM, Kenny Sabir  wrote:
> Hello all,
>
> we were having the same issues with groomer taking up to 12 hours for large
> files. I had a look at the code and saw it was only using the single core. I
> changed the code to split the fastq input into multiple file parts and
> process it in parallel and reassemble the results. It also reassembles the
> aggregator data (which prints the final summary).
>
> For using 8 cores we saw a 7x  improvement. Naturally the data-output is
> identical. One limitation is that it does not support fastq that has
> multiple lines per single sequence. I have read that this practice is
> discouraged anyway as it was problematic (though it was in the original
> spec) and I haven't seen this occur in our data so far.
>
> I believe there is still room to improve as the Python readLine has
> suboptimal performance as it will do too much file I/O without enough
> buffering.
>
> I'm new to bioinformatics, though i come from a history of R&D comp eng. If
> anyone is at the Chicago Galaxy conference, you can talk to Warren Kaplan
> about this. I can provide the code.
>
> regards
> Kenny
>
> --
> Bioinformatics Architect
> Garvan Institute
>
>
> On Wed, Jul 25, 2012 at 5:54 AM, Langhorst, Brad  wrote:
>>
>> galaxy just wraps existing tools...  so it's probably not galaxy that is
>> slow per se, but the fastqgroomer too.  Each tool has its own performance
>> characteristics.
>>
>> I don't use fastqgroomer, so I don't know how it can be expected to
>> perform.
>>
>> Are you sure you need it?
>>
>> If you know that your error is scaled in sanger units (iontorrent and
>> casava  1.8 fastqs are), then you may not.
>>
>> If you look at your activity monitor you can see if CPU or disk is the
>> limiting factor for the work you are doing.
>>
>>
>> Brad
>> On Jul 24, 2012, at 3:41 PM, Di Nguyen wrote:
>>
>> > Dear All,
>> >
>> > I successfully install Galaxy onto my new MBP with 16Gb or Ram but when
>> > I tried to use Galaxy, it is painfully slow! The first test I did was to
>> > create Admin and import data (RNA-seq fastq, about 6Gb in size) into
>> > database and then history and it worked fine. The second test was to run
>> > fasqgroomer on this fasq and it took forever (3 hours+).
>> >
>> > Anybody got in idea of why it is so slow? Would it be possible that
>> > Galaxy was set up to run a single process instead of 8-core processor? If
>> > that is the case, how to fix it?
>> >
>> > Please help!
>> >
>> > Di Nguyen
>> > Postdoc, U of W, Seattle, WA
>> > ___
>> > Please keep all replies on the list by using "reply all"
>> > in your mail client.  To manage your subscriptions to this
>> > and other Galaxy lists, please use the interface at:
>> >
>> > http://lists.bx.psu.edu/
>>
>> --
>> Brad Langhorst
>> langho...@neb.com
>> 978-380-7564
>>
>>
>>
>>
>>
>> ___
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>
>>   http://lists.bx.psu.edu/
>
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>
>   http://lists.bx.psu.edu/
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/