Hi,

Am 09.04.2014 um 07:44 schrieb VG:

> Hello Everyone,
> 
> Till now I have been writing simple scripts to do my work. Now I want to 
> automate my task.
> I have no idea about submitting array jobs
> So here is the explanation of what I want to do and then you can help me how 
> to design the script.
> 
> In a directory I have these 4 fastq files
> 
> A-122-3.XX.lane_1_P1_I24.sacCer3.sequence.fastq
> 
> A-122-3.XX.lane_1_P2_I24.sacCer3.sequence.fastq
> 
> A-2-3.XX.lane_1_P1_I47.sacCer3.sequence.fastq
> 
> A-2-3.XX.lane_1_P2_I47.sacCer3.sequence.fastq
> 
> I made this script to align my fastq files with yeast genome and it worked 
> perfectly fine

As far as I can see here are two questions in the game. First how to handle 
this task as an array job, second to apply the postprocessing. This rises the 
question whether it's suitable at all. Often an array job is submitted in case 
you have the same application and input, but need a varying index to specify a 
point in time or frame of a movie you want to render.

Nevertheless you can assemble a list of files first and use the generated index 
then to pick a specific line of this list of files which should be computed for 
a particular run.


Step 1:

$ ls *.fastq > workset


Step 2:

A jobscript.sh which you will submit:

#!/bin/bash
f=$(sed -n ${SGE_TASK_ID}p)
...
your commands from below without the loop
...


Step 3:

Replace the submission command from below by a plain call to your application.
Add the necessary commands for postprocessing as you outlined below.


Step 4:

$ qsub -t 1-$(sed -n '$=' workset)  -cwd -V -j y -pe smp 12 jobscript.sh


Step N:

As you observe above, I replaced "-l num_proc=12" with a request for a PE 
(which you will have to set up). The "num_proc" is a feature of a machine like 
it could be the architecture or OS. The advantage besides being more SGE-like, 
is that it's quite easy to change the number of cores. Inside the jobscript you 
get the variable $NSLOTS set, and using "-t $NSLOTS" as argument to your 
application would instantly uses the granted number, i.e. running with only 6 
cores would need a change in the submission command, but not in the script any 
longer.

HTH -- Reuti



> #!/bin/bash
> 
> for f in *_P1*
> 
> do
> 
> LB="${f%%.*}";
> 
> SM="${f%%.*}";
> 
> PU="${f%%.lane_*}";
> 
> PU="${PU#*.}";
> 
> NUM="${f%%_P*}";
> 
> NUM="${NUM##*_}";
> 
> 
> ID="$LB-$PU-$NUM";
> 
> 
> PART1="${f%%_P*}";
> 
> PART2="${f##*_P1_}";
> 
> 
> qsub -l mf=30G -l num_proc=12 -cwd -V -j y -b y -N $LB 
> "/apps1/bwa/bwa-0.7.5a/bwa mem -t 12 -R 
> '@RG\tID:$ID\tPL:illumina\tPU:$PU\tLB:$LB\tSM:$SM' -v 1 -a -M 
> /cork/vgupta12/S.cerevisiae/indexes/bwa/sacCer3.fa "${PART1}_P1_${PART2}" 
> "${PART1}_P2_${PART2}" > ${LB}.sam"
> 
> 
> 
> done
> 
> 
> 
> Now the result I have is 2 .sam files namely A-122-3.sam and A-2-3.sam
> 
> 
> 
> This is what I want to do in the same script
> 
> Convert all the sam files into bam files. Here is the command for one sam file
> 
> samtools view -bS test.sam | samtools sort - test
> 
> Then make index files for all bam files. Command for one index bam file is
> 
> samtools index test.bam ##output is test.bam.bai
> 
> 
> 
> I have further downstream analysis , but as of now I would like to get my 
> above script plus the commands I just mentioned above(highlighted part) into 
> one script only.
> 
> 
> 
> Any help would be appreciated.
> 
> 
> 
> Thanks for all the help so far
> 
> 
> 
> Regards
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to