Re: [galaxy-user] Getting reference index files in local galaxy install

2012-07-05 Thread Carlos Borroto
Also make sure you are using TABs to separate the fields in the .loc
file, this has bitten me several time in the past. My vim config
places 4 spaces instead of TAB, to deactivate this option you can do
:set noexpandtab.

Hope it helps,
Carlos

On Thu, Jul 5, 2012 at 4:39 AM, Avik Datta reach4a...@gmail.com wrote:
 Hi Aarti,

 Check the name of your ref file. If it is hg19.fa, then modify loc file as
 hg19   hg19   HG19_BWA   /root/Ref_INDEX/HG19BWAIndex/base/hg19.fa

 Avik Datta

 On Thu, Jul 5, 2012 at 1:42 PM, Aarti Desai aarti_de...@persistent.co.in
 wrote:

 Hi,

 We have a local install of galaxy and I’m trying to add the reference
 index files for bwa using the information provided in the following link

 http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup



 I have modified the bwa_index.loc file present in the ../tool-data
 directory by adding the path to where the index is on our server (Also
 attached). However, even after restarting the server, the reference genome
 does not show when choosing the “use a built-in index option”. I’m not sure
 whether the loc file is correctly created and whether any other
 configuration file needs to be changed/updated. Help in the matter greatly
 appreciated.



 Thanks,

 Aarti



 From: galaxy-user-boun...@lists.bx.psu.edu
 [mailto:galaxy-user-boun...@lists.bx.psu.edu] On Behalf Of Jennifer Jackson
 Sent: Thursday, July 05, 2012 1:23 AM
 To: Lindsey Kelly
 Cc: galaxy-user@lists.bx.psu.edu
 Subject: Re: [galaxy-user] Initial QC and grooming for Illumina HiSeq2000
 paired end RNAseq data



 Hello Lindsey,

 Yes, you have this correct. The general path would be to:

  - join forward and reverse data per run
  - run FASTQ Groomer  FastQC
(note: if your data is already in Sanger FASTQ format with Phred+33
 quality scaled
values, the datatype '.fastqsanger' can be directly assigned and the
 FASTQ Groomer
   step skipped. This is likely true if your data is a from the latest
 CASAVA pipeline, but
please double check.)
  - discard data as needed based on quality
  - split forward and reverse data that passes QC
  - concatenate all forward reads from a sample into one FASTQ file
  - concatenate all reverse reads from a sample into one FASTQ file.
  - for each sample, run TopHat using the two concatenated FASTQ files

 To manipulate paired end data, please see the tools - NGS: QC and
 manipulation: FASTQ splitter  FASTQ joiner.

 To combined data files head-to-tail from multiple runs into a single FASTQ
 file please see the tool - Text Manipulation: Concatenate datasets.

 I am not sure of the actual volume of data, but if these start to get
 large or TopHat errors with a memory problem, a local or cluster instance
 would be the recommendation: http://getgalaxy.org

 For reference:
 http://tophat.cbcb.umd.edu/manual.html
 http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html

 Hopefully this helps. Others are welcome to post comments/suggestions.

 Jen
 Galaxy team

 On 7/2/12 11:17 AM, Lindsey Kelly wrote:

 I am trying to do RNAseq analysis on Paired end data from the Hiseq2000.
 I have about 50 files for each sample (25 forward and 25 reverse - although
 each sample has a different number of files).

 I think that I need to:

 -convert them into FASTQ sanger format using the FASTSQ groomer tool

 -check the quality using the FASTQqc tool



 I don't know how to handle this many files.  Do I have to groom and run
 the QC for each file? Should I join the paired files and run both tools on
 each pair, or should I combine all of the data for each sample (which I
 don't know how to do) and then groom and run the QC for all of the reads for
 the sample.


 Thanks in advance for advice

 Lindsey




 ___

 The Galaxy User list should be used for the discussion of

 Galaxy analysis and other features on the public server

 at usegalaxy.org.  Please keep all replies on the list by

 using reply all in your mail client.  For discussion of

 local Galaxy instances and the Galaxy source code, please

 use the Galaxy Development list:



   http://lists.bx.psu.edu/listinfo/galaxy-dev



 To manage your subscriptions to this and other Galaxy lists,

 please use the interface at:



   http://lists.bx.psu.edu/



 --

 Jennifer Jackson

 http://galaxyproject.org



 DISCLAIMER == This e-mail may contain privileged and confidential
 information which is the property of Persistent Systems Ltd. It is intended
 only for the use of the individual or entity to which it is addressed. If
 you are not the intended recipient, you are not authorized to read, retain,
 copy, print, distribute or use this message. If you have received this
 communication in error, please notify the sender and delete all copies of
 this message. Persistent Systems Ltd. does not accept any liability for
 virus infected mails.


 ___
 The Galaxy User list should be 

Re: [galaxy-user] Getting reference index files in local galaxy install

2012-07-05 Thread Aarti Desai
Hello Carlos,
Thanks a lot for the tip. The tab trick has fixed the problem.

Regards,
Aarti

-Original Message-
From: Carlos Borroto [mailto:carlos.borr...@gmail.com] 
Sent: Thursday, July 05, 2012 9:12 PM
To: Avik Datta
Cc: Aarti Desai; galaxy-user@lists.bx.psu.edu
Subject: Re: [galaxy-user] Getting reference index files in local galaxy install

Also make sure you are using TABs to separate the fields in the .loc file, this 
has bitten me several time in the past. My vim config places 4 spaces instead 
of TAB, to deactivate this option you can do :set noexpandtab.

Hope it helps,
Carlos

On Thu, Jul 5, 2012 at 4:39 AM, Avik Datta reach4a...@gmail.com wrote:
 Hi Aarti,

 Check the name of your ref file. If it is hg19.fa, then modify loc file as
 hg19   hg19   HG19_BWA   /root/Ref_INDEX/HG19BWAIndex/base/hg19.fa

 Avik Datta

 On Thu, Jul 5, 2012 at 1:42 PM, Aarti Desai 
 aarti_de...@persistent.co.in
 wrote:

 Hi,

 We have a local install of galaxy and I’m trying to add the reference 
 index files for bwa using the information provided in the following 
 link

 http://wiki.g2.bx.psu.edu/Admin/NGS%20Local%20Setup



 I have modified the bwa_index.loc file present in the ../tool-data 
 directory by adding the path to where the index is on our server 
 (Also attached). However, even after restarting the server, the 
 reference genome does not show when choosing the “use a built-in 
 index option”. I’m not sure whether the loc file is correctly created 
 and whether any other configuration file needs to be changed/updated. 
 Help in the matter greatly appreciated.



 Thanks,

 Aarti



 From: galaxy-user-boun...@lists.bx.psu.edu
 [mailto:galaxy-user-boun...@lists.bx.psu.edu] On Behalf Of Jennifer 
 Jackson
 Sent: Thursday, July 05, 2012 1:23 AM
 To: Lindsey Kelly
 Cc: galaxy-user@lists.bx.psu.edu
 Subject: Re: [galaxy-user] Initial QC and grooming for Illumina 
 HiSeq2000 paired end RNAseq data



 Hello Lindsey,

 Yes, you have this correct. The general path would be to:

  - join forward and reverse data per run
  - run FASTQ Groomer  FastQC
(note: if your data is already in Sanger FASTQ format with 
 Phred+33 quality scaled
values, the datatype '.fastqsanger' can be directly assigned and 
 the FASTQ Groomer
   step skipped. This is likely true if your data is a from the latest 
 CASAVA pipeline, but
please double check.)
  - discard data as needed based on quality
  - split forward and reverse data that passes QC
  - concatenate all forward reads from a sample into one FASTQ file
  - concatenate all reverse reads from a sample into one FASTQ file.
  - for each sample, run TopHat using the two concatenated FASTQ files

 To manipulate paired end data, please see the tools - NGS: QC and
 manipulation: FASTQ splitter  FASTQ joiner.

 To combined data files head-to-tail from multiple runs into a single 
 FASTQ file please see the tool - Text Manipulation: Concatenate datasets.

 I am not sure of the actual volume of data, but if these start to get 
 large or TopHat errors with a memory problem, a local or cluster 
 instance would be the recommendation: http://getgalaxy.org

 For reference:
 http://tophat.cbcb.umd.edu/manual.html
 http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html

 Hopefully this helps. Others are welcome to post comments/suggestions.

 Jen
 Galaxy team

 On 7/2/12 11:17 AM, Lindsey Kelly wrote:

 I am trying to do RNAseq analysis on Paired end data from the Hiseq2000.
 I have about 50 files for each sample (25 forward and 25 reverse - 
 although each sample has a different number of files).

 I think that I need to:

 -convert them into FASTQ sanger format using the FASTSQ groomer tool

 -check the quality using the FASTQqc tool



 I don't know how to handle this many files.  Do I have to groom and 
 run the QC for each file? Should I join the paired files and run both 
 tools on each pair, or should I combine all of the data for each 
 sample (which I don't know how to do) and then groom and run the QC 
 for all of the reads for the sample.


 Thanks in advance for advice

 Lindsey




 ___

 The Galaxy User list should be used for the discussion of

 Galaxy analysis and other features on the public server

 at usegalaxy.org.  Please keep all replies on the list by

 using reply all in your mail client.  For discussion of

 local Galaxy instances and the Galaxy source code, please

 use the Galaxy Development list:



   http://lists.bx.psu.edu/listinfo/galaxy-dev



 To manage your subscriptions to this and other Galaxy lists,

 please use the interface at:



   http://lists.bx.psu.edu/



 --

 Jennifer Jackson

 http://galaxyproject.org



 DISCLAIMER == This e-mail may contain privileged and 
 confidential information which is the property of Persistent Systems 
 Ltd. It is intended only for the use of the individual or entity to 
 which it is addressed. If you are not the intended