Re: [galaxy-user] mm7 chromosome name

2013-10-31 Thread Jennifer Jackson

Hi Jill,
I am pretty certain that I found out why mm7 is not extracting - the 
database is not fully set up to use with this tool (although the data is 
present). I'll add this to the list of items to adjust this upcoming 
month (plus find/fix any others like it - all would be older DBs).


And glad the tab file is now working. Whenever you really do have just a 
tabular file - using a plain text editor is best along with the option 
on the 'Get Data -> Upload File' form of ' Convert spaces to tabs:'. 
Excel is known to most bioinformatics folks as a tool that it is wise to 
carefully screen any "text" output from - primarily because of inserted 
'hidden' or whitespace characters (soft returns and such). Not Excel's 
fault, nor any other editor's - but what you did (cycle through a plain 
text editor) is one way gain clear data.


Now, that said -> never use that upload option on any file that would 
contain internal spaces - such as GFF/GTF, or SAM, but for plain text 
tabular, in particular strict BED, this can help clean up stray spaces 
or tabs introduced. Other tools in Text manipulation can also help for 
data already loaded (try cutting out the columns you want to use, maybe 
after converting all whitespace to tabs first).


Thanks and glad you have a working solution. I missed the details of the 
mm7 extract issue originally - sorry if that was confusing!


Jen
Galaxy team

On 10/31/13 6:46 AM, Kreiling, Jill wrote:
Thank you Jen.  You mentioned it may be a formatting problem and you 
were able to successfully convert the coordinates to mm8.  I tried 
that several times yesterday and they kept coming up in the unmapped 
file saying the region was deleted from the newer build.  I opened the 
tab deliminated text file I created in Excel in Notepad++ and just 
resaved it without changing anything.  When I uploaded the new file to 
galaxy and and lifted over to mm8 it worked fine.  It still wouldn't 
pull out genomic sequences from mm7, but it will from the new file 
converted to mm8.  Thank you for your help - it is very much appreciated!


Jill


On Wed, Oct 30, 2013 at 11:45 PM, Jennifer Jackson > wrote:


Hello Jill,

This is strange. I just pasted the region you noted below into
Galaxy (in the 'Get Data -> Upload File' tool), assigned it to
mm7, and lifted to mm8 without any issues. I also checked the data
behind the tool - all appears to be fine.

result in mm8 coordinates

chr14552557 4556399 region_00   +


Are you certain there is not a format problem with the data? This
seems to be the only explanation for the problem. But after one
more check, you can submit a bug report and note that this is the
problem. Be sure to leave the input and all error outputs
undeleted when you report the problem or we won't be able to offer
the best feedback.

It is true that UCSC only produced a liftOver file that went from
mm7->mm6/8, then you can go from mm8->mm7/9/10. This is just the
data available. When lifting from data this old - be aware that a
genome can change quite a bit in some regions in new 3 revisions.
Still, lifting this way is certainly something you can try. If a
much older genome is not in Galaxy, just do the lift at UCSC (the
liftOver tool is under the top blue banner "Tools").

Hopefully the problem can be sorted out but if not we can take a look,

Jen
Galaxy team

On 10/30/13 3:04 PM, Kreiling, Jill wrote:

Hello,  I have a set of coordinates for mm7 that I have been
using try to extract the genomic sequences.  However it doesn't
recognize the chromosome name column.  The are currently listed
as chr1, chr2, chrX.  This is the error I get each time I try
to extract sequences:
Chromosome by name 'chr1' was not found for build 'mm7'. Skipped
1181 invalid lines, 1st is #1, "chr1 4558068 4561910 region_0 0 +"

However if I change the build to mm10 it works fine -  but the
coordinates are not the same between builds.  Also, mm7 can't be
lifted over to mm9 or mm10.

Does anyone know the proper format for chromosome name in mm7:

Thanks,
Jill



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
atusegalaxy.org  .  Please keep all replies on the 
list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


-- 
Jennifer Hillman-Jackson


Re: [galaxy-user] mm7 chromosome name

2013-10-31 Thread Kreiling, Jill
Thank you Jen.  You mentioned it may be a formatting problem and you were
able to successfully convert the coordinates to mm8.  I tried that several
times yesterday and they kept coming up in the unmapped file saying the
region was deleted from the newer build.  I opened the tab deliminated text
file I created in Excel in Notepad++ and just resaved it without changing
anything.  When I uploaded the new file to galaxy and and lifted over to
mm8 it worked fine.  It still wouldn't pull out genomic sequences from mm7,
but it will from the new file converted to mm8.  Thank you for your help -
it is very much appreciated!

Jill


On Wed, Oct 30, 2013 at 11:45 PM, Jennifer Jackson  wrote:

>  Hello Jill,
>
> This is strange. I just pasted the region you noted below into Galaxy (in
> the 'Get Data -> Upload File' tool), assigned it to mm7, and lifted to mm8
> without any issues. I also checked the data behind the tool - all appears
> to be fine.
>
> result in mm8 coordinates
>
> chr145525574556399region_00+
>
>
> Are you certain there is not a format problem with the data? This seems to
> be the only explanation for the problem. But after one more check, you can
> submit a bug report and note that this is the problem. Be sure to leave the
> input and all error outputs undeleted when you report the problem or we
> won't be able to offer the best feedback.
>
> It is true that UCSC only produced a liftOver file that went from
> mm7->mm6/8, then you can go from mm8->mm7/9/10. This is just the data
> available. When lifting from data this old - be aware that a genome can
> change quite a bit in some regions in new 3 revisions. Still, lifting this
> way is certainly something you can try. If a much older genome is not in
> Galaxy, just do the lift at UCSC (the liftOver tool is under the top blue
> banner "Tools").
>
> Hopefully the problem can be sorted out but if not we can take a look,
>
> Jen
> Galaxy team
>
> On 10/30/13 3:04 PM, Kreiling, Jill wrote:
>
>  Hello,  I have a set of coordinates for mm7 that I have been using try
> to extract the genomic sequences.  However it doesn't recognize the
> chromosome name column.  The are currently listed as chr1, chr2, chrX.
>  This is the error I get each time I try to extract sequences:
>
> Chromosome by name 'chr1' was not found for build 'mm7'. Skipped 1181
> invalid lines, 1st is #1, "chr1 4558068 4561910 region_0 0 +"
>
>  However if I change the build to mm10 it works fine -  but the
> coordinates are not the same between builds.  Also, mm7 can't be lifted
> over to mm9 or mm10.
>
>  Does anyone know the proper format for chromosome name in mm7:
>
>  Thanks,
> Jill
>
>
>
> ___
> The Galaxy User list should be used for the discussion of
> Galaxy analysis and other features on the public server
> at usegalaxy.org.  Please keep all replies on the list by
> using "reply all" in your mail client.  For discussion of
> local Galaxy instances and the Galaxy source code, please
> use the Galaxy Development list:
>
>   http://lists.bx.psu.edu/listinfo/galaxy-dev
>
> To manage your subscriptions to this and other Galaxy lists,
> please use the interface at:
>
>   http://lists.bx.psu.edu/
>
> To search Galaxy mailing lists use the unified search at:
>
>   http://galaxyproject.org/search/mailinglists/
>
>
> --
> Jennifer Hillman-Jacksonhttp://galaxyproject.org
>
>


-- 
Jill Kreiling, Ph.D.
Assistant Professor, Research
Department of Molecular Biology, Cell Biology and Biochemistry
Brown University
Providence, RI 02903
___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-user] mm7 chromosome name

2013-10-30 Thread Jennifer Jackson

Hello Jill,

This is strange. I just pasted the region you noted below into Galaxy 
(in the 'Get Data -> Upload File' tool), assigned it to mm7, and lifted 
to mm8 without any issues. I also checked the data behind the tool - all 
appears to be fine.


result in mm8 coordinates

chr14552557 4556399 region_00   +


Are you certain there is not a format problem with the data? This seems 
to be the only explanation for the problem. But after one more check, 
you can submit a bug report and note that this is the problem. Be sure 
to leave the input and all error outputs undeleted when you report the 
problem or we won't be able to offer the best feedback.


It is true that UCSC only produced a liftOver file that went from 
mm7->mm6/8, then you can go from mm8->mm7/9/10. This is just the data 
available. When lifting from data this old - be aware that a genome can 
change quite a bit in some regions in new 3 revisions. Still, lifting 
this way is certainly something you can try. If a much older genome is 
not in Galaxy, just do the lift at UCSC (the liftOver tool is under the 
top blue banner "Tools").


Hopefully the problem can be sorted out but if not we can take a look,

Jen
Galaxy team

On 10/30/13 3:04 PM, Kreiling, Jill wrote:
Hello,  I have a set of coordinates for mm7 that I have been using try 
to extract the genomic sequences.  However it doesn't recognize the 
chromosome name column.  The are currently listed as chr1, chr2, 
chrX.  This is the error I get each time I try to extract sequences:
Chromosome by name 'chr1' was not found for build 'mm7'. Skipped 1181 
invalid lines, 1st is #1, "chr1 4558068 4561910 region_0 0 +"


However if I change the build to mm10 it works fine -  but the 
coordinates are not the same between builds.  Also, mm7 can't be 
lifted over to mm9 or mm10.


Does anyone know the proper format for chromosome name in mm7:

Thanks,
Jill



___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

   http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

   http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

   http://galaxyproject.org/search/mailinglists/


--
Jennifer Hillman-Jackson
http://galaxyproject.org

___
The Galaxy User list should be used for the discussion of
Galaxy analysis and other features on the public server
at usegalaxy.org.  Please keep all replies on the list by
using "reply all" in your mail client.  For discussion of
local Galaxy instances and the Galaxy source code, please
use the Galaxy Development list:

  http://lists.bx.psu.edu/listinfo/galaxy-dev

To manage your subscriptions to this and other Galaxy lists,
please use the interface at:

  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:

  http://galaxyproject.org/search/mailinglists/