bug#22001: Is it possible to tab separate concatenated files?

2015-11-27 Thread Erik Auerswald
Hi,

On Thu, Nov 26, 2015 at 08:28:13PM -0700, Eric Blake wrote:
> On 11/26/2015 04:52 PM, Linda Walsh wrote:
> 
> >> Because every plain
> >> text line in a file must be terminated with a newline.
> > 
> >That's only a recent POSIX definition.  It's not related to
> > real life.  When I looked for a text file definition on google, nothing
> > was mentioned about needing a newline on the last line -- except on
> > 1 site -- and that site was clearly not talking about 'text' files, but
> > Unix-text-record files w/each record terminated by a NL char.
> > 
> 
> Quit spreading FUD about POSIX.  That definition of text file is NOT a
> recent invention; even back in POSIX 2001 the definition read:
> 
> 3.392 Text File
> 
> A file that contains characters organized into one or more lines. The
> lines do not contain NUL characters and none can exceed {LINE_MAX} bytes
> in length, including the . Although IEEE Std 1003.1-2001 does
> not distinguish between text files and binary files (see the ISO C
> standard), many utilities only produce predictable or meaningful output
> when operating on text files. The standard utilities that have such
> restrictions always specify "text files" in their STDIN or INPUT FILES
> sections.
> http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html

At least the definition of a "line" is needed as well to understand the
above (from the same URL):

 3.205 Line

 A sequence of zero or more non- s plus a terminating .

[...]
> 
> No, it has ALWAYS been a problem.  Even 40 years ago, before POSIX was
> invented, the only PORTABLE way to use programs like sed was to use it
> on text files [...]

The sed of Solaris 10 ignores trailing text after the last line, that
is after the last newline. I am quite sure this behavior has been in
older Solaris and SunOS versions as well.

Best regards,
Erik
-- 
http://www.unix-ag.uni-kl.de/~auerswal/





bug#22001: Is it possible to tab separate concatenated files?

2015-11-26 Thread Linda Walsh





Bob Proulx wrote:


That example shows a completely different problem.  It shows that your
input plain text files have no terminating newline, making them
officially[/sic/] not plain text files but binary files.  



Because every plain
text line in a file must be terminated with a newline.


   That's only a recent POSIX definition.  It's not related to
real life.  When I looked for a text file definition on google, nothing
was mentioned about needing a newline on the last line -- except on
1 site -- and that site was clearly not talking about 'text' files, but
Unix-text-record files w/each record terminated by a NL char.

   On a mac, txt files have records separated by 'CR', and on DOS/Win,
txt files have txt records separated by CRLF.  Wikipedia quotes the
Unicode definition of txt files -- which doesn't require the POSIX
txt-record definition.  Also POSIX limits txt format to 'LINE_MAX' bytes --
notice it says 'bytes' and not characters.  Yet a unicode line of 256
characters can easily exceed 1024 bytes.  Yet never in the the history of
the english language have lines been restricted to some number of bytes or
characters.  But one could note that the posix definition ONLY refers
to files -- not streams of TEXT (whatever the character set). 


   Specificially, note, that with 'TEXT COLUMNMS', describe text
columns measured in column widths -- yet that conflicts with the
definition Text File, in that textfiles use 'bytes' for a maximum
line length, while text columns use 'characters' (which can be
1-4 bytes in unicode, UTF-8 or UTF-16 encoded). 


   Of specific note -- "text" composed of characters, MUST
support 'NUL' (as well as 'the audio bell' (control-g), the
backspace (control-h), vertical tabs(U+000B), form-feed(U+000C).

   No standard definition outside POSIX include any of those
characters -- because text characters are supposed to be readable
and visible.  But POSIX compatibility claims that Portable
Character Set
( 
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html#tag_06_01)

must include those characters.

   The 'text'-files-must-have-NL' group ignores the POSIX 2008 
definition of

a portable character set -- but globs onto the implied definition
of a text line as part of a 'text file'.

   But as already noted, POSIX has conflicting definitions about what text
is.  (Unicode measured in chars/columns or ascii (measured in bytes).  But
POSIX 2008 (same url as above) clearly states:
A null character, NUL, which has all bits set to zero, shall be in the 
set of [supported] characters.


   In all plain-text definitions, it is mentioned that 'text' is is a
set of displayable characters that can be broken into lines with the
text-line separator definition.  The last line of the file Needs No
separation character at the end of the line as it doesn't need to be
separated from anything.

   The GNU standard should not limit itself to an *arcane* (and not well
known outside of POSIX-fans) definition of text, as it makes text files
created before 2008, potentially incompatible.

   POSIX was supposed to be about portability... it certainly doesn't
follow the internet-design-mime of "Accept input liberally, and generate
output conservatively.


If they are
not then it isn't a text line.  Must be binary.
  

---
   Whereas I maintain that Newlines are required to break plain-text
into records -- but not at the end-of-file, since there is no record
following.



Why isn't there a newline at the end of the file?  Fix that and all of
your problems and many others go away.
  

---
   Didn't used to be a requirement -- it was added because of a broken
interpretation of the posix standard.  Please remember that a a posixified
definition of 'X' (for any X), may not be the same as a real-live 'X'.

   In this case,  we have a file containing *text* by the POSIX
def, which you claim doesn't meet the POSIX definition of "text file".
It's similar to Orwellian-speak -- redefining common terms to mean
something else, so people don't notice the requirement change, then later
telling others to clean-up their old input code/data that doesn't
meet the newly created definition.  Text files have been around alot
longer than 8 years.  Posix disqualifies most text files, for example,
those created on the most widely laptop/desktop/commercial computerer OS
in the world (Windows). 


   I think what may be true is that 'POSIX text files' describe a data
format that may not be how it is stored on disk.  I find it very
interesting in how 'NUL' is defined to be part of any POSIX text character
set definition where such apps claim to support or process 'text'.

   It's sad to see the GNU utils becoming less flexible and more
restricted over time -- much like the trend in computers to steer
the public away from general purpose processing (and computers that
can do such), to a tightly controlled, walled garden where consumers
are only allowed to do what the manufacturer tells them to do.

   

bug#22001: Is it possible to tab separate concatenated files?

2015-11-26 Thread Eric Blake
On 11/26/2015 04:52 PM, Linda Walsh wrote:

>> Because every plain
>> text line in a file must be terminated with a newline.
> 
>That's only a recent POSIX definition.  It's not related to
> real life.  When I looked for a text file definition on google, nothing
> was mentioned about needing a newline on the last line -- except on
> 1 site -- and that site was clearly not talking about 'text' files, but
> Unix-text-record files w/each record terminated by a NL char.
> 

Quit spreading FUD about POSIX.  That definition of text file is NOT a
recent invention; even back in POSIX 2001 the definition read:

3.392 Text File

A file that contains characters organized into one or more lines. The
lines do not contain NUL characters and none can exceed {LINE_MAX} bytes
in length, including the . Although IEEE Std 1003.1-2001 does
not distinguish between text files and binary files (see the ISO C
standard), many utilities only produce predictable or meaningful output
when operating on text files. The standard utilities that have such
restrictions always specify "text files" in their STDIN or INPUT FILES
sections.
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html

That was POSIX Issue 6; the more recent POSIX Issue 7 corrected the
definition to also allow a completely empty file to be considered as a
text file.  But the point is that POSIX has always required a text file
to end in a newline.

>On a mac, txt files have records separated by 'CR', and on DOS/Win,
> txt files have txt records separated by CRLF.

And those systems aren't POSIX.  So they aren't relevant to a discussion
about POSIX.


>> Why isn't there a newline at the end of the file?  Fix that and all of
>> your problems and many others go away.
>>   
> ---
>Didn't used to be a requirement -- it was added because of a broken
> interpretation of the posix standard.  Please remember that a a posixified
> definition of 'X' (for any X), may not be the same as a real-live 'X'.

No, it has ALWAYS been a problem.  Even 40 years ago, before POSIX was
invented, the only PORTABLE way to use programs like sed was to use it
on text files - namely, files where no line exceeded LINE_MAX bytes,
where no lines contained NUL bytes, and where ALL lines ended in
newline.  Because there were vendor implementations of sed (not GNU
coreutils, mind you, but other vendors) that really were hardcoded to
some rather small limits, and understandably so in a day when computers
did not have as much memory as they do today.  POSIX just standardized
existing practice on what formed a text file, when it came to existing
Unix systems at that time.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


bug#22001: Is it possible to tab separate concatenated files?

2015-11-23 Thread Macdonald, Kim - BCCDC
Thanks Assaf, 

Sorry for the confusion - I wanted to add a tab (or even a new line) after each 
file that was concatenated. Actually a new line may be better. 

For Example:
Concatenate the files like so:
>gi|452742846|ref|NZ_CAFD01001.1| Salmonella enterica subsp., whole genome 
>shotgun sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT
>gi|452742846|ref|NZ_CAFD01002.1| Salmonella enterica subsp., whole genome 
>shotgun 
>sequenceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC
>gi|452742846|ref|NZ_CAFD01003.1| Salmonella enterica subsp., whole genome 
>shotgun sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG

Right now - Just using cat, they look , like:
>gi|452742846|ref|NZ_CAFD01001.1| Salmonella enterica subsp., whole genome 
>shotgun 
>sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT>gi|452742846|ref|NZ_CAFD01002.1|
> Salmonella enterica subsp., whole genome shotgun 
>sequenceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC>gi|452742846|ref|NZ_CAFD01003.1|
> Salmonella enterica subsp., whole genome shotgun 
>sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG


Kim



-Original Message-
From: Assaf Gordon [mailto:assafgor...@gmail.com] 
Sent: November 23, 2015 2:03 PM
To: Macdonald, Kim - BCCDC; 22...@debbugs.gnu.org
Subject: Re: bug#22001: Is it possible to tab separate concatenated files?

tag 22001 notabug
close 22001
stop

Hello Kim,

On 11/23/2015 03:50 PM, Macdonald, Kim - BCCDC wrote:
> I'm just looking at the options for the cat command - I see there's a 
> way to ignore tabs when they exist - but is there a way to tab 
> separate the files you're concatenating with the cat command?

It is unclear (to me) what you're trying to achieve - could provide a bit more 
details (perhaps a short example) ?

If you have a file (one file) with spaces and you wish to convert them to tabs, 
consider the 'expand' command (then pipe to 'cat' if needed).

If you have multiple files and you wish to print them side-by-side, separated 
by tabs (as opposed to one-after-the-other, as with 'cat'), consider using 
'paste':

   $ cat 1.txt
   a
   b
   c
   d

   $ cat 2.txt
   1
   2
   3
   4

   $ cat 3.txt
   w
   x
   y
   z

   $ paste 1.txt 2.txt 3.txt
   a1   w
   b2   x
   c3   y
   d4   z

regards,
  - assaf






bug#22001: Is it possible to tab separate concatenated files?

2015-11-23 Thread Assaf Gordon

Correcting myself:

On 11/23/2015 05:02 PM, Assaf Gordon wrote:

If you have a file (one file) with spaces and you wish to convert
them to tabs, consider the 'expand' command (then pipe to 'cat' if
needed).



"unexpand" will convert spaces to tabs,
"expand" will convert tabs to spaces.






bug#22001: Is it possible to tab separate concatenated files?

2015-11-23 Thread Bob Proulx
Macdonald, Kim - BCCDC wrote:
> Sorry for the confusion - I wanted to add a tab (or even a new line)
> after each file that was concatenated. Actually a new line may be
> better.
>
> For Example:
> Concatenate the files like so:
> >gi|452742846|ref|NZ_CAFD01001.1| Salmonella enterica subsp., whole 
> >genome shotgun sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT
> >gi|452742846|ref|NZ_CAFD01002.1| Salmonella enterica subsp., whole 
> >genome shotgun 
> >sequenceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC
> >gi|452742846|ref|NZ_CAFD01003.1| Salmonella enterica subsp., whole 
> >genome shotgun sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG
> 
> Right now - Just using cat, they look , like:
> >gi|452742846|ref|NZ_CAFD01001.1| Salmonella enterica subsp., whole 
> >genome shotgun 
> >sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT>gi|452742846|ref|NZ_CAFD01002.1|
> > Salmonella enterica subsp., whole genome shotgun 
> >sequenceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC>gi|452742846|ref|NZ_CAFD01003.1|
> > Salmonella enterica subsp., whole genome shotgun 
> >sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG

That example shows a completely different problem.  It shows that your
input plain text files have no terminating newline, making them
officially not plain text files but binary files.  Because every plain
text line in a file must be terminated with a newline.  If they are
not then it isn't a text line.  Must be binary.

Why isn't there a newline at the end of the file?  Fix that and all of
your problems and many others go away.

Getting ahead of things 1...

If you just can't fix the lack of a newline at the end of those files
then you must handle it explicitly.

  for f in *.txt; do
cat "$f"
echo
  done

Getting ahead of things 2...

Sometimes people just want a separator between files.
Actually 'tail' will already do this rather well.

  tail -n+0 *.txt
  ==> 1.txt <==
  foo

  ==> 2.txt <==
  bar

Bob





bug#22001: Is it possible to tab separate concatenated files?

2015-11-23 Thread Assaf Gordon

tag 22001 notabug
close 22001
stop

Hello Kim,

On 11/23/2015 03:50 PM, Macdonald, Kim - BCCDC wrote:

I’m just looking at the options for the cat command – I see there’s a
way to ignore tabs when they exist – but is there a way to tab
separate the files you’re concatenating with the cat command?


It is unclear (to me) what you're trying to achieve - could provide a bit more 
details (perhaps a short example) ?

If you have a file (one file) with spaces and you wish to convert them to tabs, 
consider the 'expand' command (then pipe to 'cat' if needed).

If you have multiple files and you wish to print them side-by-side, separated 
by tabs (as opposed to one-after-the-other, as with 'cat'),
consider using 'paste':

  $ cat 1.txt
  a
  b
  c
  d

  $ cat 2.txt
  1
  2
  3
  4

  $ cat 3.txt
  w
  x
  y
  z

  $ paste 1.txt 2.txt 3.txt
  a 1   w
  b 2   x
  c 3   y
  d 4   z

regards,
 - assaf






bug#22001: Is it possible to tab separate concatenated files?

2015-11-23 Thread Assaf Gordon

Hello Kim,

On 11/23/2015 06:09 PM, Bob Proulx wrote:

Macdonald, Kim - BCCDC wrote:

For Example:
Concatenate the files like so:

gi|452742846|ref|NZ_CAFD01001.1| Salmonella enterica subsp., whole genome 
shotgun sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT
gi|452742846|ref|NZ_CAFD01002.1| Salmonella enterica subsp., whole genome 
shotgun 
sequenceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC
gi|452742846|ref|NZ_CAFD01003.1| Salmonella enterica subsp., whole genome 
shotgun sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG



That example shows a completely different problem.  It shows that your
input plain text files have no terminating newline, making them
officially not plain text files but binary files.


Based on the content of your files, I'm guessing that you are working with 
mangled FASTA file.
In that case, it is possible that fixing the original files might be more 
efficient than trying to amend them later on.

The original FASTA files likely looked like so:

>gi|452742846|ref|NZ_CAFD01001.1| Salmonella enterica subsp., whole 
genome shotgun sequence
TTTCAGCATATATATAGGCCATCATACATAGCCATATAT

And I'm also guessing that with some script you've removed the ">" prefix and 
joined the two lines into one.

First,
I suggest ensuring the original files have unix-style new-lines (LF) and not 
windows style (CR-LF) or Mac-style (CR).
The programs 'dos2unix' and 'mac2unix' would be able to fix it.
simply run the programs on each file, they will fix it inplace.
I would also recommend ensuring each file does end with a newline.


Second,
The FASTA id (the long text before your nucleotide sequence) contains spaces, 
and this will make downstream processing a bit of a pain.
I would recommend trimming the FASTA identifier and keeping only the first part 
(since it contains your IDs, you should have no problem
recovering the organism name later).

Example:

  $ cat 1.fa
  >gi|452742846|ref|NZ_CAFD01001.1|  Salmonella enterica subsp., whole 
genome shotgun sequence
  TTTCAGCATATATATAGGCCATCATACATAGCCATATAT

  $ sed '/^>/s/ .*$//' 1.fa > 2.fa

  $ cat 2.fa
  >gi|452742846|ref|NZ_CAFD01001.1|
  TTTCAGCATATATATAGGCCATCATACATAGCCATATAT

Or do it inplace for all your FA file (be sure to have a backup, though):

   for i in *.fa ; do sed -i '/^>/s/ .*$//' $i ; done


Third,
To combine and convert the files into a table (i.e. 1st column=ID, 2nd 
column=sequence),
then, assuming all your sequences are short and contained on one line, the 
following would work:

  $ cat 2.fa
  >gi|452742846|ref|NZ_CAFD01001.1|
  TTTCAGCATATATATAGGCCATCATACATAGCCATATAT

  $ cat 3.fa
  >gi|452742846|ref|NZ_CAFD01002.1|
  CATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC

  $ cat *.fa | paste - - | sed 's/^>//' > final.txt

  $ cat final.txt
  gi|452742846|ref|NZ_CAFD01001.1|  TTTCAGCATATATATAGGCCATCATACATAGCCATATAT
  gi|452742846|ref|NZ_CAFD01002.1|  
CATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC

the 'final.txt' will be an easy-to-work-with tabular file.


Fourth,
If you FASTA files contain multi-lined long sequences, like so:

   >gi|452742846|ref|NZ_CAFD01002.1|
   CATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTAC
   GTCGACTGACGTCTGTACACCACACGTTGTGACGAGCATCGACTAGCATCAG
   TTGAGCGACATCATCAGCGACGAGATCACGAGCACTAGCACTACGACTACGA

You might consider using a specialized tool to convert them to a table, such as:
 http://manpages.ubuntu.com/manpages/trusty/man1/fasta_formatter.1.html (*)
 or http://kirill-kryukov.com/study/tools/fasta-formatter/ .

Hope this helps,
 - assaf



(* shameless plug: I wrote fasta_formatter long ago)






bug#22001: Is it possible to tab separate concatenated files?

2015-11-23 Thread Macdonald, Kim - BCCDC
Thanks so much!!! I'll try these out now

Kim


-Original Message-
From: Assaf Gordon [mailto:assafgor...@gmail.com] 
Sent: November 23, 2015 3:48 PM
To: Bob Proulx; Macdonald, Kim - BCCDC
Cc: 22...@debbugs.gnu.org
Subject: Re: bug#22001: Is it possible to tab separate concatenated files?

Hello Kim,

On 11/23/2015 06:09 PM, Bob Proulx wrote:
> Macdonald, Kim - BCCDC wrote:
>> For Example:
>> Concatenate the files like so:
>>> gi|452742846|ref|NZ_CAFD01001.1| Salmonella enterica subsp., 
>>> gi|452742846|ref|whole genome shotgun 
>>> gi|452742846|ref|sequenceTTTCAGCATATATATAGGCCATCATACATAGCCATATAT
>>> gi|452742846|ref|NZ_CAFD01002.1| Salmonella enterica subsp., 
>>> gi|452742846|ref|whole genome shotgun 
>>> gi|452742846|ref|sequenceCATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGA
>>> gi|452742846|ref|CTGACGTACGTCGACTGACGTC
>>> gi|452742846|ref|NZ_CAFD01003.1| Salmonella enterica subsp., 
>>> gi|452742846|ref|whole genome shotgun 
>>> gi|452742846|ref|sequenceTATATAGATACATATATCGCGATATCAGACTGCATAGCGTCAG
>>
> That example shows a completely different problem.  It shows that your 
> input plain text files have no terminating newline, making them 
> officially not plain text files but binary files.

Based on the content of your files, I'm guessing that you are working with 
mangled FASTA file.
In that case, it is possible that fixing the original files might be more 
efficient than trying to amend them later on.

The original FASTA files likely looked like so:

 >gi|452742846|ref|NZ_CAFD01001.1| Salmonella enterica subsp., whole 
genome shotgun sequence
 TTTCAGCATATATATAGGCCATCATACATAGCCATATAT

And I'm also guessing that with some script you've removed the ">" prefix and 
joined the two lines into one.

First,
I suggest ensuring the original files have unix-style new-lines (LF) and not 
windows style (CR-LF) or Mac-style (CR).
The programs 'dos2unix' and 'mac2unix' would be able to fix it.
simply run the programs on each file, they will fix it inplace.
I would also recommend ensuring each file does end with a newline.


Second,
The FASTA id (the long text before your nucleotide sequence) contains spaces, 
and this will make downstream processing a bit of a pain.
I would recommend trimming the FASTA identifier and keeping only the first part 
(since it contains your IDs, you should have no problem recovering the organism 
name later).

Example:

   $ cat 1.fa
   >gi|452742846|ref|NZ_CAFD01001.1|  Salmonella enterica subsp., whole 
genome shotgun sequence
   TTTCAGCATATATATAGGCCATCATACATAGCCATATAT

   $ sed '/^>/s/ .*$//' 1.fa > 2.fa

   $ cat 2.fa
   >gi|452742846|ref|NZ_CAFD01001.1|
   TTTCAGCATATATATAGGCCATCATACATAGCCATATAT

Or do it inplace for all your FA file (be sure to have a backup, though):

for i in *.fa ; do sed -i '/^>/s/ .*$//' $i ; done


Third,
To combine and convert the files into a table (i.e. 1st column=ID, 2nd 
column=sequence), then, assuming all your sequences are short and contained on 
one line, the following would work:

   $ cat 2.fa
   >gi|452742846|ref|NZ_CAFD01001.1|
   TTTCAGCATATATATAGGCCATCATACATAGCCATATAT

   $ cat 3.fa
   >gi|452742846|ref|NZ_CAFD01002.1|
   CATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC

   $ cat *.fa | paste - - | sed 's/^>//' > final.txt

   $ cat final.txt
   gi|452742846|ref|NZ_CAFD01001.1| TTTCAGCATATATATAGGCCATCATACATAGCCATATAT
   gi|452742846|ref|NZ_CAFD01002.1| 
CATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTACGTCGACTGACGTC

the 'final.txt' will be an easy-to-work-with tabular file.


Fourth,
If you FASTA files contain multi-lined long sequences, like so:

>gi|452742846|ref|NZ_CAFD01002.1|
CATAGCCATATATACTAGCTGACTGACGTCGCAGCTGGTCAGACTGACGTAC
GTCGACTGACGTCTGTACACCACACGTTGTGACGAGCATCGACTAGCATCAG
TTGAGCGACATCATCAGCGACGAGATCACGAGCACTAGCACTACGACTACGA

You might consider using a specialized tool to convert them to a table, such as:
  http://manpages.ubuntu.com/manpages/trusty/man1/fasta_formatter.1.html (*)
  or http://kirill-kryukov.com/study/tools/fasta-formatter/ .

Hope this helps,
  - assaf



(* shameless plug: I wrote fasta_formatter long ago)






bug#22001: Is it possible to tab separate concatenated files?

2015-11-23 Thread Macdonald, Kim - BCCDC
Hi!

I'm just looking at the options for the cat command - I see there's a way to 
ignore tabs when they exist - but is there a way to tab separate the files 
you're concatenating with the cat command?

Thanks,
Kim