On Thu, May 3, 2012 at 11:24 AM, Matt Oates (Home) <[email protected]> wrote:
> On 2 May 2012 15:23, Ole Tange <[email protected]> wrote:
>> On Tue, May 1, 2012 at 9:42 AM, Matt Oates (Home) <[email protected]>
>> wrote:
>>> On 30 April 2012 21:51, Ole Tange <[email protected]> wrote:
>>>> On Thu, Apr 26, 2012 at 12:20 AM, Matt Oates (Home) <[email protected]>
>>>> wrote:
>> :
>>>>> I then want to run something of the form:
>>>>>
>>>>> parallel -C '\t' -N 1 --pipe "myprogram /dev/stdin | cat <(echo {1})
>>>>> -" < file.tab | output-processing-program > results.tab
:
>> 21501699 MSAFFPVISSLNPAVPSVAAP
>> 21501700 MIGGILSCGITHTGITPLDVV
>> 21501701 MVIAIAKYFGWPLDQLDVVTA
>> 21501702 MKWHPDKNKNNLVEAQYRFQE
>>
>> If I understand you correctly, you want:
>>
>> echo 21501699
>> printf "21501699\tMSAFFPVISSLNPAVPSVAAP" | myprogram /dev/stdin
:
> > cat foo.tab | parallel -C '\t' 'echo {1}; printf "{1}\t{2}\n" |
> > myprogram /dev/stdin'
:
> Yes this is what I want to achieve in theory but the protein sequences
> are too long to be used as command line arguments, they overflow the
> UNIX c/l buffer.
Use awk to extract the short args on a line of its own while keeping
the full line:
cat file.tab | awk '{ print $1;print; }' | parallel --pipe -N2 'read
A; echo $A; myprogram'
This works for sequences up to at least 50 MB.
/Ole