Hello everyone,
I am new to this list and to the parallel command. I hope answer to next
question is not too obvious, but enough to get some advice :)
I have to process a big file, and have been reading about parallel command
to try to use more than 1 core processor when using sed, sort and so on. So
I first wanted to change first line of every four (because of naming
conventions of this kind of file - FastQ format).
For example, this would be a group of four, and I want to modify the first
line:
cat sbcc073_pcm_ill_all.musket_default.fastq | head -4
@HWUSI-EAS1752R:29:FC64CL3AAXX:8:65:16525:4289*_1:N:0:ACTTGA*
GCGAGAGAAT
+
GHHHHHHHHHH
With the next command I have the work done:
cat sbcc073_pcm_ill_all.musket_default.fastq | head -4 | sed
's#^\(@.*\)_\([12]\).*#\1/\2#'
@HWUSI-EAS1752R:29:FC64CL3AAXX:8:65:16525:4289*/1*
GCGAGAGAAT
+
GHHHHHHHHHH
However, when using parallel it seems that is not recognizing the group
capture brackets:
cat sbcc073_pcm_ill_all.musket_default.fastq | head -4 | parallel
--pipe sed 's#^\(@.*\)_\([12]\).*#\1/\2#'
@HWUSI-EAS1752R:29:FC64CL3AAXX:8:65:16525:4289*_1:N:0:ACTTGA*
GCGAGAGAAT
+
GHHHHHHHHHH
When removing backslashes or using sed -r the command is telling me:
/bin/bash: -c: line 3: syntax error near unexpected token `('
/bin/bash: -c: line 3: ` (cat /tmp/60xrxvCIRX.chr; rm
/tmp/60xrxvCIRX.chr; cat - ) | (sed s#^(@.*)_([12]).*#\1/\2# );'
Could anyone put some light on this?
thank you very much