Re: [R] parsing text files

2012-03-09 Thread jim holtman
Here is one way of doing it; it reads the file and create a 'long' version.

##
input - file(/temp/ClinicalReports.txt, 'r')
outFile - '/temp/output.txt'  #  tempfile()
output - file(outFile, 'w')
writeLines(ID, Date, variable, value, output)
ID - NULL
dataSw - NULL
repeat{
line - readLines(input, n = 1)
if (length(line) == 0) break
if (!is.null(dataSw)){
if (line == ''){  # end of data
ID - NULL
dataSw - NULL
next
}
# now write CSV output file
cat(ID
  , ','
  , Date
  , ','
  , substring(line, 1, 31)
  , ','
  , substring(line, 32, 43)
  , '\n'
  , sep = ''
  , file = output
  )
next
}
if (grepl(Acc.ne, line)){
ID - (substring(line, 29,35))
Date - (substring(line, 52,61))
next
}
if (!is.null(ID)){  # looking for Esame
if (grepl(Esame, line)){
# skip two lines
readLines(input, n = 2)
dataSw - 1
next
}
}

}

# now read in the data in a long format
close(output)
result - read.csv(outFile, as.is = TRUE)


the results from your test data is:

 str(result)
'data.frame':   43 obs. of  4 variables:
 $ ID  : int  185 185 185 185 185 185 185 185 185 185 ...
 $ Date: chr  05/12/2011 05/12/2011 05/12/2011 05/12/2011 ...
 $ variable: chr  AZOTEMIACREATININEMIA
  SODIEMIAPOTASSIEMIA
   ...
 $ value   : num  33.6 0.99 136 4.22 94.2 8.68 1.87 1.79 189 118 ...
 head(result)
   ID   Datevariable  value
1 185 05/12/2011 AZOTEMIA 33.60
2 185 05/12/2011 CREATININEMIA 0.99
3 185 05/12/2011 SODIEMIA136.00
4 185 05/12/2011 POTASSIEMIA   4.22
5 185 05/12/2011 CLOREMIA 94.20
6 185 05/12/2011 CALCEMIA  8.68



On Thu, Mar 8, 2012 at 8:24 AM, ginger bi...@igm.cnr.it wrote:
 Ooops,
 I forgot to specify that for each raw, containing records of the clinical
 reports , the values  of the 22 parameter measurement have to be reported.
 For example, first raw, first 5 columns:
 ID                  DATE                  GLICEMIA   AZOTEMIA
 CREATININEMIA    SODIEMIA  ...        ...      ...
 185      05/12/2011        115              33.6                  0.99
 136             ...        ...      ...

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/parsing-text-files-tp4456355p4456389.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] parsing text files

2012-03-08 Thread ginger
Hello, I have a .txt file with many clinical exams reports (two examples of
which are attached to the message).
I have to create a data frame with as many rows as the number of clinical
exams reports in the text file and 24 columns:
the first (to be labelled as ID) with a number (representing an
identification code) which is the number in the 13th line of the clinical
report following the string Acc.ne n. 
the second (to be labelled as DATE) with a date (indicating date of blood
sampling), which is the date, again in the 13th line, following the
identification code
the following 22 columns (to be labelled with the name of parameters at
lines from 20 to 41, as GLICEMIA ... COLESTEROLO LDL)

I did search in the mailing list and tried to begin something like:

#read the text file
reports - readLines(ClinicalReports.txt)
#processing the file starting at each Acc.ne n. 
serologic - lapply(which(grepl(^Acc.ne n., reports)), function(.line
)

but I'm a biostatistician whith almost no expertise in programming and I
really need your hepl! Please!!!
http://r.789695.n4.nabble.com/file/n4456355/ClinicalReports.txt
ClinicalReports.txt 

--
View this message in context: 
http://r.789695.n4.nabble.com/parsing-text-files-tp4456355p4456355.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing text files

2012-03-08 Thread ginger
Ooops,
I forgot to specify that for each raw, containing records of the clinical
reports , the values  of the 22 parameter measurement have to be reported.
For example, first raw, first 5 columns:
ID  DATE  GLICEMIA   AZOTEMIA 
CREATININEMIASODIEMIA  ......  ...
185  05/12/2011115  33.6  0.99  
  
136 ......  ...

--
View this message in context: 
http://r.789695.n4.nabble.com/parsing-text-files-tp4456355p4456389.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.