Christoph Lehmann wrote:
entry from html:
<tr bgcolor=#9090f0><td align="right"><b>BM</b></td><td> 0.952</td><td> 0.136</td><td> 6.984</td><td>0.000000</td></tr>
<tr bgcolor=#9090f0><td align="right"><b>BH</b></td><td> 1.338</td><td> 0.136</td><td> 9.821</td><td>0.000000</td></tr>
using left.data<- scan(paste(path, left.file, sep = ""), what = 'character', sep=c("<td>", "</td>"))
yields
> left.data [1] " " "tr bgcolor=#9090f0>" "td align=right>" [4] "b>BM" "/b>" "/td>" [7] "td> 0.952" "/td>" "td> 0.136" [10] "/td>" "td> 6.984" "/td>" [13] "td>0.000000" "/td>" "/tr>" [16] " " "tr bgcolor=#9090f0>" "td align=right>" [19] "b>BH" "/b>" "/td>" [22] "td> 1.338" "/td>" "td> 0.136" [25] "/td>" "td> 9.821" "/td>" [28] "td>0.000000" "/td>" "/tr>"
why doesn't it detect the whole '<tr> as sep?
Uwe Ligges wrote:
Christoph Lehmann wrote:
Hi
I try to import html text and I need to split the fields at each <td> or </td> entry
How can I succeed? sep = '<td>' doens't yield the right result
If it fits pairwise together, use sep=c("<td>", "</td>")
Apologies, one should not send untested code.
"sep" must be a character rather than a string containg more than one character.
So you may want to try out my second suggestion.
Uwe Ligges
if not, you can read the whole lot with readLines and strsplit for both pattern after that, for example.
Uwe Ligges
thanks for hints
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
