Christoph Lehmann wrote:

entry from html:

<tr bgcolor=#9090f0><td align="right"><b>BM</b></td><td> 0.952</td><td> 0.136</td><td> 6.984</td><td>0.000000</td></tr>
<tr bgcolor=#9090f0><td align="right"><b>BH</b></td><td> 1.338</td><td> 0.136</td><td> 9.821</td><td>0.000000</td></tr>




 using
left.data<- scan(paste(path, left.file, sep = ""), what = 'character',
               sep=c("<td>", "</td>"))


yields

 > left.data
 [1] "  "                  "tr bgcolor=#9090f0>" "td align=right>"
 [4] "b>BM"                "/b>"                 "/td>"
 [7] "td> 0.952"           "/td>"                "td> 0.136"
[10] "/td>"                "td> 6.984"           "/td>"
[13] "td>0.000000"         "/td>"                "/tr>"
[16] "  "                  "tr bgcolor=#9090f0>" "td align=right>"
[19] "b>BH"                "/b>"                 "/td>"
[22] "td> 1.338"           "/td>"                "td> 0.136"
[25] "/td>"                "td> 9.821"           "/td>"
[28] "td>0.000000"         "/td>"                "/tr>"

why doesn't it detect the whole '<tr> as sep?


Uwe Ligges wrote:

Christoph Lehmann wrote:

Hi
I try to import html text and I need to split the fields at each <td> or </td> entry


How can I succeed? sep = '<td>' doens't yield the right result


If it fits pairwise together, use
  sep=c("<td>", "</td>")

Apologies, one should not send untested code.
"sep" must be a character rather than a string containg more than one character.


So you may want to try out my second suggestion.

Uwe Ligges





if not, you can read the whole lot with readLines and strsplit for both pattern after that, for example.

Uwe Ligges



thanks for hints

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to