Quick-n-dirty way: After you get your whole p string: <p class="contentBody">FOO <a name="f"></a> </p> Remove any tags delimited by '<' and '>' with a regex. In your short example you _don't_ show that there might be something between the <a> and </a> tags so I assume there won't be anything or if there would be something then you also want it included in the final text. As in '<p class="contentBody">FOO <a name="f">URLNAME</a> </p>' ==> 'FOO URLNAME'
For the regex start with something simple like <.*?> and see if it works then improve it. Use kiki or kodos - python visual regex helpers. Hope this helps, Nick V. GinTon wrote: > I'm trying to get the 'FOO' string but the problem is that inner 'P' > tag there is another tag, 'a'. So: > > > from BeautifulSoup import BeautifulSoup > > s = '<td width="88%" valign="TOP"> <p class="contentBody">FOO <a > > name="f"></a> </p></td>' > > tree = BeautifulSoup(s) > > > print tree.first('p') > <p class="contentBody">FOO <a name="f"></a> </p> > > So if I run 'print tree.first('p').string' to get the 'FOO' string it > shows Null value because it's the 'a' tag: > > > print tree.first('p').string > Null > > Any solution? -- http://mail.python.org/mailman/listinfo/python-list