It isn’t the fault of BaseX. The parser (tagsoup, if you choose HTML parsing) inserts the default values for attributes. You should be able to suppress it by adding nodefaults=true to HTMLOPT.


On 2012-05-09 17:07, jida...@jidanni.org wrote:
"AH" == Alexander Holupirek<alexander.holupi...@uni-konstanz.de>  writes:
AH>  Please post a small snippet or example, so that we are able to test the 
problem.

Taking the example from the Debian basex man page, we add an innocent
<br>  and<a>:

cat>  bad.html<<\EOF
           <html>
             <ul>
               <li>A<a href="o">z</a>
               <li>B<br>
             </ul>
           </html>
EOF
basex -c 'set parser html; set htmlopt method=html,nons=true; create db htmldb 
bad.html'
basex -q "doc('htmldb')"

<html>
   <body>
     <ul>
       <li>A<a shape="rect" href="o">z</a>  HORRIBLE
       </li>
       <li>B<br clear="none"/>  TERRIBLE
       </li>
     </ul>
   </body>
</html>

How can I stop basex from insisting on adding such atrocious junk?
_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsi...@le-tex.de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vöckler
_______________________________________________
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Reply via email to