reading HTML input-files
Dear Wget-team, the NEWS file coming with Wget 1.7 says: ** The HTML parser has been rewritten. The new one works more reliably, allows finer-grained control over which tags and attributes are detected, and has better support for some features like correctly skipping comments and declarations, decoding entities, etc. It is also more general. While calling Wget 1.5.2 by wget -F -O 69_4_522_Ref.res -i 69_4_522_Ref.mrq on the attached file 69_4_522_Ref.mrq has worked very well I am left with the error message No URLs found in 69_4_522_Ref.mrq whenever I try the same command using Wget 1.7. Even embedding the content of 69_4_522_Ref.mrq into a HTML4 frame (i.e. DOCTYPE-header, html-, head- and body-tags) did not help. Can you tell me what I am doing wrong? Thanks in advance, Mathias -- Dr. Mathias Kratzer | I_nstitute for E-Mail: [EMAIL PROTECTED] | E_xperimental Phone : +49-201-183-7680 | M_athematics Ellernstr. 29 Visit : IEM, Room 205 | D-45326 ESSEN
reading HTML input-files (WITH ATTACHMNT!)
Dear Wget-team, sorry for forgetting about the attachment to my first mail: -- original message --- original message -- Date: Thu, 7 Mar 2002 17:41:53 +0100 (MEZ) From: Mathias Kratzer [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: reading HTML input-files Dear Wget-team, the NEWS file coming with Wget 1.7 says: ** The HTML parser has been rewritten. The new one works more reliably, allows finer-grained control over which tags and attributes are detected, and has better support for some features like correctly skipping comments and declarations, decoding entities, etc. It is also more general. While calling Wget 1.5.2 by wget -F -O 69_4_522_Ref.res -i 69_4_522_Ref.mrq on the attached file 69_4_522_Ref.mrq has worked very well I am left with the error message No URLs found in 69_4_522_Ref.mrq whenever I try the same command using Wget 1.7. Even embedding the content of 69_4_522_Ref.mrq into a HTML4 frame (i.e. DOCTYPE-header, html-, head- and body-tags) did not help. Can you tell me what I am doing wrong? Thanks in advance, Mathias -- Dr. Mathias Kratzer | I_nstitute for E-Mail: [EMAIL PROTECTED] | E_xperimental Phone : +49-201-183-7680 | M_athematics Ellernstr. 29 Visit : IEM, Room 205 | D-45326 ESSEN a href=http://www.ams.org/batchmrlookup?api=xrefqdata=|London Math. Soc. Lecture Note Ser.|ELIASHBERG, Y.|151||45|1991Filling by holomorphic discs and its applications/a a href=http://www.ams.org/batchmrlookup?api=xrefqdata=|Ann. Inst. Fourier (Grenoble)|ELIASHBERG, Y.|42||165|1992Contact 3-manifolds twenty years since J. Martinet's work/a a href=http://www.ams.org/batchmrlookup?api=xrefqdata=||GIROUX, E.|B|||/a a href=http://www.ams.org/batchmrlookup?api=xrefqdata=|Invent. Math.|GROMOV, M.|82||307|1985Pseudo -holomorphic curves in symplectic manifolds/a a href=http://www.ams.org/batchmrlookup?api=xrefqdata=||GROMOV, M.|B|||/a a href=http://www.ams.org/batchmrlookup?api=xrefqdata=|Ann. of Math. (2)|HIRSCH, M.|73||566|1961On imbedding differential manifolds into Euclidean space/a a href=http://www.ams.org/batchmrlookup?api=xrefqdata=||LUTTINGER, K.|B|||/a a href=http://www.ams.org/batchmrlookup?api=xrefqdata=|Astérisque|LAUDENBACH, F.|12|||1974Topologie de la dimension trois: homotopie et isotopie/a a href=http://www.ams.org/batchmrlookup?api=xrefqdata=|J. Amer. Math. Soc.|McDuFF, D.|3||679|1990The structure of rational and ruled symplectic 4-manifolds/a a href=http://www.ams.org/batchmrlookup?api=xrefqdata=||POLTEROVICH, L.|B|||/a a href=http://www.ams.org/batchmrlookup?api=xrefqdata=|Math. Notes|POLTEROVICH, L.|45||152|1989Strongly optical Lagrange manifolds/a a href=http://www.ams.org/batchmrlookup?api=xrefqdata=||SIKORAV, J.|B|||/a
¿Ü·Î¿òÀ» ´Þ·¡µå¸±²²¿ä (¼ºÀÎ-±¤°í)
¿Ü·Î¿òÀ» ´Þ·¡ÁÙ ¾ÖÀÎÀÌ ÇÊ¿äÇÒ¶§ 24½Ã°£ ±â´Ù¸®°í ÀÖ¾î¿ä 1. Àüȱ⸦ µé°í 060-707-7749 ¸¦ ´©¸£¼¼¿ä. (Àü±¹½Ã³»¿ä±Ý) °¡ÀÔºñ ¾ø½¿ 2. ¿Ü·Î¿òÀ» ´Þ·¡ÁÙ ´ëÈ»ó´ë°¡ ÇÊ¿äÇϽźÐ. 3. Áö±Ý Àüȸ¦ µé°í 060-707-7749¸¦ ´·¯ÁÖ¼¼¿ä 5. °¡ÀÔÈÄ °ø°³°Ô½ÃÆÇ¿¡ ÀÚ½ÅÀÇ °³ÀÎÇÁ·ÎÇÊÀ» °·ÂÇÏ°Ô ¾îÇÊÇغ¸¼¼¿ä ±×´ÙÀ½¿¡ ¹«½¼ÀÏÀÌ ÀϾÁö Ã¥ÀÓ¸øÁ® ** ÀÌÄÉÇÏ¸é ¿Àºü¾ß Ç°À¸·Î ¿³ª¼½¾¯ ÂßÂß»§»§ ¾ð³ÄµéÀÌ È£¹Ú³ÕÄðó·³ ¿ì·ç·è!! ** º» ¸ÞÀÏÀº Á¤º¸Åë½Å¸Á ÀÌ¿ëÃËÁø¹ý ±ÔÁ¤¿¡ µû¶ó ±¤°í¸ÞÀÏÀÓÀ» Ç¥½ÃÇÏ¿´À¸¸ç ¼ö½Å°ÅºÎÀåÄ¡¸¦ ¸¶·ÃÇÏ°í ÀÖ½À´Ï´Ù. º» ¸ÞÀÏÁÖ¼Ò´Â ÀÎÅͳݻ󿡼 ¼öÁýÇÑ°ÍÀÌ¸ç ¸ÞÀÏÁÖ¼Ò¿Ü ¾î¶°ÇÑ °³ÀÎÁ¤º¸µµ °®°í ÀÖÁö ¾Ê½À´Ï´Ù ¿øÄ¡ ¾ÊÀº Á¤º¸¿´´Ù¸é Á¤ÁßÈ÷ »ç°ú µå¸®¸ç, ¼ö½Å°ÅºÎ¸¦ ÇØ ÁÖ½Ã¸é ´ÙÀ½ºÎÅÍ´Â ¸ÞÀÏÀÌ ¹ß¼ÛµÇÁö ¾ÊÀ» °ÍÀÔ´Ï´Ù ¼ö½Å°ÅºÎ
Re: reading HTML input-files (WITH ATTACHMNT!)
On 7 Mar 2002 at 17:50, Mathias Kratzer wrote: While calling Wget 1.5.2 by wget -F -O 69_4_522_Ref.res -i 69_4_522_Ref.mrq on the attached file 69_4_522_Ref.mrq has worked very well I am left with the error message No URLs found in 69_4_522_Ref.mrq whenever I try the same command using Wget 1.7. Even embedding the content of 69_4_522_Ref.mrq into a HTML4 frame (i.e. DOCTYPE-header, html-, head- and body-tags) did not help. Can you tell me what I am doing wrong? The file 69_4_522_Ref.mrq contains several lines of the form: a href=url/a which looks pretty invalid to me. Perhaps you need to change them to: a href=url/ (XML format) or: a href=url/a (SGML format)
bug with .html?
I'm using -html-extension to append files with the html extension. Debug log is below. I'm not getting the expected result, and I'm hoping someone can determine the problem. For testing purposes, I've got a cgi script that generates the html for a page. The server, that the cgi is running on, has mime type set to text/html. As an aside, the page requisites aren't loaded either. Could this be because the file doesn't initially have the html extension, so wget doesn't go back and grab all images, etc? Thanks, Picot * The wget call looks like this: ./wget -da extlog --html-extension --convert-links --page-requisites --tries=3 --timeout=60 --ignore-length --no-http-keep-alive --cookies=off -i ID_11 ** The output from my debug log looks like this: DEBUG output created by Wget 1.8.1 on solaris2.8. Loaded ID_11 (size 28). Enqueuing http://host:port/PET.cgi at depth 0 Queue count 1, maxcount 1. Dequeuing http://host:port/PET.cgi at depth 0 Queue count 0, maxcount 1. Caching host = ip.address Created socket 18. Releasing 121ee8 (new refcount 1). ---request begin--- GET /PET.cgi HTTP/1.0 User-Agent: Wget/1.8.1 Host: host:port Accept: */* ---request end--- HTTP/1.1 200 OK Server: Netscape-Enterprise/3.5.1 Date: Thu, 07 Mar 2002 11:38:10 GMT content-type: text/html; charset=ISO-8859-1 Connection: close Closing fd 18 *** The page that's generated looks like this: html head titlePET/title /head !-- This is the linked style sheet -- link rel=stylesheet href=http://host:port/style.css; TYPE=text/css !-- Begin HTML Body -- body class=White : : Stuff : : /body /html begin:vcard n:Chappell;Picot tel;cell:571.214.2874 tel;fax:703.902.3697 tel;work:703.902.5297 x-mozilla-html:FALSE org:Booz Allen Hamilton;Visit us on the Internet: a href=http://boozallen.com;BoozOnline/a adr:;; version:2.1 email;internet:[EMAIL PROTECTED] fn:Picot Chappell end:vcard