Bug#513638: wget: --follow-tags not following

2013-08-16 Thread Noël Köthe
tags 513638 + moreinfo
thanks

Hello Tim and Tong,

Tim: Thanks for your the information.

Tong: Is your report still valid, or with the comments from Tim do you
aggree this report http://bugs.debian.org/513638 can be closed?

Thanks.

Am Mittwoch, den 14.08.2013, 10:58 +0200 schrieb Tim Ruehsen:
 For a conversion to local URLs you will definitely need --span-hosts since 
 www.cnn.com refers to images on other hosts.
 
 Also, you do not need curl to fetch the index.html (you named it test.htm).
 
 
 This will work for you (at least here with wget 1.14-2):
 
 wget --span-hosts --no-clobber --convert-links --no-directories --page-
 requisites --follow-tags=img http://www.cnn.com
 
 
 If you are going to parse a local HTML file you use e.g.
   --force-html -i test.htm
 Of course, this works only for absolute URLs found in test.htm.
 If you want to download relative URLs as well, add the --base option.



-- 
Noël Köthe noel debian.org
Debian GNU/Linux, www.debian.org


signature.asc
Description: This is a digitally signed message part


Bug#513638: wget: --follow-tags not following

2013-08-16 Thread Tong Sun
On Fri, Aug 16, 2013 at 8:35 AM, Noël Köthe n...@debian.org wrote:
 Tong: Is your report still valid, or with the comments from Tim do you
 aggree this report http://bugs.debian.org/513638 can be closed?

I tried it again today, and ... yes, you can close it now.

 If you are going to parse a local HTML file you use e.g.
   --force-html -i test.htm

Thanks.


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#513638: wget: --follow-tags not following

2013-08-14 Thread Tim Ruehsen
For a conversion to local URLs you will definitely need --span-hosts since 
www.cnn.com refers to images on other hosts.

Also, you do not need curl to fetch the index.html (you named it test.htm).


This will work for you (at least here with wget 1.14-2):

wget --span-hosts --no-clobber --convert-links --no-directories --page-
requisites --follow-tags=img http://www.cnn.com


If you are going to parse a local HTML file you use e.g.
--force-html -i test.htm
Of course, this works only for absolute URLs found in test.htm.
If you want to download relative URLs as well, add the --base option.


Regards, Tim


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#513638: wget: --follow-tags not following

2009-01-30 Thread Tong Sun

Package: wget
Version: 1.11.4-2
Severity: minor

Hi, 

I found that I can't make wget --follow-tags to work when using the 
--no-clobber option. 

First of all, all that I want is to download all images from an
*downloaded* html file, and then convert-links locally. Believing
that --no-clobber is all what I need:

,-
| ... when -nc is specified, files with the suffixes .html or .htm
| will be loaded from the local disk and parsed as if they had been
| retrieved from the Web.
`-

so I tried,

  curl http://www.cnn.com/ -o test.htm
  wget --no-directories --no-clobber --convert-links --follow-tags=img 
http://localhost/test.htm

Thinking that wget will load the test.htm as if it had been
retrieved from the web, parse it, then follow and download all the
img tags. However it didn't work (see below). Why is that?
(Note that if I substitute the --follow-tags option with 
--page-requisites, at least I'm getting something).

PS. 

- If not using --no-clobber, but --force-html --input-file=test.htm,
  the --follow-tags=img, works fine, but --convert-links did not do
  anything, but I want the img links to be converted locally.

- is there better way for wget to load local files instead of using
  the fake http://localhost/...?

- is it possible not to use --span-hosts for this case?

Thanks

Tong


  $ wget -d --no-directories --no-clobber --convert-links --follow-tags=img 
--span-hosts http://localhost/test.htm
  Setting --directories (dirstruct) to 0
  Setting --no-clobber (noclobber) to 1
  Setting --convert-links (convertlinks) to 1
  Setting --follow-tags (followtags) to img
  Setting --span-hosts (spanhosts) to 1
  DEBUG output created by Wget 1.11.4 on linux-gnu.

  File `test.htm' already there; not retrieving.

  Scanning test.htm (from http://localhost/test.htm)
  Loaded test.htm (size 91418).
  test.htm: merge(http://localhost/test.htm;, 
http://localhost/header_cnn_com_logo.gif;) - 
http://localhost/header_cnn_com_logo.gif
  appending http://localhost/header_cnn_com_logo.gif; to urlpos.
  test.htm: merge(http://localhost/test.htm;, 
http://localhost/header_google_logo.gif;) - 
http://localhost/header_google_logo.gif
  appending http://localhost/header_google_logo.gif; to urlpos.
  test.htm: merge(http://localhost/test.htm;, 
http://i.cdn.turner.com/cnn/images/1.gif;) - 
http://i.cdn.turner.com/cnn/images/1.gif
  appending http://i.cdn.turner.com/cnn/images/1.gif; to urlpos.
  [ . . .]
  appending 
http://metrics.cnn.com/b/ss/cnn2global/1/H.1--NS/0?pageName=No%20Javascript; 
to urlpos.
  no-follow in test.htm: 0
  will convert url http://localhost/header_cnn_com_logo.gif to complete
  will convert url http://localhost/header_google_logo.gif to complete
  will convert url http://i.cdn.turner.com/cnn/images/1.gif to complete
  [ . . .]
  will convert url http://i.cdn.turner.com/cnn/images/1.gif to complete
  will convert url http://i.cdn.turner.com/cnn/images/1.gif to complete
  will convert url 
http://metrics.cnn.com/b/ss/cnn2global/1/H.1--NS/0?pageName=No%20Javascript to 
complete
  Converting test.htm... nothing to do.
  Converted 1 files in 0.02 seconds.


-- System Information:
Debian Release: lenny/sid
  APT prefers testing
  APT policy: (300, 'testing'), (50, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.26-grml (SMP w/1 CPU core; PREEMPT)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/bash

Versions of packages wget depends on:
ii  libc6 2.7-13 GNU C Library: Shared libraries
ii  libssl0.9.8   0.9.8g-13  SSL shared libraries

wget recommends no packages.

wget suggests no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org