[Bug-wget] [bug #56648] Add HTTP request header to file

2019-07-19 Thread Tim Ruehsen
Follow-up Comment #3, bug #56648 (project wget):

Good that you can work on...

On Debian GNU/Linux:
apt-get install xattr
wget --xattr www.example.com
xattr -p user.xdg.origin.url index.html >out.txt
cat out.txt


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #56648] Add HTTP request header to file

2019-07-19 Thread anonymous
Follow-up Comment #2, bug #56648 (project wget):

Thanks a lot for the fast reply. 

xattr does save URL to file attribute but I couldn't find a way to get the
attributes when using cat to merge all the files to one. Didn't even find a
way to copy attribute and append it to file.

WARC - seems to be helpful.
The manual doesn't explain what it is:
https://www.gnu.org/software/wget/manual/wget.html
Maybe add some explanation based on this:
https://www.archiveteam.org/index.php?title=Wget_with_WARC_output

-o also works but I saved to same file as -O so I get all the data in one file
and don't need to look for log file and try to merge data.

Thanks again.

P.S. I still think it might be easier to save original URL through
‘--save-headers’ 

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #56648] Add HTTP request header to file

2019-07-19 Thread Tim Ruehsen
Follow-up Comment #1, bug #56648 (project wget):

If you copy all files and headers into one file - did you play with the WARC
options ?

With --xattr the original URL is saved as extended file attribute, if your
file system supports it.

You could use -d -olog and let a script extract URL and filename from it.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #56648] Add HTTP request header to file

2019-07-19 Thread anonymous
URL:
  

 Summary: Add HTTP request header to file
 Project: GNU Wget
Submitted by: None
Submitted on: Fri 19 Jul 2019 01:26:19 PM UTC
Category: Feature Request
Severity: 3 - Normal
Priority: 5 - Normal
  Status: None
 Privacy: Public
 Assigned to: None
 Originator Name: Asaf
Originator Email: 3023023...@gmail.com
 Open/Closed: Open
 Discussion Lock: Any
 Release: None
Operating System: GNU/Linux
 Reproducibility: Every Time
   Fixed Release: None
 Planned Release: None
  Regression: None
   Work Required: None
  Patch Included: None

___

Details:

I use wget to download website to one folder, copy all files to one file and
process the full text of the website.
I need to know from which link each part of the text was taken. I thought
‘--save-headers’ will help me but it only contains HTTP response header,
and I guess the link appears in the HTTP request.

Planning to move to httrack due to this issue. There I get original link on
note on each file by default.

Thanks. 




___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/