[Bug-wget] [bug #53322] Add option to let page-requisites bypass no-parent

2018-11-12 Thread Darshit Shah
Update of bug #53322 (project wget):

  Status:None => Invalid
 Open/Closed:Open => Closed 


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[Bug-wget] [bug #53322] Add option to let page-requisites bypass no-parent

2018-04-15 Thread David
Follow-up Comment #9, bug #53322 (project wget):

Probably I must have missed something, because using `--recursive
--page-requisites --no-parent` does indeed do what I want. Strange, I could
swear it was behaving differently before.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #53322] Add option to let page-requisites bypass no-parent

2018-04-08 Thread Tim Ruehsen
Follow-up Comment #8, bug #53322 (project wget):

Just remove --recursive and you get what you want.
Keep in mind that wget doesn't run Javascript, so no dynamic created
requisites can be downloaded.

$ tree addyosmani.com/
addyosmani.com/
├── cdn-cgi
│   └── scripts
│   └── d07b1474
│   └── cloudflare-static
│   └── email-decode.min.js
└── resources
└── essentialjsdesignpatterns
├── book
│   ├── images
│   │   ├── base.png
│   │   └── ns1.png
│   ├── index.html
│   ├── scripts
│   │   └── vendor.js
│   └── styles
│   └── vendor.css
└── cover
└── cover.jpg

11 directories, 7 files


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #53322] Add option to let page-requisites bypass no-parent

2018-04-07 Thread David
Follow-up Comment #7, bug #53322 (project wget):

Okay, I've got an example:

wget --recursive --page-requisites --convert-links --adjust-extension
https://addyosmani.com/resources/essentialjsdesignpatterns/book/

I would like to download everything under this path:
"/resources/essentialjsdesignpatterns/book/", in this example index.html is
the only page, but if there was a second html page linked from index, that
would be downloaded too.

Plus anything page requisites from other paths, like "../../../cdn-cgi/" but
not other pages, like html documents in "/blog/".

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #53322] Add option to let page-requisites bypass no-parent

2018-04-05 Thread Tim Ruehsen
Follow-up Comment #6, bug #53322 (project wget):

Little correction: the "...is no reference..." should be "... no page
requisite..."


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #53322] Add option to let page-requisites bypass no-parent

2018-04-05 Thread Tim Ruehsen
Follow-up Comment #5, bug #53322 (project wget):

Maybe I was a bit unclear, sorry for that.

In your document (http://www.oreilly.com/openbook/osfreesoft/book/) is no
reference to any other resource on the same domain (www.oreilly.com).

So to download anything with --page-requisite, you'll need -H.

Without it, you just get your index.html and that's it. You didn't request
anything else.

So what *exactly* do you want. Please make a list of documents/resources you
would like to see in the end. With that we also have a potential test case, if
it comes to implement a new feature.


___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #53322] Add option to let page-requisites bypass no-parent

2018-04-04 Thread David
Follow-up Comment #4, bug #53322 (project wget):

Thanks for your answer, but it's missing the point of the original question.

I basically want to download everything under the current folder (recursive,
no parent) but i also want to download css/js that are on the same domain but
higher in the hierarchy (page requisites, recursive).

You understand what I mean? For example, in the previous example I'd like to
download css/js and the linked pdf files.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #53322] Add option to let page-requisites bypass no-parent

2018-04-04 Thread Tim Ruehsen
Follow-up Comment #3, bug #53322 (project wget):

Then you won't need --recursive.
And looking at that page: the links all go to different domains - so will need
-H as well.

wget -H --page-requisites http://www.oreilly.com/openbook/osfreesoft/book/

does the job for me:

FINISHED --2018-04-04 14:17:29--
Total wall clock time: 15s
Downloaded: 41 files, 569K in 0.6s (888 KB/s)



___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #53322] Add option to let page-requisites bypass no-parent

2018-04-04 Thread David
Follow-up Comment #2, bug #53322 (project wget):

Well, I don't want to download a whole site!

Example:
wget --recursive --page-requisites
http://www.oreilly.com/openbook/osfreesoft/book/

It will start downloading everything that is linked on the page, not just the
current directory.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #53322] Add option to let page-requisites bypass no-parent

2018-04-04 Thread Tim Ruehsen
Follow-up Comment #1, bug #53322 (project wget):

Why don't you leave away --no-parent then ?

--page-requisites alone should do what you want.

___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/




[Bug-wget] [bug #53322] Add option to let page-requisites bypass no-parent

2018-03-11 Thread David
URL:
  

 Summary: Add option to let page-requisites bypass no-parent
 Project: GNU Wget
Submitted by: mcdado
Submitted on: Sun 11 Mar 2018 11:43:25 AM UTC
Category: Feature Request
Severity: 3 - Normal
Priority: 5 - Normal
  Status: None
 Privacy: Public
 Assigned to: None
 Originator Name: 
Originator Email: 
 Open/Closed: Open
 Discussion Lock: Any
 Release: None
Operating System: None
 Reproducibility: Every Time
   Fixed Release: None
 Planned Release: None
  Regression: None
   Work Required: None
  Patch Included: None

___

Details:

When using `--no-parent` and `--page-requisites`, if the page requires images
or other requisites from that domain but higher in hierarchy, it will not
proceed to download those requisites because of the `no-parent` option. While
`no-parent` is useful for web pages (without it you download a whole site,
especially if you `--mirror`) so it would be good to have an extra option to
allow downloading page requisites that are higher in the hierarchy.




___

Reply to this item at:

  

___
  Message sent via/by Savannah
  http://savannah.gnu.org/