[Bug 1937874] [NEW] one --accept-regex expression negates another

Bill Yikes Fri, 23 Jul 2021 12:50:47 -0700

Public bug reported:

This command should theoretically fetch all PDFs on a page:


$ wget -v -d -r --level 1 --adjust-extension --no-clobber --no-directories\
       --accept-regex 'administrative-orders/.*/administrative-order-matter-'\
       --accept-regex 'administrative-orders.*.pdf'\
       --accept-regex 'administrative-orders.page[^&]*$'\
       --directory-prefix=/tmp\
       
'https://www.ncua.gov/regulation-supervision/enforcement-actions/administrative-orders?page=56'

But it fails to grab any of them, giving the output:

---
Deciding whether to enqueue 
"https://www.ncua.gov/files/administrative-orders/AO14-0241-R4.pdf";.
https://www.ncua.gov/files/administrative-orders/AO14-0241-R4.pdf is 
excluded/not-included through regex.
Decided NOT to load it.
---

That's bogus.  The workaround is to remove this option:

--accept-regex 'administrative-orders.page[^&]*$'

But that should not be necessary.  Adding an --accept-* clause should
never cause another --accept-* clause to become invalidated and it
should not shrink the set of fetched files.

** Affects: wget (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1937874

Title:
  one --accept-regex expression negates another

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/wget/+bug/1937874/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1937874] [NEW] one --accept-regex expression negates another

Reply via email to