Re: Links Not Parsing Correctly?

2023-11-15 Thread Stephane Ascoet

Le 14/11/2023 à 19:22, Derek Tombrello a écrit :

I appreciate that. I'll check that out. In the mean time, I came up with
a bash script to fix the issue with the ones I've already downloaded. In
case anyone else is interested or needs it, two simply commands run in
the same directory as the index.html files:


rename 's/index\.html\?page=([0-9]+)\&/index$1.html/' *
sed -Ei 's/index\.html\?page=([0-9]+)\/index\1.html/' *.html


Hi, that's the sort of things I do too. In a lot of the huge on-line 
archives of the Web of the past I've made the lasts years, even when 
sucking has worked mostly right, there are always some corrections like 
this to be done.



/"First they came for the Communists, but I was not a Communist so I did



Very long quotations(longer than the actual content). In french, we've 
got a little book telling this story, called "Les matins bruns". It sold 
well, but sadly without great effects on people's minds. A derivative 
short film, with the same title, very strange, has been made from it too.

--
Sincerely, Stephane Ascoet




Re: Links Not Parsing Correctly?

2023-11-14 Thread Derek Tombrello
I appreciate that. I'll check that out. In the mean time, I came up with 
a bash script to fix the issue with the ones I've already downloaded. In 
case anyone else is interested or needs it, two simply commands run in 
the same directory as the index.html files:



rename 's/index\.html\?page=([0-9]+)\&/index$1.html/' *
sed -Ei 's/index\.html\?page=([0-9]+)\/index\1.html/' *.html





✞ Derek Tombrello (KM4JAG)
www.RobotsAndComputers.com


/"First they came for the Communists, but I was not a Communist so I did 
not speak out.
Then they came for the Socialists and the Trade Unionists, but I was 
neither, so I did not speak out.

Then they came for the Jews, but I was not a Jew so I did not speak out.
And when they came for me, there was no one left to speak out for me."
/


/"Every record has been destroyed or falsified, every book rewritten, 
every picture has been repainted,
every statue and street building has been renamed, every date has been 
altered. And the process is continuing
day by day and minute by minute. History has stopped. Nothing exists 
except an endless present in which the Party

is always right." - George Orwell, "1984" /


On 11/13/23 02:33, Stephane Ascoet wrote:

Le 12/11/2023 à 18:00, bug-wget-requ...@gnu.org a écrit :

From: Derek Tombrello 
To: bug-wget@gnu.org
Subject: Links Not Parsing Correctly?

 From the main 'index.html' page, if you click on 'page 2', the address
bar reflects that it is displaying 'index.html?page=2&' but the actual
content is still that of the original 'index.html' page. I can double
click on the 'index.html?page=2&' file itself in the file manager and it
does, in fact, display the page associated with page 2.




Hi, I had almost exactly the same problem a few months ago and got no 
solution except migrating to WebHTTrack. You probably can find the 
thread in the archives, beginning on the 19/8/2023


Re: Links Not Parsing Correctly?

2023-11-13 Thread Stephane Ascoet

Le 12/11/2023 à 18:00, bug-wget-requ...@gnu.org a écrit :

From: Derek Tombrello 
To: bug-wget@gnu.org
Subject: Links Not Parsing Correctly?

 From the main 'index.html' page, if you click on 'page 2', the address
bar reflects that it is displaying 'index.html?page=2&' but the actual
content is still that of the original 'index.html' page. I can double
click on the 'index.html?page=2&' file itself in the file manager and it
does, in fact, display the page associated with page 2.




Hi, I had almost exactly the same problem a few months ago and got no 
solution except migrating to WebHTTrack. You probably can find the 
thread in the archives, beginning on the 19/8/2023

--
Cordialement, Stephane Ascoet




Links Not Parsing Correctly?

2023-11-11 Thread Derek Tombrello
I don't know if this is the right place to ask this, but I can't find 
anywhere else to turn, so


So, I'm trying to mirror a site. I'm using 'wget -r -l 0 -k www.site.com 
' as the command. This works great... almost. The 
site is paginated in such a way that each successive page is linked 
using 'index.html?page=2&' where the number is incremented for each 
page. The index pages are being stored this way on my drive


|index.html index.html?page=2& index.html?page=3& index.html?page=4& 
...etc... |


From the main 'index.html' page, if you click on 'page 2', the address 
bar reflects that it is displaying 'index.html?page=2&' but the actual 
content is still that of the original 'index.html' page. I can double 
click on the 'index.html?page=2&' file itself in the file manager and it 
does, in fact, display the page associated with page 2.


What I am trying to figure out is, is there any EASY way to get the page 
links to work from within the web page. Or am I going to have to 
manually rename the 'index.html?page=2&' files and edit the html files 
to reflect the new names? That's really more than I want to have to do.


Or... is there anything I can do to the command parameters that would 
correct this behaviour?


I hope all of this makes sense. It does in my head, but... it's 
cluttered up there


--



✞ Derek Tombrello (KM4JAG)
www.RobotsAndComputers.com