The redirect page seems to be a browser check that sets a `human=1` cookie, which then allows access to the underlying page. The robots.txt on that subdomain also blocks bot traffic on the whole site, so then that has to be disabled too.
This wget command works here: wget --header 'Cookie: human=1' -e robots=off https://beta.lmfdb.org/data/riemann-zeta-zeros/zeros_14.dat or all: wget --header 'Cookie: human=1' -e robots=off --recursive --no-parent --reject 'index.*' https://beta.lmfdb.org/data/riemann-zeta-zeros/ On Sat, Nov 1, 2025 at 6:48 PM American Citizen <[email protected]> wrote: > Hello: > > I can manually download the numeric data from the LMFDB website page > https://beta.lmfdb.org/data/riemann-zeta-zeros/ by clicking on one of > the zeros_nnnn.dat files and the browser asks if I want to save the file. > > However if I attempt to automate this procedure (and yes, the reason I > am attempting to do this, is the fact that 14,580 dat files exist on > this webpage, way beyond the normal human ability to click all 14,580 > dat files (I took 2 hours to do 53 files due to the slow download speed) > but using curl fails. > > Here's the curl session logged with the -v option > > owner@localhost:~> curl -A "Mozilla/5.0 (X11; Linux x86_64; rv:144.0) > Gecko/20100101 Firefox/144.0" -L -O > "beta.lmfdb.org/data/riemann-zeta-zeros/zeros_14.dat" -v > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left Speed > 0 0 0 0 0 0 0 0 --:--:-- --:--:-- > --:--:-- 0* Trying 18.1.37.31:443... > * Connected to beta.lmfdb.org (18.1.37.31) port 443 (#0) > * ALPN: offers h2,http/1.1 > } [5 bytes data] > * TLSv1.3 (OUT), TLS handshake, Client hello (1): > } [512 bytes data] > * TLSv1.3 (IN), TLS handshake, Server hello (2): > { [122 bytes data] > * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): > { [25 bytes data] > * TLSv1.3 (IN), TLS handshake, Certificate (11): > { [2683 bytes data] > * TLSv1.3 (IN), TLS handshake, CERT verify (15): > { [264 bytes data] > * TLSv1.3 (IN), TLS handshake, Finished (20): > { [52 bytes data] > * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): > } [1 bytes data] > * TLSv1.3 (OUT), TLS handshake, Finished (20): > } [52 bytes data] > * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 > * ALPN: server accepted http/1.1 > * Server certificate: > * subject: CN=alpha.lmfdb.org > * start date: Sep 1 16:25:06 2025 GMT > * expire date: Nov 30 16:25:05 2025 GMT > * subjectAltName: host "beta.lmfdb.org" matched cert's "beta.lmfdb.org" > * issuer: C=US; O=Let's Encrypt; CN=R13 > * SSL certificate verify ok. > * using HTTP/1.1 > } [5 bytes data] > > GET /data/riemann-zeta-zeros/zeros_14.dat HTTP/1.1 > > Host: beta.lmfdb.org > > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:144.0) Gecko/20100101 > Firefox/144.0 > > Accept: */* > > > { [5 bytes data] > * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): > { [57 bytes data] > * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4): > { [57 bytes data] > * old SSL session ID is stale, removing > { [5 bytes data] > < HTTP/1.1 302 Found > < Date: Sun, 02 Nov 2025 01:32:38 GMT > < Server: Apache/2.4.52 (Ubuntu) > < Location: > beta.lmfdb.org/gate.html?gateorig=/data/riemann-zeta-zeros/zeros_14.dat > < Cache-Control: max-age=0 > < Expires: Sun, 02 Nov 2025 01:32:38 GMT > < Content-Length: 344 > < Content-Type: text/html; charset=iso-8859-1 > < > * Ignoring the response-body > { [344 bytes data] > 100 344 100 344 0 0 807 0 --:--:-- --:--:-- > --:--:-- 809 > * Connection #0 to host beta.lmfdb.org left intact > * Issue another request to this URL: > 'beta.lmfdb.org/gate.html?gateorig=/data/riemann-zeta-zeros/zeros_14.dat' > * Found bundle for host: 0x564bce5df2d0 [serially] > * Can not multiplex, even if we wanted to > * Re-using existing connection #0 with host beta.lmfdb.org > } [5 bytes data] > > GET /gate.html?gateorig=/data/riemann-zeta-zeros/zeros_14.dat HTTP/1.1 > > Host: beta.lmfdb.org > > User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:144.0) Gecko/20100101 > Firefox/144.0 > > Accept: */* > > > { [5 bytes data] > < HTTP/1.1 200 OK > < Date: Sun, 02 Nov 2025 01:32:38 GMT > < Server: Apache/2.4.52 (Ubuntu) > < Last-Modified: Thu, 14 Aug 2025 10:49:47 GMT > < ETag: "28a-63c51082bdba9" > < Accept-Ranges: bytes > < Content-Length: 650 > < Cache-Control: no-cache, no-store, must-revalidate > < Expires: 0 > < Vary: Accept-Encoding,User-Agent > < Pragma: no-cache > < Content-Type: text/html > < > { [650 bytes data] > 100 650 100 650 0 0 1224 0 --:--:-- --:--:-- > --:--:-- 1224 > > NO!!! the zeros_14.dat file is NOT 650 bytes in size! It is at least 57K > bytes in size and a binary file too. > > Apparently I am being redirected, but then land on an html page when > using the curl command. > > LFMDB claims that anyone can download their data, they're not trying to > keep this a big secret. > > How can I fix the curl command and its options, so I can successfully > download a selected zeros_nnnn.dat file? > > Randall > >
