Re: [Slackbuilds-users] sbog ping: invalid download urls in .info files
On Fri, 2 Mar 2018 13:49:55 -0500 B Watson wrote: > On 3/2/18, rundstut...@gmx.de wrote: > > i didn't even know that sbosrcarch does exist (even though i have > > seen slackware.uk before). so the functionality i programmed does > > already exist. > > ...or: git clone git://urchlay.naptime.net/sbostuff.git > > It's in perl, and not what I'd call beautiful code, so you might want > to put on your goggles before looking at it :) thank you, this will be an interesting read. i don't care about code quality - i care more about knowledge that usually lies within the code. you know probably more about the sbo stuff than i do. > > You might at least look through your logs and make a list of the > servers that don't do HEAD requests, and add some code that logs > "this server doesn't support HEAD requests", so the user knows it's > the server's fault. > please read my starter post, there is a pastebin link to the output of "sbog ping". i print all errors to stdout, easy to parse. errors caused by HEAD request failures are only a few. "forbidden" errors are about 10 or so. can be easily found with grep. > > You're right. How much overhead depends on how the server/proxy is > configured, but it seems to be 64K bytes per request usually. I > decided that wasn't a problem, partly because the guy who hosts the > archive said so (I wrote the script, he runs it and lets people > download the files it collects). there used to be times where starting several downloads simultaneously from the same server was frowned upon. i don't know if this has changed. > > Make the client pool smart enough to serialize requests to the same > server, and do them in parallel only when they're for different > servers? That seems like it'd be worth doing even with regular HEAD > requests like you use now. i did a lot of "curl -LIv" testing on download urls - and *many* of them have redirects. no way to tell the real download server by the url found in the .info file. but i will find a way. right now i do some runs, printing a lot of data and collect statistics - so i know what i am actually dealing with. saying that, i have no problem sending 5 or 10 HEAD request to the same server simultaneously. GET is another story. ___ SlackBuilds-users mailing list SlackBuilds-users@slackbuilds.org https://lists.slackbuilds.org/mailman/listinfo/slackbuilds-users Archives - https://lists.slackbuilds.org/pipermail/slackbuilds-users/ FAQ - https://slackbuilds.org/faq/
Re: [Slackbuilds-users] sbog ping: invalid download urls in .info files
On 3/2/18, rundstut...@gmx.de wrote: > i didn't even know that sbosrcarch does exist (even though i have seen > slackware.uk before). so the functionality i programmed does already > exist. Not exactly: it's the archive creation/maintenance script that does all the "fake HEAD" requests. It's looking at the headers to decide whether the file has changed. If it thinks the file's different, it downloads a new copy for the archive. There's no "check links, but don't download files" mode, so it doesn't do what "sbog ping" does. > i can't find the sources of your ping script, though. Here: http://urchlay.naptime.net/repos/sbostuff/tree/ ...or: git clone git://urchlay.naptime.net/sbostuff.git It's in perl, and not what I'd call beautiful code, so you might want to put on your goggles before looking at it :) > i hate workarounds with passion. its one of the banes of the software > industry. what i am trying to do is actually what HEAD requests were > made for. > ... > but alas - these servers are not supported by "sbog > ping". if a server is not able to properly handle a HEAD request (which > is not hard) then this is not the fault of sbog. Right. In a perfect world, all web servers would comply with the spec, and would support HEAD requests. Your approach is valid, but I took a different approach, since I wanted sbosrcarch to be as complete an archive as possible. The only things it doesn't have are files hidden behind a click-through license (like Oracle's jdk download). You might at least look through your logs and make a list of the servers that don't do HEAD requests, and add some code that logs "this server doesn't support HEAD requests", so the user knows it's the server's fault. > if i read the output of "curl -v --head -X GET $url" correctly: the > server will get a GET request, so the server (or proxy) will start > sending the body (before the request is closed/cancelled), causing > traffic overhead. correct me if i'm wrong. i don't want to do that. You're right. How much overhead depends on how the server/proxy is configured, but it seems to be 64K bytes per request usually. I decided that wasn't a problem, partly because the guy who hosts the archive said so (I wrote the script, he runs it and lets people download the files it collects). > there is another reason i am don't want to use GET requests: sbog uses a > client pool, sending request concurrently to servers. this speeds > things up *a lot*. i don't want to send several GET request to a > server simultaneously. Make the client pool smart enough to serialize requests to the same server, and do them in parallel only when they're for different servers? That seems like it'd be worth doing even with regular HEAD requests like you use now. sbosrcarch doesn't do anything in parallel, it's one request after another. Normally it's run non-interactively (via cron job) so it doesn't matter if it takes a long time to finish. ___ SlackBuilds-users mailing list SlackBuilds-users@slackbuilds.org https://lists.slackbuilds.org/mailman/listinfo/slackbuilds-users Archives - https://lists.slackbuilds.org/pipermail/slackbuilds-users/ FAQ - https://slackbuilds.org/faq/
Re: [Slackbuilds-users] sbog ping: invalid download urls in .info files
i didn't even know that sbosrcarch does exist (even though i have seen slackware.uk before). so the functionality i programmed does already exist. i can't find the sources of your ping script, though. On Thu, 1 Mar 2018 14:21:12 -0500 B Watson wrote: > On 3/1/18, rundstut...@gmx.de wrote: > > > > to understand the cause of some errors one has to know > > how "sbog ping" works. for http urls a http "HEAD" request is send > > to test if the url is valid. this can cause trouble, as some > > servers are picky about the request method ("GET" vs. "HEAD"). > > I ran into this when working on sbosrcarch. Not only do some servers > not allow HEAD, there are also some that allow it but don't report the > same info as GET (missing Content-length header, usually). The way I > work around it is to do an actual GET request, but close the > connection after the headers are received (before the content). i hate workarounds with passion. its one of the banes of the software industry. what i am trying to do is actually what HEAD requests were made for. i had to fiddle but i got it working - most of the times. i had to add some code for amazons3 servers (you probably know why), this saved me from using get, so this is the price i'm willing to pay. i am happy with the result (except the amazons3 code). there are a few servers that will fail - but alas - these servers are not supported by "sbog ping". if a server is not able to properly handle a HEAD request (which is not hard) then this is not the fault of sbog. > > sbosrcarch calls the curl command line tool, so it does this: > > curl --head -X GET $url if i read the output of "curl -v --head -X GET $url" correctly: the server will get a GET request, so the server (or proxy) will start sending the body (before the request is closed/cancelled), causing traffic overhead. correct me if i'm wrong. i don't want to do that. there is another reason i am don't want to use GET requests: sbog uses a client pool, sending request concurrently to servers. this speeds things up *a lot*. i don't want to send several GET request to a server simultaneously. regards, heiko ___ SlackBuilds-users mailing list SlackBuilds-users@slackbuilds.org https://lists.slackbuilds.org/mailman/listinfo/slackbuilds-users Archives - https://lists.slackbuilds.org/pipermail/slackbuilds-users/ FAQ - https://slackbuilds.org/faq/
Re: [Slackbuilds-users] sbog ping: invalid download urls in .info files
On 3/1/18, rundstut...@gmx.de wrote: > > to understand the cause of some errors one has to know > how "sbog ping" works. for http urls a http "HEAD" request is send to > test if the url is valid. this can cause trouble, as some servers are > picky about the request method ("GET" vs. "HEAD"). I ran into this when working on sbosrcarch. Not only do some servers not allow HEAD, there are also some that allow it but don't report the same info as GET (missing Content-length header, usually). The way I work around it is to do an actual GET request, but close the connection after the headers are received (before the content). sbosrcarch calls the curl command line tool, so it does this: curl --head -X GET $url sbog is written in a compiled language, presumably doesn't call external commands like this, but you can do the same thing. Send a GET request, read the headers until you get a blank line, then close the connection. ___ SlackBuilds-users mailing list SlackBuilds-users@slackbuilds.org https://lists.slackbuilds.org/mailman/listinfo/slackbuilds-users Archives - https://lists.slackbuilds.org/pipermail/slackbuilds-users/ FAQ - https://slackbuilds.org/faq/