Re: [Slackbuilds-users] sbog ping: invalid download urls in .info files

2018-03-02 Thread rundstutzen
On Fri, 2 Mar 2018 13:49:55 -0500
B Watson  wrote:

> On 3/2/18, rundstut...@gmx.de  wrote:
> > i didn't even know that sbosrcarch does exist (even though i have
> > seen slackware.uk before). so the functionality i programmed does
> > already exist.  
> 
> ...or: git clone git://urchlay.naptime.net/sbostuff.git
> 
> It's in perl, and not what I'd call beautiful code, so you might want
> to put on your goggles before looking at it :)

thank you, this will be an interesting read. i don't care about code
quality - i care more about knowledge that usually lies within the
code. you know probably more about the sbo stuff than i do.
> 
> You might at least look through your logs and make a list of the
> servers that don't do HEAD requests, and add some code that logs
> "this server doesn't support HEAD requests", so the user knows it's
> the server's fault.
> 

please read my starter post, there is a pastebin link to the output of
"sbog ping". i print all errors to stdout, easy to parse. errors
caused by HEAD request failures are only a few. "forbidden" errors are
about 10 or so. can be easily found with grep.

> 
> You're right. How much overhead depends on how the server/proxy is
> configured, but it seems to be 64K bytes per request usually. I
> decided that wasn't a problem, partly because the guy who hosts the
> archive said so (I wrote the script, he runs it and lets people
> download the files it collects).

there used to be times where starting several downloads simultaneously
from the same server was frowned upon. i don't know if this has changed.

> 
> Make the client pool smart enough to serialize requests to the same
> server, and do them in parallel only when they're for different
> servers? That seems like it'd be worth doing even with regular HEAD
> requests like you use now.

i did a lot of "curl -LIv" testing on download urls - and *many* of them
have redirects. no way to tell the real download server by the
url found in the .info file. but i will find a way. right now i do some
runs, printing a lot of data and collect statistics - so i know what i
am actually dealing with.

saying that, i have no problem sending 5 or 10 HEAD request to the same
server simultaneously. GET is another story. 
___
SlackBuilds-users mailing list
SlackBuilds-users@slackbuilds.org
https://lists.slackbuilds.org/mailman/listinfo/slackbuilds-users
Archives - https://lists.slackbuilds.org/pipermail/slackbuilds-users/
FAQ - https://slackbuilds.org/faq/



Re: [Slackbuilds-users] sbog ping: invalid download urls in .info files

2018-03-02 Thread B Watson
On 3/2/18, rundstut...@gmx.de  wrote:
> i didn't even know that sbosrcarch does exist (even though i have seen
> slackware.uk before). so the functionality i programmed does already
> exist.

Not exactly: it's the archive creation/maintenance script that does all
the "fake HEAD" requests. It's looking at the headers to decide whether
the file has changed. If it thinks the file's different, it downloads
a new copy for the archive. There's no "check links, but don't download
files" mode, so it doesn't do what "sbog ping" does.

> i can't find the sources of your ping script, though.

Here: http://urchlay.naptime.net/repos/sbostuff/tree/

...or: git clone git://urchlay.naptime.net/sbostuff.git

It's in perl, and not what I'd call beautiful code, so you might want
to put on your goggles before looking at it :)

> i hate workarounds with passion. its one of the banes of the software
> industry. what i am trying to do is actually what HEAD requests were
> made for.
> ...
> but alas - these servers are not supported by "sbog
> ping". if a server is not able to properly handle a HEAD request (which
> is not hard) then this is not the fault of sbog.

Right. In a perfect world, all web servers would comply with the spec,
and would support HEAD requests. Your approach is valid, but I took
a different approach, since I wanted sbosrcarch to be as complete an
archive as possible. The only things it doesn't have are files hidden
behind a click-through license (like Oracle's jdk download).

You might at least look through your logs and make a list of the servers
that don't do HEAD requests, and add some code that logs "this server
doesn't support HEAD requests", so the user knows it's the server's fault.

> if i read the output of "curl -v --head -X GET $url" correctly: the
> server will get a GET request, so the server (or proxy) will start
> sending the body (before the request is closed/cancelled), causing
> traffic overhead. correct me if i'm wrong. i don't want to do that.

You're right. How much overhead depends on how the server/proxy is
configured, but it seems to be 64K bytes per request usually. I decided
that wasn't a problem, partly because the guy who hosts the archive said
so (I wrote the script, he runs it and lets people download the files
it collects).

> there is another reason i am don't want to use GET requests: sbog uses a
> client pool, sending request concurrently to servers. this speeds
> things up *a lot*. i don't want to send several GET request to a
> server simultaneously.

Make the client pool smart enough to serialize requests to the same
server, and do them in parallel only when they're for different servers?
That seems like it'd be worth doing even with regular HEAD requests like
you use now.

sbosrcarch doesn't do anything in parallel, it's one request after
another. Normally it's run non-interactively (via cron job) so it doesn't
matter if it takes a long time to finish.
___
SlackBuilds-users mailing list
SlackBuilds-users@slackbuilds.org
https://lists.slackbuilds.org/mailman/listinfo/slackbuilds-users
Archives - https://lists.slackbuilds.org/pipermail/slackbuilds-users/
FAQ - https://slackbuilds.org/faq/



Re: [Slackbuilds-users] sbog ping: invalid download urls in .info files

2018-03-02 Thread rundstutzen
i didn't even know that sbosrcarch does exist (even though i have seen
slackware.uk before). so the functionality i programmed does already
exist. i can't find the sources of your ping script, though.

On Thu, 1 Mar 2018 14:21:12 -0500
B Watson  wrote:

> On 3/1/18, rundstut...@gmx.de  wrote:
> >
> > to understand the cause of some errors one has to know
> > how "sbog ping" works. for http urls a http "HEAD" request is send
> > to test if the url is valid. this can cause trouble, as some
> > servers are picky about the request method ("GET" vs. "HEAD").  
> 
> I ran into this when working on sbosrcarch. Not only do some servers
> not allow HEAD, there are also some that allow it but don't report the
> same info as GET (missing Content-length header, usually). The way I
> work around it is to do an actual GET request, but close the
> connection after the headers are received (before the content).

i hate workarounds with passion. its one of the banes of the software
industry. what i am trying to do is actually what HEAD requests were
made for. 
i had to fiddle but i got it working - most of the times. i had to add
some code for amazons3 servers (you probably know why), this saved me
from using get, so this is the price i'm willing to pay. i am happy
with the result (except the amazons3 code). there are a few servers
that will fail - but alas - these servers are not supported by "sbog
ping". if a server is not able to properly handle a HEAD request (which
is not hard) then this is not the fault of sbog.

> 
> sbosrcarch calls the curl command line tool, so it does this:
> 
> curl --head -X GET $url

if i read the output of "curl -v --head -X GET $url" correctly: the
server will get a GET request, so the server (or proxy) will start
sending the body (before the request is closed/cancelled), causing
traffic overhead. correct me if i'm wrong. i don't want to do that.

there is another reason i am don't want to use GET requests: sbog uses a
client pool, sending request concurrently to servers. this speeds
things up *a lot*. i don't want to send several GET request to a
server simultaneously.

regards,

heiko
___
SlackBuilds-users mailing list
SlackBuilds-users@slackbuilds.org
https://lists.slackbuilds.org/mailman/listinfo/slackbuilds-users
Archives - https://lists.slackbuilds.org/pipermail/slackbuilds-users/
FAQ - https://slackbuilds.org/faq/



Re: [Slackbuilds-users] sbog ping: invalid download urls in .info files

2018-03-01 Thread B Watson
On 3/1/18, rundstut...@gmx.de  wrote:
>
> to understand the cause of some errors one has to know
> how "sbog ping" works. for http urls a http "HEAD" request is send to
> test if the url is valid. this can cause trouble, as some servers are
> picky about the request method ("GET" vs. "HEAD").

I ran into this when working on sbosrcarch. Not only do some servers
not allow HEAD, there are also some that allow it but don't report the
same info as GET (missing Content-length header, usually). The way I
work around it is to do an actual GET request, but close the connection
after the headers are received (before the content).

sbosrcarch calls the curl command line tool, so it does this:

curl --head -X GET $url

sbog is written in a compiled language, presumably doesn't call external
commands like this, but you can do the same thing. Send a GET request,
read the headers until you get a blank line, then close the connection.
___
SlackBuilds-users mailing list
SlackBuilds-users@slackbuilds.org
https://lists.slackbuilds.org/mailman/listinfo/slackbuilds-users
Archives - https://lists.slackbuilds.org/pipermail/slackbuilds-users/
FAQ - https://slackbuilds.org/faq/