Bug#326429: ITP: webcheck -- website link and structure checker

2005-09-09 Thread Arthur de Jong
On Wed, 2005-09-07 at 00:47 -0500, Peter Samuelson wrote:
 [Arthur de Jong]
  I'm not sure if I need some statement on the copyrights on the
  generated html files. The css file that is just copied has a BSD
  license.
 
 Generally, output from a program is not considered to be copyrighted.
 The templates from which it is built could be copyrighted, and if
 significant bits of a template are copied in verbatim, you may wish to
 copy in a license statement from the template too.

The templates are embedded in the python code (e.g. write('html
code')) (except for the mentioned css file). The python code is GPL.
Most of the content (links, titles, other gathered information) is from
the crawled website.

  The old package provides, conflicts with and replaces linbot (the
  name of webcheck a long time ago). Should I keep that or just drop
  it? (linbot was in slink, potato and woody but neither linbot or
  webcheck were in sarge)
 
 Completely your call.  You do not need to support upgrades from woody
 or prior, but you can if you wish.  Three lines in debian/control
 which you'll never need to change is a pretty cheap price, but it *is*
 untidy if you want a minimalist control file.

I think I'll keep those lines for a while then (they're not in the way
for now).

  The old package has a configuration file in /etc/webcheck and the
  new package no longer provides that. What would be the best way to
  get rid of it? (policy 10.7.3 has a note about removing conffiles
  but I'm not sure it's relevant) Should I delete it on upgrade?
 
 Is the package configured in some other way, or have you dropped
 support for any site-wide configuration?  If you still have a
 configuration mechanism, it's best if you can migrate /etc/webcheck to
 the new scheme automatically, then delete it, at upgrade time.  If
 not, you can just delete it.

The new package does not use a configuration file at all any more (the
config.php file is still there but it is not really meant to be edited
any more and is not in /etc). I don't think I want to keep a site-wide
configfile. Maybe I'll support specifying a configfile from the
command-line one day.

I'm going for completely removing the /etc/webcheck directory on
upgrades. Anyone think there should be a debconf question about this at
install time (e.g. test if /etc/webcheck exists and ask the user to
remove it)?

  Btw, I'm packaging this as a native Debian package because I just
  want to release one version and have one source tarball.
 
 Not recommended - you'll have to release a whole new upstream
 version any time you fix a trivial Debian bug, or even just to
 recompile against a newer sid library.  Providing backports or forks
 (for etch after etch is frozen) will require new upstream version
 numbers, which will confuse your non-Debian audience (wait, what's
 the current release?  Upstream 3.1.15 and 3.1.15~etch1 were released
 at the same time, but 3.0.4.etch2 was just added to the debian ftp
 site)

I think this would avoid confusion since every Debian version is also a
released version. If a release changes just Debian packaging (unlikely
at the moment since it is in development) that will be documented in the
NEWS file. Since this is a python package with architecture: all the
risk of recompiles are minimal. I'm not too worried about the version
numbers of backports or forks because priority is extra (not likely to
be affected by major freeze problems) but adding a .etch# or .sarge#
suffix shouldn't cause too much confusion.

 And there's the bandwidth issue - you and the build daemons have to
 transfer the whole source tarball every time you make a trivial change
 to debian/*.

The current tarball is pretty small (45k plus maybe 5k for a debian
directory) and since the architecture is all (no buildds) I don't think
we'll be having a lot of bandwidth issues.

Thanks for you comments!

-- 
-- arthur - [EMAIL PROTECTED] - http://people.debian.org/~adejong --


signature.asc
Description: This is a digitally signed message part


Bug#326429: ITP: webcheck -- website link and structure checker

2005-09-07 Thread Peter Samuelson

[Arthur de Jong]
 * I'm not sure if I need some statement on the copyrights on the
   generated html files. The css file that is just copied has a BSD
   license.

Generally, output from a program is not considered to be copyrighted.
The templates from which it is built could be copyrighted, and if
significant bits of a template are copied in verbatim, you may wish to
copy in a license statement from the template too.

 * The old package provides, conflicts with and replaces linbot (the
   name of webcheck a long time ago). Should I keep that or just drop
   it?  (linbot was in slink, potato and woody but neither linbot or
   webcheck were in sarge)

Completely your call.  You do not need to support upgrades from woody
or prior, but you can if you wish.  Three lines in debian/control which
you'll never need to change is a pretty cheap price, but it *is* untidy
if you want a minimalist control file.

 * The old package has a configuration file in /etc/webcheck and the
   new package no longer provides that. What would be the best way to
   get rid of it? (policy 10.7.3 has a note about removing conffiles
   but I'm not sure it's relevant) Should I delete it on upgrade?

Is the package configured in some other way, or have you dropped
support for any site-wide configuration?  If you still have a
configuration mechanism, it's best if you can migrate /etc/webcheck to
the new scheme automatically, then delete it, at upgrade time.  If not,
you can just delete it.

 Btw, I'm packaging this as a native Debian package because I just
 want to release one version and have one source tarball.

Not recommended - you'll have to release a whole new upstream version
any time you fix a trivial Debian bug, or even just to recompile
against a newer sid library.  Providing backports or forks (for etch
after etch is frozen) will require new upstream version numbers, which
will confuse your non-Debian audience (wait, what's the current
release?  Upstream 3.1.15 and 3.1.15~etch1 were released at the same
time, but 3.0.4.etch2 was just added to the debian ftp site)

And there's the bandwidth issue - you and the build daemons have to
transfer the whole source tarball every time you make a trivial change
to debian/*.

But again, this one's your call.

Peter


signature.asc
Description: Digital signature


Bug#326429: ITP: webcheck -- website link and structure checker

2005-09-03 Thread Arthur de Jong
Subject: ITP: webcheck -- website link and structure checker
Package: wnpp
Owner: Arthur de Jong [EMAIL PROTECTED]
Severity: wishlist

* Package name: webcheck
  Version : 1.9.3
  Upstream Author : Arthur de Jong [EMAIL PROTECTED]
* URL : http://ch.tudelft.nl/~arthur/webcheck/
* License : GPL
  Description : website link and structure checker
 webcheck is a website checking tool for webmasters. It crawls a given
 website and generates a number of reports in the form of html pages.
 It is easy to use and generates simple, clear and readable reports.
 .
 Features of webcheck include:
  * support for http, https, ftp and file schemes
  * view the structure of a site
  * track down broken links
  * find potentially outdated and new pages
  * list links pointing to external sites
  * can run without user intervention

Webcheck (release 1.0) was in Debian before, but was removed on
2005-02-02 (orphaned on 2004-05-31, request for removal on 2004-11-02,
see bug #251931) because it was no longer maintained and there was no
upstream development. Since I have taken over development and will
maintain this package this is no longer the case.

Bugs that were open at the time webcheck was removed:
#71419 request for internationalization of output
  this is on the TODO list as a wishlist item
#192868 handle relative links to file:/ urls nicely
  using file:/// urls is possible and supported, automatically
  translating relative or absolute paths into file:/// is also
  supported now
#201154 please handle url()'s in stylesheets
  this is also implemented in the current version of webcheck
#253424 recursion problem in sitemap module (key not found)
  this should be solved as most of the code is rewritten
#271085 produce a validation report (using wdg-html-validator)
  there is an item on the TODO list to implement more html validation
  checking, however I do not think that submitting a large number of
  html pages to a web-based checker is a good idea
#286017 problem with recent python version
  webcheck is developed using python 2.3 but is tested regularly with
  python 2.4 so this should be solved
(this leaves two wishlist bugs in the TODO list)

There are a couple of questions left though:
* I'm not sure if I need some statement on the copyrights on the
  generated html files. The css file that is just copied has a BSD
  license.
* The old package provides, conflicts with and replaces linbot (the name
  of webcheck a long time ago). Should I keep that or just drop it?
  (linbot was in slink, potato and woody but neither linbot or webcheck
  were in sarge)
* The old package has a configuration file in /etc/webcheck and the new
  package no longer provides that. What would be the best way to get rid
  of it? (policy 10.7.3 has a note about removing conffiles but I'm not
  sure it's relevant) Should I delete it on upgrade?

Btw, I'm packaging this as a native Debian package because I just want
to release one version and have one source tarball.

-- 
-- arthur - [EMAIL PROTECTED] - http://people.debian.org/~adejong --


signature.asc
Description: This is a digitally signed message part