Re: [PROPOSAL] improvements to site-scan.rb/site.cgi

2018-04-11 Thread Sam Ruby
One other consideration: it probably continues to make sense for the
CGI to describe the check that is being made to the end user.  A
regular expression is barely adequate for that, but better than no
indication.

- Sam Ruby

On Wed, Apr 11, 2018 at 9:46 AM, Shane Curcuru  wrote:
> I'd like to simplify some of the site-scan.rb/site.cgi processing by
> centralizing some of the core things that the scripts are searching for
> into site-scan.rb.  While I appreciate the original design motivation,
> we currently have duplicate regexes - and we have more people interested
> in using the results of the site scan (esp. with events) and officers
> potentially requesting changes to the requirements.
>
> Roughly, I'd like to move most of CHECKS into site-scan.rb for
> simplicity and use those to implement most of the link scans.  Some of
> the scans still have more logic (which would still be custom), but some
> of them can be mechanical.
>
> CHECKS = {
>   'events'  =>
> [
>   '',
>   # a_text regex to scan for - for events, we don't care, so blank
>   '/apache.org/events',
>   # a_href minimal regex to capture - for events, this tells us what
> link to capture from the page
>   %r{^https?://.*apache.org/events/current-event}
>   # a_href full regex to expect for compliance (used in site.cgi)
> ],
>
>   'license'  =>
> [
>   '/licenses?/',
>   # a_text regex to scan for - for license, this is required
>   'apache.org',
>   # a_href minimal regex to capture - for license, we only capture
> the link if it points to apache.org
>   %r{^https?://.*apache.org/licenses/$}
>   # a_href full regex to expect for compliance; it must point to one
> of our actual licenses to pass
> ],
> ...etc.
> }
>
> Any overall objections?  It's making me twitchy seeing most of the
> regexes we use for scanning in separate places.
>
> --
>
> - Shane
>   Director & Member
>   The Apache Software Foundation


[jira] [Commented] (WHIMSY-191) site checker ignore proper license link

2018-04-11 Thread Sebb (JIRA)

[ 
https://issues.apache.org/jira/browse/WHIMSY-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434546#comment-16434546
 ] 

Sebb commented on WHIMSY-191:
-

Agreed.
Whimsy needs to distinguish the footer license link from the code license link.

> site checker ignore proper license link
> ---
>
> Key: WHIMSY-191
> URL: https://issues.apache.org/jira/browse/WHIMSY-191
> Project: Whimsy
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Priority: Major
>
> The ORC website has:
> {code:html}
> https://www.apache.org/licenses/;>license
> 
> https://www.apache.org/licenses/LICENSE-2.0.html;>
>   ApacheLicensev2
> {code}
> which the whimsey site check flags as yellow for license. It seems to be 
> because the second link, which is in the footer as the license for the site 
> content, is the one that the script checks. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apache Corporate Contributor License Agreement (CCLA) for UShareSoft

2018-04-11 Thread Craig Russell
I did not find any indication that this message with attachment was put into 
the workbench for processing.

Any ideas why not?

Craig

> On Apr 9, 2018, at 7:09 AM, James Weir  wrote:
> 
> Dear Sir/Madam,
> 
> Please find enclosed the Apache Corporate Contributor License Agreement 
> (CCLA) for UShareSoft.  As a company we wish to be able to contribute to 
> Apache projects (notably Apache Brooklyn and Apache jClouds).
> 
> We understand that each developer will also have to sign an ICLA.  We will do 
> this once we are ready to submit our first PR.
> 
> Please let me know if you require any further information regarding this 
> request.
> 
> Kind Regards
> James Weir
> 
>  -- 
> James Weir
> Chief Technology Officer
> ja...@usharesoft.com 
> @jamesgweir
> Linkedin: http://www.linkedin.com/in/jamesweir 
> 
> Tel: +33 (0)675 23 80 23
> www.usharesoft.com 
> @usharesoft
> 

Craig L Russell
Secretary, Apache Software Foundation
c...@apache.org  http://db.apache.org/jdo 



[jira] [Created] (WHIMSY-192) copyright site checker rule should accept ©

2018-04-11 Thread Owen O'Malley (JIRA)
Owen O'Malley created WHIMSY-192:


 Summary: copyright site checker rule should accept ©
 Key: WHIMSY-192
 URL: https://issues.apache.org/jira/browse/WHIMSY-192
 Project: Whimsy
  Issue Type: Bug
Reporter: Owen O'Malley


The current copyright site checker rule requires "[Cc]opyright". It would be 
nice for it to accept '©' as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (WHIMSY-191) site checker ignore proper license link

2018-04-11 Thread Owen O'Malley (JIRA)
Owen O'Malley created WHIMSY-191:


 Summary: site checker ignore proper license link
 Key: WHIMSY-191
 URL: https://issues.apache.org/jira/browse/WHIMSY-191
 Project: Whimsy
  Issue Type: Bug
Reporter: Owen O'Malley


The ORC website has:

{code:html}
https://www.apache.org/licenses/;>license

https://www.apache.org/licenses/LICENSE-2.0.html;>
  ApacheLicensev2
{code}

which the whimsey site check flags as yellow for license. It seems to be 
because the second link, which is in the footer as the license for the site 
content, is the one that the script checks. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [PROPOSAL] improvements to site-scan.rb/site.cgi

2018-04-11 Thread Shane Curcuru
Also - any objection to taking most of the code from www/pods.cgi out
and replacing with shared methods from site.cgi somehow?  Obviously
pods.cgi needs to display some different text, and may have additional
checks, but really most of the code should be identical to site.cgi now
that the scanning is done all by site-scan.rb.

-- 

- Shane
  Director & Member
  The Apache Software Foundation


[PROPOSAL] improvements to site-scan.rb/site.cgi

2018-04-11 Thread Shane Curcuru
I'd like to simplify some of the site-scan.rb/site.cgi processing by
centralizing some of the core things that the scripts are searching for
into site-scan.rb.  While I appreciate the original design motivation,
we currently have duplicate regexes - and we have more people interested
in using the results of the site scan (esp. with events) and officers
potentially requesting changes to the requirements.

Roughly, I'd like to move most of CHECKS into site-scan.rb for
simplicity and use those to implement most of the link scans.  Some of
the scans still have more logic (which would still be custom), but some
of them can be mechanical.

CHECKS = {
  'events'  =>
[
  '',
  # a_text regex to scan for - for events, we don't care, so blank
  '/apache.org/events',
  # a_href minimal regex to capture - for events, this tells us what
link to capture from the page
  %r{^https?://.*apache.org/events/current-event}
  # a_href full regex to expect for compliance (used in site.cgi)
],

  'license'  =>
[
  '/licenses?/',
  # a_text regex to scan for - for license, this is required
  'apache.org',
  # a_href minimal regex to capture - for license, we only capture
the link if it points to apache.org
  %r{^https?://.*apache.org/licenses/$}
  # a_href full regex to expect for compliance; it must point to one
of our actual licenses to pass
],
...etc.
}

Any overall objections?  It's making me twitchy seeing most of the
regexes we use for scanning in separate places.

--

- Shane
  Director & Member
  The Apache Software Foundation