Re: Cassandra site broken links

2021-11-08 Thread Melissa Logan
Michael, thank you for bringing to attention. I'm reviewing with the folks
who created the site and I/they will reply with next steps.


On Sat, Nov 6, 2021 at 10:38 AM Michael Shuler 
wrote:

> I overwrote the result link - much better, no more 429s.
>
> https://12.am/tmp/cassandra.apache.org_muffet.log.txt
>
> - lots of page anchor problems
> - quite a few busted links
> - quite a few hosts that are gone
> - one link timeout
> - (a few "error" reports are each 200s, just headers)
>
> $ egrep '^\sid #' c*.log.txt |wc -l
> 1416
> $ egrep '^\s4' c*.log.txt |wc -l
> 55
> $ egrep '^\slookup' c*.log.txt |wc -l
> 20
> $ egrep '^\stimeout' c*.log.txt |wc -l
> 1
>
> Warm regards,
> Michael
>
> On 11/6/21 11:59 AM, Michael Shuler wrote:
> > FYI - I'm going to try to slow down the checks, since I just noticed a
> > bunch of the 4xx errors are "HTTP 429 Too Many Requests"
> >
> > Kind regards,
> > Michael
> >
> > On 11/6/21 11:52 AM, Michael Shuler wrote:
> >> (Sending to dev@ which seems a better place to discuss; updated
> >> subject. Thanks OP!)
> >>
> >> I ran a couple link checking tools on the site and there are lots more
> >> problems than the couple noted. This seems like a good task for a
> >> non-dev to make a substantial project impact. Muffet [0] seemed the
> >> quickest way to get some decent output. I grabbed the v2.4.4 binary
> >> release [1]; tar xzvf .., and:
> >>
> >> $ ./muffet https://cassandra.apache.org/ \
> >>   | tee -a cassandra.apache.org_muffet.log.txt
> >>
> >> result (2950 lines):
> >> https://12.am/tmp/cassandra.apache.org_muffet.log.txt
> >>
> >> $ egrep '^\s4' cassandra.apache.org_muffet.log.txt \
> >>   | wc -l
> >> 841
> >> $ egrep '^\sid #' cassandra.apache.org_muffet.log.txt \
> >>   | wc -l
> >> 1401
> >>
> >> [0] https://github.com/raviqqe/muffet
> >> [1] https://github.com/raviqqe/muffet/releases
> >>
> >> Kind regards,
> >> Michael
> >>
> >> On 11/5/21 4:09 PM, Greg Stein wrote:
> >>> see below:
> >>>
> >>> -- Forwarded message -
> >>> From: *Hubert Kulas*  >>> >
> >>> Date: Fri, Nov 5, 2021 at 1:29 PM
> >>> Subject: Not working links
> >>> To: mailto:webmas...@apache.org>>
> >>>
> >>>
> >>> Hi,
> >>>
> >>> I am writing my thesis about big data and I was doing some research
> >>> about real-world use cases of Cassandra. While doing that I found
> >>> that after clicking "read more" under 'Coursera'  leads us to
> >>> DataStax website where we are greeted with "You do not have access to
> >>> view this page" message. To reproduce it just go to
> >>> https://cassandra.apache.org/_/case-studies.html
> >>>  and then find
> >>> Coursera and click "read more".  Then after trying to find a way to
> >>> contact you guys about the problem I encountered another problem on
> >>> this part of the website
> >>> https://cassandra.apache.org/doc/3.11.5/contactus.html
> >>> 
> >>> After clicking the icon leads us to
> >>> https://cassandra.apache.org/feed.xml
> >>>  which gives us the 404 Not
> >>> Found message.
> >>> 2021-11-05_19h26_44.png
> >>>
> >>> Best Regards,
> >>> Hubert Kulas
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>
>

-- 
Melissa Logan (she/her)
Principal, Constantia.io
meli...@constantia.io
Cell: 503-317-8498
LinkedIn  | Twitter



Re: Cassandra site broken links

2021-11-06 Thread Michael Shuler

I overwrote the result link - much better, no more 429s.

https://12.am/tmp/cassandra.apache.org_muffet.log.txt

- lots of page anchor problems
- quite a few busted links
- quite a few hosts that are gone
- one link timeout
- (a few "error" reports are each 200s, just headers)

$ egrep '^\sid #' c*.log.txt |wc -l
1416
$ egrep '^\s4' c*.log.txt |wc -l
55
$ egrep '^\slookup' c*.log.txt |wc -l
20
$ egrep '^\stimeout' c*.log.txt |wc -l
1

Warm regards,
Michael

On 11/6/21 11:59 AM, Michael Shuler wrote:
FYI - I'm going to try to slow down the checks, since I just noticed a 
bunch of the 4xx errors are "HTTP 429 Too Many Requests"


Kind regards,
Michael

On 11/6/21 11:52 AM, Michael Shuler wrote:
(Sending to dev@ which seems a better place to discuss; updated 
subject. Thanks OP!)


I ran a couple link checking tools on the site and there are lots more 
problems than the couple noted. This seems like a good task for a 
non-dev to make a substantial project impact. Muffet [0] seemed the 
quickest way to get some decent output. I grabbed the v2.4.4 binary 
release [1]; tar xzvf .., and:


$ ./muffet https://cassandra.apache.org/ \
  | tee -a cassandra.apache.org_muffet.log.txt

result (2950 lines):
https://12.am/tmp/cassandra.apache.org_muffet.log.txt

$ egrep '^\s4' cassandra.apache.org_muffet.log.txt \
  | wc -l
841
$ egrep '^\sid #' cassandra.apache.org_muffet.log.txt \
  | wc -l
1401

[0] https://github.com/raviqqe/muffet
[1] https://github.com/raviqqe/muffet/releases

Kind regards,
Michael

On 11/5/21 4:09 PM, Greg Stein wrote:

see below:

-- Forwarded message -
From: *Hubert Kulas* >

Date: Fri, Nov 5, 2021 at 1:29 PM
Subject: Not working links
To: mailto:webmas...@apache.org>>


Hi,

I am writing my thesis about big data and I was doing some research 
about real-world use cases of Cassandra. While doing that I found 
that after clicking "read more" under 'Coursera'  leads us to 
DataStax website where we are greeted with "You do not have access to 
view this page" message. To reproduce it just go to 
https://cassandra.apache.org/_/case-studies.html 
 and then find 
Coursera and click "read more".  Then after trying to find a way to 
contact you guys about the problem I encountered another problem on 
this part of the website 
https://cassandra.apache.org/doc/3.11.5/contactus.html 

After clicking the icon leads us to 
https://cassandra.apache.org/feed.xml 
 which gives us the 404 Not 
Found message.

2021-11-05_19h26_44.png

Best Regards,
Hubert Kulas


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Re: Cassandra site broken links

2021-11-06 Thread Michael Shuler
FYI - I'm going to try to slow down the checks, since I just noticed a 
bunch of the 4xx errors are "HTTP 429 Too Many Requests"


Kind regards,
Michael

On 11/6/21 11:52 AM, Michael Shuler wrote:
(Sending to dev@ which seems a better place to discuss; updated subject. 
Thanks OP!)


I ran a couple link checking tools on the site and there are lots more 
problems than the couple noted. This seems like a good task for a 
non-dev to make a substantial project impact. Muffet [0] seemed the 
quickest way to get some decent output. I grabbed the v2.4.4 binary 
release [1]; tar xzvf .., and:


$ ./muffet https://cassandra.apache.org/ \
  | tee -a cassandra.apache.org_muffet.log.txt

result (2950 lines):
https://12.am/tmp/cassandra.apache.org_muffet.log.txt

$ egrep '^\s4' cassandra.apache.org_muffet.log.txt \
  | wc -l
841
$ egrep '^\sid #' cassandra.apache.org_muffet.log.txt \
  | wc -l
1401

[0] https://github.com/raviqqe/muffet
[1] https://github.com/raviqqe/muffet/releases

Kind regards,
Michael

On 11/5/21 4:09 PM, Greg Stein wrote:

see below:

-- Forwarded message -
From: *Hubert Kulas* >

Date: Fri, Nov 5, 2021 at 1:29 PM
Subject: Not working links
To: mailto:webmas...@apache.org>>


Hi,

I am writing my thesis about big data and I was doing some research 
about real-world use cases of Cassandra. While doing that I found that 
after clicking "read more" under 'Coursera'  leads us to DataStax 
website where we are greeted with "You do not have access to view this 
page" message. To reproduce it just go to 
https://cassandra.apache.org/_/case-studies.html 
 and then find 
Coursera and click "read more".  Then after trying to find a way to 
contact you guys about the problem I encountered another problem on 
this part of the website 
https://cassandra.apache.org/doc/3.11.5/contactus.html 

After clicking the icon leads us to 
https://cassandra.apache.org/feed.xml 
 which gives us the 404 Not 
Found message.

2021-11-05_19h26_44.png

Best Regards,
Hubert Kulas


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org



Cassandra site broken links

2021-11-06 Thread Michael Shuler
(Sending to dev@ which seems a better place to discuss; updated subject. 
Thanks OP!)


I ran a couple link checking tools on the site and there are lots more 
problems than the couple noted. This seems like a good task for a 
non-dev to make a substantial project impact. Muffet [0] seemed the 
quickest way to get some decent output. I grabbed the v2.4.4 binary 
release [1]; tar xzvf .., and:


$ ./muffet https://cassandra.apache.org/ \
 | tee -a cassandra.apache.org_muffet.log.txt

result (2950 lines):
https://12.am/tmp/cassandra.apache.org_muffet.log.txt

$ egrep '^\s4' cassandra.apache.org_muffet.log.txt \
 | wc -l
841
$ egrep '^\sid #' cassandra.apache.org_muffet.log.txt \
 | wc -l
1401

[0] https://github.com/raviqqe/muffet
[1] https://github.com/raviqqe/muffet/releases

Kind regards,
Michael

On 11/5/21 4:09 PM, Greg Stein wrote:

see below:

-- Forwarded message -
From: *Hubert Kulas* >

Date: Fri, Nov 5, 2021 at 1:29 PM
Subject: Not working links
To: mailto:webmas...@apache.org>>


Hi,

I am writing my thesis about big data and I was doing some research 
about real-world use cases of Cassandra. While doing that I found that 
after clicking "read more" under 'Coursera'  leads us to DataStax 
website where we are greeted with "You do not have access to view this 
page" message. To reproduce it just go to 
https://cassandra.apache.org/_/case-studies.html 
 and then find 
Coursera and click "read more".  Then after trying to find a way to 
contact you guys about the problem I encountered another problem on this 
part of the website 
https://cassandra.apache.org/doc/3.11.5/contactus.html 

After clicking the icon leads us to 
https://cassandra.apache.org/feed.xml 
 which gives us the 404 Not Found 
message.

2021-11-05_19h26_44.png

Best Regards,
Hubert Kulas


-
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org