Re: [Bug-wget] Hello again

2018-10-12 Thread michael


Hello Darshit Shah,

Converting a CMS system to static HTML pages is not a solution that suite all. 
Some sites which want to be 'dynamic' and retain "backward flik-flak" abilities 
might not use wget2 and retain their CMS or software behavior.

Many people creating a website use CMS to generate the site because of its 
abilities to retain uniform website and make every change in GUI site-wide. 
Those people might want to have the static website as it is faster to download 
(Google SEO factor) and much more secure - hiding the CMS location and 
preventing login attempts.

If those people would want to retain features as RSS feeds, we might be able to 
tell them how they can have it.

If a website contains some hidden pages that are connected by JavaScript code, 
the programmer might create a shell script calling wget2 specifying each hidden 
page location.

Have a good weekend!

Michael



-Original Message-
From: 'Darshit Shah'  
Sent: Thursday, 11 October, 2018 12:35 PM
To: mich...@cyber-dome.com
Cc: bug-wget@gnu.org
Subject: Re: [Bug-wget] Hello again

* mich...@cyber-dome.com  [181009 17:12]:
> 
> Hello Darshit Shah,
> 
> Thank you for your welcome message. I am glad to be part of your project!
> 
> I don't understand the term "javascript engine". AFAK javascript is code that 
> run on the browser side, and we have no problem fetching it.
>
Exactly! Javascript is code that is executed on the client side and hence
requires a javascript engine which interprets the code and executes it.
However, Wget does not and will not package a javscript engine in order to run
those scripts. This means, sites where Javascript is used to create hyperlinks
won't work well when scraped through Wget.
> 
> There might be an "ajax" issues with sites rely on it. Ajax is dealt heavy by 
> programmers and they will have to take some action on their site to 
> incorporate the engine.

Similarly, sites that use Javascript to show menus or create AJAX requests are
usually not amenable to being scraped as a static HTML page.
> 
> POST requests to comments and mail will need to taken care of so they will 
> work on static site. One solution is to do hosted supplier that will carry 
> the task and deliver spam removal as well.
> I think I will be able to a howto document on that.
> 
> Michael
> 
> -Original Message-
> From: Darshit Shah  
> Sent: Tuesday, 9 October, 2018 2:52 PM
> To: mich...@cyber-dome.com
> Cc: bug-wget@gnu.org
> Subject: Re: [Bug-wget] Hello again
> 
> Hi Michael,
> 
> Nice to hear from you again. I vaguely remember a mention of someone who 
> wanted
> to work on this feature. When deciding to make this work, please remember that
> any of this can only work if the site does not rely on Javascript; which given
> Wordpress is a difficult thing. The reason for this is that we do _not_ intend
> to ship a javascript engine alongwith Wget2. It is too large, unwieldy and too
> much of a maintenance nightmare. However, if the site can work without
> Javascript, then I would assume that Wget2 can already handle making a static
> copy. If it can't handle something, please let us know / file a bug report
> about it.
> 
> Of course, I welcome you to work on Wget2 as you see fit. And we would love to
> look at any contributions you can make. We will also try and help you out as
> much as possible when dealing with the codebase.
> 
> About the dev setup, I only use vim and gdb to work with Wget. As Tim has
> already mentioned, he uses Netbeans and might be able to help you out.
> 
> You also mentioned something about the lib/ directory. That is an
> auto-generated dir with compatibility libs that you don't need to care about.
> All the code for Wget2 is in src/ and the code for the library is in libwget/.
> Those are the two main directories you need to care about. And of course 
> tests/
> for the tests.
> 
> * mich...@cyber-dome.com  [181008 21:22]:
> > 
> > Hello again,
> > 
> > My name is Michael. I have approached you about a year ago.
> > 
> > I am interested in making wget2 a tool that can convert content management
> > systems (like WordPress) output to HTML. This actually limits the content
> > management system to generate the website every time it is changed, and the
> > presentation is done using the HTTP server only.
> > 
> > This is an important feature as it prevents security risk - penetration of
> > hacker to the site and installing viruses or stealing data.
> > It also allows the website to be delivered much faster as no PHP code needs
> > to run in order to deliver the content. Google already announced that site
> > download speed is a factor in its SEO evaluation.
> > 
> > I will be able

Re: [Bug-wget] Hello again

2018-10-11 Thread 'Darshit Shah'
* mich...@cyber-dome.com  [181009 17:12]:
> 
> Hello Darshit Shah,
> 
> Thank you for your welcome message. I am glad to be part of your project!
> 
> I don't understand the term "javascript engine". AFAK javascript is code that 
> run on the browser side, and we have no problem fetching it.
>
Exactly! Javascript is code that is executed on the client side and hence
requires a javascript engine which interprets the code and executes it.
However, Wget does not and will not package a javscript engine in order to run
those scripts. This means, sites where Javascript is used to create hyperlinks
won't work well when scraped through Wget.
> 
> There might be an "ajax" issues with sites rely on it. Ajax is dealt heavy by 
> programmers and they will have to take some action on their site to 
> incorporate the engine.

Similarly, sites that use Javascript to show menus or create AJAX requests are
usually not amenable to being scraped as a static HTML page.
> 
> POST requests to comments and mail will need to taken care of so they will 
> work on static site. One solution is to do hosted supplier that will carry 
> the task and deliver spam removal as well.
> I think I will be able to a howto document on that.
> 
> Michael
> 
> -Original Message-
> From: Darshit Shah  
> Sent: Tuesday, 9 October, 2018 2:52 PM
> To: mich...@cyber-dome.com
> Cc: bug-wget@gnu.org
> Subject: Re: [Bug-wget] Hello again
> 
> Hi Michael,
> 
> Nice to hear from you again. I vaguely remember a mention of someone who 
> wanted
> to work on this feature. When deciding to make this work, please remember that
> any of this can only work if the site does not rely on Javascript; which given
> Wordpress is a difficult thing. The reason for this is that we do _not_ intend
> to ship a javascript engine alongwith Wget2. It is too large, unwieldy and too
> much of a maintenance nightmare. However, if the site can work without
> Javascript, then I would assume that Wget2 can already handle making a static
> copy. If it can't handle something, please let us know / file a bug report
> about it.
> 
> Of course, I welcome you to work on Wget2 as you see fit. And we would love to
> look at any contributions you can make. We will also try and help you out as
> much as possible when dealing with the codebase.
> 
> About the dev setup, I only use vim and gdb to work with Wget. As Tim has
> already mentioned, he uses Netbeans and might be able to help you out.
> 
> You also mentioned something about the lib/ directory. That is an
> auto-generated dir with compatibility libs that you don't need to care about.
> All the code for Wget2 is in src/ and the code for the library is in libwget/.
> Those are the two main directories you need to care about. And of course 
> tests/
> for the tests.
> 
> * mich...@cyber-dome.com  [181008 21:22]:
> > 
> > Hello again,
> > 
> > My name is Michael. I have approached you about a year ago.
> > 
> > I am interested in making wget2 a tool that can convert content management
> > systems (like WordPress) output to HTML. This actually limits the content
> > management system to generate the website every time it is changed, and the
> > presentation is done using the HTTP server only.
> > 
> > This is an important feature as it prevents security risk - penetration of
> > hacker to the site and installing viruses or stealing data.
> > It also allows the website to be delivered much faster as no PHP code needs
> > to run in order to deliver the content. Google already announced that site
> > download speed is a factor in its SEO evaluation.
> > 
> > I will be able to work for 3 hours every week on the project. I do need some
> > guidance from you.
> > 
> > I have started to configure Netbeans IDE as using a debugger can help me
> > delve into the code much faster. There are some issues with the Netbeans. Do
> > you use Id? Which one?
> > 
> > Best regards,
> > 
> > Michael
> > 
> > 
> > 
> > 
> 
> -- 
> Thanking You,
> Darshit Shah
> PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Hello again

2018-10-09 Thread michael


Hello Darshit Shah,

Thank you for your welcome message. I am glad to be part of your project!

I don't understand the term "javascript engine". AFAK javascript is code that 
run on the browser side, and we have no problem fetching it.

There might be an "ajax" issues with sites rely on it. Ajax is dealt heavy by 
programmers and they will have to take some action on their site to incorporate 
the engine.

POST requests to comments and mail will need to taken care of so they will work 
on static site. One solution is to do hosted supplier that will carry the task 
and deliver spam removal as well.
I think I will be able to a howto document on that.

Michael

-Original Message-
From: Darshit Shah  
Sent: Tuesday, 9 October, 2018 2:52 PM
To: mich...@cyber-dome.com
Cc: bug-wget@gnu.org
Subject: Re: [Bug-wget] Hello again

Hi Michael,

Nice to hear from you again. I vaguely remember a mention of someone who wanted
to work on this feature. When deciding to make this work, please remember that
any of this can only work if the site does not rely on Javascript; which given
Wordpress is a difficult thing. The reason for this is that we do _not_ intend
to ship a javascript engine alongwith Wget2. It is too large, unwieldy and too
much of a maintenance nightmare. However, if the site can work without
Javascript, then I would assume that Wget2 can already handle making a static
copy. If it can't handle something, please let us know / file a bug report
about it.

Of course, I welcome you to work on Wget2 as you see fit. And we would love to
look at any contributions you can make. We will also try and help you out as
much as possible when dealing with the codebase.

About the dev setup, I only use vim and gdb to work with Wget. As Tim has
already mentioned, he uses Netbeans and might be able to help you out.

You also mentioned something about the lib/ directory. That is an
auto-generated dir with compatibility libs that you don't need to care about.
All the code for Wget2 is in src/ and the code for the library is in libwget/.
Those are the two main directories you need to care about. And of course tests/
for the tests.

* mich...@cyber-dome.com  [181008 21:22]:
> 
> Hello again,
> 
> My name is Michael. I have approached you about a year ago.
> 
> I am interested in making wget2 a tool that can convert content management
> systems (like WordPress) output to HTML. This actually limits the content
> management system to generate the website every time it is changed, and the
> presentation is done using the HTTP server only.
> 
> This is an important feature as it prevents security risk - penetration of
> hacker to the site and installing viruses or stealing data.
> It also allows the website to be delivered much faster as no PHP code needs
> to run in order to deliver the content. Google already announced that site
> download speed is a factor in its SEO evaluation.
> 
> I will be able to work for 3 hours every week on the project. I do need some
> guidance from you.
> 
> I have started to configure Netbeans IDE as using a debugger can help me
> delve into the code much faster. There are some issues with the Netbeans. Do
> you use Id? Which one?
> 
> Best regards,
> 
> Michael
> 
> 
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6




Re: [Bug-wget] Hello again

2018-10-09 Thread michael
Thank you!

-Original Message-
From: Tim Rühsen  
Sent: Tuesday, 9 October, 2018 10:55 AM
To: mich...@cyber-dome.com; bug-wget@gnu.org
Subject: Re: [Bug-wget] Hello again

On 10/8/18 10:27 PM, mich...@cyber-dome.com wrote:
> The issues that I have is this:
> 
> Since the source code is split in various directories (src, lib) the Netbeans 
> lose track of source code in the lib directory.
> I verified it using gdb. (You can see how dip I went).

lib/ is a automatically created directory (gnulib stuff, created by
'bootstrap') and normally you are not interested in it's contents.

You might have the same issue with the test directories and fuzz/. I
normally right click on the file I am interested in and enable 'Code
Assistance'.

> 
> So, can you send me your Netbeans project settings?

Not the private/ stuff, but here is nbproject/configurations.xml and
nbproject/project.xml.

Regards, Tim




Re: [Bug-wget] Hello again

2018-10-09 Thread Darshit Shah
Hi Michael,

Nice to hear from you again. I vaguely remember a mention of someone who wanted
to work on this feature. When deciding to make this work, please remember that
any of this can only work if the site does not rely on Javascript; which given
Wordpress is a difficult thing. The reason for this is that we do _not_ intend
to ship a javascript engine alongwith Wget2. It is too large, unwieldy and too
much of a maintenance nightmare. However, if the site can work without
Javascript, then I would assume that Wget2 can already handle making a static
copy. If it can't handle something, please let us know / file a bug report
about it.

Of course, I welcome you to work on Wget2 as you see fit. And we would love to
look at any contributions you can make. We will also try and help you out as
much as possible when dealing with the codebase.

About the dev setup, I only use vim and gdb to work with Wget. As Tim has
already mentioned, he uses Netbeans and might be able to help you out.

You also mentioned something about the lib/ directory. That is an
auto-generated dir with compatibility libs that you don't need to care about.
All the code for Wget2 is in src/ and the code for the library is in libwget/.
Those are the two main directories you need to care about. And of course tests/
for the tests.

* mich...@cyber-dome.com  [181008 21:22]:
> 
> Hello again,
> 
> My name is Michael. I have approached you about a year ago.
> 
> I am interested in making wget2 a tool that can convert content management
> systems (like WordPress) output to HTML. This actually limits the content
> management system to generate the website every time it is changed, and the
> presentation is done using the HTTP server only.
> 
> This is an important feature as it prevents security risk - penetration of
> hacker to the site and installing viruses or stealing data.
> It also allows the website to be delivered much faster as no PHP code needs
> to run in order to deliver the content. Google already announced that site
> download speed is a factor in its SEO evaluation.
> 
> I will be able to work for 3 hours every week on the project. I do need some
> guidance from you.
> 
> I have started to configure Netbeans IDE as using a debugger can help me
> delve into the code much faster. There are some issues with the Netbeans. Do
> you use Id? Which one?
> 
> Best regards,
> 
> Michael
> 
> 
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6


signature.asc
Description: PGP signature


Re: [Bug-wget] Hello again

2018-10-09 Thread Tim Rühsen
On 10/8/18 10:27 PM, mich...@cyber-dome.com wrote:
> The issues that I have is this:
> 
> Since the source code is split in various directories (src, lib) the Netbeans 
> lose track of source code in the lib directory.
> I verified it using gdb. (You can see how dip I went).

lib/ is a automatically created directory (gnulib stuff, created by
'bootstrap') and normally you are not interested in it's contents.

You might have the same issue with the test directories and fuzz/. I
normally right click on the file I am interested in and enable 'Code
Assistance'.

> 
> So, can you send me your Netbeans project settings?

Not the private/ stuff, but here is nbproject/configurations.xml and
nbproject/project.xml.

Regards, Tim


  

  
check_url_types.c
getstream.c
http_get.c
http_get2.c
http_multi_get.c
print_css_urls.c
print_css_urls2.c
print_css_urls3.c
print_html_urls.c
websequencediagram.c
websequencediagram_high.c
  
  
libwget_base64_fuzzer.c
libwget_hpkp_fuzzer.c
libwget_hsts_fuzzer.c
libwget_netrc_fuzzer.c
libwget_ocsp_fuzzer.c
libwget_tlssess_fuzzer.c
libwget_utils_fuzzer.c
main.c
wget_http_client_fuzzer.c
wget_options_fuzzer.c
wget_skip_fuzzer.c
  
  

  cond.c
  lock.c
  thread.c
  threadlib.c


  scratch_buffer_grow.c
  scratch_buffer_grow_preserve.c
  scratch_buffer_set_array_size.c

asnprintf.c
basename-lgpl.c
basename.c
binary-io.c
c-ctype.c
c-strcasecmp.c
c-strcasestr.c
c-strncasecmp.c
cloexec.c
dirname-lgpl.c
dirname.c
dup-safer-flag.c
dup-safer.c
exitfail.c
fatal-signal.c
fclose.c
fcntl.c
fd-hook.c
fd-safer-flag.c
fd-safer.c
fflush.c
fpurge.c
freading.c
fseek.c
fseeko.c
getprogname.c
gettime.c
glob.c
glob_pattern_p.c
globfree.c
hard-locale.c
ioctl.c
localcharset.c
localename.c
localtime-buffer.c
malloca.c
mbrtowc.c
md2.c
md5.c
nanosleep.c
pipe-safer.c
pipe2-safer.c
pipe2.c
printf-args.c
printf-parse.c
progname.c
safe-write.c
sha1.c
sha256.c
sha512.c
sig-handler.c
sockets.c
spawn-pipe.c
stat-time.c
stripslash.c
strnlen1.c
sys_socket.c
tempname.c
timespec.c
u64.c
unistd.c
utimens.c
vasnprintf.c
wait-process.c
wctype-h.c
xalloc-die.c
xmalloc.c
xsize.c
xstrndup.c
  
  
atom_url.c
bar.c
base64.c
bitmap.c
buffer.c
buffer_printf.c
console.c
cookie.c
css.c
css_tokenizer.c
css_url.c
decompressor.c
dns.c
dns_cache.c
encoding.c
error.c
hash_printf.c
hashfile.c
hashmap.c
hpkp.c
hsts.c
html_url.c
http.c
http_highlevel.c
http_parse.c
init.c
io.c
ip.c
iri.c
list.c
log.c
logger.c
mem.c
metalink.c
net.c
netrc.c
ocsp.c
pipe.c
plugin.c
printf.c
random.c
robots.c
rss_url.c
sitemap_url.c
ssl_gnutls.c
stringmap.c
strlcpy.c
strscpy.c
test_linking.c
thread.c
tls_session.c
utils.c
vector.c
xalloc.c
xml.c
  
  
bar.c
blacklist.c
dl.c
gpgme.c
host.c
job.c
log.c
options.c
plugin.c
stats.c
stats_dns.c
stats_ocsp.c
stats_server.c
stats_site.c
stats_tls.c
testing.c
utils.c
wget.c
  
  
libtest.c
test--exclude-directories1.c
test--filter-mime-type.c
test--https-enforce-hard1.c
test--https-enforce-hard2.c
test--https-enforce-hard3.c
test--https-enforce-soft1.c
test--https-enforce-soft2.c
test--https-enforce-soft3.c
test--page-requisites.c
test--save-content-on.c
test--spider-r.c
test-base.c
test-gpg-styles.c
test-i-https.c
test-include-and-exclude-directories.c
test-p-np.c
test-plugin-dummy.c
test-plugin.c
test-stats-dns.c
test-stats.c
  
  
test-dl-dummy.c
test-dl.c
test.c
  


  Makefile
  

Re: [Bug-wget] Hello again

2018-10-08 Thread michael
The issues that I have is this:

Since the source code is split in various directories (src, lib) the Netbeans 
lose track of source code in the lib directory.
I verified it using gdb. (You can see how dip I went).

So, can you send me your Netbeans project settings?

Thank you,

Michael

-Original Message-
From: Tim Rühsen  
Sent: Monday, 8 October, 2018 10:55 PM
To: mich...@cyber-dome.com; bug-wget@gnu.org
Subject: Re: [Bug-wget] Hello again

On 10/8/18 7:57 PM, mich...@cyber-dome.com wrote:
> 
> Hello again,
> 
> My name is Michael. I have approached you about a year ago.
> 
> I am interested in making wget2 a tool that can convert content management
> systems (like WordPress) output to HTML. This actually limits the content
> management system to generate the website every time it is changed, and the
> presentation is done using the HTTP server only.
> 
> This is an important feature as it prevents security risk - penetration of
> hacker to the site and installing viruses or stealing data.
> It also allows the website to be delivered much faster as no PHP code needs
> to run in order to deliver the content. Google already announced that site
> download speed is a factor in its SEO evaluation.
> 
> I will be able to work for 3 hours every week on the project. I do need some
> guidance from you.
> 
> I have started to configure Netbeans IDE as using a debugger can help me
> delve into the code much faster. There are some issues with the Netbeans. Do
> you use Id? Which one?

Id ? it ?

I use stock Netbeans 8.2 from https://netbeans.org/downloads/ (the All
option). But you can take the any 'version' and install the C/C++ plugin
afterwards.

These are my jdk packages installed:

default-jdk 2:1.10-68
default-jdk-headless 2:1.10-68
openjdk-10-jdk:amd64 10.0.2+13-1
openjdk-10-jdk-headless:amd64 10.0.2+13-1
openjdk-10-jre:amd64 10.0.2+13-1
openjdk-10-jre-headless:amd64 10.0.2+13-1
openjdk-7-jre-lib 7u95-2.6.4-1
openjdk-8-demo 8u181-b13-1
openjdk-8-doc 8u181-b13-1
openjdk-8-jdk:amd64 8u181-b13-1
openjdk-8-jdk-headless:amd64 8u181-b13-1
openjdk-8-jre:amd64 8u181-b13-1
openjdk-8-jre-headless:amd64 8u181-b13-1
openjdk-8-source 8u181-b13-1

What issues do you have ?

Regards, Tim





Re: [Bug-wget] Hello again

2018-10-08 Thread Tim Rühsen
On 10/8/18 7:57 PM, mich...@cyber-dome.com wrote:
> 
> Hello again,
> 
> My name is Michael. I have approached you about a year ago.
> 
> I am interested in making wget2 a tool that can convert content management
> systems (like WordPress) output to HTML. This actually limits the content
> management system to generate the website every time it is changed, and the
> presentation is done using the HTTP server only.
> 
> This is an important feature as it prevents security risk - penetration of
> hacker to the site and installing viruses or stealing data.
> It also allows the website to be delivered much faster as no PHP code needs
> to run in order to deliver the content. Google already announced that site
> download speed is a factor in its SEO evaluation.
> 
> I will be able to work for 3 hours every week on the project. I do need some
> guidance from you.
> 
> I have started to configure Netbeans IDE as using a debugger can help me
> delve into the code much faster. There are some issues with the Netbeans. Do
> you use Id? Which one?

Id ? it ?

I use stock Netbeans 8.2 from https://netbeans.org/downloads/ (the All
option). But you can take the any 'version' and install the C/C++ plugin
afterwards.

These are my jdk packages installed:

default-jdk 2:1.10-68
default-jdk-headless 2:1.10-68
openjdk-10-jdk:amd64 10.0.2+13-1
openjdk-10-jdk-headless:amd64 10.0.2+13-1
openjdk-10-jre:amd64 10.0.2+13-1
openjdk-10-jre-headless:amd64 10.0.2+13-1
openjdk-7-jre-lib 7u95-2.6.4-1
openjdk-8-demo 8u181-b13-1
openjdk-8-doc 8u181-b13-1
openjdk-8-jdk:amd64 8u181-b13-1
openjdk-8-jdk-headless:amd64 8u181-b13-1
openjdk-8-jre:amd64 8u181-b13-1
openjdk-8-jre-headless:amd64 8u181-b13-1
openjdk-8-source 8u181-b13-1

What issues do you have ?

Regards, Tim



signature.asc
Description: OpenPGP digital signature


[Bug-wget] Hello again

2018-10-08 Thread michael


Hello again,

My name is Michael. I have approached you about a year ago.

I am interested in making wget2 a tool that can convert content management
systems (like WordPress) output to HTML. This actually limits the content
management system to generate the website every time it is changed, and the
presentation is done using the HTTP server only.

This is an important feature as it prevents security risk - penetration of
hacker to the site and installing viruses or stealing data.
It also allows the website to be delivered much faster as no PHP code needs
to run in order to deliver the content. Google already announced that site
download speed is a factor in its SEO evaluation.

I will be able to work for 3 hours every week on the project. I do need some
guidance from you.

I have started to configure Netbeans IDE as using a debugger can help me
delve into the code much faster. There are some issues with the Netbeans. Do
you use Id? Which one?

Best regards,

Michael