Re: [Bug-wget] Google Summer of Code 2016

2016-03-01 Thread Darshit Shah

On 03/02, Kushagra Singh wrote:

Hi,

Thanks for the quick reply. I went through the repository and the issues,
and found a couple of things I would like to work on.

I have a couple of questions about Wget2. Is it a complete rewrite of the
Wget project, available at git://git.savannah.gnu.org/wget.git, or are we
using existing code and extending functionality? I guess it is the second
one because I saw `libwget` in the repo. However if such is the case, then
how do we change existing functions in wget? For example, implementing [2]
would require making changes to the file cookies.c, which is present in
/src in the wget repo, but not in /src in the wget2 repo.


Wget2 is a complete rewrite of GNU Wget. It is also available on the 
savannah server as its own repository at [1]. Wget2 is meant to be a 
modern (almost) drop-in replacement for Wget. It strives to maintain 
backward compatible command line options and behaviour as far as it 
makes sense. The codebase for the two projects has diverged by 
significant amounts and hence new features need to be implemented 
separately for each.


I was looking at #43 [1], and have already submitted a patch for
consideration for the first suggestion [2]. The second suggestion mentioned
[3] is one of the things I'd like to work on, however this is not something
which will take three months :)


You submitted a patch for Wget. This is the Wget2 repository. Anyways, I 
already have a working patch for most of that issue, got sidetracked 
when writing the tests and eventually forgot about it. I think I'll 
spend some time on it this week and have that patch merged. Don't spend 
time on that part.


Another thing to remember is, not all GitHub issues are valid GSoC 
projects. Since the number of issues is few, it is easy to scout out the 
larger ones. Some issues are pretty tiny, just need someone willing to 
spend time working on them.




Another project I am interested in, is implementing FTPS. I saw this listed
under one of the ideas of GSoC 2015, but I'm not sure whether it was
implemented, as I didn't see it under 'Development Status' in the wget2
readme on Github.

Wget2 as far as I'm aware is still lacking FTPS support. Remember that 
Wget and Wget2 are two different projects.


Also, in #67 [4], we are talking about adhering to some specific parts of
RFC 7230. I'm not sure which all parts would be right, as the discussion
thread mentions that it won't be good to stick to each point of the RFC.
WDYT?


This is a minor grievance I raised. We stick to most of it anyways. As 
Tim points out, being completely RFC compliant may make the tool 
unusuable thanks to the number of bad servers out there. If anything,  
that issue needs to be split into multiple smaller issues about specific 
parts of the RFC that we want to adhere to.


Open projects I currently see are:
1. FTP / FTPS support
2. SOCKS5 Proxy support (This may be too small.)
3. Progress Bar implementation (Looks deceptively simple, isn't)
4. WARC support and tests
5. Brotli compression (May be too small)

The README file also has more pointers on features not implemented in 
Wget2. You may get some ideas from there. Request pipelining and DNSSEC 
are two features I'd be interested in seeing implemented.


Moreover, you are always welcome to submit your own ideas for either 
Wget or Wget2.


Tim can add more details or comment on whether something is too small to 
work on for a GSoC project.


[1]: git://git.savannah.gnu.org/wget/wget2.git


[1] https://github.com/rockdaboot/wget2/issues/43
[2] https://tools.ietf.org/html/draft-west-leave-secure-cookies-alone-04
[3] https://tools.ietf.org/html/draft-west-cookie-prefixes-05
[4] https://github.com/rockdaboot/wget2/issues/67

On Tue, Mar 1, 2016 at 9:57 PM, Giuseppe Scrivano  wrote:


Kushagra Singh  writes:

> Hi,
>
> Will we be taking part in GSoC this year? I would really like to work on
a
> project related to Wget this summer. Any specific ideas that are of
> importance to the community presently?

yes, we will be take part in GSoC.  I think we would like to see more
work happening on wget2, at the moment there is a list of issues on
github that can be useful to you to pick some ideas to work on:

  https://github.com/rockdaboot/wget2/issues

Could you take a look at it?  Do you see anything interesting that you
would like to work on?

Regards,
Giuseppe



--
Thanking You,
Darshit Shah


signature.asc
Description: PGP signature


Re: [Bug-wget] Google Summer of Code 2016

2016-03-01 Thread Kushagra Singh
Hi,

Thanks for the quick reply. I went through the repository and the issues,
and found a couple of things I would like to work on.

I have a couple of questions about Wget2. Is it a complete rewrite of the
Wget project, available at git://git.savannah.gnu.org/wget.git, or are we
using existing code and extending functionality? I guess it is the second
one because I saw `libwget` in the repo. However if such is the case, then
how do we change existing functions in wget? For example, implementing [2]
would require making changes to the file cookies.c, which is present in
/src in the wget repo, but not in /src in the wget2 repo.

I was looking at #43 [1], and have already submitted a patch for
consideration for the first suggestion [2]. The second suggestion mentioned
[3] is one of the things I'd like to work on, however this is not something
which will take three months :)

Another project I am interested in, is implementing FTPS. I saw this listed
under one of the ideas of GSoC 2015, but I'm not sure whether it was
implemented, as I didn't see it under 'Development Status' in the wget2
readme on Github.

Also, in #67 [4], we are talking about adhering to some specific parts of
RFC 7230. I'm not sure which all parts would be right, as the discussion
thread mentions that it won't be good to stick to each point of the RFC.
WDYT?


[1] https://github.com/rockdaboot/wget2/issues/43
[2] https://tools.ietf.org/html/draft-west-leave-secure-cookies-alone-04
[3] https://tools.ietf.org/html/draft-west-cookie-prefixes-05
[4] https://github.com/rockdaboot/wget2/issues/67

On Tue, Mar 1, 2016 at 9:57 PM, Giuseppe Scrivano  wrote:

> Kushagra Singh  writes:
>
> > Hi,
> >
> > Will we be taking part in GSoC this year? I would really like to work on
> a
> > project related to Wget this summer. Any specific ideas that are of
> > importance to the community presently?
>
> yes, we will be take part in GSoC.  I think we would like to see more
> work happening on wget2, at the moment there is a list of issues on
> github that can be useful to you to pick some ideas to work on:
>
>   https://github.com/rockdaboot/wget2/issues
>
> Could you take a look at it?  Do you see anything interesting that you
> would like to work on?
>
> Regards,
> Giuseppe
>


Re: [Bug-wget] Google Summer of Code 2016

2016-03-01 Thread Giuseppe Scrivano
Kushagra Singh  writes:

> Hi,
>
> Will we be taking part in GSoC this year? I would really like to work on a
> project related to Wget this summer. Any specific ideas that are of
> importance to the community presently?

yes, we will be take part in GSoC.  I think we would like to see more
work happening on wget2, at the moment there is a list of issues on
github that can be useful to you to pick some ideas to work on:

  https://github.com/rockdaboot/wget2/issues

Could you take a look at it?  Do you see anything interesting that you
would like to work on?

Regards,
Giuseppe



Re: [Bug-wget] Patch for understanding srcset= on img tags.

2016-03-01 Thread Maksim Orlovich
> thanks for your patch!  I have some comments.  Please amend this:
>
> diff --git a/src/html-url.c b/src/html-url.c
> index dff8d57..2f205c7 100644
> --- a/src/html-url.c
> +++ b/src/html-url.c
> @@ -692,8 +692,8 @@ tag_handle_img (int tagid, struct taginfo *tag, struct 
> map_context *ctx) {
>if (srcset)
>  {
>/* These are relative to the input text. */
> -  int base_ind = ATTR_POS(tag,attrind,ctx);
> -  int size = strlen(srcset);
> +  int base_ind = ATTR_POS (tag, attrind, ctx);
> +  int size = strlen (srcset);

Done.


> should the condition be (c == ')' && in_paren)  ?

Indeed.

Thanks,
Maks
From 49933c84012536388e1f9d0bc4070e377d824309 Mon Sep 17 00:00:00 2001
From: Maks Orlovich 
Date: Tue, 1 Mar 2016 09:43:56 -0500
Subject: Parse  attributes, they have image URLs.

* src/convert.h: Add link_noquote_html_p to permit rewriting URLs deep
 inside attributes without adding extraneous quoting
* src/convert.c (convert_links): Honor link_noquote_html_p
* src/html_url.c (tag_handle_img): New function. Add srcset parsing.

diff --git a/src/convert.c b/src/convert.c
index df8d58d..509923e 100644
--- a/src/convert.c
+++ b/src/convert.c
@@ -308,7 +308,7 @@ convert_links (const char *file, struct urlpos *links)
 char *quoted_newname = local_quote_string (newname,
link->link_css_p);
 
-if (link->link_css_p)
+if (link->link_css_p || link->link_noquote_html_p)
   p = replace_plain (p, link->size, fp, quoted_newname);
 else if (!link->link_refresh_p)
   p = replace_attr (p, link->size, fp, quoted_newname);
@@ -329,7 +329,7 @@ convert_links (const char *file, struct urlpos *links)
 char *newname = convert_basename (p, link);
 char *quoted_newname = local_quote_string (newname, 
link->link_css_p);
 
-if (link->link_css_p)
+if (link->link_css_p || link->link_noquote_html_p)
   p = replace_plain (p, link->size, fp, quoted_newname);
 else if (!link->link_refresh_p)
   p = replace_attr (p, link->size, fp, quoted_newname);
@@ -352,7 +352,7 @@ convert_links (const char *file, struct urlpos *links)
 char *newlink = link->url->url;
 char *quoted_newlink = html_quote_string (newlink);
 
-if (link->link_css_p)
+if (link->link_css_p || link->link_noquote_html_p)
   p = replace_plain (p, link->size, fp, newlink);
 else if (!link->link_refresh_p)
   p = replace_attr (p, link->size, fp, quoted_newlink);
diff --git a/src/convert.h b/src/convert.h
index b3cd196..e3ff6f0 100644
--- a/src/convert.h
+++ b/src/convert.h
@@ -69,6 +69,7 @@ struct urlpos {
   unsigned int link_base_p  :1; /* the url came from  */
   unsigned int link_inline_p:1; /* needed to render the page */
   unsigned int link_css_p   :1; /* the url came from CSS */
+  unsigned int link_noquote_html_p :1; /* from HTML, but doesn't need " */
   unsigned int link_expect_html :1; /* expected to contain HTML */
   unsigned int link_expect_css  :1; /* expected to contain CSS */
 
diff --git a/src/html-url.c b/src/html-url.c
index 0743587..ab04204 100644
--- a/src/html-url.c
+++ b/src/html-url.c
@@ -56,6 +56,7 @@ typedef void (*tag_handler_t) (int, struct taginfo *, struct 
map_context *);
 DECLARE_TAG_HANDLER (tag_find_urls);
 DECLARE_TAG_HANDLER (tag_handle_base);
 DECLARE_TAG_HANDLER (tag_handle_form);
+DECLARE_TAG_HANDLER (tag_handle_img);
 DECLARE_TAG_HANDLER (tag_handle_link);
 DECLARE_TAG_HANDLER (tag_handle_meta);
 
@@ -105,7 +106,7 @@ static struct known_tag {
   { TAG_FORM,"form",tag_handle_form },
   { TAG_FRAME,   "frame",   tag_find_urls },
   { TAG_IFRAME,  "iframe",  tag_find_urls },
-  { TAG_IMG, "img", tag_find_urls },
+  { TAG_IMG, "img", tag_handle_img },
   { TAG_INPUT,   "input",   tag_find_urls },
   { TAG_LAYER,   "layer",   tag_find_urls },
   { TAG_LINK,"link",tag_handle_link },
@@ -183,7 +184,8 @@ static const char *additional_attributes[] = {
   "name",   /* used by tag_handle_meta  */
   "content",/* used by tag_handle_meta  */
   "action", /* used by tag_handle_form  */
-  "style"   /* used by check_style_attr */
+  "style",  /* used by check_style_attr */
+  "srcset", /* used by tag_handle_img */
 };
 
 static struct hash_table *interesting_tags;
@@ -674,6 +676,88 @@ tag_handle_meta (int tagid _GL_UNUSED, struct taginfo 
*tag, struct map_context *
 }
 }
 
+/* Handle the IMG tag.  This requires special handling for the srcset attr,
+   while the traditional src/lowsrc/href attributes can be handled generically.
+*/
+
+static void
+tag_handle_img (int tagid, struct taginfo *tag, struct 

Re: [Bug-wget] Patch for understanding srcset= on img tags.

2016-03-01 Thread Giuseppe Scrivano
Hi Maksim,

Maksim Orlovich  writes:

> Hi... wget currently doesn't understand HTML5's srcset= attribute on
> images. The attached adds support for it.
> This is under Google copyright, so should be covered by the company's
> copyright assignment with the FSF.
>
> If you might be interested in incorporating this in some form, please
> let me know if you want any changes (e.g. tests, etc.), ---
> not really familiar with how you folks do things.
>
> Hoping this may be of some use to someone else,
> Maks

thanks for your patch!  I have some comments.  Please amend this:

diff --git a/src/html-url.c b/src/html-url.c
index dff8d57..2f205c7 100644
--- a/src/html-url.c
+++ b/src/html-url.c
@@ -692,8 +692,8 @@ tag_handle_img (int tagid, struct taginfo *tag, struct 
map_context *ctx) {
   if (srcset)
 {
   /* These are relative to the input text. */
-  int base_ind = ATTR_POS(tag,attrind,ctx);
-  int size = strlen(srcset);
+  int base_ind = ATTR_POS (tag, attrind, ctx);
+  int size = strlen (srcset);
 
   /* These are relative to srcset. */
   int offset, url_start, url_end;


> +  /* If the URL wasn't terminated by a , there may also be a 
> descriptor
> + which we just skip. */
> +  if (has_descriptor)
> +{
> +  /* This is comma-terminated, except there may be one level of
> + parentheses escaping that. */
> +  bool in_paren = false;
> +  for (offset = url_end; offset < size; ++offset)
> +{
> +  char c = srcset[offset];
> +  if (c == '(')
> +in_paren = true;
> +  else if (c == '(' && in_paren)
> +in_paren = false;

should the condition be (c == ')' && in_paren)  ?


Thanks,
Giuseppe



[Bug-wget] Google Summer of Code 2016

2016-03-01 Thread Kushagra Singh
Hi,

Will we be taking part in GSoC this year? I would really like to work on a
project related to Wget this summer. Any specific ideas that are of
importance to the community presently?

A quick introduction, I'm Kushagra Singh, a second year student at IIIT
Delhi, India. My major is Computer Science. I have gone through a
particular chunk of wget's source code and understand it well, and have
submitted a patch for consideration. I successfully completed GSoC, working
with lmonade last summer.

Looking forward to code on this project this summer!

Thank you,
Kushagra Singh