Hi,

attached you can find a patch for urlgrabber to preserve query parameter in 
urls.

Some CDN do token authentication by appending a token to the URL
as query parameter. So the baseurl could be something like:

https://host.domain.top/path/?abcdef1234567890

Simply appending the relative part to it will result in something like this

  https://host.domain.top/path/?abcdef1234567890/requested/file.txt

which is simply wrong.

-- 
Regards

        Michael Calmer

--------------------------------------------------------------------------
Michael Calmer
SUSE LINUX Products GmbH, Maxfeldstr. 5, D-90409 Nuernberg
T: +49 (0) 911 74053 0
F: +49 (0) 911 74053575  - e-mail: michael.cal...@suse.com
--------------------------------------------------------------------------
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer
HRB 16746 (AG Nürnberg)
>From 6166930c6d5d1823b0b4e47f86078c5a76418c23 Mon Sep 17 00:00:00 2001
From: Michael Calmer <m...@suse.de>
Date: Fri, 12 Sep 2014 13:01:55 +0200
Subject: [PATCH] preserve queryparams in urls

Some CDN do token authentication by appending a token to the URL
as query parameter. So the baseurl could be something like:

https://host.domain.top/path/?abcdef1234567890

Simply appending the relative part to it will result in an invalid URL.
---
 urlgrabber/mirror.py | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/urlgrabber/mirror.py b/urlgrabber/mirror.py
index f3c2664..6a7cdb2 100644
--- a/urlgrabber/mirror.py
+++ b/urlgrabber/mirror.py
@@ -94,6 +94,7 @@ CUSTOMIZATION
 import sys
 import six
 import random
+import urlparse
 from six.moves import _thread  as thread # needed for locking to make this threadsafe
 
 from urlgrabber.grabber import URLGrabError, CallbackObject, DEBUG, _to_utf8
@@ -395,11 +396,12 @@ class MirrorGroup:
     # by overriding the configuration methods :)
 
     def _join_url(self, base_url, rel_url):
-        if base_url.endswith('/') or rel_url.startswith('/'):
-            return base_url + rel_url
+        (scheme, netloc, path, query, fragid) = urlparse.urlsplit(base_url)
+        if path.endswith('/') or rel_url.startswith('/'):
+            return urlparse.urlunsplit((scheme, netloc, path + rel_url, query, fragid))
         else:
-            return base_url + '/' + rel_url
-        
+            return urlparse.urlunsplit((scheme, netloc, path + '/' + rel_url, query, fragid))
+
     def _mirror_try(self, func, url, kw):
         gr = GrabRequest()
         gr.func = func
-- 
1.8.1.4

_______________________________________________
Yum-devel mailing list
Yum-devel@lists.baseurl.org
http://lists.baseurl.org/mailman/listinfo/yum-devel

Reply via email to