Re: [Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2012-02-28 Thread Sorin Marian Nasoi
This branch was superseded by lp:~zorba-coders/zorba/web_crawler_tutorial.
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/78243
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-12-14 Thread Daniel Turcanu
The proposal to merge lp:~danielturcanu/zorba/web_crawler_tutorial into 
lp:zorba has been updated.

Status: Needs review = Approved

For more details, see:
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/85669
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/85669
Your team Zorba Coders is requested to review the proposed merge of 
lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-12-14 Thread Zorba Build Bot
The proposal to merge lp:~danielturcanu/zorba/web_crawler_tutorial into 
lp:zorba has been updated.

Status: Approved = Needs review

For more details, see:
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/85669
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/85669
Your team Zorba Coders is requested to review the proposed merge of 
lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-10-14 Thread Daniel Turcanu
Daniel Turcanu has proposed merging 
lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba.

Requested reviews:
  Sorin Marian Nasoi (sorin.marian.nasoi)

For more details, see:
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/79407

Updated the web crawler tutorial with the latest updates in link_crawler2.xq
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/79407
Your team Zorba Coders is subscribed to branch lp:zorba.
=== added file 'doc/zorba/link_crawler2.dox'
--- doc/zorba/link_crawler2.dox	1970-01-01 00:00:00 +
+++ doc/zorba/link_crawler2.dox	2011-10-14 15:00:48 +
@@ -0,0 +1,238 @@
+/**
+\page link_crawler2  Web Crawler example in XQuery
+\code
+(:
+ : Copyright 2006-2011 The FLWOR Foundation.
+ :
+ : Licensed under the Apache License, Version 2.0 (the License);
+ : you may not use this file except in compliance with the License.
+ : You may obtain a copy of the License at
+ :
+ : http://www.apache.org/licenses/LICENSE-2.0
+ :
+ : Unless required by applicable law or agreed to in writing, software
+ : distributed under the License is distributed on an AS IS BASIS,
+ : WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ : See the License for the specific language governing permissions and
+ : limitations under the License.
+:)
+
+import module namespace http = http://www.zorba-xquery.com/modules/http-client;;
+import module namespace map = http://www.zorba-xquery.com/modules/store/data-structures/unordered-map;;
+import module namespace html = http://www.zorba-xquery.com/modules/converters/html;;
+import module namespace parse-xml = http://www.zorba-xquery.com/modules/xml;;
+import module namespace file = http://expath.org/ns/file;;
+
+declare namespace ann = http://www.zorba-xquery.com/annotations;;
+declare namespace xhtml=http://www.w3.org/1999/xhtml;;
+declare namespace output=http://www.w3.org/2010/xslt-xquery-serialization;;
+declare namespace err=http://www.w3.org/2005/xqt-errors;;
+declare namespace httpsch = http://expath.org/ns/http-client;;
+
+declare variable $top-uri  as xs:string := http://www.zorba-xquery.com/site2/html/index.html;;
+declare variable $uri-host as xs:string := http://www.zorba-xquery.com;;
+
+
+
+declare variable $local:processed-internal-links := xs:QName(processed-internal-links);
+declare variable $local:processed-external-links := xs:QName(processed-external-links);
+
+
+declare %ann:sequential function local:create-containers()
+{
+  map:create($local:processed-internal-links, xs:QName(xs:string));
+  map:create($local:processed-external-links, xs:QName(xs:string));
+};
+
+declare %ann:sequential function local:delete-containers(){
+  for $x in map:available-maps()
+  return map:delete($x);
+};
+
+declare function local:is-internal($x as xs:string) as xs:boolean
+{
+ starts-with($x, $uri-host)
+};
+
+declare function local:my-substring-before($s1 as xs:string, $s2 as xs:string) as xs:string
+{
+let $sb := fn:substring-before($s1, $s2)
+return  if($sb = ) then  $s1 else $sb
+};
+
+declare %ann:sequential function local:get-real-link($href as xs:string, $start-uri as xs:string) as xs:string?
+{
+   variable $absuri;
+   try{
+$absuri := local:my-substring-before(resolve-uri(fn:normalize-space($href), $start-uri), #);
+   }
+   catch *
+   { 
+ map:insert($local:processed-external-links, (FROM{$start-uri}/FROM, 
+  MESSAGEmalformed/MESSAGE,
+  RESULTbroken/RESULT), $href);
+   }
+   $absuri
+};
+
+
+declare  function local:get-media-type ($http-call as node()) as xs:string
+{
+   local:my-substring-before($http-call/httpsch:header[@name = 'Content-Type'][1]/string(@value), ;)
+};
+
+declare function local:alive($http-call as item()*) as xs:boolean
+{
+ if((count($http-call) ge 1) and 
+($http-call[1]/@status eq 200)) 
+   then true() else fn:trace(false(), alive)
+};
+
+
+declare %ann:sequential function local:get-out-links-parsed($content as node()*, $uri as xs:string) as xs:string*
+{  distinct-values( for $y in  ($content//*:a/string(@href),
+  $content//*:link/string(@href),
+  $content//*:script/string(@src),
+  $content//*:img/string(@src),
+  $content//*:area/string(@href)
+  )
+return  local:get-real-link($y, $uri))
+};
+
+
+declare %ann:sequential function local:get-out-links-unparsed($content as xs:string, $uri as xs:string) as xs:string*{
+
+  distinct-values( 
+ let $search := fn:analyze-string($content, (lt;|amp;lt;|)(((a|link|area).+?href)|((script|img).+?src))=(['])(.*?)\7)
+ for $other-uri2 in  $search//group[@nr=8]/string()
+ return local:get-real-link($other-uri2, $uri)
+ )
+};
+
+
+declare %ann:sequential function local:map-insert-result($map-name as xs:QName, $url as xs:string, 

Re: [Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-10-05 Thread Daniel Turcanu
The link crawler is added in html module as a test for compilation.
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/77179
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-10-05 Thread Daniel Turcanu
Daniel Turcanu has proposed merging 
lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba.

Requested reviews:
  Zorba Coders (zorba-coders)

For more details, see:
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/78243
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/78243
Your team Zorba Coders is requested to review the proposed merge of 
lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba.
=== added file 'doc/zorba/link_crawler2.dox'
--- doc/zorba/link_crawler2.dox	1970-01-01 00:00:00 +
+++ doc/zorba/link_crawler2.dox	2011-10-05 12:23:32 +
@@ -0,0 +1,208 @@
+/**
+\page link_crawler2  Web Crawler example in XQuery
+\code
+(:
+ : Copyright 2006-2011 The FLWOR Foundation.
+ :
+ : Licensed under the Apache License, Version 2.0 (the License);
+ : you may not use this file except in compliance with the License.
+ : You may obtain a copy of the License at
+ :
+ : http://www.apache.org/licenses/LICENSE-2.0
+ :
+ : Unless required by applicable law or agreed to in writing, software
+ : distributed under the License is distributed on an AS IS BASIS,
+ : WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ : See the License for the specific language governing permissions and
+ : limitations under the License.
+:)
+
+import module namespace http = http://www.zorba-xquery.com/modules/http-client;;
+import module namespace map = http://www.zorba-xquery.com/modules/store/data-structures/unordered-map;;
+import module namespace html = http://www.zorba-xquery.com/modules/converters/html;;
+import module namespace parse-xml = http://www.zorba-xquery.com/modules/xml;;
+
+declare namespace ann = http://www.zorba-xquery.com/annotations;;
+declare namespace xhtml=http://www.w3.org/1999/xhtml;;
+declare namespace output=http://www.w3.org/2010/xslt-xquery-serialization;;
+declare namespace err=http://www.w3.org/2005/xqt-errors;;
+declare namespace httpsch = http://expath.org/ns/http-client;;
+
+declare variable $top-uri  as xs:string := http://www.zorba-xquery.com/site2/html/index.html;;
+declare variable $uri-host as xs:string := http://www.zorba-xquery.com/site2/;;
+
+
+declare variable $supported-media-types as xs:string+ := (text/xml, application/xml, text/xml-external-parsed-entity, application/xml-external-parsed-entity,
+ application/atom+xml, text/html);
+
+
+declare variable $local:processed-internal-links:=xs:QName(processed-internal-links);
+declare variable $local:processed-external-links  :=xs:QName(processed-external-links);
+
+
+declare %ann:sequential function local:create-containers()
+{
+  map:create($local:processed-internal-links, xs:QName(xs:string));
+  map:create($local:processed-external-links, xs:QName(xs:string));
+};
+
+declare %ann:sequential function local:delete-containers(){
+  for $x in map:available-maps()
+  return map:delete($x);
+};
+
+declare function local:is-internal($x as xs:string) as xs:boolean
+{
+ starts-with($x, $uri-host)
+};
+
+declare function local:my-substring-before($s1 as xs:string, $s2 as xs:string) as xs:string
+{
+let $sb := fn:substring-before($s1, $s2)
+return  if($sb = ) then  $s1 else $sb
+};
+
+declare function local:get-real-link($href as xs:string, $start-uri as xs:string) as xs:string
+{
+   local:my-substring-before(resolve-uri($href, $start-uri), #)
+};
+
+
+declare  function local:get-media-type ($http-call as node()) as xs:string
+{
+   local:my-substring-before($http-call/httpsch:header[@name = 'Content-Type'][1]/string(@value), ;)
+};
+
+declare function local:alive($http-call as node()*) as xs:boolean
+{
+ if(($http-call[1]/@status eq 200)) then true() else false()
+};
+
+
+declare function local:get-out-links-parsed($content as node()*, $uri as xs:string) as xs:string*
+{  distinct-values( for $y in  ($content//*:a/string(@href),
+  $content//*:link/string(@href),
+  $content//*:script/string(@src),
+  $content//*:img/string(@src),
+  $content//*:area/string(@href)
+  )
+return  local:get-real-link($y, $uri))
+};
+
+
+declare function local:get-out-links-unparsed($content as xs:string, $uri as xs:string) as xs:string*{
+
+  distinct-values( 
+ let $search := fn:analyze-string($content, (lt;|amp;lt;|)(((a|link|area).+?href)|((script|img).+?src))=(['])(.*?)\7)
+ for $other-uri2 in  $search//group[@nr=8]/string()
+ let $y:= fn:normalize-space($other-uri2)
+ return local:get-real-link($y, $uri)
+ )
+};
+
+
+
+declare  %ann:sequential function local:process-external-link($x as xs:string){
+  if(not(empty(map:get($local:processed-external-links, $x
+ then   exit returning false();
+ else {}
+ variable $http-call:=();
+  try{
+$http-call:=http:send-request(httpsch:request method=HEAD href={$x}/, (), ());
+  }
+  catch * {}
+  if( 

[Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-10-05 Thread Zorba Build Bot
Validation queue starting for merge proposal.
Log at: 
http://zorbatest.lambda.nu:8080/remotequeue/web_crawler_tutorial-2011-10-05T12-23-57.066Z/log.html
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/78243
Your team Zorba Coders is requested to review the proposed merge of 
lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


Re: [Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-10-05 Thread Zorba Build Bot
Voting does not meet specified criteria. Required: Approve  0, Disapprove  1. 
Got: 1 Pending.
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/78243
Your team Zorba Coders is requested to review the proposed merge of 
lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-10-05 Thread Zorba Build Bot
The proposal to merge lp:~danielturcanu/zorba/web_crawler_tutorial into 
lp:zorba has been updated.

Status: Approved = Needs review

For more details, see:
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/78243
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/78243
Your team Zorba Coders is requested to review the proposed merge of 
lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


Re: [Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-10-05 Thread Sorin Marian Nasoi
Review: Approve

I have checked the changes.
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/78243
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


Re: [Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-10-05 Thread Sorin Marian Nasoi
Review: Needs Fixing

you could add the link to the script in the Doxy page instead of adding a new 
Doxy page.

Something like: 

\include zorba/store/sc2_ex1.xq

First you need to add the path to the WebCrawler script in the Doxygen example 
search path.

Edit doc/zorba/doxy.config.in
line 504, EXAMPLE_PATH
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/78243
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


Re: [Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-10-04 Thread Sorin Marian Nasoi
Review: Abstain

The tutorial is nice, but I am not sure the index page in our Doxygen 
documentation is the best place to put it.
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/77179
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


Re: [Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-10-04 Thread Chris Hillery
Review: Approve

I like it. I'd leave the link from the index page there - having a specific 
section marked tutorials will maybe encourage folks to write some more over 
time. If not, we can easily move that later.
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/77179
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-10-04 Thread Chris Hillery
The proposal to merge lp:~danielturcanu/zorba/web_crawler_tutorial into 
lp:zorba has been updated.

Status: Needs review = Approved

For more details, see:
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/77179
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/77179
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


Re: [Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-10-04 Thread Matthias Brantner
I think that the code in the tutorial should be literally included and be 
tested as such to make sure that we don't regress.

The tutorial should be linked from a blog entry. Also, the tutorial should 
provide a link to download the source code.

Daniel, could you please provide Dana with the HTML version of the tutorial. 
I'm sure she is also interested in reading it before it gets published.
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/77179
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-10-04 Thread noreply
The proposal to merge lp:~danielturcanu/zorba/web_crawler_tutorial into 
lp:zorba has been updated.

Status: Approved = Merged

For more details, see:
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/77179
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/77179
Your team Zorba Coders is subscribed to branch lp:zorba.

-- 
Mailing list: https://launchpad.net/~zorba-coders
Post to : zorba-coders@lists.launchpad.net
Unsubscribe : https://launchpad.net/~zorba-coders
More help   : https://help.launchpad.net/ListHelp


[Zorba-coders] [Merge] lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba

2011-09-27 Thread Daniel Turcanu
Daniel Turcanu has proposed merging 
lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba.

Requested reviews:
  Zorba Coders (zorba-coders)

For more details, see:
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/77179

Added tutorial for web crawler script from html module (or script directory in 
zorba).
-- 
https://code.launchpad.net/~danielturcanu/zorba/web_crawler_tutorial/+merge/77179
Your team Zorba Coders is requested to review the proposed merge of 
lp:~danielturcanu/zorba/web_crawler_tutorial into lp:zorba.
=== modified file 'doc/zorba/indexpage.dox.in'
--- doc/zorba/indexpage.dox.in	2011-09-06 16:39:46 +
+++ doc/zorba/indexpage.dox.in	2011-09-27 15:05:56 +
@@ -127,6 +127,14 @@
 !--li\ref extensions_update/li--
 
 
+/td/tr
+trtd class=tdDocIndexTable
+
+
+h2Tutorials/h2
+
+\ref web_crawler_tutorial
+
 /tdtr
 /table
 

=== added file 'doc/zorba/web_crawler.dox'
--- doc/zorba/web_crawler.dox	1970-01-01 00:00:00 +
+++ doc/zorba/web_crawler.dox	2011-09-27 15:05:56 +
@@ -0,0 +1,173 @@
+/**
+\page web_crawler_tutorial  Web Crawler example in XQuery
+
+Description of a web crawler example in XQuery.
+
+The idea is to crawl through the pages of a website and store a list with external pages and internal pages and check if they work or not.
+This example uses Zorba's http module for accessing the webpages, and the html module for converting the html to xml.
+The complete code can be found in the test directory of the html convertor module.
+
+\code
+import module namespace http = http://www.zorba-xquery.com/modules/http-client;;
+import module namespace map = http://www.zorba-xquery.com/modules/store/data-structures/unordered-map;;
+import module namespace html = http://www.zorba-xquery.com/modules/converters/html;;
+import module namespace parse-xml = http://www.zorba-xquery.com/modules/xml;;
+\endcode
+
+The internal pages are checked recursively, while the external ones are only checked for existence.
+The distinction between internal and external links is made by comparing the URI with a global string variable $uri-host.
+Change this variable to point to your website, or a subdirectory on your website.
+
+\code
+declare variable $top-uri  as xs:string := http://www.zorba-xquery.com/site2/html/index.html;;
+declare variable $uri-host as xs:string := http://www.zorba-xquery.com/site2/;;
+
+declare function local:is-internal($x as xs:string) as xs:boolean
+{
+ starts-with($x, $uri-host)
+};
+
+\endcode
+
+The crawling starts from the URI pointed by $top-uri.
+
+Visited links are stored as nodes in two maps, one for internal pages and one for external pages.
+The keys are the URIs, and the values are the strings broken or clean.
+The maps are used to avoid parsing the same page twice.
+
+\code
+declare variable $local:processed-internal-links := xs:QName(processed-internal-links);
+declare variable $local:processed-external-links := xs:QName(processed-external-links);
+
+declare %ann:sequential function local:create-containers()
+{
+  map:create($local:processed-internal-links, xs:QName(xs:string));
+  map:create($local:processed-external-links, xs:QName(xs:string));
+};
+
+declare %ann:sequential function local:delete-containers(){
+  for $x in map:available-maps()
+  return map:delete($x);
+};
+
+\endcode
+
+After parsing an internal page with html module, all the links are extracted and parsed recursively, if they haven't been parsed.
+The html module uses tidy library, so we use tidy options to setup for converting from html to xml. 
+Some html tags are marked to be ignored in new-inline-tags param, this being a particular case of this website. 
+You can add or remove tags to suit your website needs.
+
+\code
+declare function local:get-out-links-parsed($content as node()*, $uri as xs:string) as xs:string*
+{  distinct-values( for $y in  ($content//*:a/string(@href),
+  $content//*:link/string(@href),
+  $content//*:script/string(@src),
+  $content//*:img/string(@src),
+  $content//*:area/string(@href)
+  )
+return  local:get-real-link($y, $uri))
+};
+
+declare function local:tidy-options()
+{options xmlns=http://www.zorba-xquery.com/modules/converters/html-options; 
+ tidyParam name=output-xml value=yes /
+ tidyParam name=doctype value=omit /
+ tidyParam name=quote-nbsp value=no /
+ tidyParam name=char-encoding value=utf8 /
+ tidyParam name=newline value=LF /
+ tidyParam name=tidy-mark value=no /
+ tidyParam name=new-inline-tags value=nav header section article footer xqdoc:custom d c options json-param /
+