William Candillon has proposed merging lp:~zorba-coders/zorba/data-cleaning-module-doc into lp:zorba/data-cleaning-module.
Commit message: Minor documentation improvements. Requested reviews: William Candillon (wcandillon) Matthias Brantner (matthias-brantner) For more details, see: https://code.launchpad.net/~zorba-coders/zorba/data-cleaning-module-doc/+merge/126964 Minor documentation improvements. -- https://code.launchpad.net/~zorba-coders/zorba/data-cleaning-module-doc/+merge/126964 Your team Zorba Coders is subscribed to branch lp:zorba/data-cleaning-module.
=== modified file 'src/com/zorba-xquery/www/modules/data-cleaning/character-based-string-similarity.xq' --- src/com/zorba-xquery/www/modules/data-cleaning/character-based-string-similarity.xq 2011-10-19 02:03:22 +0000 +++ src/com/zorba-xquery/www/modules/data-cleaning/character-based-string-similarity.xq 2012-09-28 13:37:23 +0000 @@ -27,7 +27,7 @@ : The logic contained in this module is not specific to any particular XQuery implementation. : : @author Bruno Martins and Diogo Simões - : @project data processing/data cleaning + : @project Zorba/Data Cleaning/Character-Based String Similarity :) module namespace simc = "http://www.zorba-xquery.com/modules/data-cleaning/character-based-string-similarity"; @@ -43,9 +43,9 @@ : being insertion, deletion, or substitution of a single character. : : <br/> - : Example usage : <pre> edit-distance("FLWOR", "FLOWER") </pre> + : Example usage : <code>edit-distance("FLWOR", "FLOWER")</code> : <br/> - : The function invocation in the example above returns : <pre> 2 </pre> + : The function invocation in the example above returns : <code>2</code> : : @param $s1 The first string. : @param $s2 The second string. @@ -71,9 +71,9 @@ : normalized such that 0 equates to no similarity and 1 is an exact match. : : <br/> - : Example usage : <pre> jaro("FLWOR Found.", "FLWOR Foundation") </pre> + : Example usage : <code>jaro("FLWOR Found.", "FLWOR Foundation")</code> : <br/> - : The function invocation in the example above returns : <pre> 0.5853174603174603 </pre> + : The function invocation in the example above returns : <code>0.5853174603174603</code> : : @param $s1 The first string. : @param $s2 The second string. @@ -103,9 +103,9 @@ : penalizes strings based on their similarity at the beginning of the string, up to a given prefix size. : : <br/> - : Example usage : <pre> jaro-winkler("DWAYNE", "DUANE", 4, 0.1 ) </pre> + : Example usage : <code>jaro-winkler("DWAYNE", "DUANE", 4, 0.1 )</code> : <br/> - : The function invocation in the example above returns : <pre> 0.8577777777777778 </pre> + : The function invocation in the example above returns : <code>0.8577777777777778</code> : : @param $s1 The first string. : @param $s2 The second string. @@ -129,9 +129,9 @@ : distance metric. : : <br/> - : Example usage : <pre> needleman-wunsch("KAK", "KQRK", 1, 1) </pre> + : Example usage : <code>needleman-wunsch("KAK", "KQRK", 1, 1)</code> : <br/> - : The function invocation in the example above returns : <pre> 0 </pre> + : The function invocation in the example above returns : <code>0</code> : : @param $s1 The first string. : @param $s2 The second string. @@ -155,9 +155,9 @@ : Returns the Smith-Waterman distance between two strings. : : <br/> - : Example usage : <pre> smith-waterman("ACACACTA", "AGCACACA", 2, 1) </pre> + : Example usage : <code>smith-waterman("ACACACTA", "AGCACACA", 2, 1)</code> : <br/> - : The function invocation in the example above returns : <pre> 12 </pre> + : The function invocation in the example above returns : <code>12</code> : : @param $s1 The first string. : @param $s2 The second string. === modified file 'src/com/zorba-xquery/www/modules/data-cleaning/consolidation.xq' --- src/com/zorba-xquery/www/modules/data-cleaning/consolidation.xq 2012-04-27 15:19:46 +0000 +++ src/com/zorba-xquery/www/modules/data-cleaning/consolidation.xq 2012-09-28 13:37:23 +0000 @@ -22,11 +22,10 @@ : : The logic contained in this module is not specific to any particular XQuery implementation, : although the consolidation functions based on matching sequences against XPath expressions require - : some form of dynamic evaluation for XPath expressions, - : such as the x:eval() function provided in the Qizx XQuery Engine. + : some form of dynamic evaluation for XPath expressions. : : @author Bruno Martins - : @project data processing/data cleaning + : @project Zorba/Data Cleaning/Consolidation :) module namespace con = "http://www.zorba-xquery.com/modules/data-cleaning/consolidation"; @@ -42,9 +41,9 @@ : If more then one answer is possible, returns the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> most-frequent( ( "a", "a", "b") ) </pre> + : Example usage : <code>most-frequent( ( "a", "a", "b") )</code> : <br/> - : The function invocation in the example above returns : <pre> ("a") </pre> + : The function invocation in the example above returns : <code>("a")</code> : : @param $s A sequence of nodes. : @return The most frequent node in the input sequence. @@ -59,9 +58,9 @@ : If more then one answer is possible, return the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> least-frequent( ( "a", "a", "b") ) </pre> + : Example usage : <code>least-frequent( ( "a", "a", "b") )</code> : <br/> - : The function invocation in the example above returns : <pre> ("b") </pre> + : The function invocation in the example above returns : <code>("b")</code> : : @param $s A sequence of nodes. : @return The least frequent node in the input sequence. @@ -77,9 +76,9 @@ : If more then one answer is possible, return the first string according to the order of the input sequence. : : <br/> - : Example usage : <pre> con:longest( ( "a", "aa", "aaa") ) </pre> + : Example usage : <code>con:longest( ( "a", "aa", "aaa") )</code> : <br/> - : The function invocation in the example above returns : <pre> ("aaa") </pre> + : The function invocation in the example above returns : <code>("aaa")</code> : : @param $s A sequence of strings. : @return The longest string in the input sequence. @@ -95,9 +94,9 @@ : If more then one answer is possible, return the first string according to the order of the input sequence. : : <br/> - : Example usage : <pre> shortest( ( "a", "aa", "aaa") ) </pre> + : Example usage : <code>shortest( ( "a", "aa", "aaa") )</code> : <br/> - : The function invocation in the example above returns : <pre> ("a") </pre> + : The function invocation in the example above returns : <code>("a")</code> : : @param $s A sequence of strings. : @return The shortest string in the input sequence. @@ -113,9 +112,9 @@ : If more then one answer is possible, return the first string according to the order of the input sequence. : : <br/> - : Example usage : <pre> most-tokens( ( "a b c", "a b", "a"), " +" ) </pre> + : Example usage : <code>most-tokens( ( "a b c", "a b", "a"), " +" )</code> : <br/> - : The function invocation in the example above returns : <pre> ("a b c") </pre> + : The function invocation in the example above returns : <code>("a b c")</code> : : @param $s A sequence of strings. : @param $r A regular expression forming the delimiter character(s) which mark the boundaries between adjacent tokens. @@ -132,9 +131,9 @@ : If more then one answer is possible, return the first string according to the order of the input sequence. : : <br/> - : Example usage : <pre> least-tokens( ( "a b c", "a b", "a"), " +" ) </pre> + : Example usage : <code>least-tokens( ( "a b c", "a b", "a"), " +" )</code> : <br/> - : The function invocation in the example above returns : <pre> ("a") </pre> + : The function invocation in the example above returns : <code>("a")</code> : : @param $s A sequence of strings. : @param $r A regular expression forming the delimiter character(s) which mark the boundaries between adjacent tokens. @@ -150,9 +149,9 @@ : Returns the strings from an input sequence of strings that match a particular regular expression. : : <br/> - : Example usage : <pre> matching( ( "a A b", "c AAA d", "e BB f"), "A+" ) </pre> + : Example usage : <code>matching( ( "a A b", "c AAA d", "e BB f"), "A+" )</code> : <br/> - : The function invocation in the example above returns : <pre> ( "a A b", "c AAA d") </pre> + : The function invocation in the example above returns : <code>( "a A b", "c AAA d")</code> : : @param $s A sequence of strings. : @param $r The regular expression to be used in the matching. @@ -169,9 +168,9 @@ : If more then one answer is possible, the function returns the first string according to the order of the input sequence. : : <br/> - : Example usage : <pre> super-string( ( "aaa bbb ccc", "aaa bbb", "aaa ddd", "eee fff" ) ) </pre> + : Example usage : <code>super-string( ( "aaa bbb ccc", "aaa bbb", "aaa ddd", "eee fff" ) )</code> : <br/> - : The function invocation in the example above returns : <pre> ( "aaa bbb" ) </pre> + : The function invocation in the example above returns : <code>( "aaa bbb" )</code> : : @param $s A sequence of strings. : @return The string that appears more frequently as part of the other strings in the sequence. @@ -194,9 +193,9 @@ : input sequence. : : <br/> - : Example usage : <pre> most-similar-edit-distance( ( "aaabbbccc", "aaabbb", "eeefff" ), "aaab" ) </pre> + : Example usage : <code>most-similar-edit-distance( ( "aaabbbccc", "aaabbb", "eeefff" ), "aaab" )</code> : <br/> - : The function invocation in the example above returns : <pre> ( "aaabbb" ) </pre> + : The function invocation in the example above returns : <code>( "aaabbb" )</code> : : @param $s A sequence of strings. : @param $m The string towards which we want to measure the edit distance. @@ -214,9 +213,9 @@ : value for the edit distance metric), return the first string according to the order of the input sequence. : : <br/> - : Example usage : <pre> least-similar-edit-distance( ( "aaabbbccc", "aaabbb", "eeefff" ), "aaab" ) </pre> + : Example usage : <code>least-similar-edit-distance( ( "aaabbbccc", "aaabbb", "eeefff" ), "aaab" )</code> : <br/> - : The function invocation in the example above returns : <pre> ( "eeefff" ) </pre> + : The function invocation in the example above returns : <code>( "eeefff" )</code> : : @param $s A sequence of strings. : @param $m The string towards which we want to measure the edit distance. @@ -234,9 +233,9 @@ : If more then one answer is possible, return the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> most-elements( ( <a><b/></a>, <a/>, <b/>) ) </pre> + : Example usage : <code>most-elements( ( <a><b/></a>, <a/>, <b/>) )</code> : <br/> - : The function invocation in the example above returns : <pre> (<a><b/></a>) </pre> + : The function invocation in the example above returns : <code>(<a><b/></a>)</code> : : @param $s A sequence of nodes. : @return The node having the largest number of descending elements in the input sequence. @@ -252,9 +251,9 @@ : If more then one answer is possible, return the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> most-attributes( ( <a att1="a1" att2="a2"/>, <b att1="a1" />, <c/> ) ) </pre> + : Example usage : <code>most-attributes( ( <a att1="a1" att2="a2"/>, <b att1="a1" />, <c/> ) )</code> : <br/> - : The function invocation in the example above returns : <pre> (<a att1="a1" att2="a2"/>) </pre> + : The function invocation in the example above returns : <code>(<a att1="a1" att2="a2"/>)</code> : : @param $s A sequence of nodes. : @return The node having the largest number of descending attributes in the input sequence. @@ -270,9 +269,9 @@ : If more then one answer is possible, return the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> most-nodes( ( <a><b/></a>, <a/>, <b/>) ) </pre> + : Example usage : <code>most-nodes( ( <a><b/></a>, <a/>, <b/>) )</code> : <br/> - : The function invocation in the example above returns : <pre> (<a><b/></a>) </pre> + : The function invocation in the example above returns : <code>(<a><b/></a>)</code> : : @param $s A sequence of nodes. : @return The node having the largest number of descending nodes in the input sequence. @@ -288,9 +287,9 @@ : If more then one answer is possible, return the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> least-elements( ( <a><b/></a>, <b><c/></b>, <d/>) ) </pre> + : Example usage : <code>least-elements( ( <a><b/></a>, <b><c/></b>, <d/>) )</code> : <br/> - : The function invocation in the example above returns : <pre> (<d/>) </pre> + : The function invocation in the example above returns : <code>(<d/>)</code> : : @param $s A sequence of nodes. : @return The node having the smallest number of descending elements in the input sequence. @@ -306,9 +305,9 @@ : If more then one answer is possible, return the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> least-attributes( ( <a att1="a1" att2="a2"/>, <b att1="a1" />, <c/> ) ) </pre> + : Example usage : <code>least-attributes( ( <a att1="a1" att2="a2"/>, <b att1="a1" />, <c/> ) )</code> : <br/> - : The function invocation in the example above returns : <pre> (<c/>) </pre> + : The function invocation in the example above returns : <code>(<c/>)</code> : : @param $s A sequence of nodes. : @return The node having the smallest number of descending attributes in the input sequence. @@ -324,9 +323,9 @@ : If more then one answer is possible, return the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> least-nodes( ( <a><b/></a>, <b><c/></b>, <d/>) ) </pre> + : Example usage : <code>least-nodes( ( <a><b/></a>, <b><c/></b>, <d/>) )</code> : <br/> - : The function invocation in the example above returns : <pre> (<d/>) </pre> + : The function invocation in the example above returns : <code>(<d/>)</code> : : @param $s A sequence of nodes. : @return The node having the smallest number of descending nodes in the input sequence. @@ -342,9 +341,9 @@ : If more then one answer is possible, return the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> most-distinct-elements( ( <a><b/><c/><d/></a>, <a><b/><b/><c/></a>, <a/> ) ) </pre> + : Example usage : <code>most-distinct-elements( ( <a><b/><c/><d/></a>, <a><b/><b/><c/></a>, <a/> ) )</code> : <br/> - : The function invocation in the example above returns : <pre> (<a><b/><c/><d/></a>) </pre> + : The function invocation in the example above returns : <code>(<a><b/><c/><d/></a>)</code> : : @param $s A sequence of nodes. : @return The node having the largest number of distinct descending elements in the input sequence. @@ -360,9 +359,9 @@ : If more then one answer is possible, return the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> most-distinct-attributes( ( <a att1="a1" att2="a2" att3="a3"/>, <a att1="a1" att2="a2"><b att2="a2" /></a>, <c/> ) ) </pre> + : Example usage : <code>most-distinct-attributes( ( <a att1="a1" att2="a2" att3="a3"/>, <a att1="a1" att2="a2"><b att2="a2" /></a>, <c/> ) )</code> : <br/> - : The function invocation in the example above returns : <pre> (<a att1="a1" att2="a2" att3="a3"/>) </pre> + : The function invocation in the example above returns : <code>(<a att1="a1" att2="a2" att3="a3"/>)</code> : : @param $s A sequence of nodes. : @return The node having the largest number of distinct descending attributes in the input sequence. @@ -378,9 +377,9 @@ : If more then one answer is possible, return the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> most-distinct-nodes( ( <a><b/></a>, <a><a/></a>, <b/>) ) </pre> + : Example usage : <code>most-distinct-nodes( ( <a><b/></a>, <a><a/></a>, <b/>) )</code> : <br/> - : The function invocation in the example above returns : <pre> (<a><b/></a>) </pre> + : The function invocation in the example above returns : <code>(<a><b/></a>)</code> : : @param $s A sequence of nodes. : @return The node having the largest number of distinct descending nodes in the input sequence. @@ -396,9 +395,9 @@ : If more then one answer is possible, return the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> least-distinct-elements( ( <a><b/></a>, <b><c/></b>, <d/>) ) </pre> + : Example usage : <code> least-distinct-elements( ( <a><b/></a>, <b><c/></b>, <d/>) ) </code> : <br/> - : The function invocation in the example above returns : <pre> (<d/>) </pre> + : The function invocation in the example above returns : <code> (<d/>) </code> : : @param $s A sequence of nodes. : @return The node having the smallest number of distinct descending elements in the input sequence. @@ -414,9 +413,9 @@ : If more then one answer is possible, return the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> least-distinct-attributes( ( <a att1="a1" att2="a2"/>, <b att1="a1" />, <c/> ) ) </pre> + : Example usage : <code> least-distinct-attributes( ( <a att1="a1" att2="a2"/>, <b att1="a1" />, <c/> ) ) </code> : <br/> - : The function invocation in the example above returns : <pre> (<c/>) </pre> + : The function invocation in the example above returns : <code> (<c/>) </code> : : @param $s A sequence of nodes. : @return The node having the smallest number of distinct descending attributes in the input sequence. @@ -432,9 +431,9 @@ : If more then one answer is possible, return the first node according to the order of the input sequence. : : <br/> - : Example usage : <pre> least-distinct-nodes( ( <a><b/></a>, <b><c/></b>, <d/>) ) </pre> + : Example usage : <code> least-distinct-nodes( ( <a><b/></a>, <b><c/></b>, <d/>) ) </code> : <br/> - : The function invocation in the example above returns : <pre> (<d/>) </pre> + : The function invocation in the example above returns : <code> (<d/>) </code> : : @param $s A sequence of nodes. : @return The node having the smallest number of distinct descending nodes in the input sequence. @@ -449,9 +448,9 @@ : produce a non-empty set of nodes in all the cases. : : <br/> - : Example usage : <pre> all-xpaths( ( <a><b/></a>, <c><d/></c>, <d/>), (".//b") ) </pre> + : Example usage : <code> all-xpaths( ( <a><b/></a>, <c><d/></c>, <d/>), (".//b") ) </code> : <br/> - : The function invocation in the example above returns : <pre> (<a><b/></a>) </pre> + : The function invocation in the example above returns : <code> (<a><b/></a>) </code> : : @param $s A sequence of elements. : @param $paths A sequence of strings denoting XPath expressions. @@ -475,9 +474,9 @@ : produce a non-empty set of nodes for some of the cases. : : <br/> - : Example usage : <pre> some-xpaths( ( <a><b/></a>, <d><c/></d>, <d/>), (".//b", ".//c") ) </pre> + : Example usage : <code> some-xpaths( ( <a><b/></a>, <d><c/></d>, <d/>), (".//b", ".//c") ) </code> : <br/> - : The function invocation in the example above returns : <pre> ( <a><b/></a> , <d><c/></d> ) </pre> + : The function invocation in the example above returns : <code> ( <a><b/></a> , <d><c/></d> ) </code> : : @param $s A sequence of elements. : @param $paths A sequence of strings denoting XPath expressions. @@ -503,9 +502,9 @@ : If more then one answer is possible, return the first element according to the order of the input sequence. : : <br/> - : Example usage : <pre> most-xpaths( ( <a><b/></a>, <d><c/><b/></d>, <d/>) , (".//b", ".//c") ) </pre> + : Example usage : <code> most-xpaths( ( <a><b/></a>, <d><c/><b/></d>, <d/>) , (".//b", ".//c") ) </code> : <br/> - : The function invocation in the example above returns : <pre> ( <d><c/><b/></d> ) </pre> + : The function invocation in the example above returns : <code> ( <d><c/><b/></d> ) </code> : : @param $s A sequence of elements. : @param $paths A sequence of strings denoting XPath expressions. @@ -534,9 +533,9 @@ : If more then one answer is possible, return the first element according to the order of the input sequence. : : <br/> - : Example usage : <pre> least-xpaths( ( <a><b/></a>, <d><c/><b/></d>, <d/>) , (".//b", ".//c") ) </pre> + : Example usage : <code> least-xpaths( ( <a><b/></a>, <d><c/><b/></d>, <d/>) , (".//b", ".//c") ) </code> : <br/> - : The function invocation in the example above returns : <pre> ( $lt;d/> ) </pre> + : The function invocation in the example above returns : <code> ( $lt;d/> ) </code> : : @param $s A sequence of elements. : @param $paths A sequence of strings denoting XPath expressions. @@ -563,9 +562,9 @@ : Returns the nodes from an input sequence of nodes that validate against a given XML Schema. : : <br/> - : Example usage : <pre> validating-schema ( ( <a/> , <b/> ), <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:element name="a" /></xs:schema> ) </pre> + : Example usage : <code> validating-schema ( ( <a/> , <b/> ), <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"><xs:element name="a" /></xs:schema> ) </code> : <br/> - : The function invocation in the example above returns : <pre> ( <a/> ) </pre> + : The function invocation in the example above returns : <code> ( <a/> ) </pre> : : @param $s A sequence of elements. : @param $schema An element encoding an XML Schema. === modified file 'src/com/zorba-xquery/www/modules/data-cleaning/conversion.xq' --- src/com/zorba-xquery/www/modules/data-cleaning/conversion.xq 2012-04-25 23:27:59 +0000 +++ src/com/zorba-xquery/www/modules/data-cleaning/conversion.xq 2012-09-28 13:37:23 +0000 @@ -23,7 +23,7 @@ : The logic contained in this module is not specific to any particular XQuery implementation. : : @author Bruno Martins and Diogo Simões - : @project data processing/data cleaning + : @project Zorba/Data Cleaning/Conversion :) module namespace conversion = "http://www.zorba-xquery.com/modules/data-cleaning/conversion"; === modified file 'src/com/zorba-xquery/www/modules/data-cleaning/hybrid-string-similarity.xq' --- src/com/zorba-xquery/www/modules/data-cleaning/hybrid-string-similarity.xq 2012-05-16 17:27:36 +0000 +++ src/com/zorba-xquery/www/modules/data-cleaning/hybrid-string-similarity.xq 2012-09-28 13:37:23 +0000 @@ -25,7 +25,7 @@ : function such as sqrt($x as numeric) for computing the square root. : : @author Bruno Martins and Diogo Simões - : @project data processing/data cleaning + : @project Zorba/Data Cleaning/Hybrid String Similarity :) module namespace simh = "http://www.zorba-xquery.com/modules/data-cleaning/hybrid-string-similarity"; @@ -51,9 +51,9 @@ : this function returns the cosine similarity coefficient between sets of Soundex keys. : : <br/> - : Example usage : <pre> soft-cosine-tokens-soundex("ALEKSANDER SMITH", "ALEXANDER SMYTH", " +") </pre> + : Example usage : <code> soft-cosine-tokens-soundex("ALEKSANDER SMITH", "ALEXANDER SMYTH", " +") </code> : <br/> - : The function invocation in the example above returns : <pre> 1.0 </pre> + : The function invocation in the example above returns : <code> 1.0 </code> : : @param $s1 The first string. : @param $s2 The second string. @@ -75,9 +75,9 @@ : this function returns the cosine similarity coefficient between sets of Metaphone keys. : : <br/> - : Example usage : <pre> soft-cosine-tokens-metaphone("ALEKSANDER SMITH", "ALEXANDER SMYTH", " +" ) </pre> + : Example usage : <code> soft-cosine-tokens-metaphone("ALEKSANDER SMITH", "ALEXANDER SMYTH", " +" ) </code> : <br/> - : The function invocation in the example above returns : <pre> 1.0 </pre> + : The function invocation in the example above returns : <code> 1.0 </code> : : @param $s1 The first string. : @param $s2 The second string. @@ -99,9 +99,9 @@ : bellow a given threshold are considered as matching tokens. : : <br/> - : Example usage : <pre> soft-cosine-tokens-edit-distance("The FLWOR Foundation", "FLWOR Found.", " +", 0 ) </pre> + : Example usage : <code> soft-cosine-tokens-edit-distance("The FLWOR Foundation", "FLWOR Found.", " +", 0 ) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.408248290463863 </pre> + : The function invocation in the example above returns : <code> 0.408248290463863 </code> : : @param $s1 The first string. : @param $s2 The second string. @@ -135,9 +135,9 @@ : a given threshold are considered as matching tokens. : : <br/> - : Example usage : <pre> soft-cosine-tokens-jaro("The FLWOR Foundation", "FLWOR Found.", " +", 1 ) </pre> + : Example usage : <code> soft-cosine-tokens-jaro("The FLWOR Foundation", "FLWOR Found.", " +", 1 ) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.5 </pre> + : The function invocation in the example above returns : <code> 0.5 </code> : : @param $s1 The first string. : @param $s2 The second string. @@ -169,9 +169,9 @@ : similarity above a given threshold are considered as matching tokens. : : <br/> - : Example usage : <pre> soft-cosine-tokens-jaro-winkler("The FLWOR Foundation", "FLWOR Found.", " +", 1, 4, 0.1 ) </pre> + : Example usage : <code> soft-cosine-tokens-jaro-winkler("The FLWOR Foundation", "FLWOR Found.", " +", 1, 4, 0.1 ) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.45 </pre> + : The function invocation in the example above returns : <code> 0.45 </code> : : @param $s1 The first string. : @param $s2 The second string. @@ -202,9 +202,9 @@ : similarity function to discover token identity. : : <br/> - : Example usage : <pre> monge-elkan-jaro-winkler("Comput. Sci. and Eng. Dept., University of California, San Diego", "Department of Computer Scinece, Univ. Calif., San Diego", 4, 0.1) </pre> + : Example usage : <code> monge-elkan-jaro-winkler("Comput. Sci. and Eng. Dept., University of California, San Diego", "Department of Computer Scinece, Univ. Calif., San Diego", 4, 0.1) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.992 </pre> + : The function invocation in the example above returns : <code> 0.992 </code> : : @param $s1 The first string. : @param $s2 The second string. === modified file 'src/com/zorba-xquery/www/modules/data-cleaning/normalization.xq' --- src/com/zorba-xquery/www/modules/data-cleaning/normalization.xq 2012-04-11 09:50:34 +0000 +++ src/com/zorba-xquery/www/modules/data-cleaning/normalization.xq 2012-09-28 13:37:23 +0000 @@ -25,7 +25,7 @@ : The logic contained in this module is not specific to any particular XQuery implementation. : : @author Bruno Martins and Diogo Simões - : @project data processing/data cleaning + : @project Zorba/Data Cleaning/Normalization :) module namespace normalization = "http://www.zorba-xquery.com/modules/data-cleaning/normalization"; @@ -157,20 +157,20 @@ : letter or 'O' or 'E' and then a single letter. Any character in the format string that is not part of a conversion specification : is interpreted literally, and the string '%%' gives '%'. The supported conversion specifications are as follows: : - : <pre> - : '%H' Hours as decimal number (00-23).<br/> - : '%I' Hours as decimal number (01-12).<br/> - : '%M' Minute as decimal number (00-59).<br/> - : '%p' AM/PM indicator in the locale. Used in conjunction with '%I' and *not* with '%H'.<br/> + : <pre class="ace-static"> + : '%H' Hours as decimal number (00-23). + : '%I' Hours as decimal number (01-12). + : '%M' Minute as decimal number (00-59). + : '%p' AM/PM indicator in the locale. Used in conjunction with '%I' and *not* with '%H'. : '%S' Second as decimal number (00-61), allowing for up to two leap-seconds.<br/> - : '%X' Time, locale-specific.<br/> - : '%z' Offset from Greenwich, so '-0900' is 9 hours west of Greenwich.<br/> - : '%Z' Time zone as a character string.<br/> - : '%k' The 24-hour clock time with single digits preceded by a blank.<br/> - : '%l' The 12-hour clock time with single digits preceded by a blank.<br/> - : '%r' The 12-hour clock time (using the locale's AM or PM).<br/> - : '%R' Equivalent to '%H:%M'.<br/> - : '%T' Equivalent to '%H:%M:%S'.<br/> + : '%X' Time, locale-specific. + : '%z' Offset from Greenwich, so '-0900' is 9 hours west of Greenwich. + : '%Z' Time zone as a character string. + : '%k' The 24-hour clock time with single digits preceded by a blank. + : '%l' The 12-hour clock time with single digits preceded by a blank. + : '%r' The 12-hour clock time (using the locale's AM or PM). + : '%R' Equivalent to '%H:%M'. + : '%T' Equivalent to '%H:%M:%S'. :</pre> : : @return The time value resulting from the conversion. @@ -534,36 +534,36 @@ : letter or 'O' or 'E' and then a single letter. Any character in the format string that is not part of a conversion specification : is interpreted literally, and the string '%%' gives '%'. The supported conversion specifications are as follows: : - : <pre> - : '%b' Abbreviated month name in the current locale.<br/> - : '%B' Full month name in the current locale.<br/> - : '%c' Date and time, locale-specific.<br/> - : '%C' Century (00-99): the integer part of the year divided by 100.<br/> - : '%d' Day of the month as decimal number (01-31).<br/> - : '%H' Hours as decimal number (00-23).<br/> - : '%I' Hours as decimal number (01-12).<br/> - : '%j' Day of year as decimal number (001-366).<br/> - : '%m' Month as decimal number (01-12).<br/> - : '%M' Minute as decimal number (00-59).<br/> - : '%p' AM/PM indicator in the locale. Used in conjunction with '%I' and *not* with '%H'.<br/> - : '%S' Second as decimal number (00-61), allowing for up to two leap-seconds.<br/> - : '%x' Date, locale-specific.<br/> - : '%X' Time, locale-specific.<br/> - : '%y' Year without century (00-99).<br/> - : '%Y' Year with century.<br/> - : '%z' Offset from Greenwich, so '-0900' is 9 hours west of Greenwich.<br/> - : '%Z' Time zone as a character string.<br/> - : '%D' Locale-specific date format such as '%m/%d/%y': ISO C99 says it should be that exact format.<br/> - : '%e' Day of the month as decimal number (1-31), with a leading pace for a single-digit number.<br/> - : '%F' Equivalent to %Y-%m-%d (the ISO 8601 date format).<br/> - : '%g' The last two digits of the week-based year (see '%V').<br/> - : '%G' The week-based year (see '%V') as a decimal number.<br/> - : '%h' Equivalent to '%b'.<br/> - : '%k' The 24-hour clock time with single digits preceded by a blank.<br/> - : '%l' The 12-hour clock time with single digits preceded by a blank.<br/> - : '%r' The 12-hour clock time (using the locale's AM or PM).<br/> - : '%R' Equivalent to '%H:%M'.<br/> - : '%T' Equivalent to '%H:%M:%S'.<br/> + : <pre class="ace-static"> + : '%b' Abbreviated month name in the current locale. + : '%B' Full month name in the current locale. + : '%c' Date and time, locale-specific. + : '%C' Century (00-99): the integer part of the year divided by 100. + : '%d' Day of the month as decimal number (01-31). + : '%H' Hours as decimal number (00-23). + : '%I' Hours as decimal number (01-12). + : '%j' Day of year as decimal number (001-366). + : '%m' Month as decimal number (01-12). + : '%M' Minute as decimal number (00-59). + : '%p' AM/PM indicator in the locale. Used in conjunction with '%I' and *not* with '%H'. + : '%S' Second as decimal number (00-61), allowing for up to two leap-seconds. + : '%x' Date, locale-specific. + : '%X' Time, locale-specific. + : '%y' Year without century (00-99). + : '%Y' Year with century. + : '%z' Offset from Greenwich, so '-0900' is 9 hours west of Greenwich. + : '%Z' Time zone as a character string. + : '%D' Locale-specific date format such as '%m/%d/%y': ISO C99 says it should be that exact format. + : '%e' Day of the month as decimal number (1-31), with a leading pace for a single-digit number. + : '%F' Equivalent to %Y-%m-%d (the ISO 8601 date format). + : '%g' The last two digits of the week-based year (see '%V'). + : '%G' The week-based year (see '%V') as a decimal number. + : '%h' Equivalent to '%b'. + : '%k' The 24-hour clock time with single digits preceded by a blank. + : '%l' The 12-hour clock time with single digits preceded by a blank. + : '%r' The 12-hour clock time (using the locale's AM or PM). + : '%R' Equivalent to '%H:%M'. + : '%T' Equivalent to '%H:%M:%S'. :</pre> : : @return The dateTime value resulting from the conversion. === modified file 'src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq' --- src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq 2011-11-08 21:16:29 +0000 +++ src/com/zorba-xquery/www/modules/data-cleaning/phonetic-string-similarity.xq 2012-09-28 13:37:23 +0000 @@ -25,7 +25,7 @@ : The logic contained in this module is not specific to any particular XQuery implementation. : : @author Bruno Martins - : @project data processing/data cleaning + : @project Zorba/Data Cleaning/Phonectic String Similarity :) module namespace simp = "http://www.zorba-xquery.com/modules/data-cleaning/phonetic-string-similarity"; @@ -37,9 +37,9 @@ : Returns the Soundex key for a given string. : : <br/> - : Example usage : <pre> soundex-key("Robert") </pre> + : Example usage : <code>soundex-key("Robert")</code> : <br/> - : The function invocation in the example above returns : <pre> "R163" </pre> + : The function invocation in the example above returns : <code>"R163"</code> : : @param $s1 The string. : @return The Soundex key for the given input string. @@ -59,9 +59,9 @@ : Checks if two strings have the same Soundex key. : : <br/> - : Example usage : <pre> soundex( "Robert" , "Rupert" ) </pre> + : Example usage : <code>soundex( "Robert" , "Rupert" )</code> : <br/> - : The function invocation in the example above returns : <pre> true </pre> + : The function invocation in the example above returns : <code>true</code> : : @param $s1 The first string. : @param $s2 The second string. @@ -77,9 +77,9 @@ : The Metaphone algorithm produces variable length keys as its output, as opposed to Soundex's fixed-length keys. : : <br/> - : Example usage : <pre> metaphone-key("ALEKSANDER") </pre> + : Example usage : <code>metaphone-key("ALEKSANDER")</code> : <br/> - : The function invocation in the example above returns : <pre> "ALKSNTR" </pre> + : The function invocation in the example above returns : <code>"ALKSNTR"</code> : : @param $s1 The string. : @return The Metaphone key for the given input string. @@ -103,9 +103,9 @@ : Checks if two strings have the same Metaphone key. : : <br/> - : Example usage : <pre> metaphone("ALEKSANDER", "ALEXANDRE") </pre> + : Example usage : <code>metaphone("ALEKSANDER", "ALEXANDRE")</code> : <br/> - : The function invocation in the example above returns : <pre> true </pre> + : The function invocation in the example above returns : <code>true</code> : : @param $s1 The first string. : @param $s2 The second string. === modified file 'src/com/zorba-xquery/www/modules/data-cleaning/set-similarity.xq' --- src/com/zorba-xquery/www/modules/data-cleaning/set-similarity.xq 2012-04-26 16:11:48 +0000 +++ src/com/zorba-xquery/www/modules/data-cleaning/set-similarity.xq 2012-09-28 13:37:23 +0000 @@ -25,7 +25,7 @@ : The logic contained in this module is not specific to any particular XQuery implementation. : : @author Bruno Martins - : @project data processing/data cleaning + : @project Zorba/Data Cleaning/Set Similarity :) module namespace set = "http://www.zorba-xquery.com/modules/data-cleaning/set-similarity"; @@ -37,9 +37,9 @@ : Returns the union between two sets, using the deep-equal() function to compare the XML nodes from the sets. : : <br/> - : Example usage : <pre> deep-union ( ( "a", "b", "c") , ( "a", "a", <d/> ) ) </pre> + : Example usage : <code> deep-union ( ( "a", "b", "c") , ( "a", "a", <d/> ) ) </code> : <br/> - : The function invocation in the example above returns : <pre> ("a", "b", "c", <d/> ) </pre> + : The function invocation in the example above returns : <code> ("a", "b", "c", <d/> ) </code> : : @param $s1 The first set. : @param $s2 The second set. @@ -57,9 +57,9 @@ : Returns the intersection between two sets, using the deep-equal() function to compare the XML nodes from the sets. : : <br/> - : Example usage : <pre> deep-intersect ( ( "a", "b", "c") , ( "a", "a", <d/> ) ) </pre> + : Example usage : <code> deep-intersect ( ( "a", "b", "c") , ( "a", "a", <d/> ) ) </code> : <br/> - : The function invocation in the example above returns : <pre> ("a") </pre> + : The function invocation in the example above returns : <code> ("a") </code> : : @param $s1 The first set. : @param $s2 The second set. @@ -78,9 +78,9 @@ : Removes exact duplicates from a set, using the deep-equal() function to compare the XML nodes from the sets. : : <br/> - : Example usage : <pre> distinct ( ( "a", "a", <b/> ) ) </pre> + : Example usage : <code> distinct ( ( "a", "a", <b/> ) ) </code> : <br/> - : The function invocation in the example above returns : <pre> ("a", <b/> ) </pre> + : The function invocation in the example above returns : <code> ("a", <b/> ) </code> : : @param $s A set. : @return The set provided as input without the exact duplicates (i.e., returns the distinct nodes from the set provided as input). @@ -98,9 +98,9 @@ : (i.e., the size of the intersection) over the size of the smallest input set. : : <br/> - : Example usage : <pre> overlap ( ( "a", "b", <c/> ) , ( "a", "a", "b" ) ) </pre> + : Example usage : <code> overlap ( ( "a", "b", <c/> ) , ( "a", "a", "b" ) ) </code> : <br/> - : The function invocation in the example above returns : <pre> 1.0 </pre> + : The function invocation in the example above returns : <code> 1.0 </code> : : @param $s1 The first set. : @param $s2 The second set. @@ -117,9 +117,9 @@ : (i.e., the size of the intersection) over the sum of the cardinalities for the input sets. : : <br/> - : Example usage : <pre> dice ( ( "a", "b", <c/> ) , ( "a", "a", "d") ) </pre> + : Example usage : <code> dice ( ( "a", "b", <c/> ) , ( "a", "a", "d") ) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.4 </pre> + : The function invocation in the example above returns : <code> 0.4 </code> : : @param $s1 The first set. : @param $s2 The second set. @@ -136,9 +136,9 @@ : union of the input sets. : : <br/> - : Example usage : <pre> jaccard ( ( "a", "b", <c/> ) , ( "a", "a", "d") ) </pre> + : Example usage : <code> jaccard ( ( "a", "b", <c/> ) , ( "a", "a", "d") ) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.25 </pre> + : The function invocation in the example above returns : <code> 0.25 </code> : : @param $s1 The first set. : @param $s2 The second set. === modified file 'src/com/zorba-xquery/www/modules/data-cleaning/token-based-string-similarity.xq' --- src/com/zorba-xquery/www/modules/data-cleaning/token-based-string-similarity.xq 2012-05-16 17:27:36 +0000 +++ src/com/zorba-xquery/www/modules/data-cleaning/token-based-string-similarity.xq 2012-09-28 13:37:23 +0000 @@ -30,7 +30,7 @@ : function such as sqrt($x as numeric) for computing the square root. : : @author Bruno Martins - : @project data processing/data cleaning + : @project Zorba/Data Cleaning/Token Based String Similarity :) module namespace simt = "http://www.zorba-xquery.com/modules/data-cleaning/token-based-string-similarity"; @@ -48,9 +48,9 @@ : Returns the individual character n-grams forming a string. : : <br/> - : Example usage : <pre> ngrams("FLWOR", 2 ) </pre> + : Example usage : <code> ngrams("FLWOR", 2 ) </code> : <br/> - : The function invocation in the example above returns : <pre> ("_F" , "FL" , "LW" , "WO" , "LW" , "WO" , "OR" , "R_") </pre> + : The function invocation in the example above returns : <code> ("_F" , "FL" , "LW" , "WO" , "LW" , "WO" , "OR" , "R_") </code> : : @param $s The input string. : @param $n The number of characters to consider when extracting n-grams. @@ -77,9 +77,9 @@ : using stringdescriptors based on sets of character n-grams or sets of tokens extracted from two strings. : : <br/> - : Example usage : <pre> cosine( ("aa","bb") , ("bb","aa")) </pre> + : Example usage : <code> cosine( ("aa","bb") , ("bb","aa")) </code> : <br/> - : The function invocation in the example above returns : <pre> 1.0 </pre> + : The function invocation in the example above returns : <code> 1.0 </code> : : @param $desc1 The descriptor for the first string. : @param $desc2 The descriptor for the second string. @@ -100,9 +100,9 @@ : Returns the Dice similarity coefficient between sets of character n-grams extracted from two strings. : : <br/> - : Example usage : <pre> dice-ngrams("DWAYNE", "DUANE", 2 ) </pre> + : Example usage : <code> dice-ngrams("DWAYNE", "DUANE", 2 ) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.4615384615384616 </pre> + : The function invocation in the example above returns : <code> 0.4615384615384616 </code> : : @param $s1 The first string. : @param $s2 The second string. @@ -118,9 +118,9 @@ : Returns the overlap similarity coefficient between sets of character n-grams extracted from two strings. : : <br/> - : Example usage : <pre> overlap-ngrams("DWAYNE", "DUANE", 2 ) </pre> + : Example usage : <code> overlap-ngrams("DWAYNE", "DUANE", 2 ) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.5 </pre> + : The function invocation in the example above returns : <code> 0.5 </code> : : @param $s1 The first string. : @param $s2 The second string. @@ -136,9 +136,9 @@ : Returns the Jaccard similarity coefficient between sets of character n-grams extracted from two strings. : : <br/> - : Example usage : <pre> jaccard-ngrams("DWAYNE", "DUANE", 2 ) </pre> + : Example usage : <code> jaccard-ngrams("DWAYNE", "DUANE", 2 ) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.3 </pre> + : The function invocation in the example above returns : <code> 0.3 </code> : : @param $s1 The first string. : @param $s2 The second string. @@ -156,9 +156,9 @@ : the term-frequency heuristic from Information Retrieval). : : <br/> - : Example usage : <pre> cosine-ngrams("DWAYNE", "DUANE", 2 ) </pre> + : Example usage : <code> cosine-ngrams("DWAYNE", "DUANE", 2 ) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.2401922307076307 </pre> + : The function invocation in the example above returns : <code> 0.2401922307076307 </code> : : @param $s1 The first string. : @param $s2 The second string. @@ -176,9 +176,9 @@ : Returns the Dice similarity coefficient between sets of tokens extracted from two strings. : : <br/> - : Example usage : <pre> dice-tokens("The FLWOR Foundation", "FLWOR Found.", " +" ) </pre> + : Example usage : <code> dice-tokens("The FLWOR Foundation", "FLWOR Found.", " +" ) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.4 </pre> + : The function invocation in the example above returns : <code> 0.4 </code> : : @param $s1 The first string. : @param $s2 The second string. @@ -194,9 +194,9 @@ : Returns the overlap similarity coefficient between sets of tokens extracted from two strings. : : <br/> - : Example usage : <pre> overlap-tokens("The FLWOR Foundation", "FLWOR Found.", " +" ) </pre> + : Example usage : <code> overlap-tokens("The FLWOR Foundation", "FLWOR Found.", " +" ) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.5 </pre> + : The function invocation in the example above returns : <code> 0.5 </code> : : @param $s1 The first string. : @param $s2 The second string. @@ -212,9 +212,9 @@ : Returns the Jaccard similarity coefficient between sets of tokens extracted from two strings. : : <br/> - : Example usage : <pre> jaccard-tokens("The FLWOR Foundation", "FLWOR Found.", " +" ) </pre> + : Example usage : <code> jaccard-tokens("The FLWOR Foundation", "FLWOR Found.", " +" ) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.25 </pre> + : The function invocation in the example above returns : <code> 0.25 </code> : : @param $s1 The first string. : @param $s2 The second string. @@ -232,9 +232,9 @@ : term-frequency heuristic from Information Retrieval). : : <br/> - : Example usage : <pre> cosine-tokens("The FLWOR Foundation", "FLWOR Found.", " +" ) </pre> + : Example usage : <code> cosine-tokens("The FLWOR Foundation", "FLWOR Found.", " +" ) </code> : <br/> - : The function invocation in the example above returns : <pre> 0.408248290463863 </pre> + : The function invocation in the example above returns : <code> 0.408248290463863 </code> : : @param $s1 The first string. : @param $s2 The second string.
-- Mailing list: https://launchpad.net/~zorba-coders Post to : [email protected] Unsubscribe : https://launchpad.net/~zorba-coders More help : https://help.launchpad.net/ListHelp

