Re: [basex-talk] How to apply array:for-each on a - sequence - of arrays? SOLVED

2020-03-31 Thread Ben Engbers
Hi,

> To insert the third value into each array I think you want
> 
>   let $result := $idf ! array:append(., math:log($count div .(2) ))

This works!

Martin and Graydon, thanks for the help and the explanation.

Ben

import module namespace tidyTM = 'http://www.be-logical.nl';

declare function local:step_one($nodes as node()*) as array(*)*
{
  let $text := for $node in $nodes
 return $node/text() =>
 tokenize() => distinct-values()
  let $idf := $text   =>
 tidyTM:wordCount_arr()
  return $idf
};

declare function local:wordFreq_idf($nodes as node()*)  as array(*)
{
  let $count := count($nodes)
  let $idf := local:step_one($nodes)
  let $result := $idf ! array:append(., math:log($count div .(2) ))
  return $result
};

let $nodes :=
collection('IncidentRemarks/Incidenten-180101-190630.csv')/csv/record/INC_RM
let $Stoppers := doc('TextMining/Stopwoorden.txt')/text/line/text()

return local:wordFreq_idf(
  tidyTM:remove_Stopwords($nodes, "Stp", $Stoppers))

--

declare function tidyTM:wordCount_arr(
  $Words as xs:string*)
  as array(*)* {
for $w in $Words
  let $f := $w
  group by $f
  order by count($w) descending
return ([$f, count($w)])
} ;

---

["probleem", 703, 9.362885817944681e-1]
["opgelost.", 248, 1.9782167274401508e0]
...



Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Martin Honnen

On 31.03.2020 18:32, Ben Engbers wrote:

Hi,

For (my personal) clarity, I have split up the original function in two
parts:

declare function local:step_one($nodes as node()*) as array(*)*
{
   let $text := for $node in $nodes
  return $node/text() =>
  tokenize() => distinct-values()
   let $idf := $text   =>
  tidyTM:wordCount_arr()
   return $idf
};

In local:step_one(), I first create a sequence with the distinct tokens
for each $node. All the sequences are joined in $text.
I then call wordCount_arr to count the occurences of each word in $text:

declare function tidyTM:wordCount_arr(
   $Words as xs:string*)
   as array(*) {
for $w in $Words
   let $f := $w
   group by $f
   order by count($w) descending
return ([$f, count($w)])
} ;

I would say that tidyTM:wordCount_arr returns a sequence of arrays but I
am not certain if I have specified the correct return-type?


Reading the code I agree that the return type seems to be a sequence of
arrays but therefore I wonder why you don't get a similar error as later
on with declaring
  array(*)
and not
  array(*)*


Calling local:step_one(tidyTM:remove_Stopwords($nodes, "Stp", $Stoppers))
returns:
["probleem", 703]
["opgelost.", 248]


I had hoped that calling  the following local:wordFreq, would add the
idf to each element but instead I get an error

declare function local:wordFreq_idf($nodes as node()*)  as array(*)
{
   let $count := count($nodes)
   let $idf := local:step_one($nodes)
   let $result := for-each( $idf,
 function($z) {array:append ($z, math:log($count div $z(2) ) ) } )
   return $result
};
[XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): $idf
:= ([ "probleem", 703 ], [ "opgelost.", 248 ], ...).


The message tries to tell you that the declared return type
  array(*)
is a single array while the function returns a (non-empty) sequence of
arrays so using
  declare function local:wordFreq_idf($nodes as node()*)  as array(*)*
would remove that error.

To insert the third value into each array I think you want

  let $result := $idf ! array:append(., math:log($count div .(2) ))


Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Ben Engbers
Hi,

For (my personal) clarity, I have split up the original function in two
parts:

declare function local:step_one($nodes as node()*) as array(*)*
{
  let $text := for $node in $nodes
 return $node/text() =>
 tokenize() => distinct-values()
  let $idf := $text   =>
 tidyTM:wordCount_arr()
  return $idf
};

In local:step_one(), I first create a sequence with the distinct tokens
for each $node. All the sequences are joined in $text.
I then call wordCount_arr to count the occurences of each word in $text:

declare function tidyTM:wordCount_arr(
  $Words as xs:string*)
  as array(*) {
for $w in $Words
  let $f := $w
  group by $f
  order by count($w) descending
return ([$f, count($w)])
} ;

I would say that tidyTM:wordCount_arr returns a sequence of arrays but I
am not certain if I have specified the correct return-type?

Calling local:step_one(tidyTM:remove_Stopwords($nodes, "Stp", $Stoppers))
returns:
["probleem", 703]
["opgelost.", 248]


I had hoped that calling  the following local:wordFreq, would add the
idf to each element but instead I get an error

declare function local:wordFreq_idf($nodes as node()*)  as array(*)
{
  let $count := count($nodes)
  let $idf := local:step_one($nodes)
  let $result := for-each( $idf,
function($z) {array:append ($z, math:log($count div $z(2) ) ) } )
  return $result
};
[XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): $idf
:= ([ "probleem", 703 ], [ "opgelost.", 248 ], ...).


Cheers, Ben

Op 31-03-2020 om 16:29 schreef Martin Honnen:
> So does the working function return a sequence of arrays? That doesn't
> match the
>   as array(*)
> return type declaration, it seems.
> 
> What does tidyTM:wordCount_arr() return, a single array (of atomic items)?




Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Ben Engbers
Hi,

> => means "take the thing on the left and substitute it for the first
> parameter of the function on the right, so 
I thought it meant "The first parameter on the right will be subsituted
with the thing on the left"?

> ('weasels') => replace('weasels','mustelids')  works
> 
> ('weasels','badgers') => replace('weasels','mustelids')  DOES NOT work
> 
> This is because a one-item sequence can be treated as the single string
> value the first parameter of replace() requires, but a
> greater-then-one-item sequence can't be.  (This one gives you "item
> expected, sequence found" if you try it from the GUI.)

The following is quite similar to the 'piping' mechanism in R.
I'll start experimenting with it.

Thanx,
Ben
> ! means "take each item of the sequence on the left and pass it to the
> thing on the right in turn", so
> 
> ('weasels','badgers') ! replace(.,'weasels','mustelids')  works.
> 
> (note that replace() got its first parameter back as the context item
> dot.)
> 
> so if you take
> 
> => array:for-each(function($idf) {array:append($idf,math:log($count div 
> $idf[2]) )})
> 
> and replace it with 
> ! array:for-each(.,function($idf) {array:append($idf,math:log($count div 
> $idf[2]) )})
> 
> (note the context-item dot!)
> 
> you should at least get a different error message.
> 
> -- Graydon
> 



Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Graydon
On Tue, Mar 31, 2020 at 04:21:52PM +0200, Ben Engbers scripsit:
> Op 31-03-2020 om 01:18 schreef Graydon:
> > On Mon, Mar 30, 2020 at 11:16:23PM +0200, Ben Engbers scripsit:
> > [snip]
> >> For "probleem", the idf should be calculated as ln($count/703). Since
> >> there are 1780 nodes this would result in 0.929011751.
> >> I tried to exten the 'let $idf' line with:
> >>=> array:for-each(function($idf) {array:append($idf,
> >> math:log($count div $idf[2]) )})
> >> which should result in ["probleem", 703, 0.929011751]
> >>
> >> but no mather what I do, every time I get this error:
> >> [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([
> >> "probleem", 703 ], [ "opgelost.", 248 ], ...).
> > 
> > The errors says you're trying to feed a sequence of arrays to an array
> > function; maybe you want ! where you have => ?
> 
> Upon your remark about feeding a sequence of arrays, I first tried to
> apply 'for-each' instead of 'array:for-each'. Alas, that didn't help
> ;-(, the error was still the same.

array:for-each takes a single array and gives you back a new array based
on what the anonymous function passed as the second parameter does to
each member of the original array.

So you have to make sure you're feeding a single array to it.  (and
you're not; that's what the error message is telling you, you've got a
sequence of arrays on the left of the => operator.)

> I then tried to understand what you mean with the '!'.
> In the book from Priscilla Walmsley, the ! is mentioned as a simple map
> operator. How is that related to this problem?

=> means "take the thing on the left and substitute it for the first
parameter of the function on the right, so 

('weasels') => replace('weasels','mustelids')  works

('weasels','badgers') => replace('weasels','mustelids')  DOES NOT work

This is because a one-item sequence can be treated as the single string
value the first parameter of replace() requires, but a
greater-then-one-item sequence can't be.  (This one gives you "item
expected, sequence found" if you try it from the GUI.)

! means "take each item of the sequence on the left and pass it to the
thing on the right in turn", so

('weasels','badgers') ! replace(.,'weasels','mustelids')  works.

(note that replace() got its first parameter back as the context item
dot.)

so if you take

=> array:for-each(function($idf) {array:append($idf,math:log($count div 
$idf[2]) )})

and replace it with 
! array:for-each(.,function($idf) {array:append($idf,math:log($count div 
$idf[2]) )})

(note the context-item dot!)

you should at least get a different error message.

-- Graydon


Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Martin Honnen

Am 30.03.2020 um 23:16 schrieb Ben Engbers:

Hi,

In textmining, the 'idf' or inverse document frequency is defined as
idf(term)=ln(ndocuments / ndocuments containing term). I am working on a
function that should return this idf.

This function:

declare function local:wordFreq_idf($nodes as node()*) as array(*) {
   let $count := count($nodes)
   let $text := for $node in $nodes
  return $node/text() => tokenize() => distinct-values()
  let $idf := $text   => tidyTM:wordCount_arr()
   return $idf
};

returns:

["probleem", 703]
["opgelost.", 248]
["dictu", 235]
["opgelost", 217]
["medewerker", 193]
...


So does the working function return a sequence of arrays? That doesn't
match the
  as array(*)
return type declaration, it seems.

What does tidyTM:wordCount_arr() return, a single array (of atomic items)?






Re: [basex-talk] How to apply array:for-each on an array of arrays?

2020-03-31 Thread Ben Engbers
Op 31-03-2020 om 01:18 schreef Graydon:
> On Mon, Mar 30, 2020 at 11:16:23PM +0200, Ben Engbers scripsit:
> [snip]
>> For "probleem", the idf should be calculated as ln($count/703). Since
>> there are 1780 nodes this would result in 0.929011751.
>> I tried to exten the 'let $idf' line with:
>>=> array:for-each(function($idf) {array:append($idf,
>> math:log($count div $idf[2]) )})
>> which should result in ["probleem", 703, 0.929011751]
>>
>> but no mather what I do, every time I get this error:
>> [XPTY0004] Cannot promote (array(xs:anyAtomicType))+ to array(*): ([
>> "probleem", 703 ], [ "opgelost.", 248 ], ...).
> 
> The errors says you're trying to feed a sequence of arrays to an array
> function; maybe you want ! where you have => ?
> 
> -- Graydon
> 

Hi,
Upon your remark about feeding a sequence of arrays, I first tried to
apply 'for-each' instead of 'array:for-each'. Alas, that didn't help
;-(, the error was still the same.
I then tried to understand what you mean with the '!'.
In the book from Priscilla Walmsley, the ! is mentioned as a simple map
operator. How is that related to this problem?

Cheers,
Ben


Re: [basex-talk] Use of function-lookup in REST-XQ environment

2020-03-31 Thread Mauro Mugnaini
Dear Christian,
Thank you so much, it also worked very well in version 9.3.1 that is the one we 
are using at the moment.
We plan to move to the next stable release when it will be available.

Thanks also for your words; all the best to all: to Italy and also to people 
all over the world.

Mauro


> Il giorno 31 mar 2020, alle ore 14:14, Christian Grün 
>  ha scritto:
> 
> Dear Mauro,
> 
> The combination of higher-order functions and MIXUPDATE led to a wrong
> assignment of static properties of your updating expression. I have
> made the compilation step more rigid: If a dynamic function call is
> found, and if mixing updates is enabled, the function will now always
> be tagged as an updating function. Feel free to check out the latest
> stable snapshot [1].
> 
> If you want to stick with 9.3.2, you can prefix the function lookup
> call with an additional "updating" keyword:
> 
>  updating fn:function-lookup(fn:QName('urn:ns:impl2', 'store'), 1)($resource)
> 
> Cari saluti all’ Italia; our hearts are with you,
> Christian
> 
> [1] http://files.basex.org/releases/latest/
> 
> 
> 
>> Hi all,
>> In order to have a "dynamic backend” implementation to REST-XQ invocations 
>> we used the fn:function-lookup() function but we are experiencing a strange 
>> behavior if the target looked-up function is an updating function that for 
>> example adds a resource to a database.
>> 
>> Putting the provided files in the webapp folder of a fresh new BaseX latest 
>> (9.3.2) downloaded instance (adding "MIXUPDATES = true” setting in the 
>> .basex and with a database named DB already created inside) and invoking the 
>> REST function that calls directly the backend function with [1] the resource 
>> is correctly created on the database, by invoking the REST function that 
>> performs the dynamic lookup by using the fn:function-lookup() with [2] it 
>> completes without errors but the resource is not added to the DB.
>> In our implementation the implementation modules are stored in the repo but 
>> the same behavior can be reproduced with “direct” function call as per my 
>> example.
>> 
>> Is there any limitation in the looked-up function related to updating 
>> databases?
>> 
>> Regards,
>> 
>> Mauro
>> 
>> [1]
>> http:send-request(
>>  
>>
>>  ,
>>  "http://localhost:8984/store;,
>>  
>> )
>> [2]
>> http:send-request(
>>  
>>
>>  ,
>>  "http://localhost:8984/dyn-store;,
>>  
>> )
>> 



Re: [basex-talk] Use of function-lookup in REST-XQ environment

2020-03-31 Thread Christian Grün
Dear Mauro,

The combination of higher-order functions and MIXUPDATE led to a wrong
assignment of static properties of your updating expression. I have
made the compilation step more rigid: If a dynamic function call is
found, and if mixing updates is enabled, the function will now always
be tagged as an updating function. Feel free to check out the latest
stable snapshot [1].

If you want to stick with 9.3.2, you can prefix the function lookup
call with an additional "updating" keyword:

  updating fn:function-lookup(fn:QName('urn:ns:impl2', 'store'), 1)($resource)

Cari saluti all’ Italia; our hearts are with you,
Christian

[1] http://files.basex.org/releases/latest/



> Hi all,
> In order to have a "dynamic backend” implementation to REST-XQ invocations we 
> used the fn:function-lookup() function but we are experiencing a strange 
> behavior if the target looked-up function is an updating function that for 
> example adds a resource to a database.
>
> Putting the provided files in the webapp folder of a fresh new BaseX latest 
> (9.3.2) downloaded instance (adding "MIXUPDATES = true” setting in the .basex 
> and with a database named DB already created inside) and invoking the REST 
> function that calls directly the backend function with [1] the resource is 
> correctly created on the database, by invoking the REST function that 
> performs the dynamic lookup by using the fn:function-lookup() with [2] it 
> completes without errors but the resource is not added to the DB.
> In our implementation the implementation modules are stored in the repo but 
> the same behavior can be reproduced with “direct” function call as per my 
> example.
>
> Is there any limitation in the looked-up function related to updating 
> databases?
>
> Regards,
>
> Mauro
>
> [1]
> http:send-request(
>   
> 
>   ,
>   "http://localhost:8984/store;,
>   
> )
> [2]
> http:send-request(
>   
> 
>   ,
>   "http://localhost:8984/dyn-store;,
>   
> )
>


[basex-talk] Use of function-lookup in REST-XQ environment

2020-03-31 Thread Mauro Mugnaini
Hi all,In order to have a "dynamic backend” implementation to REST-XQ invocations we used the fn:function-lookup() function but we are experiencing a strange behavior if the target looked-up function is an updating function that for example adds a resource to a database.Putting the provided files in the webapp folder of a fresh new BaseX latest (9.3.2) downloaded instance (adding "MIXUPDATES = true” setting in the .basex and with a database named DB already created inside) and invoking the REST function that calls directly the backend function with [1] the resource is correctly created on the database, by invoking the REST function that performs the dynamic lookup by using the fn:function-lookup() with [2] it completes without errors but the resource is not added to the DB.In our implementation the implementation modules are stored in the repo but the same behavior can be reproduced with “direct” function call as per my example.Is there any limitation in the looked-up function related to updating databases?Regards,Mauro[1]http:send-request(        ,  "http://localhost:8984/store",  )[2]http:send-request(        ,  "http://localhost:8984/dyn-store",  )

impl2.xqm
Description: Binary data


restxq.xqm
Description: Binary data