Re: [basex-talk] Too aggressive optimizations?

2023-02-07 Thread Marco Lettere

Dear Christian,

I have to admit that all what you say makes sense and I was even digging 
in my swapped long term memory for something like the annotation you 
mention.


The background information you share makes me feel more comfortable and 
I'll take your advice.


Thanks!

Marco.

On 07/02/23 19:22, Christian Grün wrote:

Hi Marco,

let $ops := (
for $i in (1 to 5)
let $url := "http://www.google.com;
return function(){
(file:write(file:create-temp-file("ttt",string($i)),fetch:content-type($url)))
}
)
let $download := xquery:fork-join($ops)
return count($ops)


I've noticed often the archive arrives empty. So after investigation I've found that 
query [1] isnon predictable. It is often optimized to "count(0)".

That’s surprising, and I didn’t manage to reproduce it. count($pos)
should consistently yield 5, as the number of (non-executed) function
items attached to $ops is 5, regardless of what is supposed to happen
before or after the variable declaration. Maybe you wrote
"count($download)" or something similar? Is there any other way to get
it reproduced?

However, I can confirm that xquery:fork-join($ops) is not evaluated,
as its result, which is bound to $download, is never referenced.
Here’s a better way to write it:

let $ops := (... for $i in ...)
return (
   xquery:fork-join($ops),
   count($pos)
)

Another solution to enforce the evaluation of the function call is the
basex:non-deterministic pragma [1]…

let $ops := (... for $i in ...)
let $download := (# basex:non-deterministic #) { xquery:fork-join($ops) }
return count($ops)

…but in general, it’s better to get rid of unreferenced variables in
the code whenever possible.

Some background noise: Non-deterministic and side-effecting functions
carve out a niche existence in the official W3 standards, as they
contradict the nature of functional languages. It’s tricky for the
optimizer to treat them properly: Function items are deterministic,
but when they are evaluated, they may trigger side effects.
Deterministic code that seems irrelevant is removed from the original
query whenever possible, so solutions to circumvent this are to either
wrap the expression with the pragma (thus, annotating it as
non-deterministic), or by moving it to a result sequence.

And a monologic side note: Maybe we should internally annotate
xquery:fork-join as non-deterministic. Even if it may contain purely
deterministic code, it’s almost always used for non-deterministic
operations in practice.

Hope this helps,
Christian

[1] https://docs.basex.org/wiki/XQuery_Extensions


Re: [basex-talk] Too aggressive optimizations?

2023-02-07 Thread Christian Grün
Hi Marco,

let $ops := (
   for $i in (1 to 5)
   let $url := "http://www.google.com;
   return function(){
(file:write(file:create-temp-file("ttt",string($i)),fetch:content-type($url)))
   }
)
let $download := xquery:fork-join($ops)
return count($ops)

> I've noticed often the archive arrives empty. So after investigation I've 
> found that query [1] isnon predictable. It is often optimized to "count(0)".

That’s surprising, and I didn’t manage to reproduce it. count($pos)
should consistently yield 5, as the number of (non-executed) function
items attached to $ops is 5, regardless of what is supposed to happen
before or after the variable declaration. Maybe you wrote
"count($download)" or something similar? Is there any other way to get
it reproduced?

However, I can confirm that xquery:fork-join($ops) is not evaluated,
as its result, which is bound to $download, is never referenced.
Here’s a better way to write it:

let $ops := (... for $i in ...)
return (
  xquery:fork-join($ops),
  count($pos)
)

Another solution to enforce the evaluation of the function call is the
basex:non-deterministic pragma [1]…

let $ops := (... for $i in ...)
let $download := (# basex:non-deterministic #) { xquery:fork-join($ops) }
return count($ops)

…but in general, it’s better to get rid of unreferenced variables in
the code whenever possible.

Some background noise: Non-deterministic and side-effecting functions
carve out a niche existence in the official W3 standards, as they
contradict the nature of functional languages. It’s tricky for the
optimizer to treat them properly: Function items are deterministic,
but when they are evaluated, they may trigger side effects.
Deterministic code that seems irrelevant is removed from the original
query whenever possible, so solutions to circumvent this are to either
wrap the expression with the pragma (thus, annotating it as
non-deterministic), or by moving it to a result sequence.

And a monologic side note: Maybe we should internally annotate
xquery:fork-join as non-deterministic. Even if it may contain purely
deterministic code, it’s almost always used for non-deterministic
operations in practice.

Hope this helps,
Christian

[1] https://docs.basex.org/wiki/XQuery_Extensions


[basex-talk] Too aggressive optimizations?

2023-02-07 Thread Marco Lettere

Dear all,

my scenario is a RestXQ:

- download resources and store them in temporary directory.

- do it with fork-join in order to obtain smaller latency

- compress to zip archive and return the archive data.

I've noticed often the archive arrives empty. So after investigation 
I've found that query [1] isnon predictable. It is often optimized to 
"count(0)".


I can manage to produce results from time to time but not consistently 
with [2].


[3] Seems the safer solution.

The behavior is the same with 9.x and 10.

Since I do not feel very comfortable, is there someone who can tell me 
if I'm doing it wrong or if there is a secure solution or if I should 
abandon fork-join tout-court?


Thanks a lot.

Regards,

Marco.

[1]

let $ops := (
  for $i in (1 to 5)
  let $url := "http://www.google.com;
  return function(){
(file:write(file:create-temp-file("ttt",string($i)),fetch:content-type($url)))
  }
)
let $download := xquery:fork-join($ops)
return count($ops)

[2]

let $ops := (
  for $i in (1 to 5)
  let $url := "http://www.google.com;
  return function(){
(file:write(file:create-temp-file("ttt",string($i)),fetch:content-type($url)),1)
  }
)
let $download := xquery:fork-join($ops)
return count($ops)

[3]

let $ops := xquery:fork-join(
  for $i in (1 to 5)
  let $url := "http://www.google.com;
  return function(){
    (1, 
file:write(file:create-temp-file("ttt",string($i)),fetch:content-type($url)))

  }
)
return count($ops)