Re: [basex-talk] Global lock = false and parallel update processes to different DBs

2019-02-06 Thread France Baril
Using restxq. I was hoping to speed things up with parallel processing :-).

We are using some new indices to speed things up and more can be done. The
issue main issue with that we process a lot of files and there are multiple
levels of processing:

1- Apply 1st level
2- Save to db
3- Apply 2nd level
4- Save to db
5- Apply 3rd level

Why we work by level is to be able to search content after it's been
processed in a level. So we need indices to be refreshed. For each level I
apply everything I can before I need to re-indexing.

The levels look something like that (with some variations):
1- Add ids to all elements (content coming from authors through webdav
doesn't always have all the required ids)
2- Aggregate content for a publication... That means resolving references
recursively until all the pieces that create a larger publication are
aggregated
3- Filter out content that doesn't apply to the current configuration (done
after aggregation because we may use the same aggregate for multiple filter
combination - for example we may have a publication for 2 similar products
where the same content is used but a few lines here and there are
different... Getting the same publication out for 2 different OS version
would be a good example. Same content, tiny differences here and there.)
4- Apply transformation to filtered aggregate (to one or more formats:
HTML, PDF, csv, rss all or whatever is needed)

If I am outputting the publication in HTML and PDF for 26 of the 52
languages, I was hoping to be able to apply filter and aggregates on the 26
dbs pairs (base + staging) at once. Maybe I need 26 instances of BaseX
where each instance has a lang... Then my js could call each instance
individually. That's a lot of ports... and also again... not easy for
clients to just add a language. If it means parallel processing, it may be
worth it.

Then I'd need to figure out handling processes that use more than one
instance of BaseX... like the translation processes. A lot of files would
need to go through outside of baseX thought the .js. I might need a node.js
layer. I can't imagine the .js client doing all the work... So far the
client was pretty light, so the controlling was split between .js and .xqm.
I though moving the lang loop outside of the .xqm would mean parallel
processing just because each call to the .xqm function would be separate,
each with their own $lang. As you know, that didn't do it. Oupsy.

Optimizing performance is key for us at this point... so any clue is
welcomed.

The 2 most time intensive processes: creating the aggregates and
transforming files to XLIFF for translation. what these process have in
common... If I can stop holding the dbs when these run, I'm good.

I'm even considering processing all the small outputs to the file system
and then import the result back once the process is over. Most operations
would become read-only as far as BaseX is concerned... not my favorite
approach, but it might do the trick...





On Wed, Feb 6, 2019 at 9:24 AM Christian Grün 
wrote:

> Hi France,
>
> I agree that duplicating the same code more than once is not a good
> idea. I surely know too little about your use case, as I guessed you
> were sending custom query strings to BaseX via one of our APIs. Are
> you using REST oder RESTXQ?
>
> It seems that your current update operation is pretty costly. Do you
> think there are chances to speed it up?
>
> Best,
> Christian
>
>
>
>
>
> On Wed, Feb 6, 2019 at 9:12 AM France Baril
>  wrote:
> >
> > Irsh, we have 52 languages and all our system is based on being able to
> work with any language and let clients add/remove languages without having
> to call developers. I can't imagine the domino effect of having to build a
> shell function per language per process that access the DB.
> >
> > Plus as we are running batch processes, I think we'll just run out of
> memory.
> >
> > I'm thinking one function like this per language is what you propose :
> >
> > rest-path /base/filter-es-us()
> > function filter-es-us {
> > let $src-db = db:open(es-us)
> > let $results := apply-non-updating-processes($src-db)...  where result
> is a map of (filename, xml)
> > return
> >   for $result in $results
> >   return  db:replace('staging-es-us', $results)
> > };
> >
> > apply-non-updating-processes($src-db){
> > map:merge(
> >  for $file in $src-db/*
> >  res= do x
> >  return map:entry ($file/base-uri, res)
> > };
> >
> >
> > Since we run batch processes I'm also thinking we'll run out of memory
> with processes like that... or maybe we need to split also small functions
> so each tiny update is in its own function... then maintaiing functions for
> 52 languages becomes even harder... or I add an extra layer of abstraction
> and build the .xqm functions dynamically based on a central code base and
> the dynamic language names... hmmm
> >
> > I'm thinking out loud here trying to find my way outside of dynamic
> names... but static naming of databases doesn't sound like a 

Re: [basex-talk] Global lock = false and parallel update processes to different DBs

2019-02-06 Thread Christian Grün
Hi France,

I agree that duplicating the same code more than once is not a good
idea. I surely know too little about your use case, as I guessed you
were sending custom query strings to BaseX via one of our APIs. Are
you using REST oder RESTXQ?

It seems that your current update operation is pretty costly. Do you
think there are chances to speed it up?

Best,
Christian





On Wed, Feb 6, 2019 at 9:12 AM France Baril
 wrote:
>
> Irsh, we have 52 languages and all our system is based on being able to work 
> with any language and let clients add/remove languages without having to call 
> developers. I can't imagine the domino effect of having to build a shell 
> function per language per process that access the DB.
>
> Plus as we are running batch processes, I think we'll just run out of memory.
>
> I'm thinking one function like this per language is what you propose :
>
> rest-path /base/filter-es-us()
> function filter-es-us {
> let $src-db = db:open(es-us)
> let $results := apply-non-updating-processes($src-db)...  where result is a 
> map of (filename, xml)
> return
>   for $result in $results
>   return  db:replace('staging-es-us', $results)
> };
>
> apply-non-updating-processes($src-db){
> map:merge(
>  for $file in $src-db/*
>  res= do x
>  return map:entry ($file/base-uri, res)
> };
>
>
> Since we run batch processes I'm also thinking we'll run out of memory with 
> processes like that... or maybe we need to split also small functions so each 
> tiny update is in its own function... then maintaiing functions for 52 
> languages becomes even harder... or I add an extra layer of abstraction and 
> build the .xqm functions dynamically based on a central code base and the 
> dynamic language names... hmmm
>
> I'm thinking out loud here trying to find my way outside of dynamic names... 
> but static naming of databases doesn't sound like a good idea in our case. 
> Dynamic naming is at the core of our approach... or maybe I'm so laced in it 
> that I can't see the easy way in?
>
>
>
>
>
> On Mon, Feb 4, 2019 at 11:46 AM Christian Grün  
> wrote:
>>
>> Hi France,
>>
>> > I noticed that the latest version of BaseX lost this feature and nothing 
>> > seems to replace it. I'm trying to improve performance of batch processes 
>> > and I was counting on that feature a lot. Any change it will come back or 
>> > that something equivalent will come?
>>
>> With BaseX 9, we removed the classical GLOBALLOCK option (i.e.,
>> GLOBALLOCK = false is standard now).
>>
>> > get db:open($lang)/*
>> > process
>> > save to db:open('staging-' || $lang)
>>
>> The name of your database may be specified as static string in your
>> query (no matter if you use BaseX 8 or 9):
>>
>>   get db:open('de')/*
>>   process
>>   save to db:open('staging-de')
>>
>> Did you try this already?
>> Christian
>
>
>
> --
> France Baril
> Architecte documentaire / Documentation architect
> france.ba...@architextus.com


Re: [basex-talk] Global lock = false and parallel update processes to different DBs

2019-02-06 Thread Marco Lettere
Hi France,
I recall once I've been successful in generating xquery strings by patching
the database name into it and then processing it with xquery:eval.
Might this be anche approach for you?
M.

Il giorno mer 6 feb 2019, 09:12 France Baril 
ha scritto:

> Irsh, we have 52 languages and all our system is based on being able to
> work with any language and let clients add/remove languages without having
> to call developers. I can't imagine the domino effect of having to build a
> shell function per language per process that access the DB.
>
> Plus as we are running batch processes, I think we'll just run out of
> memory.
>
> I'm thinking one function like this per language is what you propose :
>
> rest-path /base/filter-es-us()
> function filter-es-us {
> let $src-db = db:open(es-us)
> let $results := apply-non-updating-processes($src-db)...  where result is
> a map of (filename, xml)
> return
>   for $result in $results
>   return  db:replace('staging-es-us', $results)
> };
>
> apply-non-updating-processes($src-db){
> map:merge(
>  for $file in $src-db/*
>  res= do x
>  return map:entry ($file/base-uri, res)
> };
>
>
> Since we run batch processes I'm also thinking we'll run out of memory
> with processes like that... or maybe we need to split also small functions
> so each tiny update is in its own function... then maintaiing functions for
> 52 languages becomes even harder... or I add an extra layer of abstraction
> and build the .xqm functions dynamically based on a central code base and
> the dynamic language names... hmmm
>
> I'm thinking out loud here trying to find my way outside of dynamic
> names... but static naming of databases doesn't sound like a good idea in
> our case. Dynamic naming is at the core of our approach... or maybe I'm so
> laced in it that I can't see the easy way in?
>
>
>
>
>
> On Mon, Feb 4, 2019 at 11:46 AM Christian Grün 
> wrote:
>
>> Hi France,
>>
>> > I noticed that the latest version of BaseX lost this feature and
>> nothing seems to replace it. I'm trying to improve performance of batch
>> processes and I was counting on that feature a lot. Any change it will come
>> back or that something equivalent will come?
>>
>> With BaseX 9, we removed the classical GLOBALLOCK option (i.e.,
>> GLOBALLOCK = false is standard now).
>>
>> > get db:open($lang)/*
>> > process
>> > save to db:open('staging-' || $lang)
>>
>> The name of your database may be specified as static string in your
>> query (no matter if you use BaseX 8 or 9):
>>
>>   get db:open('de')/*
>>   process
>>   save to db:open('staging-de')
>>
>> Did you try this already?
>> Christian
>>
>
>
> --
> France Baril
> Architecte documentaire / Documentation architect
> france.ba...@architextus.com
>


Re: [basex-talk] Global lock = false and parallel update processes to different DBs

2019-02-06 Thread France Baril
Irsh, we have 52 languages and all our system is based on being able to
work with any language and let clients add/remove languages without having
to call developers. I can't imagine the domino effect of having to build a
shell function per language per process that access the DB.

Plus as we are running batch processes, I think we'll just run out of
memory.

I'm thinking one function like this per language is what you propose :

rest-path /base/filter-es-us()
function filter-es-us {
let $src-db = db:open(es-us)
let $results := apply-non-updating-processes($src-db)...  where result is a
map of (filename, xml)
return
  for $result in $results
  return  db:replace('staging-es-us', $results)
};

apply-non-updating-processes($src-db){
map:merge(
 for $file in $src-db/*
 res= do x
 return map:entry ($file/base-uri, res)
};


Since we run batch processes I'm also thinking we'll run out of memory with
processes like that... or maybe we need to split also small functions so
each tiny update is in its own function... then maintaiing functions for 52
languages becomes even harder... or I add an extra layer of abstraction and
build the .xqm functions dynamically based on a central code base and the
dynamic language names... hmmm

I'm thinking out loud here trying to find my way outside of dynamic
names... but static naming of databases doesn't sound like a good idea in
our case. Dynamic naming is at the core of our approach... or maybe I'm so
laced in it that I can't see the easy way in?





On Mon, Feb 4, 2019 at 11:46 AM Christian Grün 
wrote:

> Hi France,
>
> > I noticed that the latest version of BaseX lost this feature and nothing
> seems to replace it. I'm trying to improve performance of batch processes
> and I was counting on that feature a lot. Any change it will come back or
> that something equivalent will come?
>
> With BaseX 9, we removed the classical GLOBALLOCK option (i.e.,
> GLOBALLOCK = false is standard now).
>
> > get db:open($lang)/*
> > process
> > save to db:open('staging-' || $lang)
>
> The name of your database may be specified as static string in your
> query (no matter if you use BaseX 8 or 9):
>
>   get db:open('de')/*
>   process
>   save to db:open('staging-de')
>
> Did you try this already?
> Christian
>


-- 
France Baril
Architecte documentaire / Documentation architect
france.ba...@architextus.com


Re: [basex-talk] Global lock = false and parallel update processes to different DBs

2019-02-04 Thread Christian Grün
Hi France,

> I noticed that the latest version of BaseX lost this feature and nothing 
> seems to replace it. I'm trying to improve performance of batch processes and 
> I was counting on that feature a lot. Any change it will come back or that 
> something equivalent will come?

With BaseX 9, we removed the classical GLOBALLOCK option (i.e.,
GLOBALLOCK = false is standard now).

> get db:open($lang)/*
> process
> save to db:open('staging-' || $lang)

The name of your database may be specified as static string in your
query (no matter if you use BaseX 8 or 9):

  get db:open('de')/*
  process
  save to db:open('staging-de')

Did you try this already?
Christian