Re: [basex-talk] Global lock = false and parallel update processes to different DBs
Using restxq. I was hoping to speed things up with parallel processing :-). We are using some new indices to speed things up and more can be done. The issue main issue with that we process a lot of files and there are multiple levels of processing: 1- Apply 1st level 2- Save to db 3- Apply 2nd level 4- Save to db 5- Apply 3rd level Why we work by level is to be able to search content after it's been processed in a level. So we need indices to be refreshed. For each level I apply everything I can before I need to re-indexing. The levels look something like that (with some variations): 1- Add ids to all elements (content coming from authors through webdav doesn't always have all the required ids) 2- Aggregate content for a publication... That means resolving references recursively until all the pieces that create a larger publication are aggregated 3- Filter out content that doesn't apply to the current configuration (done after aggregation because we may use the same aggregate for multiple filter combination - for example we may have a publication for 2 similar products where the same content is used but a few lines here and there are different... Getting the same publication out for 2 different OS version would be a good example. Same content, tiny differences here and there.) 4- Apply transformation to filtered aggregate (to one or more formats: HTML, PDF, csv, rss all or whatever is needed) If I am outputting the publication in HTML and PDF for 26 of the 52 languages, I was hoping to be able to apply filter and aggregates on the 26 dbs pairs (base + staging) at once. Maybe I need 26 instances of BaseX where each instance has a lang... Then my js could call each instance individually. That's a lot of ports... and also again... not easy for clients to just add a language. If it means parallel processing, it may be worth it. Then I'd need to figure out handling processes that use more than one instance of BaseX... like the translation processes. A lot of files would need to go through outside of baseX thought the .js. I might need a node.js layer. I can't imagine the .js client doing all the work... So far the client was pretty light, so the controlling was split between .js and .xqm. I though moving the lang loop outside of the .xqm would mean parallel processing just because each call to the .xqm function would be separate, each with their own $lang. As you know, that didn't do it. Oupsy. Optimizing performance is key for us at this point... so any clue is welcomed. The 2 most time intensive processes: creating the aggregates and transforming files to XLIFF for translation. what these process have in common... If I can stop holding the dbs when these run, I'm good. I'm even considering processing all the small outputs to the file system and then import the result back once the process is over. Most operations would become read-only as far as BaseX is concerned... not my favorite approach, but it might do the trick... On Wed, Feb 6, 2019 at 9:24 AM Christian Grün wrote: > Hi France, > > I agree that duplicating the same code more than once is not a good > idea. I surely know too little about your use case, as I guessed you > were sending custom query strings to BaseX via one of our APIs. Are > you using REST oder RESTXQ? > > It seems that your current update operation is pretty costly. Do you > think there are chances to speed it up? > > Best, > Christian > > > > > > On Wed, Feb 6, 2019 at 9:12 AM France Baril > wrote: > > > > Irsh, we have 52 languages and all our system is based on being able to > work with any language and let clients add/remove languages without having > to call developers. I can't imagine the domino effect of having to build a > shell function per language per process that access the DB. > > > > Plus as we are running batch processes, I think we'll just run out of > memory. > > > > I'm thinking one function like this per language is what you propose : > > > > rest-path /base/filter-es-us() > > function filter-es-us { > > let $src-db = db:open(es-us) > > let $results := apply-non-updating-processes($src-db)... where result > is a map of (filename, xml) > > return > > for $result in $results > > return db:replace('staging-es-us', $results) > > }; > > > > apply-non-updating-processes($src-db){ > > map:merge( > > for $file in $src-db/* > > res= do x > > return map:entry ($file/base-uri, res) > > }; > > > > > > Since we run batch processes I'm also thinking we'll run out of memory > with processes like that... or maybe we need to split also small functions > so each tiny update is in its own function... then maintaiing functions for > 52 languages becomes even harder... or I add an extra layer of abstraction > and build the .xqm functions dynamically based on a central code base and > the dynamic language names... hmmm > > > > I'm thinking out loud here trying to find my way outside of dynamic > names... but static naming of databases doesn't sound like a good
Re: [basex-talk] Global lock = false and parallel update processes to different DBs
Hi France, I agree that duplicating the same code more than once is not a good idea. I surely know too little about your use case, as I guessed you were sending custom query strings to BaseX via one of our APIs. Are you using REST oder RESTXQ? It seems that your current update operation is pretty costly. Do you think there are chances to speed it up? Best, Christian On Wed, Feb 6, 2019 at 9:12 AM France Baril wrote: > > Irsh, we have 52 languages and all our system is based on being able to work > with any language and let clients add/remove languages without having to call > developers. I can't imagine the domino effect of having to build a shell > function per language per process that access the DB. > > Plus as we are running batch processes, I think we'll just run out of memory. > > I'm thinking one function like this per language is what you propose : > > rest-path /base/filter-es-us() > function filter-es-us { > let $src-db = db:open(es-us) > let $results := apply-non-updating-processes($src-db)... where result is a > map of (filename, xml) > return > for $result in $results > return db:replace('staging-es-us', $results) > }; > > apply-non-updating-processes($src-db){ > map:merge( > for $file in $src-db/* > res= do x > return map:entry ($file/base-uri, res) > }; > > > Since we run batch processes I'm also thinking we'll run out of memory with > processes like that... or maybe we need to split also small functions so each > tiny update is in its own function... then maintaiing functions for 52 > languages becomes even harder... or I add an extra layer of abstraction and > build the .xqm functions dynamically based on a central code base and the > dynamic language names... hmmm > > I'm thinking out loud here trying to find my way outside of dynamic names... > but static naming of databases doesn't sound like a good idea in our case. > Dynamic naming is at the core of our approach... or maybe I'm so laced in it > that I can't see the easy way in? > > > > > > On Mon, Feb 4, 2019 at 11:46 AM Christian Grün > wrote: >> >> Hi France, >> >> > I noticed that the latest version of BaseX lost this feature and nothing >> > seems to replace it. I'm trying to improve performance of batch processes >> > and I was counting on that feature a lot. Any change it will come back or >> > that something equivalent will come? >> >> With BaseX 9, we removed the classical GLOBALLOCK option (i.e., >> GLOBALLOCK = false is standard now). >> >> > get db:open($lang)/* >> > process >> > save to db:open('staging-' || $lang) >> >> The name of your database may be specified as static string in your >> query (no matter if you use BaseX 8 or 9): >> >> get db:open('de')/* >> process >> save to db:open('staging-de') >> >> Did you try this already? >> Christian > > > > -- > France Baril > Architecte documentaire / Documentation architect > france.ba...@architextus.com
Re: [basex-talk] Global lock = false and parallel update processes to different DBs
Hi France, I recall once I've been successful in generating xquery strings by patching the database name into it and then processing it with xquery:eval. Might this be anche approach for you? M. Il giorno mer 6 feb 2019, 09:12 France Baril ha scritto: > Irsh, we have 52 languages and all our system is based on being able to > work with any language and let clients add/remove languages without having > to call developers. I can't imagine the domino effect of having to build a > shell function per language per process that access the DB. > > Plus as we are running batch processes, I think we'll just run out of > memory. > > I'm thinking one function like this per language is what you propose : > > rest-path /base/filter-es-us() > function filter-es-us { > let $src-db = db:open(es-us) > let $results := apply-non-updating-processes($src-db)... where result is > a map of (filename, xml) > return > for $result in $results > return db:replace('staging-es-us', $results) > }; > > apply-non-updating-processes($src-db){ > map:merge( > for $file in $src-db/* > res= do x > return map:entry ($file/base-uri, res) > }; > > > Since we run batch processes I'm also thinking we'll run out of memory > with processes like that... or maybe we need to split also small functions > so each tiny update is in its own function... then maintaiing functions for > 52 languages becomes even harder... or I add an extra layer of abstraction > and build the .xqm functions dynamically based on a central code base and > the dynamic language names... hmmm > > I'm thinking out loud here trying to find my way outside of dynamic > names... but static naming of databases doesn't sound like a good idea in > our case. Dynamic naming is at the core of our approach... or maybe I'm so > laced in it that I can't see the easy way in? > > > > > > On Mon, Feb 4, 2019 at 11:46 AM Christian Grün > wrote: > >> Hi France, >> >> > I noticed that the latest version of BaseX lost this feature and >> nothing seems to replace it. I'm trying to improve performance of batch >> processes and I was counting on that feature a lot. Any change it will come >> back or that something equivalent will come? >> >> With BaseX 9, we removed the classical GLOBALLOCK option (i.e., >> GLOBALLOCK = false is standard now). >> >> > get db:open($lang)/* >> > process >> > save to db:open('staging-' || $lang) >> >> The name of your database may be specified as static string in your >> query (no matter if you use BaseX 8 or 9): >> >> get db:open('de')/* >> process >> save to db:open('staging-de') >> >> Did you try this already? >> Christian >> > > > -- > France Baril > Architecte documentaire / Documentation architect > france.ba...@architextus.com >
Re: [basex-talk] Global lock = false and parallel update processes to different DBs
Irsh, we have 52 languages and all our system is based on being able to work with any language and let clients add/remove languages without having to call developers. I can't imagine the domino effect of having to build a shell function per language per process that access the DB. Plus as we are running batch processes, I think we'll just run out of memory. I'm thinking one function like this per language is what you propose : rest-path /base/filter-es-us() function filter-es-us { let $src-db = db:open(es-us) let $results := apply-non-updating-processes($src-db)... where result is a map of (filename, xml) return for $result in $results return db:replace('staging-es-us', $results) }; apply-non-updating-processes($src-db){ map:merge( for $file in $src-db/* res= do x return map:entry ($file/base-uri, res) }; Since we run batch processes I'm also thinking we'll run out of memory with processes like that... or maybe we need to split also small functions so each tiny update is in its own function... then maintaiing functions for 52 languages becomes even harder... or I add an extra layer of abstraction and build the .xqm functions dynamically based on a central code base and the dynamic language names... hmmm I'm thinking out loud here trying to find my way outside of dynamic names... but static naming of databases doesn't sound like a good idea in our case. Dynamic naming is at the core of our approach... or maybe I'm so laced in it that I can't see the easy way in? On Mon, Feb 4, 2019 at 11:46 AM Christian Grün wrote: > Hi France, > > > I noticed that the latest version of BaseX lost this feature and nothing > seems to replace it. I'm trying to improve performance of batch processes > and I was counting on that feature a lot. Any change it will come back or > that something equivalent will come? > > With BaseX 9, we removed the classical GLOBALLOCK option (i.e., > GLOBALLOCK = false is standard now). > > > get db:open($lang)/* > > process > > save to db:open('staging-' || $lang) > > The name of your database may be specified as static string in your > query (no matter if you use BaseX 8 or 9): > > get db:open('de')/* > process > save to db:open('staging-de') > > Did you try this already? > Christian > -- France Baril Architecte documentaire / Documentation architect france.ba...@architextus.com
Re: [basex-talk] Global lock = false and parallel update processes to different DBs
Hi France, > I noticed that the latest version of BaseX lost this feature and nothing > seems to replace it. I'm trying to improve performance of batch processes and > I was counting on that feature a lot. Any change it will come back or that > something equivalent will come? With BaseX 9, we removed the classical GLOBALLOCK option (i.e., GLOBALLOCK = false is standard now). > get db:open($lang)/* > process > save to db:open('staging-' || $lang) The name of your database may be specified as static string in your query (no matter if you use BaseX 8 or 9): get db:open('de')/* process save to db:open('staging-de') Did you try this already? Christian
[basex-talk] Global lock = false and parallel update processes to different DBs
Hi! I noticed that the latest version of BaseX lost this feature and nothing seems to replace it. I'm trying to improve performance of batch processes and I was counting on that feature a lot. Any change it will come back or that something equivalent will come? Case: .js: For $lang in $langs call basex updating function: . xqm updating function: get db:open($lang)/* process save to db:open('staging-' || $lang) This is a case where there is no chance of conflict because each thread saves to a different db. Any suggestion on how to apply parallel processing for this case is welcome. -- France Baril Architecte documentaire / Documentation architect france.ba...@architextus.com