Re: [basex-talk] Creating backups manually
Perfect, thank you, Christian! On Wed, Jan 3, 2024 at 11:25 AM Christian Grün wrote: > Hi Matt, > > Your assumption is correct: Database backups are nothing else than zipped > archives of the corresponding database subdirectory located in the 'data' > directory. > > If you create backups on your own, you should ensure that no updates are > currently running during your update operation. If you use CREATE BACKUP or > db:create-backup, BaseX will take care of that. > > Hope this helps, > Christian > > > > I'm interested in creating backups manually so I can use a different >> compression algorithm. Based on the source code[1], it looks like backups >> are just created by adding each file (excluding upd.basex) in the >> database directory to a .zip file, so I could do the same using tar and >> my compression algorithm of choice. Is my understanding correct or am I >> missing some other logic? >> >> Thanks, >> Matt >> >> [1] >> https://github.com/BaseXdb/basex/blob/main/basex-core/src/main/java/org/basex/core/cmd/CreateBackup.java#L90-L119 >> >
[basex-talk] Creating backups manually
Hi all, I'm interested in creating backups manually so I can use a different compression algorithm. Based on the source code[1], it looks like backups are just created by adding each file (excluding upd.basex) in the database directory to a .zip file, so I could do the same using tar and my compression algorithm of choice. Is my understanding correct or am I missing some other logic? Thanks, Matt [1] https://github.com/BaseXdb/basex/blob/main/basex-core/src/main/java/org/basex/core/cmd/CreateBackup.java#L90-L119
Re: [basex-talk] Text index on uncommon node
Hi Christian, Good thinking! Indeed, 'false' appears 7.25 million times in the database. I'll look into selective indexing, thanks for the link, and thanks again for the help -- this gives me a much better idea of how the text index works in general. Best, Matt On Mon, Oct 30, 2023 at 5:23 PM Christian Grün wrote: > Hi Matt, > > I assume the culprit is the common string you're looking up. It probably > occurs very often in your database. You can e.g. verify this via > index:texts('MyDatabase', 'false') or count(db:get('MyDatabase')//text()[. > = 'false']). > > If you don't need to perform exact queries on arbitrary elements, you > could think about restricting the text index to specific element names to > reduce the number of intermediate hits [1]. > > Hope this helps, > Christian > > [1] https://docs.basex.org/wiki/Indexes#Selective_Indexing > > > Matthew Dziuban schrieb am Mo., 30. Okt. 2023, > 22:09: > >> Hi Christian, >> >> Thanks for the quick response, and sure thing. It does look like the text >> index is applied in both cases. While I was writing up an example of the >> slow query, I realized that the way I'll actually be querying it is by >> wrapping the condition in not(...). After doing so, it now only takes 6 >> seconds to run -- still slower but better. The slow query looks like this: >> >> for $x in db:open('MyDatabase')/data/element >> where not($x/child1/child2/valid = 'false') >> return $x >> >> And the fast query looks like this: >> >> for $x in db:open('MyDatabase')/data/element >> where $x/child3/child4/id = '123' >> return $x >> >> Let me know if this is helpful -- if not, I could share more info in a >> direct email about the actual structure of the database and the Info output. >> >> Thanks, >> Matt >> >> On Mon, Oct 30, 2023 at 4:40 PM Christian Grün >> wrote: >> >>> Hi Matt, >>> >>> In general, all nodes are treated identically, no matter what the >>> hierarchy is or regular the target path is. >>> >>> Could you share some more information with us? How do the queries look >>> like (the slow and the fast one)? Is the text index applied in both cases? >>> >>> Thanks in advance, >>> Christian >>> >>> >>> >>> Matthew Dziuban schrieb am Mo., 30. Okt. 2023, >>> 21:33: >>> >>>> Hi all, >>>> >>>> I'm working with a database structured like so: >>>> >>>> >>>> ... >>>> ... >>>> ... >>>> >>>> >>>> There are a total of about 1.5 million nodes in the database. >>>> Each has many child nodes, one of which is uncommon -- it only >>>> appears in 727 s. >>>> >>>> I'm writing a query that has a condition on this uncommon field, but >>>> the query takes about 20 seconds to run, whereas another with a condition >>>> on a child node that appears in every only takes about 20 >>>> milliseconds to run. >>>> >>>> Based on the Info in the GUI, it does appear that the text index is >>>> being used -- I see 'apply text index for "..."'. Is it expected that the >>>> query time would be this much longer? Is the text index somehow built >>>> differently for nodes that don't appear often in the database? >>>> >>>> Thanks in advance, >>>> Matt >>>> >>> >> >> -- >> Matthew R. Dziuban >> mattdziuban.com >> 703-973-6717 >> mrdziu...@gmail.com >> > -- Matthew R. Dziuban mattdziuban.com 703-973-6717 mrdziu...@gmail.com
Re: [basex-talk] Text index on uncommon node
Hi Christian, Thanks for the quick response, and sure thing. It does look like the text index is applied in both cases. While I was writing up an example of the slow query, I realized that the way I'll actually be querying it is by wrapping the condition in not(...). After doing so, it now only takes 6 seconds to run -- still slower but better. The slow query looks like this: for $x in db:open('MyDatabase')/data/element where not($x/child1/child2/valid = 'false') return $x And the fast query looks like this: for $x in db:open('MyDatabase')/data/element where $x/child3/child4/id = '123' return $x Let me know if this is helpful -- if not, I could share more info in a direct email about the actual structure of the database and the Info output. Thanks, Matt On Mon, Oct 30, 2023 at 4:40 PM Christian Grün wrote: > Hi Matt, > > In general, all nodes are treated identically, no matter what the > hierarchy is or regular the target path is. > > Could you share some more information with us? How do the queries look > like (the slow and the fast one)? Is the text index applied in both cases? > > Thanks in advance, > Christian > > > > Matthew Dziuban schrieb am Mo., 30. Okt. 2023, > 21:33: > >> Hi all, >> >> I'm working with a database structured like so: >> >> >> ... >> ... >> ... >> >> >> There are a total of about 1.5 million nodes in the database. >> Each has many child nodes, one of which is uncommon -- it only >> appears in 727 s. >> >> I'm writing a query that has a condition on this uncommon field, but the >> query takes about 20 seconds to run, whereas another with a condition on a >> child node that appears in every only takes about 20 milliseconds >> to run. >> >> Based on the Info in the GUI, it does appear that the text index is being >> used -- I see 'apply text index for "..."'. Is it expected that the query >> time would be this much longer? Is the text index somehow built differently >> for nodes that don't appear often in the database? >> >> Thanks in advance, >> Matt >> > -- Matthew R. Dziuban mattdziuban.com 703-973-6717 mrdziu...@gmail.com
[basex-talk] Text index on uncommon node
Hi all, I'm working with a database structured like so: ... ... ... There are a total of about 1.5 million nodes in the database. Each has many child nodes, one of which is uncommon -- it only appears in 727 s. I'm writing a query that has a condition on this uncommon field, but the query takes about 20 seconds to run, whereas another with a condition on a child node that appears in every only takes about 20 milliseconds to run. Based on the Info in the GUI, it does appear that the text index is being used -- I see 'apply text index for "..."'. Is it expected that the query time would be this much longer? Is the text index somehow built differently for nodes that don't appear often in the database? Thanks in advance, Matt
Re: [basex-talk] Support non-admin user access to database admin interface (DBA)
Hi Christian, Thanks for the feedback! If I'm reading the code correctly, my understanding was that the permissions you mentioned should already be enforced: - admin:logs() specifies Perm::ADMIN [1] - db:list() calls ctx.listDBs() which says it should return the databases for which the current user has read access [2] - job:list-details() specifies Perm::ADMIN [3] I can update my fork to disallow access to the Logs and Jobs panels, but is it an issue in the java code that the relevant permissions aren't being enforced? Thanks again, Matt [1] https://github.com/BaseXdb/basex/blob/10.7/basex-core/src/main/java/org/basex/query/func/Function.java#L871 [2] https://github.com/BaseXdb/basex/blob/10.7/basex-core/src/main/java/org/basex/core/Context.java#L283 [3] https://github.com/BaseXdb/basex/blob/10.7/basex-core/src/main/java/org/basex/query/func/Function.java#L1534 On Wed, Aug 30, 2023 at 7:41 AM Christian Grün wrote: > Hi Matthew, > > Thanks for providing me access to your fork. I’ve done some quick tests, > and I noticed the following: > > • The Database panel should only list those databases that a particular > user has access to. > • It must not be allowed to run queries like admin:logs() unless you have > 'admin' permissions. More generally, the permissions used for running > queries must not be more powerful than those of the current user. > • The Jobs panel must be limited to Admin users; at least that’s how our > current permission model is designed (the current solution could possibly > be enhanced, such that users with fewer permissions could see their own > jobs). > > You can either try the BaseX client to find out what users with fewer > permissions are allowed to do, or you can look into the code [1]. > > Hope this helps; feel free to ask for more details, > Christian > > [1] > https://github.com/BaseXdb/basex/blob/main/basex-core/src/main/java/org/basex/query/func/Function.java > > > > On Mon, Aug 21, 2023 at 7:34 PM Matthew Dziuban > wrote: > >> Hi all, >> >> While the subject might sound contradictory, I'm curious what you think >> about opening up the DBA code to allow non-admin users to access it and >> perform actions for which they have permissions? >> >> I currently maintain and run a fork of the DBA web app at work to make >> this possible, but I'd love to have the behavior built into BaseX if >> possible. You can view the changes I've made against BaseX 10.7 here: >> https://github.com/mblink/basex-webapp/compare/upstream-webapp...webapp-10.7 >> >> If you're open to this, I'd be happy to open a pull request with my >> changes! >> >> Thanks, >> Matt >> >> -- Matthew R. Dziuban mattdziuban.com 703-973-6717 mrdziu...@gmail.com
Re: [basex-talk] Support non-admin user access to database admin interface (DBA)
Sounds good, thanks Christian! Let me know if I can provide any more details that would be helpful. Matt On Tue, Aug 22, 2023 at 3:25 AM Christian Grün wrote: > Hi Matt, > > Providing a non-admin version of the DBA is certainly a good idea. We > mostly didn’t have time and resources to clarify what will be the > implication for the particular views. > > I’ll be happy to have a closer look at your fork next week. > > Best, > Christian > > > > Matthew Dziuban schrieb am Mo., 21. Aug. 2023, > 19:34: > >> Hi all, >> >> While the subject might sound contradictory, I'm curious what you think >> about opening up the DBA code to allow non-admin users to access it and >> perform actions for which they have permissions? >> >> I currently maintain and run a fork of the DBA web app at work to make >> this possible, but I'd love to have the behavior built into BaseX if >> possible. You can view the changes I've made against BaseX 10.7 here: >> https://github.com/mblink/basex-webapp/compare/upstream-webapp...webapp-10.7 >> >> If you're open to this, I'd be happy to open a pull request with my >> changes! >> >> Thanks, >> Matt >> >> -- Matthew R. Dziuban mattdziuban.com 703-973-6717 mrdziu...@gmail.com
[basex-talk] Support non-admin user access to database admin interface (DBA)
Hi all, While the subject might sound contradictory, I'm curious what you think about opening up the DBA code to allow non-admin users to access it and perform actions for which they have permissions? I currently maintain and run a fork of the DBA web app at work to make this possible, but I'd love to have the behavior built into BaseX if possible. You can view the changes I've made against BaseX 10.7 here: https://github.com/mblink/basex-webapp/compare/upstream-webapp...webapp-10.7 If you're open to this, I'd be happy to open a pull request with my changes! Thanks, Matt
Re: [basex-talk] Text index requires `/text()` in query
Good to know -- thanks for the help, Christian!
Re: [basex-talk] Text index requires `/text()` in query
As I was trying to come up with a simple example to reproduce it I rediscovered that the top-level element specifies an XML namespace, apologies I failed to mention that initially. Would that affect whether the index is used or not? I'm able to reproduce by loading this data into a new database named ElementsTest: http://www.w3.org/2001/XMLSchema-instance;> 1 And then running this query: for $x in db:open('ElementsTest')/data/element where $x/id = '1' return $x/id The GUI shows the following as the optimized query: db:open-pre("ElementsTest", 0)/data/element[(id = "1")]/id
Re: [basex-talk] Text index requires `/text()` in query
Hi Christian, Thanks for the quick response! That query returns the following: Out of curiosity, is there a way to see index utilization through the DBA web app or via the ClientSession java class [1] instead of the GUI? I'm using the client/server architecture so mainly run queries these ways. Best, Matt On Fri, Apr 29, 2022 at 1:52 PM Christian Grün wrote: > Hi Matthew, > > If you run your query on the following document … > > > 123 > 456 > > > … and if you look into the Info View in the GUI, you will notice that > the index will be utilized: > > Optimized Query: > db:text("data", "DatabaseName")/parent::id/parent::element > > The query optimizer detects that all “data/element/id” elements are > leaf elements (i.e., have a single text child node), and the resulting > query will be rewritten for index. > > Maybe there are “id” elements in your document that are no leaf > elements? Could you share the result of the following query with us? > > > index:facets('data')/*/element[@name='data']/element[@name='element']/element[@name='id'] > > Best, > Christian >
[basex-talk] Text index requires `/text()` in query
Hi all, I was recently debugging performance of a query with an exact string comparison and discovered that it seems the query was only rewritten to use the text index [1] if I explicitly added `/text()` to the path I was comparing. My data looks like this: 123 And my original query was: for $el in db:open('DatabaseName')/data/element where $el/id = '123' return $el With 3 million nodes in the database, this query took about 4 seconds, which made me question whether the text index was being used. I then changed the query to add `/text()` to the `where` clause, like so: for $el in db:open('DatabaseName')/data/element where $el/id/text() = '123' return $el With this change, the query only takes 0.4 seconds. Is it expected that `/text()` is required to get the text index to kick in? Thanks in advance, Matt [1] https://docs.basex.org/wiki/Indexes#Text_Index
[basex-talk] Outdated milton-api version
Hi all, While auditing the library dependencies of an application, I found that org.basex:basex-api [1] depends on a very old version of com.ettrema:milton-api -- v1.8.1.4 released in 2014 [2]. The package seems to have been migrated to the io.milton namespace and have a recently (December 2021) released v3.1.0.301 [3]. This stood out to me because my application is using the latest version of Apache commons-io, v2.11.0, while the old version of milton-api depends on Apache commons-io v1.4, released in 2008. Would it be feasible to migrate BaseX to a newer version of the milton-api library? Thanks in advance! Matt [1] https://mvnrepository.com/artifact/org.basex/basex-api/9.6.4 [2] https://mvnrepository.com/artifact/com.ettrema/milton-api/1.8.1.4 [3] https://mvnrepository.com/artifact/io.milton/milton-api/3.1.0.301
[basex-talk] org.basex.core.BaseXException with no error message
Hi all, I'm using the java ClientSession class defined in the basex-api module and I'm seeing intermittent org.basex.core.BaseXException errors with an empty error message when executing long-running read queries or CREATE DB commands. The stack trace of the most recent error points to ClientSession.receive ( https://github.com/BaseXdb/basex/blob/9.5.2/basex-core/src/main/java/org/basex/api/client/ClientSession.java#L190 ) org.basex.core.BaseXException: org.basex.api.client.ClientSession.receive(ClientSession.java:190) org.basex.api.client.ClientSession.send(ClientSession.java:178) org.basex.api.client.ClientSession.send(ClientSession.java:215) org.basex.api.client.ClientSession.create(ClientSession.java:128) ... Unfortunately since the message is empty it's difficult to debug what's going wrong. I've checked the log files in basex/data/.logs but haven't found any information about the failure. Is there another place I can look? Or do you have any other recommendations for debugging the errors? Thanks in advance, Matt
Re: [basex-talk] Merge/combine databases in memory constrained environment
Thanks again, Christian. Regardless of whether I have the UPDINDEX and AUTOOPTIMIZE options enabled, I'm seeing that my first set of updates runs pretty quickly (in about 90 seconds) but any subsequent set of updates hangs indefinitely -- I let it run for over 2 hours and it never completed. Do you have any idea what could be going on? On Mon, Jul 26, 2021 at 9:41 PM Christian Grün wrote: > > Thank you Christian and Graydon! I've got a solution working by calling > the script multiple times to do only a subset of the updates each time. On > a different (though related) note, do I understand correctly that I should > run a "db:optimize($db)" after each time I perform a significant amount of > updates? > > It’s definitely advisable if you perform queries that take advantage > of the BaseX index structures – which is already the case for > root/element[@id = $se/@id]. If the UPDINDEX option is enabled, the > index structures will always be kept up-to-date (but again, a database > optimization might be recommendable to minimize the index structures). >
Re: [basex-talk] Merge/combine databases in memory constrained environment
Thank you Christian and Graydon! I've got a solution working by calling the script multiple times to do only a subset of the updates each time. On a different (though related) note, do I understand correctly that I should run a "db:optimize($db)" after each time I perform a significant amount of updates? Matt On Mon, Jul 26, 2021 at 1:32 PM Graydon wrote: > On Mon, Jul 26, 2021 at 01:08:19PM -0400, Matthew Dziuban scripsit: > > Is there any way to accomplish this? Thanks in advance! > > It might be easier to generate the merged content, create a db from > that, then replace the old target DB with that whole. > > -- Graydon > -- Matthew R. Dziuban mattdziuban.com 703-973-6717 mrdziu...@gmail.com
[basex-talk] Merge/combine databases in memory constrained environment
Hi all, I have two databases in BaseX, source_db and target_db, and would like to merge them by matching on the id attribute of each element and upserting the element with a `replace` or an `insert` depending on whether the element was found in the `target_db`. `source_db` has about 100,000 elements, and `target_db` has about 1,000,000 elements. The databases look like this: And my query to merge the two looks like this: for $e in (db:open("source_db")/root/element) return ( if (exists(db:open("target_db")/root/element[@id = data($e/@id)])) then replace node db:open("target_db")/root/element[@id = data($e/@id)] with $e else insert node $e into db:open("target_db")/root ) When running the query, however, I keep getting memory constraint errors. Using a POST request to BaseX's REST interface I get "Out of Main Memory" and using the BaseX java client ( https://github.com/BaseXdb/basex/blob/master/basex-examples/src/main/java/org/basex/examples/api/BaseXClient.java) I get "java.io.IOException: GC overhead limit exceeded". Ideally I would like to just process one element from source_db at a time to avoid memory issues, but it seems like my query isn't doing this. I've tried using the `db:copynode false` pragma but it did not make a difference. Is there any way to accomplish this? Thanks in advance! Matt