Re: [basex-talk] Creating backups manually

2024-01-04 Thread Matthew Dziuban
Perfect, thank you, Christian!

On Wed, Jan 3, 2024 at 11:25 AM Christian Grün 
wrote:

> Hi Matt,
>
> Your assumption is correct: Database backups are nothing else than zipped
> archives of the corresponding database subdirectory located in the 'data'
> directory.
>
> If you create backups on your own, you should ensure that no updates are
> currently running during your update operation. If you use CREATE BACKUP or
> db:create-backup, BaseX will take care of that.
>
> Hope this helps,
> Christian
>
>
>
> I'm interested in creating backups manually so I can use a different
>> compression algorithm. Based on the source code[1], it looks like backups
>> are just created by adding each file (excluding upd.basex) in the
>> database directory to a .zip file, so I could do the same using tar and
>> my compression algorithm of choice. Is my understanding correct or am I
>> missing some other logic?
>>
>> Thanks,
>> Matt
>>
>> [1]
>> https://github.com/BaseXdb/basex/blob/main/basex-core/src/main/java/org/basex/core/cmd/CreateBackup.java#L90-L119
>>
>


[basex-talk] Creating backups manually

2024-01-02 Thread Matthew Dziuban
Hi all,

I'm interested in creating backups manually so I can use a different
compression algorithm. Based on the source code[1], it looks like backups
are just created by adding each file (excluding upd.basex) in the database
directory to a .zip file, so I could do the same using tar and my
compression algorithm of choice. Is my understanding correct or am I
missing some other logic?

Thanks,
Matt

[1]
https://github.com/BaseXdb/basex/blob/main/basex-core/src/main/java/org/basex/core/cmd/CreateBackup.java#L90-L119


Re: [basex-talk] Text index on uncommon node

2023-10-30 Thread Matthew Dziuban
Hi Christian,

Good thinking! Indeed, 'false' appears 7.25 million times in the database.
I'll look into selective indexing, thanks for the link, and thanks again
for the help -- this gives me a much better idea of how the text index
works in general.

Best,
Matt

On Mon, Oct 30, 2023 at 5:23 PM Christian Grün 
wrote:

> Hi Matt,
>
> I assume the culprit is the common string you're looking up. It probably
> occurs very often in your database. You can e.g. verify this via
> index:texts('MyDatabase', 'false') or count(db:get('MyDatabase')//text()[.
> = 'false']).
>
> If you don't need to perform exact queries on arbitrary elements, you
> could think about restricting the text index to specific element names to
> reduce the number of intermediate hits [1].
>
> Hope this helps,
> Christian
>
> [1] https://docs.basex.org/wiki/Indexes#Selective_Indexing
>
>
> Matthew Dziuban  schrieb am Mo., 30. Okt. 2023,
> 22:09:
>
>> Hi Christian,
>>
>> Thanks for the quick response, and sure thing. It does look like the text
>> index is applied in both cases. While I was writing up an example of the
>> slow query, I realized that the way I'll actually be querying it is by
>> wrapping the condition in not(...). After doing so, it now only takes 6
>> seconds to run -- still slower but better. The slow query looks like this:
>>
>> for $x in db:open('MyDatabase')/data/element
>> where not($x/child1/child2/valid = 'false')
>> return $x
>>
>> And the fast query looks like this:
>>
>> for $x in db:open('MyDatabase')/data/element
>> where $x/child3/child4/id = '123'
>> return $x
>>
>> Let me know if this is helpful -- if not, I could share more info in a
>> direct email about the actual structure of the database and the Info output.
>>
>> Thanks,
>> Matt
>>
>> On Mon, Oct 30, 2023 at 4:40 PM Christian Grün 
>> wrote:
>>
>>> Hi Matt,
>>>
>>> In general, all nodes are treated identically, no matter what the
>>> hierarchy is or regular the target path is.
>>>
>>> Could you share some more information with us? How do the queries look
>>> like (the slow and the fast one)? Is the text index applied in both cases?
>>>
>>> Thanks in advance,
>>> Christian
>>>
>>>
>>>
>>> Matthew Dziuban  schrieb am Mo., 30. Okt. 2023,
>>> 21:33:
>>>
>>>> Hi all,
>>>>
>>>> I'm working with a database structured like so:
>>>>
>>>> 
>>>>   ...
>>>>   ...
>>>>   ...
>>>> 
>>>>
>>>> There are a total of about 1.5 million  nodes in the database.
>>>> Each  has many child nodes, one of which is uncommon -- it only
>>>> appears in 727 s.
>>>>
>>>> I'm writing a query that has a condition on this uncommon field, but
>>>> the query takes about 20 seconds to run, whereas another with a condition
>>>> on a child node that appears in every  only takes about 20
>>>> milliseconds to run.
>>>>
>>>> Based on the Info in the GUI, it does appear that the text index is
>>>> being used -- I see 'apply text index for "..."'. Is it expected that the
>>>> query time would be this much longer? Is the text index somehow built
>>>> differently for nodes that don't appear often in the database?
>>>>
>>>> Thanks in advance,
>>>> Matt
>>>>
>>>
>>
>> --
>> Matthew R. Dziuban
>> mattdziuban.com
>> 703-973-6717
>> mrdziu...@gmail.com
>>
>

-- 
Matthew R. Dziuban
mattdziuban.com
703-973-6717
mrdziu...@gmail.com


Re: [basex-talk] Text index on uncommon node

2023-10-30 Thread Matthew Dziuban
Hi Christian,

Thanks for the quick response, and sure thing. It does look like the text
index is applied in both cases. While I was writing up an example of the
slow query, I realized that the way I'll actually be querying it is by
wrapping the condition in not(...). After doing so, it now only takes 6
seconds to run -- still slower but better. The slow query looks like this:

for $x in db:open('MyDatabase')/data/element
where not($x/child1/child2/valid = 'false')
return $x

And the fast query looks like this:

for $x in db:open('MyDatabase')/data/element
where $x/child3/child4/id = '123'
return $x

Let me know if this is helpful -- if not, I could share more info in a
direct email about the actual structure of the database and the Info output.

Thanks,
Matt

On Mon, Oct 30, 2023 at 4:40 PM Christian Grün 
wrote:

> Hi Matt,
>
> In general, all nodes are treated identically, no matter what the
> hierarchy is or regular the target path is.
>
> Could you share some more information with us? How do the queries look
> like (the slow and the fast one)? Is the text index applied in both cases?
>
> Thanks in advance,
> Christian
>
>
>
> Matthew Dziuban  schrieb am Mo., 30. Okt. 2023,
> 21:33:
>
>> Hi all,
>>
>> I'm working with a database structured like so:
>>
>> 
>>   ...
>>   ...
>>   ...
>> 
>>
>> There are a total of about 1.5 million  nodes in the database.
>> Each  has many child nodes, one of which is uncommon -- it only
>> appears in 727 s.
>>
>> I'm writing a query that has a condition on this uncommon field, but the
>> query takes about 20 seconds to run, whereas another with a condition on a
>> child node that appears in every  only takes about 20 milliseconds
>> to run.
>>
>> Based on the Info in the GUI, it does appear that the text index is being
>> used -- I see 'apply text index for "..."'. Is it expected that the query
>> time would be this much longer? Is the text index somehow built differently
>> for nodes that don't appear often in the database?
>>
>> Thanks in advance,
>> Matt
>>
>

-- 
Matthew R. Dziuban
mattdziuban.com
703-973-6717
mrdziu...@gmail.com


[basex-talk] Text index on uncommon node

2023-10-30 Thread Matthew Dziuban
Hi all,

I'm working with a database structured like so:


  ...
  ...
  ...


There are a total of about 1.5 million  nodes in the database.
Each  has many child nodes, one of which is uncommon -- it only
appears in 727 s.

I'm writing a query that has a condition on this uncommon field, but the
query takes about 20 seconds to run, whereas another with a condition on a
child node that appears in every  only takes about 20 milliseconds
to run.

Based on the Info in the GUI, it does appear that the text index is being
used -- I see 'apply text index for "..."'. Is it expected that the query
time would be this much longer? Is the text index somehow built differently
for nodes that don't appear often in the database?

Thanks in advance,
Matt


Re: [basex-talk] Support non-admin user access to database admin interface (DBA)

2023-08-30 Thread Matthew Dziuban
Hi Christian,

Thanks for the feedback! If I'm reading the code correctly, my
understanding was that the permissions you mentioned should already be
enforced:

   - admin:logs() specifies Perm::ADMIN [1]
   - db:list() calls ctx.listDBs() which says it should return the
   databases for which the current user has read access [2]
   - job:list-details() specifies Perm::ADMIN [3]

I can update my fork to disallow access to the Logs and Jobs panels, but is
it an issue in the java code that the relevant permissions aren't being
enforced?

Thanks again,
Matt

[1]
https://github.com/BaseXdb/basex/blob/10.7/basex-core/src/main/java/org/basex/query/func/Function.java#L871
[2]
https://github.com/BaseXdb/basex/blob/10.7/basex-core/src/main/java/org/basex/core/Context.java#L283
[3]
https://github.com/BaseXdb/basex/blob/10.7/basex-core/src/main/java/org/basex/query/func/Function.java#L1534

On Wed, Aug 30, 2023 at 7:41 AM Christian Grün 
wrote:

> Hi Matthew,
>
> Thanks for providing me access to your fork. I’ve done some quick tests,
> and I noticed the following:
>
> • The Database panel should only list those databases that a particular
> user has access to.
> • It must not be allowed to run queries like admin:logs() unless you have
> 'admin' permissions. More generally, the permissions used for running
> queries must not be more powerful than those of the current user.
> • The Jobs panel must be limited to Admin users; at least that’s how our
> current permission model is designed (the current solution could possibly
> be enhanced, such that users with fewer permissions could see their own
> jobs).
>
> You can either try the BaseX client to find out what users with fewer
> permissions are allowed to do, or you can look into the code [1].
>
> Hope this helps; feel free to ask for more details,
> Christian
>
> [1]
> https://github.com/BaseXdb/basex/blob/main/basex-core/src/main/java/org/basex/query/func/Function.java
>
>
>
> On Mon, Aug 21, 2023 at 7:34 PM Matthew Dziuban 
> wrote:
>
>> Hi all,
>>
>> While the subject might sound contradictory, I'm curious what you think
>> about opening up the DBA code to allow non-admin users to access it and
>> perform actions for which they have permissions?
>>
>> I currently maintain and run a fork of the DBA web app at work to make
>> this possible, but I'd love to have the behavior built into BaseX if
>> possible. You can view the changes I've made against BaseX 10.7 here:
>> https://github.com/mblink/basex-webapp/compare/upstream-webapp...webapp-10.7
>>
>> If you're open to this, I'd be happy to open a pull request with my
>> changes!
>>
>> Thanks,
>> Matt
>>
>>

-- 
Matthew R. Dziuban
mattdziuban.com
703-973-6717
mrdziu...@gmail.com


Re: [basex-talk] Support non-admin user access to database admin interface (DBA)

2023-08-23 Thread Matthew Dziuban
Sounds good, thanks Christian! Let me know if I can provide any more
details that would be helpful.

Matt

On Tue, Aug 22, 2023 at 3:25 AM Christian Grün 
wrote:

> Hi Matt,
>
> Providing a non-admin version of the DBA is certainly a good idea. We
> mostly didn’t have time and resources to clarify what will be the
> implication for the particular views.
>
> I’ll be happy to have a closer look at your fork next week.
>
> Best,
> Christian
>
>
>
> Matthew Dziuban  schrieb am Mo., 21. Aug. 2023,
> 19:34:
>
>> Hi all,
>>
>> While the subject might sound contradictory, I'm curious what you think
>> about opening up the DBA code to allow non-admin users to access it and
>> perform actions for which they have permissions?
>>
>> I currently maintain and run a fork of the DBA web app at work to make
>> this possible, but I'd love to have the behavior built into BaseX if
>> possible. You can view the changes I've made against BaseX 10.7 here:
>> https://github.com/mblink/basex-webapp/compare/upstream-webapp...webapp-10.7
>>
>> If you're open to this, I'd be happy to open a pull request with my
>> changes!
>>
>> Thanks,
>> Matt
>>
>>

-- 
Matthew R. Dziuban
mattdziuban.com
703-973-6717
mrdziu...@gmail.com


[basex-talk] Support non-admin user access to database admin interface (DBA)

2023-08-21 Thread Matthew Dziuban
Hi all,

While the subject might sound contradictory, I'm curious what you think
about opening up the DBA code to allow non-admin users to access it and
perform actions for which they have permissions?

I currently maintain and run a fork of the DBA web app at work to make this
possible, but I'd love to have the behavior built into BaseX if possible.
You can view the changes I've made against BaseX 10.7 here:
https://github.com/mblink/basex-webapp/compare/upstream-webapp...webapp-10.7

If you're open to this, I'd be happy to open a pull request with my changes!

Thanks,
Matt


Re: [basex-talk] Text index requires `/text()` in query

2022-05-02 Thread Matthew Dziuban
Good to know -- thanks for the help, Christian!


Re: [basex-talk] Text index requires `/text()` in query

2022-04-29 Thread Matthew Dziuban
As I was trying to come up with a simple example to reproduce it I
rediscovered that the top-level  element specifies an XML namespace,
apologies I failed to mention that initially. Would that affect whether the
index is used or not?

I'm able to reproduce by loading this data into a new database named
ElementsTest:

http://www.w3.org/2001/XMLSchema-instance;>
  1


And then running this query:

for $x in db:open('ElementsTest')/data/element
where $x/id = '1'
return $x/id

The GUI shows the following as the optimized query:

db:open-pre("ElementsTest", 0)/data/element[(id = "1")]/id


Re: [basex-talk] Text index requires `/text()` in query

2022-04-29 Thread Matthew Dziuban
Hi Christian,

Thanks for the quick response! That query returns the following:


  


Out of curiosity, is there a way to see index utilization through the DBA
web app or via the ClientSession java class [1] instead of the GUI? I'm
using the client/server architecture so mainly run queries these ways.

Best,
Matt

On Fri, Apr 29, 2022 at 1:52 PM Christian Grün 
wrote:

> Hi Matthew,
>
> If you run your query on the following document …
>
> 
>   123
>   456
> 
>
> … and if you look into the Info View in the GUI, you will notice that
> the index will be utilized:
>
> Optimized Query:
> db:text("data", "DatabaseName")/parent::id/parent::element
>
> The query optimizer detects that all “data/element/id” elements are
> leaf elements (i.e., have a single text child node), and the resulting
> query will be rewritten for index.
>
> Maybe there are “id” elements in your document that are no leaf
> elements? Could you share the result of the following query with us?
>
>
> index:facets('data')/*/element[@name='data']/element[@name='element']/element[@name='id']
>
> Best,
> Christian
>


[basex-talk] Text index requires `/text()` in query

2022-04-29 Thread Matthew Dziuban
Hi all,

I was recently debugging performance of a query with an exact string
comparison and discovered that it seems the query was only rewritten to use
the text index [1] if I explicitly added `/text()` to the path I was
comparing.

My data looks like this:


  123


And my original query was:

for $el in db:open('DatabaseName')/data/element
where $el/id = '123'
return $el

With 3 million  nodes in the database, this query took about 4
seconds, which made me question whether the text index was being used. I
then changed the query to add `/text()` to the `where` clause, like so:

for $el in db:open('DatabaseName')/data/element
where $el/id/text() = '123'
return $el

With this change, the query only takes 0.4 seconds. Is it expected that
`/text()` is required to get the text index to kick in?

Thanks in advance,
Matt

[1] https://docs.basex.org/wiki/Indexes#Text_Index


[basex-talk] Outdated milton-api version

2022-01-21 Thread Matthew Dziuban
Hi all,

While auditing the library dependencies of an application, I found that
org.basex:basex-api [1] depends on a very old version of
com.ettrema:milton-api -- v1.8.1.4 released in 2014 [2]. The package seems
to have been migrated to the io.milton namespace and have a recently
(December 2021) released v3.1.0.301 [3]. This stood out to me because my
application is using the latest version of Apache commons-io, v2.11.0,
while the old version of milton-api depends on Apache commons-io v1.4,
released in 2008.

Would it be feasible to migrate BaseX to a newer version of the milton-api
library? Thanks in advance!

Matt

[1] https://mvnrepository.com/artifact/org.basex/basex-api/9.6.4
[2] https://mvnrepository.com/artifact/com.ettrema/milton-api/1.8.1.4
[3] https://mvnrepository.com/artifact/io.milton/milton-api/3.1.0.301


[basex-talk] org.basex.core.BaseXException with no error message

2021-08-12 Thread Matthew Dziuban
Hi all,

I'm using the java ClientSession class defined in the basex-api module and
I'm seeing intermittent org.basex.core.BaseXException errors with an empty
error message when executing long-running read queries or CREATE DB
commands. The stack trace of the most recent error points to
ClientSession.receive (
https://github.com/BaseXdb/basex/blob/9.5.2/basex-core/src/main/java/org/basex/api/client/ClientSession.java#L190
)

org.basex.core.BaseXException:
org.basex.api.client.ClientSession.receive(ClientSession.java:190)
org.basex.api.client.ClientSession.send(ClientSession.java:178)
org.basex.api.client.ClientSession.send(ClientSession.java:215)
org.basex.api.client.ClientSession.create(ClientSession.java:128)
...

Unfortunately since the message is empty it's difficult to debug what's
going wrong. I've checked the log files in basex/data/.logs but haven't
found any information about the failure. Is there another place I can look?
Or do you have any other recommendations for debugging the errors?

Thanks in advance,
Matt


Re: [basex-talk] Merge/combine databases in memory constrained environment

2021-07-27 Thread Matthew Dziuban
Thanks again, Christian. Regardless of whether I have the UPDINDEX and
AUTOOPTIMIZE options enabled, I'm seeing that my first set of updates runs
pretty quickly (in about 90 seconds) but any subsequent set of updates
hangs indefinitely -- I let it run for over 2 hours and it never completed.
Do you have any idea what could be going on?

On Mon, Jul 26, 2021 at 9:41 PM Christian Grün 
wrote:

> > Thank you Christian and Graydon! I've got a solution working by calling
> the script multiple times to do only a subset of the updates each time. On
> a different (though related) note, do I understand correctly that I should
> run a "db:optimize($db)" after each time I perform a significant amount of
> updates?
>
> It’s definitely advisable if you perform queries that take advantage
> of the BaseX index structures – which is already the case for
> root/element[@id = $se/@id]. If the UPDINDEX option is enabled, the
> index structures will always be kept up-to-date (but again, a database
> optimization might be recommendable to minimize the index structures).
>


Re: [basex-talk] Merge/combine databases in memory constrained environment

2021-07-26 Thread Matthew Dziuban
Thank you Christian and Graydon! I've got a solution working by calling the
script multiple times to do only a subset of the updates each time. On a
different (though related) note, do I understand correctly that I should
run a "db:optimize($db)" after each time I perform a significant amount of
updates?

Matt

On Mon, Jul 26, 2021 at 1:32 PM Graydon  wrote:

> On Mon, Jul 26, 2021 at 01:08:19PM -0400, Matthew Dziuban scripsit:
> > Is there any way to accomplish this? Thanks in advance!
>
> It might be easier to generate the merged content, create a db from
> that, then replace the old target DB with that whole.
>
> -- Graydon
>


-- 
Matthew R. Dziuban
mattdziuban.com
703-973-6717
mrdziu...@gmail.com


[basex-talk] Merge/combine databases in memory constrained environment

2021-07-26 Thread Matthew Dziuban
Hi all,

I have two databases in BaseX, source_db and target_db, and would like to
merge them by matching on the id attribute of each element and upserting
the element with a `replace` or an `insert` depending on whether the
element was found in the `target_db`. `source_db` has about 100,000
elements, and `target_db` has about 1,000,000 elements. The databases look
like this:



  
  



  


And my query to merge the two looks like this:

for $e in (db:open("source_db")/root/element)
return (
  if (exists(db:open("target_db")/root/element[@id = data($e/@id)]))
  then replace node db:open("target_db")/root/element[@id = data($e/@id)]
with $e
  else insert node $e into db:open("target_db")/root
)

When running the query, however, I keep getting memory constraint errors.
Using a POST request to BaseX's REST interface I get "Out of Main Memory"
and using the BaseX java client (
https://github.com/BaseXdb/basex/blob/master/basex-examples/src/main/java/org/basex/examples/api/BaseXClient.java)
I get "java.io.IOException: GC overhead limit exceeded".

Ideally I would like to just process one element from source_db at a time
to avoid memory issues, but it seems like my query isn't doing this. I've
tried using the `db:copynode false` pragma but it did not make a difference.

Is there any way to accomplish this? Thanks in advance!

Matt