Re: [basex-talk] Text index on uncommon node

2023-10-30 Thread Matthew Dziuban
Hi Christian,

Good thinking! Indeed, 'false' appears 7.25 million times in the database.
I'll look into selective indexing, thanks for the link, and thanks again
for the help -- this gives me a much better idea of how the text index
works in general.

Best,
Matt

On Mon, Oct 30, 2023 at 5:23 PM Christian Grün 
wrote:

> Hi Matt,
>
> I assume the culprit is the common string you're looking up. It probably
> occurs very often in your database. You can e.g. verify this via
> index:texts('MyDatabase', 'false') or count(db:get('MyDatabase')//text()[.
> = 'false']).
>
> If you don't need to perform exact queries on arbitrary elements, you
> could think about restricting the text index to specific element names to
> reduce the number of intermediate hits [1].
>
> Hope this helps,
> Christian
>
> [1] https://docs.basex.org/wiki/Indexes#Selective_Indexing
>
>
> Matthew Dziuban  schrieb am Mo., 30. Okt. 2023,
> 22:09:
>
>> Hi Christian,
>>
>> Thanks for the quick response, and sure thing. It does look like the text
>> index is applied in both cases. While I was writing up an example of the
>> slow query, I realized that the way I'll actually be querying it is by
>> wrapping the condition in not(...). After doing so, it now only takes 6
>> seconds to run -- still slower but better. The slow query looks like this:
>>
>> for $x in db:open('MyDatabase')/data/element
>> where not($x/child1/child2/valid = 'false')
>> return $x
>>
>> And the fast query looks like this:
>>
>> for $x in db:open('MyDatabase')/data/element
>> where $x/child3/child4/id = '123'
>> return $x
>>
>> Let me know if this is helpful -- if not, I could share more info in a
>> direct email about the actual structure of the database and the Info output.
>>
>> Thanks,
>> Matt
>>
>> On Mon, Oct 30, 2023 at 4:40 PM Christian Grün 
>> wrote:
>>
>>> Hi Matt,
>>>
>>> In general, all nodes are treated identically, no matter what the
>>> hierarchy is or regular the target path is.
>>>
>>> Could you share some more information with us? How do the queries look
>>> like (the slow and the fast one)? Is the text index applied in both cases?
>>>
>>> Thanks in advance,
>>> Christian
>>>
>>>
>>>
>>> Matthew Dziuban  schrieb am Mo., 30. Okt. 2023,
>>> 21:33:
>>>
 Hi all,

 I'm working with a database structured like so:

 
   ...
   ...
   ...
 

 There are a total of about 1.5 million  nodes in the database.
 Each  has many child nodes, one of which is uncommon -- it only
 appears in 727 s.

 I'm writing a query that has a condition on this uncommon field, but
 the query takes about 20 seconds to run, whereas another with a condition
 on a child node that appears in every  only takes about 20
 milliseconds to run.

 Based on the Info in the GUI, it does appear that the text index is
 being used -- I see 'apply text index for "..."'. Is it expected that the
 query time would be this much longer? Is the text index somehow built
 differently for nodes that don't appear often in the database?

 Thanks in advance,
 Matt

>>>
>>
>> --
>> Matthew R. Dziuban
>> mattdziuban.com
>> 703-973-6717
>> mrdziu...@gmail.com
>>
>

-- 
Matthew R. Dziuban
mattdziuban.com
703-973-6717
mrdziu...@gmail.com


Re: [basex-talk] Text index on uncommon node

2023-10-30 Thread Christian Grün
Hi Matt,

I assume the culprit is the common string you're looking up. It probably
occurs very often in your database. You can e.g. verify this via
index:texts('MyDatabase', 'false') or count(db:get('MyDatabase')//text()[.
= 'false']).

If you don't need to perform exact queries on arbitrary elements, you could
think about restricting the text index to specific element names to reduce
the number of intermediate hits [1].

Hope this helps,
Christian

[1] https://docs.basex.org/wiki/Indexes#Selective_Indexing


Matthew Dziuban  schrieb am Mo., 30. Okt. 2023, 22:09:

> Hi Christian,
>
> Thanks for the quick response, and sure thing. It does look like the text
> index is applied in both cases. While I was writing up an example of the
> slow query, I realized that the way I'll actually be querying it is by
> wrapping the condition in not(...). After doing so, it now only takes 6
> seconds to run -- still slower but better. The slow query looks like this:
>
> for $x in db:open('MyDatabase')/data/element
> where not($x/child1/child2/valid = 'false')
> return $x
>
> And the fast query looks like this:
>
> for $x in db:open('MyDatabase')/data/element
> where $x/child3/child4/id = '123'
> return $x
>
> Let me know if this is helpful -- if not, I could share more info in a
> direct email about the actual structure of the database and the Info output.
>
> Thanks,
> Matt
>
> On Mon, Oct 30, 2023 at 4:40 PM Christian Grün 
> wrote:
>
>> Hi Matt,
>>
>> In general, all nodes are treated identically, no matter what the
>> hierarchy is or regular the target path is.
>>
>> Could you share some more information with us? How do the queries look
>> like (the slow and the fast one)? Is the text index applied in both cases?
>>
>> Thanks in advance,
>> Christian
>>
>>
>>
>> Matthew Dziuban  schrieb am Mo., 30. Okt. 2023,
>> 21:33:
>>
>>> Hi all,
>>>
>>> I'm working with a database structured like so:
>>>
>>> 
>>>   ...
>>>   ...
>>>   ...
>>> 
>>>
>>> There are a total of about 1.5 million  nodes in the database.
>>> Each  has many child nodes, one of which is uncommon -- it only
>>> appears in 727 s.
>>>
>>> I'm writing a query that has a condition on this uncommon field, but the
>>> query takes about 20 seconds to run, whereas another with a condition on a
>>> child node that appears in every  only takes about 20 milliseconds
>>> to run.
>>>
>>> Based on the Info in the GUI, it does appear that the text index is
>>> being used -- I see 'apply text index for "..."'. Is it expected that the
>>> query time would be this much longer? Is the text index somehow built
>>> differently for nodes that don't appear often in the database?
>>>
>>> Thanks in advance,
>>> Matt
>>>
>>
>
> --
> Matthew R. Dziuban
> mattdziuban.com
> 703-973-6717
> mrdziu...@gmail.com
>


Re: [basex-talk] Text index on uncommon node

2023-10-30 Thread Matthew Dziuban
Hi Christian,

Thanks for the quick response, and sure thing. It does look like the text
index is applied in both cases. While I was writing up an example of the
slow query, I realized that the way I'll actually be querying it is by
wrapping the condition in not(...). After doing so, it now only takes 6
seconds to run -- still slower but better. The slow query looks like this:

for $x in db:open('MyDatabase')/data/element
where not($x/child1/child2/valid = 'false')
return $x

And the fast query looks like this:

for $x in db:open('MyDatabase')/data/element
where $x/child3/child4/id = '123'
return $x

Let me know if this is helpful -- if not, I could share more info in a
direct email about the actual structure of the database and the Info output.

Thanks,
Matt

On Mon, Oct 30, 2023 at 4:40 PM Christian Grün 
wrote:

> Hi Matt,
>
> In general, all nodes are treated identically, no matter what the
> hierarchy is or regular the target path is.
>
> Could you share some more information with us? How do the queries look
> like (the slow and the fast one)? Is the text index applied in both cases?
>
> Thanks in advance,
> Christian
>
>
>
> Matthew Dziuban  schrieb am Mo., 30. Okt. 2023,
> 21:33:
>
>> Hi all,
>>
>> I'm working with a database structured like so:
>>
>> 
>>   ...
>>   ...
>>   ...
>> 
>>
>> There are a total of about 1.5 million  nodes in the database.
>> Each  has many child nodes, one of which is uncommon -- it only
>> appears in 727 s.
>>
>> I'm writing a query that has a condition on this uncommon field, but the
>> query takes about 20 seconds to run, whereas another with a condition on a
>> child node that appears in every  only takes about 20 milliseconds
>> to run.
>>
>> Based on the Info in the GUI, it does appear that the text index is being
>> used -- I see 'apply text index for "..."'. Is it expected that the query
>> time would be this much longer? Is the text index somehow built differently
>> for nodes that don't appear often in the database?
>>
>> Thanks in advance,
>> Matt
>>
>

-- 
Matthew R. Dziuban
mattdziuban.com
703-973-6717
mrdziu...@gmail.com


Re: [basex-talk] Text index on uncommon node

2023-10-30 Thread Christian Grün
Hi Matt,

In general, all nodes are treated identically, no matter what the hierarchy
is or regular the target path is.

Could you share some more information with us? How do the queries look like
(the slow and the fast one)? Is the text index applied in both cases?

Thanks in advance,
Christian



Matthew Dziuban  schrieb am Mo., 30. Okt. 2023, 21:33:

> Hi all,
>
> I'm working with a database structured like so:
>
> 
>   ...
>   ...
>   ...
> 
>
> There are a total of about 1.5 million  nodes in the database.
> Each  has many child nodes, one of which is uncommon -- it only
> appears in 727 s.
>
> I'm writing a query that has a condition on this uncommon field, but the
> query takes about 20 seconds to run, whereas another with a condition on a
> child node that appears in every  only takes about 20 milliseconds
> to run.
>
> Based on the Info in the GUI, it does appear that the text index is being
> used -- I see 'apply text index for "..."'. Is it expected that the query
> time would be this much longer? Is the text index somehow built differently
> for nodes that don't appear often in the database?
>
> Thanks in advance,
> Matt
>


[basex-talk] Text index on uncommon node

2023-10-30 Thread Matthew Dziuban
Hi all,

I'm working with a database structured like so:


  ...
  ...
  ...


There are a total of about 1.5 million  nodes in the database.
Each  has many child nodes, one of which is uncommon -- it only
appears in 727 s.

I'm writing a query that has a condition on this uncommon field, but the
query takes about 20 seconds to run, whereas another with a condition on a
child node that appears in every  only takes about 20 milliseconds
to run.

Based on the Info in the GUI, it does appear that the text index is being
used -- I see 'apply text index for "..."'. Is it expected that the query
time would be this much longer? Is the text index somehow built differently
for nodes that don't appear often in the database?

Thanks in advance,
Matt