Re: [basex-talk] Performance of ft:search function

2022-04-29 Thread Christian Grün
Exactly: The longer you run a BaseX instance, the faster it gets. That’s
particularly noticeable when using the client/server or HTTP architecture.

There are various reasons for that: BaseX caches, OS & main-memory caching,
JIT optimizations, …



Tim Thompson  schrieb am Fr., 29. Apr. 2022, 22:40:

> Oh, I see--thanks for the tip; I wasn't aware of the SET RUNS feature,
> which is really helpful! With 1000 runs, the average execution time is more
> in line with expectations: 38.96ms for expression #1 and 12.44ms for #2.
> But I notice that with successive executions, #1 gets faster: 38.96ms,
> 17.73ms, 12.82ms. Is this a result of caching?
>
> Best,
> Tim
>
>
> --
> Tim A. Thompson (he, him)
> Librarian for Applied Metadata Research
> Yale University Library
>
>
>
> On Wed, Apr 27, 2022 at 5:09 PM Christian Grün 
> wrote:
>
>> 2. Direct lookup against subindex
>>> Time: 3.3ms
>>> Expression: ft:search($index, $text)/../..
>>>
>>> 3. Lookup against subindex file with reference to large index
>>> Time: 2.9ms
>>> Expression:
>>> let $s :=
>>>   ft:search($index, $text)/../..
>>> return db:open-id($db, $s/id)/../..
>>>
>>> My question is: why would the third expression be slightly faster (or at
>>> least not slower) than the second one, if it involves additional
>>> computation?
>>>
>>
>> I assume it's due to slight variations during your measurements. How many
>> items will be returned by ft:search? Do you get the same runtime if you run
>> the code 100 or 1000 times?
>>
>> In the GUI, you can type and execute SET RUNS 100 in the top input bar
>> (in command mode). Your query will then be executed multiple times, and you
>> will get shown the average runtime in the Info View.
>>
>>
>>
>>
>>


Re: [basex-talk] Performance of ft:search function

2022-04-29 Thread Tim Thompson
Oh, I see--thanks for the tip; I wasn't aware of the SET RUNS feature,
which is really helpful! With 1000 runs, the average execution time is more
in line with expectations: 38.96ms for expression #1 and 12.44ms for #2.
But I notice that with successive executions, #1 gets faster: 38.96ms,
17.73ms, 12.82ms. Is this a result of caching?

Best,
Tim


-- 
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library



On Wed, Apr 27, 2022 at 5:09 PM Christian Grün 
wrote:

> 2. Direct lookup against subindex
>> Time: 3.3ms
>> Expression: ft:search($index, $text)/../..
>>
>> 3. Lookup against subindex file with reference to large index
>> Time: 2.9ms
>> Expression:
>> let $s :=
>>   ft:search($index, $text)/../..
>> return db:open-id($db, $s/id)/../..
>>
>> My question is: why would the third expression be slightly faster (or at
>> least not slower) than the second one, if it involves additional
>> computation?
>>
>
> I assume it's due to slight variations during your measurements. How many
> items will be returned by ft:search? Do you get the same runtime if you run
> the code 100 or 1000 times?
>
> In the GUI, you can type and execute SET RUNS 100 in the top input bar (in
> command mode). Your query will then be executed multiple times, and you
> will get shown the average runtime in the Info View.
>
>
>
>
>


Re: [basex-talk] Performance of ft:search function

2022-04-27 Thread Christian Grün
>
> 2. Direct lookup against subindex
> Time: 3.3ms
> Expression: ft:search($index, $text)/../..
>
> 3. Lookup against subindex file with reference to large index
> Time: 2.9ms
> Expression:
> let $s :=
>   ft:search($index, $text)/../..
> return db:open-id($db, $s/id)/../..
>
> My question is: why would the third expression be slightly faster (or at
> least not slower) than the second one, if it involves additional
> computation?
>

I assume it's due to slight variations during your measurements. How many
items will be returned by ft:search? Do you get the same runtime if you run
the code 100 or 1000 times?

In the GUI, you can type and execute SET RUNS 100 in the top input bar (in
command mode). Your query will then be executed multiple times, and you
will get shown the average runtime in the Info View.


[basex-talk] Performance of ft:search function

2022-04-27 Thread Tim Thompson
Hello,

I have a largish (5.4G) file with a full-text index that I am using to
reconcile names in a local dataset. I've been experimenting with splitting
the file into many smaller index files to improve performance. I group the
entries by initial character and create a new index file for each distinct
initial character. Each smaller file then gets its own full-text index.

I've been following the approach outlined in the documentation for custom
index structures
. Using
prof:track, I've noticed the following performance for different uses of
ft:search.

(Here, $db refers to the 5.4G file, and $index refers to a smaller 159MB
subindex. Times are averaged across 10 runs of 1000 iterations for each
expression.)

1. Direct lookup against large index
Time: 23ms
Expression: ft:search($db, $text)/../..

2. Direct lookup against subindex
Time: 3.3ms
Expression: ft:search($index, $text)/../..

3. Lookup against subindex file with reference to large index
Time: 2.9ms
Expression:
let $s :=
  ft:search($index, $text)/../..
return db:open-id($db, $s/id)/../..

My question is: why would the third expression be slightly faster (or at
least not slower) than the second one, if it involves additional
computation?

Thanks in advance,
Tim


-- 
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library