Re: [basex-talk] Text index on uncommon node
Hi Christian, Good thinking! Indeed, 'false' appears 7.25 million times in the database. I'll look into selective indexing, thanks for the link, and thanks again for the help -- this gives me a much better idea of how the text index works in general. Best, Matt On Mon, Oct 30, 2023 at 5:23 PM Christian Grün wrote: > Hi Matt, > > I assume the culprit is the common string you're looking up. It probably > occurs very often in your database. You can e.g. verify this via > index:texts('MyDatabase', 'false') or count(db:get('MyDatabase')//text()[. > = 'false']). > > If you don't need to perform exact queries on arbitrary elements, you > could think about restricting the text index to specific element names to > reduce the number of intermediate hits [1]. > > Hope this helps, > Christian > > [1] https://docs.basex.org/wiki/Indexes#Selective_Indexing > > > Matthew Dziuban schrieb am Mo., 30. Okt. 2023, > 22:09: > >> Hi Christian, >> >> Thanks for the quick response, and sure thing. It does look like the text >> index is applied in both cases. While I was writing up an example of the >> slow query, I realized that the way I'll actually be querying it is by >> wrapping the condition in not(...). After doing so, it now only takes 6 >> seconds to run -- still slower but better. The slow query looks like this: >> >> for $x in db:open('MyDatabase')/data/element >> where not($x/child1/child2/valid = 'false') >> return $x >> >> And the fast query looks like this: >> >> for $x in db:open('MyDatabase')/data/element >> where $x/child3/child4/id = '123' >> return $x >> >> Let me know if this is helpful -- if not, I could share more info in a >> direct email about the actual structure of the database and the Info output. >> >> Thanks, >> Matt >> >> On Mon, Oct 30, 2023 at 4:40 PM Christian Grün >> wrote: >> >>> Hi Matt, >>> >>> In general, all nodes are treated identically, no matter what the >>> hierarchy is or regular the target path is. >>> >>> Could you share some more information with us? How do the queries look >>> like (the slow and the fast one)? Is the text index applied in both cases? >>> >>> Thanks in advance, >>> Christian >>> >>> >>> >>> Matthew Dziuban schrieb am Mo., 30. Okt. 2023, >>> 21:33: >>> Hi all, I'm working with a database structured like so: ... ... ... There are a total of about 1.5 million nodes in the database. Each has many child nodes, one of which is uncommon -- it only appears in 727 s. I'm writing a query that has a condition on this uncommon field, but the query takes about 20 seconds to run, whereas another with a condition on a child node that appears in every only takes about 20 milliseconds to run. Based on the Info in the GUI, it does appear that the text index is being used -- I see 'apply text index for "..."'. Is it expected that the query time would be this much longer? Is the text index somehow built differently for nodes that don't appear often in the database? Thanks in advance, Matt >>> >> >> -- >> Matthew R. Dziuban >> mattdziuban.com >> 703-973-6717 >> mrdziu...@gmail.com >> > -- Matthew R. Dziuban mattdziuban.com 703-973-6717 mrdziu...@gmail.com
Re: [basex-talk] Text index on uncommon node
Hi Matt, I assume the culprit is the common string you're looking up. It probably occurs very often in your database. You can e.g. verify this via index:texts('MyDatabase', 'false') or count(db:get('MyDatabase')//text()[. = 'false']). If you don't need to perform exact queries on arbitrary elements, you could think about restricting the text index to specific element names to reduce the number of intermediate hits [1]. Hope this helps, Christian [1] https://docs.basex.org/wiki/Indexes#Selective_Indexing Matthew Dziuban schrieb am Mo., 30. Okt. 2023, 22:09: > Hi Christian, > > Thanks for the quick response, and sure thing. It does look like the text > index is applied in both cases. While I was writing up an example of the > slow query, I realized that the way I'll actually be querying it is by > wrapping the condition in not(...). After doing so, it now only takes 6 > seconds to run -- still slower but better. The slow query looks like this: > > for $x in db:open('MyDatabase')/data/element > where not($x/child1/child2/valid = 'false') > return $x > > And the fast query looks like this: > > for $x in db:open('MyDatabase')/data/element > where $x/child3/child4/id = '123' > return $x > > Let me know if this is helpful -- if not, I could share more info in a > direct email about the actual structure of the database and the Info output. > > Thanks, > Matt > > On Mon, Oct 30, 2023 at 4:40 PM Christian Grün > wrote: > >> Hi Matt, >> >> In general, all nodes are treated identically, no matter what the >> hierarchy is or regular the target path is. >> >> Could you share some more information with us? How do the queries look >> like (the slow and the fast one)? Is the text index applied in both cases? >> >> Thanks in advance, >> Christian >> >> >> >> Matthew Dziuban schrieb am Mo., 30. Okt. 2023, >> 21:33: >> >>> Hi all, >>> >>> I'm working with a database structured like so: >>> >>> >>> ... >>> ... >>> ... >>> >>> >>> There are a total of about 1.5 million nodes in the database. >>> Each has many child nodes, one of which is uncommon -- it only >>> appears in 727 s. >>> >>> I'm writing a query that has a condition on this uncommon field, but the >>> query takes about 20 seconds to run, whereas another with a condition on a >>> child node that appears in every only takes about 20 milliseconds >>> to run. >>> >>> Based on the Info in the GUI, it does appear that the text index is >>> being used -- I see 'apply text index for "..."'. Is it expected that the >>> query time would be this much longer? Is the text index somehow built >>> differently for nodes that don't appear often in the database? >>> >>> Thanks in advance, >>> Matt >>> >> > > -- > Matthew R. Dziuban > mattdziuban.com > 703-973-6717 > mrdziu...@gmail.com >
Re: [basex-talk] Text index on uncommon node
Hi Christian, Thanks for the quick response, and sure thing. It does look like the text index is applied in both cases. While I was writing up an example of the slow query, I realized that the way I'll actually be querying it is by wrapping the condition in not(...). After doing so, it now only takes 6 seconds to run -- still slower but better. The slow query looks like this: for $x in db:open('MyDatabase')/data/element where not($x/child1/child2/valid = 'false') return $x And the fast query looks like this: for $x in db:open('MyDatabase')/data/element where $x/child3/child4/id = '123' return $x Let me know if this is helpful -- if not, I could share more info in a direct email about the actual structure of the database and the Info output. Thanks, Matt On Mon, Oct 30, 2023 at 4:40 PM Christian Grün wrote: > Hi Matt, > > In general, all nodes are treated identically, no matter what the > hierarchy is or regular the target path is. > > Could you share some more information with us? How do the queries look > like (the slow and the fast one)? Is the text index applied in both cases? > > Thanks in advance, > Christian > > > > Matthew Dziuban schrieb am Mo., 30. Okt. 2023, > 21:33: > >> Hi all, >> >> I'm working with a database structured like so: >> >> >> ... >> ... >> ... >> >> >> There are a total of about 1.5 million nodes in the database. >> Each has many child nodes, one of which is uncommon -- it only >> appears in 727 s. >> >> I'm writing a query that has a condition on this uncommon field, but the >> query takes about 20 seconds to run, whereas another with a condition on a >> child node that appears in every only takes about 20 milliseconds >> to run. >> >> Based on the Info in the GUI, it does appear that the text index is being >> used -- I see 'apply text index for "..."'. Is it expected that the query >> time would be this much longer? Is the text index somehow built differently >> for nodes that don't appear often in the database? >> >> Thanks in advance, >> Matt >> > -- Matthew R. Dziuban mattdziuban.com 703-973-6717 mrdziu...@gmail.com
Re: [basex-talk] Text index on uncommon node
Hi Matt, In general, all nodes are treated identically, no matter what the hierarchy is or regular the target path is. Could you share some more information with us? How do the queries look like (the slow and the fast one)? Is the text index applied in both cases? Thanks in advance, Christian Matthew Dziuban schrieb am Mo., 30. Okt. 2023, 21:33: > Hi all, > > I'm working with a database structured like so: > > > ... > ... > ... > > > There are a total of about 1.5 million nodes in the database. > Each has many child nodes, one of which is uncommon -- it only > appears in 727 s. > > I'm writing a query that has a condition on this uncommon field, but the > query takes about 20 seconds to run, whereas another with a condition on a > child node that appears in every only takes about 20 milliseconds > to run. > > Based on the Info in the GUI, it does appear that the text index is being > used -- I see 'apply text index for "..."'. Is it expected that the query > time would be this much longer? Is the text index somehow built differently > for nodes that don't appear often in the database? > > Thanks in advance, > Matt >
[basex-talk] Text index on uncommon node
Hi all, I'm working with a database structured like so: ... ... ... There are a total of about 1.5 million nodes in the database. Each has many child nodes, one of which is uncommon -- it only appears in 727 s. I'm writing a query that has a condition on this uncommon field, but the query takes about 20 seconds to run, whereas another with a condition on a child node that appears in every only takes about 20 milliseconds to run. Based on the Info in the GUI, it does appear that the text index is being used -- I see 'apply text index for "..."'. Is it expected that the query time would be this much longer? Is the text index somehow built differently for nodes that don't appear often in the database? Thanks in advance, Matt