Re: SQL query is slow
Hi Mihaela, Index is not used in your case because you specify function-based condition. Usually this is resolved by adding functional index, but Ignite doesn't support it at the moment unfortunately. Is it possible to "materialize" the condition "POSITION ('Z',manufacturerCode)>0" as additional attribute and add an index on it? In this case SQL would look like this and index will be used: SELECT COUNT(_KEY) FROM IgniteProduct AS product WHERE manufacturerCodeZ=1 Another important thing is selectivity - which fraction of records fall under this condition? Also I would recommend to change "COUNT(_KEY)" to "COUNT(*)". Vladimir. On Tue, Aug 29, 2017 at 6:05 PM, Andrey Mashenkov < andrey.mashen...@gmail.com> wrote: > It is possible returned dataset is too large and cause high network > pressure that results in large query execution time. > > There is no recommendation for grid nodes count. > Simple SQL queries can work slower on large grid as most of time is spent > in inter-node communication. > Heavy SQL queries may show better results on larger grid as every node > will have smaller dataset. > > You can try to look at page memory statistics [1] to get estimate numbers. > > Really, there is an issue with large OFFSET as Ignite can't just skip > entries and have to fetch all of them from nodes. > OFFSET makes no sense without ORDER as Ignite fetch rows from other nodes > in async way and row order should be preserved between such queries. > OFFSET applies on query initiator node (reduce side) after results merged > as there is no way to understand on map side what rows should be skiped. > > > Looks like underlying H2 tries to use index scan, but I don't think index > can help in case of functional condition. > You can try to make Ignite to have inline values in index or use separate > field with smaller type that can be inlined. By default, index inlining is > enabled for 10 byte length values. > See IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT system property docs and [2]. > > [1] https://apacheignite.readme.io/v2.1/docs/memory-metrics > [2] https://issues.apache.org/jira/browse/IGNITE-6060 > > On Tue, Aug 29, 2017 at 3:59 PM, mhetea <mihaela.he...@gmail.com> wrote: > >> Thank you for your response. >> I used query parallelizm and the time reduced to ~2.3s, which is still too >> much. >> Regarding 1. is there any documentation about configuration parameters >> (recommended number of nodes, how much data should be stored on each >> node). >> We currently have 2 nodes with 32GB RAM each. Every 1 million records from >> our cache occupy about 1GB (is there a way to see how much memory a cache >> actually occupies? we look now at the Allocated next memory segment log >> info) >> For 3. it seems that the index is hit from the execution plan: >> /* "productCache".IGNITEPRODUCT_MANUFACTURERCODE_IDX */ >> No? >> >> We have this issue also when we use a large OFFSET (we execute this kind >> of >> query because we want paginated results) >> >> Also, this cache will be updated frequently so we expect it to grow in >> size. >> >> Thank you! >> >> >> >> -- >> View this message in context: http://apache-ignite-users.705 >> 18.x6.nabble.com/SQL-query-is-slow-tp16475p16487.html >> Sent from the Apache Ignite Users mailing list archive at Nabble.com. >> > > > > -- > Best regards, > Andrey V. Mashenkov >
Re: SQL query is slow
It is possible returned dataset is too large and cause high network pressure that results in large query execution time. There is no recommendation for grid nodes count. Simple SQL queries can work slower on large grid as most of time is spent in inter-node communication. Heavy SQL queries may show better results on larger grid as every node will have smaller dataset. You can try to look at page memory statistics [1] to get estimate numbers. Really, there is an issue with large OFFSET as Ignite can't just skip entries and have to fetch all of them from nodes. OFFSET makes no sense without ORDER as Ignite fetch rows from other nodes in async way and row order should be preserved between such queries. OFFSET applies on query initiator node (reduce side) after results merged as there is no way to understand on map side what rows should be skiped. Looks like underlying H2 tries to use index scan, but I don't think index can help in case of functional condition. You can try to make Ignite to have inline values in index or use separate field with smaller type that can be inlined. By default, index inlining is enabled for 10 byte length values. See IGNITE_MAX_INDEX_PAYLOAD_SIZE_DEFAULT system property docs and [2]. [1] https://apacheignite.readme.io/v2.1/docs/memory-metrics [2] https://issues.apache.org/jira/browse/IGNITE-6060 On Tue, Aug 29, 2017 at 3:59 PM, mhetea <mihaela.he...@gmail.com> wrote: > Thank you for your response. > I used query parallelizm and the time reduced to ~2.3s, which is still too > much. > Regarding 1. is there any documentation about configuration parameters > (recommended number of nodes, how much data should be stored on each node). > We currently have 2 nodes with 32GB RAM each. Every 1 million records from > our cache occupy about 1GB (is there a way to see how much memory a cache > actually occupies? we look now at the Allocated next memory segment log > info) > For 3. it seems that the index is hit from the execution plan: > /* "productCache".IGNITEPRODUCT_MANUFACTURERCODE_IDX */ > No? > > We have this issue also when we use a large OFFSET (we execute this kind of > query because we want paginated results) > > Also, this cache will be updated frequently so we expect it to grow in > size. > > Thank you! > > > > -- > View this message in context: http://apache-ignite-users. > 70518.x6.nabble.com/SQL-query-is-slow-tp16475p16487.html > Sent from the Apache Ignite Users mailing list archive at Nabble.com. > -- Best regards, Andrey V. Mashenkov
Re: SQL query is slow
Hi, You can try to 1. add more nodes to reduce data set on each node. 2. use query parallelizm [1]. 3. add a indexed field to be used in WHERE clause instead of function. [1] https://apacheignite.readme.io/docs/sql-performance-and-debugging#query-parallelism On Tue, Aug 29, 2017 at 12:09 PM, mhetea <mihaela.he...@gmail.com> wrote: > We have a cache with ~3million entries > > We execute the following query: > select count(_key) from IgniteProduct as product where POSITION > ('Z',manufacturerCode)>0 > > The query takes about 7s though the index is hit. > > Query execution is too long [time=7103 ms, sql='SELECT > COUNT(PRODUCT__Z0._KEY) __C0_0 > FROM "productCache".IGNITEPRODUCT PRODUCT__Z0 > WHERE POSITION('Z', PRODUCT__Z0.MANUFACTURERCODE) > 0', plan= > SELECT > COUNT(PRODUCT__Z0._KEY) AS __C0_0 > FROM "productCache".IGNITEPRODUCT PRODUCT__Z0 > /* "productCache".IGNITEPRODUCT_MANUFACTURERCODE_IDX */ > WHERE POSITION('Z', PRODUCT__Z0.MANUFACTURERCODE) > 0 > , parameters=[]] > , parameters=[]] > > Can somebody help/offer an alternative? > > > > -- > View this message in context: http://apache-ignite-users. > 70518.x6.nabble.com/SQL-query-is-slow-tp16475.html > Sent from the Apache Ignite Users mailing list archive at Nabble.com. > -- Best regards, Andrey V. Mashenkov
SQL query is slow
We have a cache with ~3million entries We execute the following query: select count(_key) from IgniteProduct as product where POSITION ('Z',manufacturerCode)>0 The query takes about 7s though the index is hit. Query execution is too long [time=7103 ms, sql='SELECT COUNT(PRODUCT__Z0._KEY) __C0_0 FROM "productCache".IGNITEPRODUCT PRODUCT__Z0 WHERE POSITION('Z', PRODUCT__Z0.MANUFACTURERCODE) > 0', plan= SELECT COUNT(PRODUCT__Z0._KEY) AS __C0_0 FROM "productCache".IGNITEPRODUCT PRODUCT__Z0 /* "productCache".IGNITEPRODUCT_MANUFACTURERCODE_IDX */ WHERE POSITION('Z', PRODUCT__Z0.MANUFACTURERCODE) > 0 , parameters=[]] , parameters=[]] Can somebody help/offer an alternative? -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/SQL-query-is-slow-tp16475.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.