Eugene, Just want to be sure you know about the existence of the following pages which elaborate on Ignite memory architecture in details:
- https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Entriesandpagesindurablememory - https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood > 1) Are indexs loaded into heap (when used)? > Something might be copied to disk but in most of the cases we perform comparisons and other operations directly off-heap. See org.apache.ignite.internal.processors.query.h2.database.InlineIndexHelper and related classes. 2) Are full pages loaded into heap, or only the matching records? > Matching records (result set) are presently loaded. The pages are not. > 3) When the query needs more processing than the exisiting index > (non-indexed columns, groupBy, aggreag) where/how does it happen? > We will be doing a full scan. Grouping and aggregations are finalized on the query coordinator which needs to get a full result set. 4) How is the query coordinator chosen? Is it the client node? How about > when using the web console? > That's your application. Web Console uses Ignite SQL APIs as well. > 5) What paralalism settings would your recomend, we were thinking to set > parallelJobsNumber to 1 and task parallelism to number of cores * 2 - > this way we can make sure that each job gets al the heap memory instead of > all jobs fighting each other. Not sure if it makes sense, and it will also > prevent us from making real time transactional transactional queries.(we > are hoping to use ignite for both olap and simple real time queries) I would start a separate discussion for this bringing this question to the attention of our SQL experts. I'm not the one of them. -- Denis On Mon, Aug 27, 2018 at 8:54 PM eugene miretsky <eugene.miret...@gmail.com> wrote: > Denis, thanks for the detailed response. > > A few more follow up questions > 1) Are indexs loaded into heap (when used)? > 2) Are full pages loaded into heap, or only the matching records? > 3) When the query needs more processing than the exisiting index > (non-indexed columns, groupBy, aggreag) where/how does it happen? > 4) How is the query coordinator chosen? Is it the client node? How about > when using the web console? > 5) What paralalism settings would your recomend, we were thinking to set > parallelJobsNumber to 1 and task parallelism to number of cores * 2 - > this way we can make sure that each job gets al the heap memory instead of > all jobs fighting each other. Not sure if it makes sense, and it will also > prevent us from making real time transactional transactional queries.(we > are hoping to use ignite for both olap and simple real time queries) > > Cheers, > Eugene > > > On Sat, Aug 25, 2018 at 3:25 AM Denis Magda <dma...@apache.org> wrote: > >> Hello Eugene, >> >> 1) In what format is data stored off heap? >> >> >> Data is always stored in the binary format let it be on-heap, off-heap or >> Ignite persistence. >> https://apacheignite.readme.io/docs/binary-marshaller >> >> 2) What happens when a SQL query is executed, in particular >> >>> >>> - How is H2 used? How is data loaded in H2? What if some of the >>> data is on disk? >>> >>> H2 is used to build execution plans for SELECTs. H2 calls Ignite's >> B+Tree based indexing implementation to see which indexes are set. All the >> data and indexes are always stored in Ignite (off-heap + disk). >> >>> >>> - When is data loaded into heap, and how much? Is only the output of >>> H2 loaded, or everything? >>> >>> Queries results are stored in Java heap temporarily. Once the result set >> is read by your application, it will be garbage collected. >> >>> >>> - How is the reduce stage performed? Is it performed only on one >>> node (hence that node needs to load all the data into memory) >>> >>> Correct, the final result set is reduced on a query coordinator - your >> application that executed a SELECT. >> >> 3) What happens when Ingite runs out of memory during execution? Is data >>> evictied to disk (if persistence is enabled)? >> >> >> I guess you mean what happens if a result set doesn't fit in RAM during >> the execution, right? If so, then OOM will occur. We're working on an >> improvement that will offload the result set to disk to avoid OOM for all >> the scenarious: >> https://issues.apache.org/jira/browse/IGNITE-7526 >> >> >> >>> 4) Based on the code, it looks like I need to set my data region size to >>> at most 50% of available memory (to avoid the warning), this seems a bit >>> wastefull. >> >> >> There is no such a requirement. I know many deployments use cases when >> one data region is given 20% of RAM, the other is given 40% and everything >> else is persisted to disk. >> >> 5) Do you have any general advice on benchmarking the memory >>> requirpement? So far I have not been able to find a way to check how much >>> memory each table takes on and off heap, and how much memory each query >>> takes. >> >> >> We use Yardstick for performance benchmarking: >> https://apacheignite.readme.io/docs/perfomance-benchmarking >> >> -- >> Denis >> >> On Fri, Aug 24, 2018 at 7:06 AM eugene miretsky < >> eugene.miret...@gmail.com> wrote: >> >>> Thanks! >>> >>> I am trying to understand when and how data is moved from off-heap to on >>> heap, particularly when using SQL. I took a look at the wiki >>> <https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood> >>> but >>> still have a few questions >>> >>> My understanding is that data is always store off-heap >>> >>> 1) In what format is data stored off heap? >>> 2) What happens when a SQL query is executed, in particular >>> >>> - How is H2 used? How is data loaded in H2? What if some of the >>> data is on disk? >>> - When is data loaded into heap, and how much? Is only the output of >>> H2 loaded, or everything? >>> - How is the reduce stage performed? Is it performed only on one >>> node (hence that node needs to load all the data into memory) >>> >>> 3) What happens when Ingite runs out of memory during execution? Is data >>> evictied to disk (if persistence is enabled)? >>> 4) Based on the code, it looks like I need to set my data region size to >>> at most 50% of available memory (to avoid the warning), this seems a bit >>> wastefull. >>> 5) Do you have any general advice on benchmarking the memory >>> requirpement? So far I have not been able to find a way to check how much >>> memory each table takes on and off heap, and how much memory each query >>> takes. >>> >>> Cheers, >>> Eugene >>> >>> On Fri, Aug 24, 2018 at 8:06 AM, NSAmelchev <nsamelc...@gmail.com> >>> wrote: >>> >>>> Hi Eugene, >>>> >>>> Yes, it's a misprint as Dmitry wrote. >>>> >>>> Ignite print this warning if nodes on local machine require more than >>>> 80% of >>>> physical RAM. >>>> >>>> From code, you can see that total heap/offheap memory summing >>>> from nodes having the same mac address. This way calculates total memory >>>> used >>>> by the local machine. >>>> >>>> -- >>>> Best wishes, >>>> Amelchev Nikita >>>> >>>> >>>> >>>> -- >>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >>>> >>> >>>