Eugene,

Just want to be sure you know about the existence of the following pages
which elaborate on Ignite memory architecture in details:

   -
   
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Entriesandpagesindurablememory
   -
   
https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood



> 1) Are indexs loaded into heap (when used)?
>

Something might be copied to disk but in most of the cases we perform
comparisons and other operations directly off-heap.
See org.apache.ignite.internal.processors.query.h2.database.InlineIndexHelper
and related classes.

2) Are full pages loaded into heap, or only the matching records?
>

Matching records (result set) are presently loaded. The pages are not.


> 3) When the query needs more processing than the exisiting index
> (non-indexed columns, groupBy, aggreag) where/how does it happen?
>

We will be doing a full scan. Grouping and aggregations are finalized on
the query coordinator which needs to get a full result set.

4) How is the query coordinator chosen? Is it the client node? How about
> when using the web console?
>

That's your application. Web Console uses Ignite SQL APIs as well.


> 5) What paralalism settings would your recomend, we were thinking to set
> parallelJobsNumber  to 1  and task parallelism to number of cores * 2 -
> this way we can make sure that each job gets al the heap memory instead of
> all jobs fighting each other. Not sure if it makes sense, and it will also
> prevent us from making real time transactional transactional queries.(we
> are hoping to use ignite for both olap and simple real time queries)


I would start a separate discussion for this bringing this question to the
attention of our SQL experts. I'm not the one of them.

--
Denis

On Mon, Aug 27, 2018 at 8:54 PM eugene miretsky <eugene.miret...@gmail.com>
wrote:

> Denis, thanks for the detailed response.
>
> A few more follow up questions
> 1) Are indexs loaded into heap (when used)?
> 2) Are full pages loaded into heap, or only the matching records?
> 3) When the query needs more processing than the exisiting index
> (non-indexed columns, groupBy, aggreag) where/how does it happen?
> 4) How is the query coordinator chosen? Is it the client node? How about
> when using the web console?
> 5) What paralalism settings would your recomend, we were thinking to set
> parallelJobsNumber  to 1  and task parallelism to number of cores * 2 -
> this way we can make sure that each job gets al the heap memory instead of
> all jobs fighting each other. Not sure if it makes sense, and it will also
> prevent us from making real time transactional transactional queries.(we
> are hoping to use ignite for both olap and simple real time queries)
>
> Cheers,
> Eugene
>
>
> On Sat, Aug 25, 2018 at 3:25 AM Denis Magda <dma...@apache.org> wrote:
>
>> Hello Eugene,
>>
>> 1) In what format is data stored off heap?
>>
>>
>> Data is always stored in the binary format let it be on-heap, off-heap or
>> Ignite persistence.
>> https://apacheignite.readme.io/docs/binary-marshaller
>>
>> 2) What happens when a SQL query is executed, in particular
>>
>>>
>>>    - How is H2 used? How is data loaded in H2? What if some of the
>>>    data is on disk?
>>>
>>> H2 is used to build execution plans for SELECTs. H2 calls Ignite's
>> B+Tree based indexing implementation to see which indexes are set. All the
>> data and indexes are always stored in Ignite (off-heap + disk).
>>
>>>
>>>    - When is data loaded into heap, and how much? Is only the output of
>>>    H2 loaded, or everything?
>>>
>>> Queries results are stored in Java heap temporarily. Once the result set
>> is read by your application, it will be garbage collected.
>>
>>>
>>>    - How is the reduce stage performed? Is it performed only on one
>>>    node (hence that node needs to load all the data into memory)
>>>
>>> Correct, the final result set is reduced on a query coordinator - your
>> application that executed a SELECT.
>>
>> 3) What happens when Ingite runs out of memory during execution? Is data
>>> evictied to disk (if persistence is enabled)?
>>
>>
>> I guess you mean what happens if a result set doesn't fit in RAM during
>> the execution, right? If so, then OOM will occur. We're working on an
>> improvement that will offload the result set to disk to avoid OOM for all
>> the scenarious:
>> https://issues.apache.org/jira/browse/IGNITE-7526
>>
>>
>>
>>> 4) Based on the code, it looks like I need to set my data region size to
>>> at most 50% of available memory (to avoid the warning), this seems a bit
>>> wastefull.
>>
>>
>> There is no such a requirement. I know many deployments use cases when
>> one data region is given 20% of RAM, the other is given 40% and everything
>> else is persisted to disk.
>>
>> 5) Do you have any general advice on benchmarking the memory
>>> requirpement? So far I have not been able to find a way to check how much
>>> memory each table takes on and off heap, and how much memory each query
>>> takes.
>>
>>
>> We use Yardstick for performance benchmarking:
>> https://apacheignite.readme.io/docs/perfomance-benchmarking
>>
>> --
>> Denis
>>
>> On Fri, Aug 24, 2018 at 7:06 AM eugene miretsky <
>> eugene.miret...@gmail.com> wrote:
>>
>>> Thanks!
>>>
>>> I am trying to understand when and how data is moved from off-heap to on
>>> heap, particularly when using SQL.  I took a look at the wiki
>>> <https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood>
>>>  but
>>> still have a few questions
>>>
>>> My understanding is that data is always store off-heap
>>>
>>> 1) In what format is data stored off heap?
>>> 2) What happens when a SQL query is executed, in particular
>>>
>>>    - How is H2 used? How is data loaded in H2? What if some of the
>>>    data is on disk?
>>>    - When is data loaded into heap, and how much? Is only the output of
>>>    H2 loaded, or everything?
>>>    - How is the reduce stage performed? Is it performed only on one
>>>    node (hence that node needs to load all the data into memory)
>>>
>>> 3) What happens when Ingite runs out of memory during execution? Is data
>>> evictied to disk (if persistence is enabled)?
>>> 4) Based on the code, it looks like I need to set my data region size to
>>> at most 50% of available memory (to avoid the warning), this seems a bit
>>> wastefull.
>>> 5) Do you have any general advice on benchmarking the memory
>>> requirpement? So far I have not been able to find a way to check how much
>>> memory each table takes on and off heap, and how much memory each query
>>> takes.
>>>
>>> Cheers,
>>> Eugene
>>>
>>> On Fri, Aug 24, 2018 at 8:06 AM, NSAmelchev <nsamelc...@gmail.com>
>>> wrote:
>>>
>>>> Hi Eugene,
>>>>
>>>> Yes, it's a misprint as Dmitry wrote.
>>>>
>>>> Ignite print this warning if nodes on local machine require more than
>>>> 80% of
>>>> physical RAM.
>>>>
>>>> From code, you can see that total heap/offheap memory summing
>>>> from nodes having the same mac address. This way calculates total memory
>>>> used
>>>> by the local machine.
>>>>
>>>> --
>>>> Best wishes,
>>>> Amelchev Nikita
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>>
>>>
>>>

Reply via email to