Thanks against for the detailed response!

Our main use case is preforming large SQL queries over tables with 200M+
rows  - wanted to give you a bit more details and context you can pass along

A simple example would be:

   - Table: customer_id, date, category, views, clicks ( pkey =
   "customer_id, date", affinity key = date )
   - Query: SELECT count(*) where date < X AND categroy in (C1, C2, C3)
   GROUP BY customer_id HAVING SUM(views) > 20

My main concernse are
1) How is the group by performed. You mentioend that it is performend on
the coordinator, I was coping that singe we are grouping using an colomn
that is an affintiy key, each node will be able to do it's own group by
2) How much heap should I allocate for the group by stage
3) General performance tips

Cheers,
Eugene


On Thu, Aug 30, 2018 at 1:32 AM Denis Magda <[email protected]> wrote:

> Eugene,
>
> Just want to be sure you know about the existence of the following pages
> which elaborate on Ignite memory architecture in details:
>
>    -
>    
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood#IgniteDurableMemory-underthehood-Entriesandpagesindurablememory
>    -
>    
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>
>
>
>> 1) Are indexs loaded into heap (when used)?
>>
>
> Something might be copied to disk but in most of the cases we perform
> comparisons and other operations directly off-heap.
> See org.apache.ignite.internal.processors.query.h2.database.InlineIndexHelper
> and related classes.
>
> 2) Are full pages loaded into heap, or only the matching records?
>>
>
> Matching records (result set) are presently loaded. The pages are not.
>
>
>> 3) When the query needs more processing than the exisiting index
>> (non-indexed columns, groupBy, aggreag) where/how does it happen?
>>
>
> We will be doing a full scan. Grouping and aggregations are finalized on
> the query coordinator which needs to get a full result set.
>
> 4) How is the query coordinator chosen? Is it the client node? How about
>> when using the web console?
>>
>
> That's your application. Web Console uses Ignite SQL APIs as well.
>
>
>> 5) What paralalism settings would your recomend, we were thinking to set
>> parallelJobsNumber  to 1  and task parallelism to number of cores * 2 -
>> this way we can make sure that each job gets al the heap memory instead of
>> all jobs fighting each other. Not sure if it makes sense, and it will also
>> prevent us from making real time transactional transactional queries.(we
>> are hoping to use ignite for both olap and simple real time queries)
>
>
> I would start a separate discussion for this bringing this question to the
> attention of our SQL experts. I'm not the one of them.
>
> --
> Denis
>
> On Mon, Aug 27, 2018 at 8:54 PM eugene miretsky <[email protected]>
> wrote:
>
>> Denis, thanks for the detailed response.
>>
>> A few more follow up questions
>> 1) Are indexs loaded into heap (when used)?
>> 2) Are full pages loaded into heap, or only the matching records?
>> 3) When the query needs more processing than the exisiting index
>> (non-indexed columns, groupBy, aggreag) where/how does it happen?
>> 4) How is the query coordinator chosen? Is it the client node? How about
>> when using the web console?
>> 5) What paralalism settings would your recomend, we were thinking to set
>> parallelJobsNumber  to 1  and task parallelism to number of cores * 2 -
>> this way we can make sure that each job gets al the heap memory instead of
>> all jobs fighting each other. Not sure if it makes sense, and it will also
>> prevent us from making real time transactional transactional queries.(we
>> are hoping to use ignite for both olap and simple real time queries)
>>
>> Cheers,
>> Eugene
>>
>>
>> On Sat, Aug 25, 2018 at 3:25 AM Denis Magda <[email protected]> wrote:
>>
>>> Hello Eugene,
>>>
>>> 1) In what format is data stored off heap?
>>>
>>>
>>> Data is always stored in the binary format let it be on-heap, off-heap
>>> or Ignite persistence.
>>> https://apacheignite.readme.io/docs/binary-marshaller
>>>
>>> 2) What happens when a SQL query is executed, in particular
>>>
>>>>
>>>>    - How is H2 used? How is data loaded in H2? What if some of the
>>>>    data is on disk?
>>>>
>>>> H2 is used to build execution plans for SELECTs. H2 calls Ignite's
>>> B+Tree based indexing implementation to see which indexes are set. All the
>>> data and indexes are always stored in Ignite (off-heap + disk).
>>>
>>>>
>>>>    - When is data loaded into heap, and how much? Is only the output
>>>>    of H2 loaded, or everything?
>>>>
>>>> Queries results are stored in Java heap temporarily. Once the result
>>> set is read by your application, it will be garbage collected.
>>>
>>>>
>>>>    - How is the reduce stage performed? Is it performed only on one
>>>>    node (hence that node needs to load all the data into memory)
>>>>
>>>> Correct, the final result set is reduced on a query coordinator - your
>>> application that executed a SELECT.
>>>
>>> 3) What happens when Ingite runs out of memory during execution? Is data
>>>> evictied to disk (if persistence is enabled)?
>>>
>>>
>>> I guess you mean what happens if a result set doesn't fit in RAM during
>>> the execution, right? If so, then OOM will occur. We're working on an
>>> improvement that will offload the result set to disk to avoid OOM for all
>>> the scenarious:
>>> https://issues.apache.org/jira/browse/IGNITE-7526
>>>
>>>
>>>
>>>> 4) Based on the code, it looks like I need to set my data region size
>>>> to at most 50% of available memory (to avoid the warning), this seems a bit
>>>> wastefull.
>>>
>>>
>>> There is no such a requirement. I know many deployments use cases when
>>> one data region is given 20% of RAM, the other is given 40% and everything
>>> else is persisted to disk.
>>>
>>> 5) Do you have any general advice on benchmarking the memory
>>>> requirpement? So far I have not been able to find a way to check how much
>>>> memory each table takes on and off heap, and how much memory each query
>>>> takes.
>>>
>>>
>>> We use Yardstick for performance benchmarking:
>>> https://apacheignite.readme.io/docs/perfomance-benchmarking
>>>
>>> --
>>> Denis
>>>
>>> On Fri, Aug 24, 2018 at 7:06 AM eugene miretsky <
>>> [email protected]> wrote:
>>>
>>>> Thanks!
>>>>
>>>> I am trying to understand when and how data is moved from off-heap to
>>>> on heap, particularly when using SQL.  I took a look at the wiki
>>>> <https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Durable+Memory+-+under+the+hood>
>>>>  but
>>>> still have a few questions
>>>>
>>>> My understanding is that data is always store off-heap
>>>>
>>>> 1) In what format is data stored off heap?
>>>> 2) What happens when a SQL query is executed, in particular
>>>>
>>>>    - How is H2 used? How is data loaded in H2? What if some of the
>>>>    data is on disk?
>>>>    - When is data loaded into heap, and how much? Is only the output
>>>>    of H2 loaded, or everything?
>>>>    - How is the reduce stage performed? Is it performed only on one
>>>>    node (hence that node needs to load all the data into memory)
>>>>
>>>> 3) What happens when Ingite runs out of memory during execution? Is
>>>> data evictied to disk (if persistence is enabled)?
>>>> 4) Based on the code, it looks like I need to set my data region size
>>>> to at most 50% of available memory (to avoid the warning), this seems a bit
>>>> wastefull.
>>>> 5) Do you have any general advice on benchmarking the memory
>>>> requirpement? So far I have not been able to find a way to check how much
>>>> memory each table takes on and off heap, and how much memory each query
>>>> takes.
>>>>
>>>> Cheers,
>>>> Eugene
>>>>
>>>> On Fri, Aug 24, 2018 at 8:06 AM, NSAmelchev <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Eugene,
>>>>>
>>>>> Yes, it's a misprint as Dmitry wrote.
>>>>>
>>>>> Ignite print this warning if nodes on local machine require more than
>>>>> 80% of
>>>>> physical RAM.
>>>>>
>>>>> From code, you can see that total heap/offheap memory summing
>>>>> from nodes having the same mac address. This way calculates total
>>>>> memory
>>>>> used
>>>>> by the local machine.
>>>>>
>>>>> --
>>>>> Best wishes,
>>>>> Amelchev Nikita
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>>>>
>>>>
>>>>

Reply via email to