Re: GEODE joining two big tables and getting aggregation - Suggestions

Jason Huynh Tue, 01 May 2018 08:57:34 -0700

Great news!  I am still wondering how the query works when the alias r
isn't defined.  Is there a field named r in the object?  I'm probably
misreading something but that part of the query still confuses me.


On Mon, Apr 30, 2018 at 5:17 PM Mageswaran Muthukumar <
mages...@thoughtworks.com> wrote:

> Hi Wes / Jason,
>
> Thanks for getting back. This was sorted once we changed the index in the
> productId column from Key to Range Index and also we just queried both the
> tables joining them and did the computation of price * quantity and other
> columns in java after fetching it in server function. Now getting the query
> back fo 100K records in less than a second.
>
> Regards,
> Mags
>
> On Mon, Apr 30, 2018 at 9:02 PM, Jason Huynh <jhu...@pivotal.io> wrote:
>
>> Just to be sure, are the two regions are colocated by productId?
>>
>> Looking at the provided query, what is the r alias?  is it supposed to
>> be r1 or r2?
>> select sum(r.price.multiply(r.quantity)) from Region1 r1, Region2 r2
>> where r1.productId = r2.productId and r2.eventId = '123'
>>
>> The index on productId for both regions is really important and probably
>> needs to be there for fastest execution.
>>
>> If you continue to get this problem after fixing the alias, you can break
>> up the query as you described, by querying against r2.eventId and running a
>> separate query against r1 if that works out better.  You should be able to
>> do that in a function and target only local data by running the
>> query.execute(context) method and passing in the function context.
>>
>> I would be interested in knowing if fixing the alias fixes the problem or
>> not.
>> Using r.price.multiply you are calling a method on the object.  I think
>> this causes a deserialization of the object.  I would look there
>> afterwards, if the change to the alias and index do not solve the issue.
>>
>>
>> On Sat, Apr 28, 2018 at 5:33 AM Real Wes <thereal...@outlook.com> wrote:
>>
>>> Based on your description the query should be fast but obviously
>>> something is wrong and from the facts that you conveyed, there is not
>>> enough info to determine the problem.  It could be GC’s, etc., who knows
>>> what. However, here is the general way I approach this scenario:
>>>
>>> Put <trace> and <hint …> before the “select” in your query to guide it
>>> to use the index. It will tell you whether it used the index or not and
>>> exactly how long it took.
>>>
>>> Make productId + marketId the key of Region1, separated by a “|”.
>>> Colocate Region2 with Region1 based on productId + “|” + marketingId.
>>> Consider a key of productId + “|” + marketingId + “|” + counter.
>>> Index productId.
>>> That will work.
>>>
>>> If you don’t want to colocate, consider making your smaller region2 a
>>> replicate region with Region1 partitioned.  Index on productId.
>>>
>>> Wes Williams
>>>
>>> On Apr 28, 2018, at 2:16 AM, Mageswaran Muthukumar <
>>> mages...@thoughtworks.com> wrote:
>>>
>>> Hi,
>>>
>>> I am new to Geode and trying to implement for an use case to get an
>>> aggregation of the data from two regions.
>>>
>>> One region has the data received from the source system one and updated
>>> on daily basis and has around 2.7 million records and based on market. I am
>>> planning to setup this region as partitioned based on market data in two
>>> servers.
>>>
>>> One region has the data received from the source system two and has
>>> around 1 million records and it is also partitioned based on market
>>> data.
>>>
>>> The requirement is to get aggregate joining both the regions with a key
>>> common to both and provide a where clause
>>>
>>> Region1 Attributes
>>>
>>> productId
>>> marketId
>>> price
>>> quantity
>>>
>>> Region2 Attributes
>>>
>>> productId
>>> marketId
>>> eventId
>>>
>>> select sum(r.price.multiply(r.quantity)) from Region1 r1, Region2 r2
>>> where r1.productId = r2.productId and r2.eventId = '123'
>>>
>>> Currently trying to run the above query as the server function just
>>> ingesting 50K records for both the region though it is taking huge time and
>>> can see spike in the CPU usage. Already using PDX serialization for this
>>> and tried creating index for productId in both the regions
>>>
>>> Any suggestion to improve the performance of this query. Also
>>> please advice if we need to use joins for this or we need to fetch region2
>>> records based on eventId and do a lookup for region1 do the aggregation in
>>> server function.
>>>
>>> Thanks!
>>> Mags
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>

Re: GEODE joining two big tables and getting aggregation - Suggestions

Reply via email to