Great news! I am still wondering how the query works when the alias r isn't defined. Is there a field named r in the object? I'm probably misreading something but that part of the query still confuses me.
On Mon, Apr 30, 2018 at 5:17 PM Mageswaran Muthukumar < mages...@thoughtworks.com> wrote: > Hi Wes / Jason, > > Thanks for getting back. This was sorted once we changed the index in the > productId column from Key to Range Index and also we just queried both the > tables joining them and did the computation of price * quantity and other > columns in java after fetching it in server function. Now getting the query > back fo 100K records in less than a second. > > Regards, > Mags > > On Mon, Apr 30, 2018 at 9:02 PM, Jason Huynh <jhu...@pivotal.io> wrote: > >> Just to be sure, are the two regions are colocated by productId? >> >> Looking at the provided query, what is the r alias? is it supposed to >> be r1 or r2? >> select sum(r.price.multiply(r.quantity)) from Region1 r1, Region2 r2 >> where r1.productId = r2.productId and r2.eventId = '123' >> >> The index on productId for both regions is really important and probably >> needs to be there for fastest execution. >> >> If you continue to get this problem after fixing the alias, you can break >> up the query as you described, by querying against r2.eventId and running a >> separate query against r1 if that works out better. You should be able to >> do that in a function and target only local data by running the >> query.execute(context) method and passing in the function context. >> >> I would be interested in knowing if fixing the alias fixes the problem or >> not. >> Using r.price.multiply you are calling a method on the object. I think >> this causes a deserialization of the object. I would look there >> afterwards, if the change to the alias and index do not solve the issue. >> >> >> On Sat, Apr 28, 2018 at 5:33 AM Real Wes <thereal...@outlook.com> wrote: >> >>> Based on your description the query should be fast but obviously >>> something is wrong and from the facts that you conveyed, there is not >>> enough info to determine the problem. It could be GC’s, etc., who knows >>> what. However, here is the general way I approach this scenario: >>> >>> Put <trace> and <hint …> before the “select” in your query to guide it >>> to use the index. It will tell you whether it used the index or not and >>> exactly how long it took. >>> >>> Make productId + marketId the key of Region1, separated by a “|”. >>> Colocate Region2 with Region1 based on productId + “|” + marketingId. >>> Consider a key of productId + “|” + marketingId + “|” + counter. >>> Index productId. >>> That will work. >>> >>> If you don’t want to colocate, consider making your smaller region2 a >>> replicate region with Region1 partitioned. Index on productId. >>> >>> Wes Williams >>> >>> On Apr 28, 2018, at 2:16 AM, Mageswaran Muthukumar < >>> mages...@thoughtworks.com> wrote: >>> >>> Hi, >>> >>> I am new to Geode and trying to implement for an use case to get an >>> aggregation of the data from two regions. >>> >>> One region has the data received from the source system one and updated >>> on daily basis and has around 2.7 million records and based on market. I am >>> planning to setup this region as partitioned based on market data in two >>> servers. >>> >>> One region has the data received from the source system two and has >>> around 1 million records and it is also partitioned based on market >>> data. >>> >>> The requirement is to get aggregate joining both the regions with a key >>> common to both and provide a where clause >>> >>> Region1 Attributes >>> >>> productId >>> marketId >>> price >>> quantity >>> >>> Region2 Attributes >>> >>> productId >>> marketId >>> eventId >>> >>> select sum(r.price.multiply(r.quantity)) from Region1 r1, Region2 r2 >>> where r1.productId = r2.productId and r2.eventId = '123' >>> >>> Currently trying to run the above query as the server function just >>> ingesting 50K records for both the region though it is taking huge time and >>> can see spike in the CPU usage. Already using PDX serialization for this >>> and tried creating index for productId in both the regions >>> >>> Any suggestion to improve the performance of this query. Also >>> please advice if we need to use joins for this or we need to fetch region2 >>> records based on eventId and do a lookup for region1 do the aggregation in >>> server function. >>> >>> Thanks! >>> Mags >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >