Re: Inputs for efficient querying

2021-02-09 Thread ankit Soni
Can some one pls guide how  functionality like "BETWEEN" operator can be
achieved using geode OQL (for Date fields).

Thanks
Ankit

On Tue, Feb 9, 2021, 11:53 PM ankit Soni  wrote:

> Thanks Dan for your input. I am able to try this at my end and it's
> working as expected.
>
> As a next steps I need to support somewhat complex queries, so updated a
> ValueObject, like
>
> public class ValueObject implements PdxSerializable {
> private static final long serialVersionUID = -754645299372860596L;
> private int versionId;   //1 for latest record; 0 for previous latest
> private String date;
> private String col_1;
> private String col_2;
> private String col_3;
> private String type;
> private Map map;
>
> public ValueObject() {
> }
>
> *Region* : *Key:* random-string and *Value:*
> ValueObject
>
> Need to support queries that fetches columns for *latest record (whose
> versionId is Max)* with filters and aggregation like
> "SELECT date, col_1, col_2, col_3<---Must be fetched from a
> row where MAX(versionId)
>  FROM /data-region d
>  WHERE d.type='t1'
>  AND d.date BETWEEN '2021-01-11' AND '2021-01-12'
> AND d.vesionId BETWEEN 0 AND 1
> AND d.col_1 IN SET ('11', '22')
> GROUP BY d.col_1"
>
> *Team, Kindly guide on following,*
> 1. How can i form an OQL query (syntax) to fetch the latest row based on
> MAX(versionId).
> 2. It seems *BETWEEN* support is not available, how can this be achieved.
> 3. What should be the* recommended index creation here*, for this query
> to gain fast performance.
> 4. Any recommendation for Key, currently it's a random string.
>
> Any suggestions on above will be really helpful.
>
> Thank you
> Ankit.
>
> On Fri, 29 Jan 2021 at 23:23, Dan Smith  wrote:
>
>> For the best performance, you should store column2 as a java Map instead
>> of a String which contains a json document. If column2 was Map> String>, you could do a query like this:
>>
>>
>> SELECT * FROM /exampleRegion r WHERE r.column2['k1'] IN SET('v10', 'v15',
>> 'v7')"
>>
>> You can create an index on the map to optimize this sort of query
>>
>> gfsh>create index --name="IndexName" --expression="r.column2[*]"
>> --region="/exampleRegion r"
>>
>> This page might be helpful
>>
>>
>> https://geode.apache.org/docs/guide/112/developing/query_index/creating_map_indexes.html
>>
>> In addition, I noticed that your value implements Serializable. You will
>> get better performance out of the query engine if you configure PDX
>> serialization for your object, either by configuring the auto serializer or
>> implementing PdxSerializable. That avoids the need to deserialize your
>> entire value on the server to query/index it.
>>
>> -Dan
>>
>>
>> 
>> From: ankit Soni 
>> Sent: Friday, January 29, 2021 9:32 AM
>> To: dev@geode.apache.org 
>> Subject: Inputs for efficient querying
>>
>> Hello Team,
>>
>> I am loading data into Geode (V 1.12) with the following *Key (of type
>> String)* and *value (custom java object - ValueObject)*.
>>
>> *public class ValueObject implements Serializable {*
>> * private int id;*
>>
>> * private String keyColumn;   <- Region.Key *
>>
>> * private String column_2; <-- Json document*
>> * private String column_3;*
>>
>> * private String column_4*
>>
>>
>>
>> * //few more string type members*
>> *}*
>>
>> *Keycolum* is a normal string of around 8 chars, like "12345678",
>> "23456789" etc...
>>
>> *In ValueObject, column_2 is of type string and having a values of type
>> valid JSON doc as bellow; *
>> {"k1" : "v1", "k3" : "v3", "k6" : "v6", "k7" : "v7", *"k10" : "v7"*, "k12"
>> : "v12", "k13" : "v13"}
>> {"k2" : "v2", "k3" : "v3", "k4" : "v4", "k6" : "v6", *"k10" : "v10"*,
>> "k13"
>> : "v13", "k14" : "v14"}
>> {"k1" : "v1", "k2" : "v2", "k6" : "v6", "k8" : "v8", "k10" : "v7", "k12" :
>> "v12", "k13" : "v13", "k14" : "v14"}
>> .
>>
>> after storing the data in Geode i need to run following two queries.
>> *Query to be supported.*
>>
>>
>> *Q1. //query with filter on keyColumn*
>> "select d.keyColumn, d.column_2, d.column_3, d.column_4
>> from /DATA_REGION.keyset key
>> where (key IN SET('12345678', '23456789', '34567890'))"
>>
>>
>> *Q2. //query with filter on column_2 attribute, something like "where
>> d.column_2.k10 IN SET('v10', 'v15', 'v7'); *
>> "select d.keyColumn, d.column_2, d.column_3, d.column_4
>> from /DATA_REGION v
>> where v.column_2.k10 INSET('v10', 'v15', 'v7')"
>>
>> I am able to run the Q1 but not sure *how to achieve Q2 (form a OQL for
>> this case)*...?
>>
>> Request team to help, how can i efficiently form and execute above kind of
>> queries with geode OQL...?
>>
>> Also advise, what kind of index are recommended to get higher query
>> performance for above queries...?
>>
>> Thanks
>> Ankit.
>>
>


I propose to remove ProductUseLog from Geode

2021-02-09 Thread Bruce Schuchardt
This is a little known class that creates a file in a Locator’s directory that 
logs who is in the cluster and a bit about the load on servers.  These 
artifacts have never been useful for debugging problems and there’s no reason 
to keep the log.  All of the information it provides is already in stats and 
the regular logs.

Here’s an example of the server information it provides:

[info 2021/02/02 04:11:24.491 PST :41003 shared ordered uid=10 
port=45440> tid=0x5f] server count: 5 connected client count: 4 client 
subscription queue count: 40
  current servers : bilbo(server1_host1_19897:19897):41001 
bilbo(server2_host1_19901:19901):41002 
bilbo(server3_host1_19907:19907):41004 
bilbo(server4_host1_19918:19918):41003 
bilbo(server5_host1_19941:19941):41005

Does anyone object to my removing it?


Re: Geode Quarterly Report DRAFT for your review

2021-02-09 Thread Karen Miller
I've filed the Geode report.  Thanks, Dave!
Karen Miller
I work for VMware.
This email is written in my capacity as Chair of the Apache Geode PMC.

On Mon, Feb 8, 2021 at 6:47 PM Dave Barnes  wrote:
>
> The numbers come from Apache - I'm not sure how they count 'em up. It was
> our practice for the first years of opensourcehood to appoint people to
> both roles, but that's no longer automatic.
>  I'm going with 110 / 53 ~= 2. Not perfectly in tune, but close enough for
> rock'n'roll.
>
> On Mon, Feb 8, 2021 at 4:17 PM Mark Hanson  wrote:
>
> > Hi Dave,
> >
> > Does the 110 committers number include PMC members? If so, that 2:1
> > doesn't line up. It would be more like 1:1.
> >
> > Thanks,
> > Mark
> >
> > On 2/8/21, 12:13 PM, "Dave Barnes"  wrote:
> >
> > Revised Committer-to-PMC ratio to 2:1 (thanks, Karen)
> >
> > On Mon, Feb 8, 2021 at 12:09 PM Dave Barnes 
> > wrote:
> >
> > > Please respond by noon (PT) Tuesday. Thanks!
> > >
> > > ## Description:
> > >
> > > The mission of Apache Geode is the creation and maintenance of
> > software
> > > related
> > >
> > > to a data management platform that provides real-time, consistent
> > access to
> > >
> > > data-intensive applications throughout widely distributed cloud
> > > architectures.
> > >
> > >
> > > ## Issues:
> > >
> > > There are no issues requiring board attention.
> > >
> > >
> > > ## Membership Data:
> > >
> > > Apache Geode was founded 2016-11-15 (4 years ago)
> > >
> > > There are currently 110 committers and 53 PMC members in this
> > project.
> > >
> > > The Committer-to-PMC ratio is roughly 7:4.
> > >
> > >
> > > Community changes, past quarter:
> > >
> > > - No new PMC members. One member resigned due to retirement (but
> > remains a
> > > committer).
> > >
> > > - No new committers. Two candidates have been proposed and
> > discussed, and
> > > are in the voting process.
> > >
> > >
> > > ## Project Activity:
> > >
> > > - Apache Geode v1.13.1 was released on 2020-11-18.
> > >
> > > - We're actively working on v1.14, which will contain many bug fixes.
> > >
> > > - PMC Member Barry Oglesby published two articles this quarter:
> > >
> > >   - ["Calculating Apache Geode GatewaySender Event Queue,
> > Transmission
> > > and Processing Times"](
> > >
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmedium.com%2Fswlh%2Fcalculating-apache-geode-gatewaysender-event-queue-transmission-and-p%2Fdata=04%7C01%7Chansonm%40vmware.com%7C3bf16ee091fc47b456ed08d8cc6df06f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637484119850376975%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=obCBhMTpwpEzmcj239Ju5MCgld5eyMm3XoQpZ0m4zVU%3Dreserved=0
> > >
> > > rocessing-times-c39839bd45a7) in November, 2020.
> > >
> > >   - ["Calculating the Size of an Apache Geode GatewaySender Queue"](
> > >
> > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmedium.com%2F%40boglesby_2508%2Fcalculating-the-size-of-an-apache-geode-gatewaysender-queue-7c41e2f6ba83data=04%7C01%7Chansonm%40vmware.com%7C3bf16ee091fc47b456ed08d8cc6df06f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637484119850376975%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=6gANHVqIfSg0jv6dPRPk1FLhJ8aFm2lg1DSbyN89FRk%3Dreserved=0
> > )
> > > in January,\
> > >
> > >  2021.
> > >
> > >
> > > ## Community Health:
> > >
> > > - 211 issues opened in JIRA, past quarter (-26%)
> > >
> > > - 203 issues closed in JIRA, past quarter (-15%)
> > >
> > > - 416 commits in the past quarter (-21% decrease)
> > >
> > > - 56 code contributors in the past quarter (-1%)
> > >
> > > - 322 PRs opened on GitHub, past quarter (-1%)
> > >
> > > - 321 PRs closed on GitHub, past quarter (-1%)
> > >
> >
> >


[DISCUSS] Cutting a Geode 1.14.0 release branch

2021-02-09 Thread Alexander Murmann
Hi everyone,

We aren't seeing many issues that would prevent a 1.14.0 release on our blocker 
board
 and all issues have an owner. No new issues seem to be coming in either. This 
seems like a good time to finally cut a 1.14 release branch and get us on track 
to ship Apache Geode 1.14.0 in early March.

I propose we cut the branch at the end of this week and aim to ship 4 weeks 
later, on March 12th.


Re: Inputs for efficient querying

2021-02-09 Thread ankit Soni
Thanks Dan for your input. I am able to try this at my end and it's working
as expected.

As a next steps I need to support somewhat complex queries, so updated a
ValueObject, like

public class ValueObject implements PdxSerializable {
private static final long serialVersionUID = -754645299372860596L;
private int versionId;   //1 for latest record; 0 for previous latest
private String date;
private String col_1;
private String col_2;
private String col_3;
private String type;
private Map map;

public ValueObject() {
}

*Region* : *Key:* random-string and *Value:*
ValueObject

Need to support queries that fetches columns for *latest record (whose
versionId is Max)* with filters and aggregation like
"SELECT date, col_1, col_2, col_3<---Must be fetched from a row
where MAX(versionId)
 FROM /data-region d
 WHERE d.type='t1'
 AND d.date BETWEEN '2021-01-11' AND '2021-01-12'
AND d.vesionId BETWEEN 0 AND 1
AND d.col_1 IN SET ('11', '22')
GROUP BY d.col_1"

*Team, Kindly guide on following,*
1. How can i form an OQL query (syntax) to fetch the latest row based on
MAX(versionId).
2. It seems *BETWEEN* support is not available, how can this be achieved.
3. What should be the* recommended index creation here*, for this query to
gain fast performance.
4. Any recommendation for Key, currently it's a random string.

Any suggestions on above will be really helpful.

Thank you
Ankit.

On Fri, 29 Jan 2021 at 23:23, Dan Smith  wrote:

> For the best performance, you should store column2 as a java Map instead
> of a String which contains a json document. If column2 was Map String>, you could do a query like this:
>
>
> SELECT * FROM /exampleRegion r WHERE r.column2['k1'] IN SET('v10', 'v15',
> 'v7')"
>
> You can create an index on the map to optimize this sort of query
>
> gfsh>create index --name="IndexName" --expression="r.column2[*]"
> --region="/exampleRegion r"
>
> This page might be helpful
>
>
> https://geode.apache.org/docs/guide/112/developing/query_index/creating_map_indexes.html
>
> In addition, I noticed that your value implements Serializable. You will
> get better performance out of the query engine if you configure PDX
> serialization for your object, either by configuring the auto serializer or
> implementing PdxSerializable. That avoids the need to deserialize your
> entire value on the server to query/index it.
>
> -Dan
>
>
> 
> From: ankit Soni 
> Sent: Friday, January 29, 2021 9:32 AM
> To: dev@geode.apache.org 
> Subject: Inputs for efficient querying
>
> Hello Team,
>
> I am loading data into Geode (V 1.12) with the following *Key (of type
> String)* and *value (custom java object - ValueObject)*.
>
> *public class ValueObject implements Serializable {*
> * private int id;*
>
> * private String keyColumn;   <- Region.Key *
>
> * private String column_2; <-- Json document*
> * private String column_3;*
>
> * private String column_4*
>
>
>
> * //few more string type members*
> *}*
>
> *Keycolum* is a normal string of around 8 chars, like "12345678",
> "23456789" etc...
>
> *In ValueObject, column_2 is of type string and having a values of type
> valid JSON doc as bellow; *
> {"k1" : "v1", "k3" : "v3", "k6" : "v6", "k7" : "v7", *"k10" : "v7"*, "k12"
> : "v12", "k13" : "v13"}
> {"k2" : "v2", "k3" : "v3", "k4" : "v4", "k6" : "v6", *"k10" : "v10"*, "k13"
> : "v13", "k14" : "v14"}
> {"k1" : "v1", "k2" : "v2", "k6" : "v6", "k8" : "v8", "k10" : "v7", "k12" :
> "v12", "k13" : "v13", "k14" : "v14"}
> .
>
> after storing the data in Geode i need to run following two queries.
> *Query to be supported.*
>
>
> *Q1. //query with filter on keyColumn*
> "select d.keyColumn, d.column_2, d.column_3, d.column_4
> from /DATA_REGION.keyset key
> where (key IN SET('12345678', '23456789', '34567890'))"
>
>
> *Q2. //query with filter on column_2 attribute, something like "where
> d.column_2.k10 IN SET('v10', 'v15', 'v7'); *
> "select d.keyColumn, d.column_2, d.column_3, d.column_4
> from /DATA_REGION v
> where v.column_2.k10 INSET('v10', 'v15', 'v7')"
>
> I am able to run the Q1 but not sure *how to achieve Q2 (form a OQL for
> this case)*...?
>
> Request team to help, how can i efficiently form and execute above kind of
> queries with geode OQL...?
>
> Also advise, what kind of index are recommended to get higher query
> performance for above queries...?
>
> Thanks
> Ankit.
>