Re: Geode data colocation partitioned regions

Udo Kohlmeyer Fri, 17 Jul 2020 10:36:15 -0700

Hi there Ashish,

I’m sorry if it might have come across as the example I provided would not hit 
the resolver every time. It will. Only difference is, your example does a 
string op and mine does not.


I might be wrong, (and I’d really have to write my own app to confirm this), 
but the PartitionResolver HAS to be on the server-side. I’m not sure about the 
client-side. But DEFINITELY on the server-side. I think you might be safe when 
it comes to client’s that only run functions and don’t have region definitions 
defined on the client-side. But please excuse me if I’m not 100% about the 
one’s where you do put/get operations on. Testing it out would help.

Given your explanation, if you have so many apps the create / read data, 
changing to a different key might be more complicated. All I wanted to point 
out was, in the example you provided, you do a string manipulation EVERYTIME 
you do a get or put because it needs to determine the bucket in the Partition 
Region it needs to address. This gets REALLY expensive really quickly.

Hope this answers your question.

—Udo
On Jul 17, 2020, 10:02 AM -0700, aashish choudhary 
<aashish.choudha...@gmail.com>, wrote:
Thanks Udo. I am also looking at this example which is also implemented the 
same way you have explained it earlier.

https://www.github.com/apache/geode-examples/tree/develop/colocation%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fgeode_examples%2Fcolocation%2FOrderPartitionResolver.java<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.github.com%2Fapache%2Fgeode-examples%2Ftree%2Fdevelop%2Fcolocation%252Fsrc%252Fmain%252Fjava%252Forg%252Fapache%252Fgeode_examples%252Fcolocation%252FOrderPartitionResolver.java&data=02%7C01%7Cudo%40vmware.com%7C50849bf74bc14457338b08d82a733faa%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637306021770536357&sdata=nZhjxVtlcQCrlabdBOlEElUaCmPRQGhYndE56uhplg0%3D&reserved=0>

I have some more doubts please help to clarify.

This partition resolver class needs to present in the classpath of the 
applications which will do get and put into the region? Actually we have 
different set of applications which do data ingestion on geode regions and then 
another set of applications which just read data from geode regions via 
functions.

There is no need to deploy this partition resolver class on the server side?

As you said in your previous email that with the implementation that I shared 
it will hit resolver to every request and may degrade the performance. So just 
trying to understand how in the example and the code snippet you shared with me 
it will not hit the resolver every time. Just trying to understand internally 
how it will handle that.

With best regards,
Ashish

On Fri, Jul 17, 2020, 1:33 AM Udo Kohlmeyer 
<u...@vmware.com<mailto:u...@vmware.com>> wrote:
Hi Ashish,

I think (from just the perspective of not doing an unnecessary String op) a 
complex Key object would be better..

But, your way could work as well… I feel you might definitely hit a performance 
wall with the string manipulation… Also.. for every get, you will hit the 
resolver, to determine what bucket to route to… So once again, more performance 
problems. (And memory)

—Udo
On Jul 16, 2020, 12:01 PM -0700, aashish choudhary 
<aashish.choudha...@gmail.com<mailto:aashish.choudha...@gmail.com>>, wrote:
Thanks Udo for Your inputs . I am planning to do something like this in 
Partitionresolver implementing.


Public class CustomPartitionResolver implements PartitionResolver<String,Object>
{
   public Object getRoutingObject(EntryOperation opDetails)
    { String key =(String)opDetails.getKey();

return key.split(“_”)[0];

    }
}



On Thu, 16 Jul 2020 at 10:56 PM, Udo Kohlmeyer 
<u...@vmware.com<mailto:u...@vmware.com>> wrote:
Hi there Ashish,

I think it is safe to assume that once you change the PartitionResolver 
strategy that you might have to reload the data.

I will not commit to a definitive, “Yes, you have to reload the data and cannot 
load it again from disk” answer, but I think that answer will become 
self-evident when you change the region configuration, as some settings on the 
region cannot be amended after creation.

I don’t know if you have considered this yet, but it sounds like you have some 
“complex” string key, that you try and parse for the common. Have you consider 
maybe using an Object like

public class ComplexKey implements DataSerializable {
  private String commonPartitioningKey;
  private String key;

  public ComplexKey() {}

  public ComplexKey(String commonPartitioningKey, String key) {
    this.commonPartitioningKey = commonPartitioningKey;
    this.key = key;
  }

  @Override
  public int hashCode() {
    return key.hashCode();
  }

  @Override
  public boolean equals(Object obj) {
    return this.key.equals(((ComplexKey) obj).key);
  }

  public Object getCommonPartitioningKey() {
    return commonPartitioningKey;
  }

  public void setCommonPartitioningKey(String commonPartitioningKey) {
    this.commonPartitioningKey = commonPartitioningKey;
  }

  public Object getKey() {
    return key;
  }

  public void setKey(String key) {
    this.key = key;
  }

  @Override
  public void toData(DataOutput out) throws IOException {
    out.writeUTF(commonPartitioningKey);
    out.writeUTF(key);
  }

  @Override
  public void fromData(DataInput in) throws IOException, ClassNotFoundException 
{
    commonPartitioningKey = in.readUTF();
    key = in.readUTF();
  }
}

Where you can still do a get using the natural key of the object but the 
PartitionResolver can partition according to the partitioningKey. Imo, it just 
cleanly separates the partitioning and natural key logic.

BE AWARE, you should not use PDX serialization for keys, so stick to 
Serializable or DataSerializable.

As for functions. You should see no difference. Colocation just means that the 
same bucket number of colocated regions are stored on the same server. What you 
can now use, is you the notion of “local” data across colocated regions and 
don’t need to go across the network if you need to access colocated data. So 
possibly functions can run using local data only and don’t need to go across a 
network if they need data from another region. I might improve performance a 
little.

Anyway, lots of information. Reach out if you get stuck or don’t understand 
something.


—Udo
On Jul 16, 2020, 9:38 AM -0700, aashish choudhary 
<aashish.choudha...@gmail.com<mailto:aashish.choudha...@gmail.com>>, wrote:
Hi,

We are seeing some performance issue with partitioned regions as when we 
execute data aware function then some of the calls to other regions inside 
functions goes to different nodes for further processing. So we are trying to 
implement data colocation between those regions.

We will be using custom partitioning of data by implementing PartitionResolver 
interface.

Questions

I believe we would need to import/export data again after creating regions with 
colocation. Please confirm.

Since we have regions with different key but all regions have first part of the 
key common(separated by _) so in partition resolver implementing class we just 
take the first of key for routing. Will this custom partition the data 
correctly?

Do we need to do any changes while reading data in functions after enabling 
data colocation?


With best regards,
Ashish
--
With Best Regards,
Ashish

Re: Geode data colocation partitioned regions

Reply via email to