Re: Maximum number of children

2009-01-13 Thread Mahadev Konar
Thanks Joshua. 

mahadev


On 1/13/09 10:43 AM, "Joshua Tuberville" 
wrote:

> Thanks to everyone for proposed schemes and I created ZOOKEEPER-272 per your
> request Mahadev.
> 
> Joshua
> 
> 
> -Original Message-
> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> Sent: Monday, January 12, 2009 7:04 PM
> To: zookeeper-user@hadoop.apache.org
> Subject: Re: Maximum number of children
> 
> I was going to suggest bucketing with predifined hashes.
> /root/template/data/hashbucket/hash
> 
> For the issue raised by Joshua regarding the length of the output from the
> server -- 
> This is a bug. We seem to allow any number of children (< int) of a node and
> the getchildren call fails to return the children. This leads to a chicken
> and egg problem on how to get rid of the nodes if you do not know them.
> 
> Here we arent saving nething since the server has already processed the
> request and sent us the data. We should get rid of this hard coded limit. I
> am not sure why we had this limit.
> 
> Can you open a jira for this Joshua?
> 
> thanks
> mahadev
> 
> 
> On 1/12/09 5:39 PM, "Stu Hood"  wrote:
> 
>> To continue with your current design, you could create a trie based on shared
>> hash prefixes.
>> 
>> /root/template/date/ 1a5e67/2b45dc
>> /root/template/date/ 1a5e67/3d4a1f
>> /root/template/date/ 3d4a1f/1a5e67
>> /root/template/date/ 3d4a1f/2b45dc
>> 
>> Alternatively, you could use what the maildir mail storage format uses:
>> /root/template/date/ eh/eharmony.com/jo/joshuatuberville
>> 
>> Just check with the second one that all of the characters you support in
>> email
>> addresses are supported in znode names.
>> 
>> Thanks,
>> Stu
>> 
>> 
>> -Original Message-
>> From: "Joshua Tuberville" 
>> Sent: Monday, January 12, 2009 7:53pm
>> To: "'zookeeper-user@hadoop.apache.org'" 
>> Subject: Maximum number of children
>> 
>> Hello,
>> 
>> We are attempting to use ZooKeeper to coordinate daily email thresholds.  To
>> do this we created a node hierarchy of
>> 
>> /root/template/date/email_hash
>> 
>> The idea being that we only send the template to an email address once per
>> day.  This is intended to support millions of email hashes per day. From the
>> ZooKeeper perspective we just attempt a create and if it succeeds we proceed
>> and if we get a node exists exception we stop processing.  This seems to
>> operate fine for over 2 million email hashes so far in testing.  However we
>> also want to prune all previous days nodes to conserve memory.  We have run
>> into a hard limit while using the getChildren method for a given
>> /root/template/date.  If the List of children exceeds the hardcoded 4,194,304
>> byte limit ClientCnxn$SendThread.readLength() throws an exception on line
>> 490.
>> So we have an issue that we can not delete a node that has children nor is it
>> possible to delete a node who has children whose total names exceed 4 Mb.
>> 
>> Any feedback or guidance is appreciated.
>> 
>> Joshua Tuberville
>> 
>> 
> 



RE: Maximum number of children

2009-01-13 Thread Joshua Tuberville
Thanks to everyone for proposed schemes and I created ZOOKEEPER-272 per your 
request Mahadev.

Joshua


-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com] 
Sent: Monday, January 12, 2009 7:04 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: Maximum number of children

I was going to suggest bucketing with predifined hashes.
/root/template/data/hashbucket/hash

For the issue raised by Joshua regarding the length of the output from the
server -- 
This is a bug. We seem to allow any number of children (< int) of a node and
the getchildren call fails to return the children. This leads to a chicken
and egg problem on how to get rid of the nodes if you do not know them.

Here we arent saving nething since the server has already processed the
request and sent us the data. We should get rid of this hard coded limit. I
am not sure why we had this limit.

Can you open a jira for this Joshua?

thanks
mahadev


On 1/12/09 5:39 PM, "Stu Hood"  wrote:

> To continue with your current design, you could create a trie based on shared
> hash prefixes.
> 
> /root/template/date/ 1a5e67/2b45dc
> /root/template/date/ 1a5e67/3d4a1f
> /root/template/date/ 3d4a1f/1a5e67
> /root/template/date/ 3d4a1f/2b45dc
> 
> Alternatively, you could use what the maildir mail storage format uses:
> /root/template/date/ eh/eharmony.com/jo/joshuatuberville
> 
> Just check with the second one that all of the characters you support in email
> addresses are supported in znode names.
> 
> Thanks,
> Stu
> 
> 
> -Original Message-
> From: "Joshua Tuberville" 
> Sent: Monday, January 12, 2009 7:53pm
> To: "'zookeeper-user@hadoop.apache.org'" 
> Subject: Maximum number of children
> 
> Hello,
> 
> We are attempting to use ZooKeeper to coordinate daily email thresholds.  To
> do this we created a node hierarchy of
> 
> /root/template/date/email_hash
> 
> The idea being that we only send the template to an email address once per
> day.  This is intended to support millions of email hashes per day. From the
> ZooKeeper perspective we just attempt a create and if it succeeds we proceed
> and if we get a node exists exception we stop processing.  This seems to
> operate fine for over 2 million email hashes so far in testing.  However we
> also want to prune all previous days nodes to conserve memory.  We have run
> into a hard limit while using the getChildren method for a given
> /root/template/date.  If the List of children exceeds the hardcoded 4,194,304
> byte limit ClientCnxn$SendThread.readLength() throws an exception on line 490.
> So we have an issue that we can not delete a node that has children nor is it
> possible to delete a node who has children whose total names exceed 4 Mb.
> 
> Any feedback or guidance is appreciated.
> 
> Joshua Tuberville
> 
> 



Re: Maximum number of children

2009-01-12 Thread Mahadev Konar
I was going to suggest bucketing with predifined hashes.
/root/template/data/hashbucket/hash

For the issue raised by Joshua regarding the length of the output from the
server -- 
This is a bug. We seem to allow any number of children (< int) of a node and
the getchildren call fails to return the children. This leads to a chicken
and egg problem on how to get rid of the nodes if you do not know them.

Here we arent saving nething since the server has already processed the
request and sent us the data. We should get rid of this hard coded limit. I
am not sure why we had this limit.

Can you open a jira for this Joshua?

thanks
mahadev


On 1/12/09 5:39 PM, "Stu Hood"  wrote:

> To continue with your current design, you could create a trie based on shared
> hash prefixes.
> 
> /root/template/date/ 1a5e67/2b45dc
> /root/template/date/ 1a5e67/3d4a1f
> /root/template/date/ 3d4a1f/1a5e67
> /root/template/date/ 3d4a1f/2b45dc
> 
> Alternatively, you could use what the maildir mail storage format uses:
> /root/template/date/ eh/eharmony.com/jo/joshuatuberville
> 
> Just check with the second one that all of the characters you support in email
> addresses are supported in znode names.
> 
> Thanks,
> Stu
> 
> 
> -Original Message-
> From: "Joshua Tuberville" 
> Sent: Monday, January 12, 2009 7:53pm
> To: "'zookeeper-user@hadoop.apache.org'" 
> Subject: Maximum number of children
> 
> Hello,
> 
> We are attempting to use ZooKeeper to coordinate daily email thresholds.  To
> do this we created a node hierarchy of
> 
> /root/template/date/email_hash
> 
> The idea being that we only send the template to an email address once per
> day.  This is intended to support millions of email hashes per day. From the
> ZooKeeper perspective we just attempt a create and if it succeeds we proceed
> and if we get a node exists exception we stop processing.  This seems to
> operate fine for over 2 million email hashes so far in testing.  However we
> also want to prune all previous days nodes to conserve memory.  We have run
> into a hard limit while using the getChildren method for a given
> /root/template/date.  If the List of children exceeds the hardcoded 4,194,304
> byte limit ClientCnxn$SendThread.readLength() throws an exception on line 490.
> So we have an issue that we can not delete a node that has children nor is it
> possible to delete a node who has children whose total names exceed 4 Mb.
> 
> Any feedback or guidance is appreciated.
> 
> Joshua Tuberville
> 
> 



RE: Maximum number of children

2009-01-12 Thread Stu Hood
To continue with your current design, you could create a trie based on shared 
hash prefixes.

/root/template/date/ 1a5e67/2b45dc
/root/template/date/ 1a5e67/3d4a1f
/root/template/date/ 3d4a1f/1a5e67
/root/template/date/ 3d4a1f/2b45dc

Alternatively, you could use what the maildir mail storage format uses:
/root/template/date/ eh/eharmony.com/jo/joshuatuberville

Just check with the second one that all of the characters you support in email 
addresses are supported in znode names.

Thanks,
Stu


-Original Message-
From: "Joshua Tuberville" 
Sent: Monday, January 12, 2009 7:53pm
To: "'zookeeper-user@hadoop.apache.org'" 
Subject: Maximum number of children

Hello,

We are attempting to use ZooKeeper to coordinate daily email thresholds.  To do 
this we created a node hierarchy of

/root/template/date/email_hash

The idea being that we only send the template to an email address once per day. 
 This is intended to support millions of email hashes per day. From the 
ZooKeeper perspective we just attempt a create and if it succeeds we proceed 
and if we get a node exists exception we stop processing.  This seems to 
operate fine for over 2 million email hashes so far in testing.  However we 
also want to prune all previous days nodes to conserve memory.  We have run 
into a hard limit while using the getChildren method for a given 
/root/template/date.  If the List of children exceeds the hardcoded 4,194,304 
byte limit ClientCnxn$SendThread.readLength() throws an exception on line 490.  
So we have an issue that we can not delete a node that has children nor is it 
possible to delete a node who has children whose total names exceed 4 Mb.  

Any feedback or guidance is appreciated.

Joshua Tuberville