--
Ted Dunning, CTO
DeepDyve
zookeeper is not really what you would call a scalable system because
all transactions that are updates go through the leader for
serialization. Zookeeper is, instead, a high throughput HA system.
That said, the throughput of a modest zookeeper cluster is fairly
prodigous so for the
Chubby and Zookeeper have very different ways at getting to similar
purposes. Chubby is a locking service, while zookeeper is all about
avoiding locks. Zookeeper is better described as a coordination service.
Regarding performance, I am pretty sure that Zookeeper could keep up with
some pretty
=SessionExpired#query:SessionExpired+page:1+mid:gt4c2kn4n4f5s5kw+state:results
Perhaps this might have something to do with what you're seeing.
Cheers,
-n
On Tue, Apr 14, 2009 at 5:48 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
We have been using EC2 as a substrate for our search cluster
Absolutely.
Katta did this, at least initially.
Just spawn a thread and mimic the launching of a Zookeeper server.
On Thu, Apr 16, 2009 at 6:38 AM, David Pollak feeder.of.the.be...@gmail.com
wrote:
Is it possible to start ZooKeeper programatically from inside my web app?
--
Ted Dunning
Patrick,
Thanks enormously.
This hasn't helped yet, but that is just because it was a very large bite of
the apple. Once I digest it, I can tell that it will be very helpful.
I did have a chance to look at the stat output and maximum latency was
300ms. How that connects with what you are
I would expect Ben's method to be slightly faster, but they should be
comparable.
And, of course you are correct about rewind. Such are the perils of writing
code in the email program.
On Fri, Apr 24, 2009 at 10:01 AM, Satish Bhatti cthd2...@gmail.com wrote:
... Your approach appears to be
on this that they could share
with us. Also, we don't want to duplicate the effort, so we would appreciate
if you let us know anyone is already working on a design proposal for this
feature.
Thanks
Raghu
--
Ted Dunning, CTO
DeepDyve
111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
858
to be delicate
for other reasons as well.
On Mon, May 4, 2009 at 2:35 PM, Mahadev Konar maha...@yahoo-inc.com wrote:
So, zookeeper would work fine if you are careful with above but I would
vote
against doing this for production since the above is pretty easy to mess
up.
--
Ted Dunning, CTO
DeepDyve
On Fri, May 8, 2009 at 1:31 PM, Javier Vegas jav...@beboinc.com wrote:
Sorry, what I meant is issuing the new method watchChildren() on the
parent node (basically the same as getChildren() but returning just a
boolean instead of a list of children, because I already know the
paths of the
that do these repeated
mundane tasks for you to handle those use cases where the verbosity of the
API is a hinderance to quality and productivity.
--
Ted Dunning, CTO
DeepDyve
111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)
Thanks very much.
I have found a few oversights in the code as well and will post a new
version shortly (with your suggested changes).
On Wed, Jun 3, 2009 at 8:17 AM, Eric Bowman ebow...@boboco.ie wrote:
Ted Dunning wrote:
Please add comments, suggestions and improvements to the JIRA ticket
Isn't the max file size a megabyte?
On Wed, Jun 3, 2009 at 9:01 AM, Eric Bowman ebow...@boboco.ie wrote:
On the client, I see this when trying to write a node with 7,641,662 bytes:
--
Ted Dunning, CTO
DeepDyve
Remember that the patch is almost trivial. Add a configuration option
acceptConnectionsOnlyFromLocalHost, and then in the server connect logic
reject non-localhost attempts (and log a security note).
On Tue, Jun 16, 2009 at 2:53 PM, Gustavo Niemeyer gust...@niemeyer.netwrote:
but the stunnel
In general for changes like this, you need to be running more than one
server in a cluster to avoid losing state such as the ephemeral nodes.
I can't say for certain that the 3.1.1 to 3.2 change can be done this way,
but most upgrades can be done by stopping one server at a time, changing
the
I don't think you should be very nervous at all.
There are two questions:
1) can 3.1.1 go to 3.2 with no down time. This is very likely, but a wiser
head than mine should have final say
2) can 3.1.1 go to 3.2 with 1 minute of downtime. The is for sure.
Neither option involves data loss.
ZK
A rolling update works very well for that. You can also change the number
of nodes in the cluster.
To do this, you replace the config files on the surviving servers and on the
new server.
Then take down the one that is leaving the cluster and then one by one
restart the servers that will remain
On Mon, Jul 6, 2009 at 12:58 PM, Gustavo Niemeyer gust...@niemeyer.netwrote:
can make the ZK servers appear a bit less connected. You have to plan
for
ConnectionLoss events.
Interesting.
Note that most of these seem to be related to client issues, especially GC.
If you configure in such
for extremely large queues of pending tasks.
On Fri, Jul 17, 2009 at 1:20 PM, Mahadev Konar maha...@yahoo-inc.comwrote:
Also are there any performance numbers of zookeeeper based queues. How
does
it compare with JMS.
--
Ted Dunning, CTO
DeepDyve
to...@audiencescience.comwrote:
Ted, could you elaborate a bit more on this? I was under the (mis)
impression that each ZK server in an ensemble only needed connectivity
to another member in the ensemble, not to each member in the ensemble.
It sounds like you are saying the latter is true.
--
Ted Dunning, CTO
the performance of the ensemble, provided
large blobs of traffic were not being sent across the network.
--
Ted Dunning, CTO
DeepDyve
THat would be a great way to get really good feedback.
On Thu, Aug 13, 2009 at 4:13 PM, Stefan Groschupf s...@101tec.com wrote:
If we have something clean and stable running we might contribute it back
to the apache zk project.
--
Ted Dunning, CTO
DeepDyve
in receiving
notifications.
Cheers
Avinash
--
Ted Dunning, CTO
DeepDyve
)
It's been running for about 48 hours.
On Tue, Sep 1, 2009 at 5:12 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
Do you have long GC delays?
On Tue, Sep 1, 2009 at 4:51 PM, Satish Bhatti cthd2...@gmail.com
wrote:
Session timeout is 30 seconds.
On Tue, Sep 1, 2009 at 4
in
mailing list archives, but got nothing helpful.
I need your help, thanks and best regards!
--
Ted Dunning, CTO
DeepDyve
Good points.
On the other hand, it could still be firewall issues.
On Wed, Sep 23, 2009 at 8:30 AM, Benjamin Reed br...@yahoo-inc.com wrote:
The connection refused message as opposed to no route to host, or unknown
host, indicate that zookeeper has not been started on the other machines.
are
there is a good reason for using this approach, but it is the
first time I have come over this type of non-automatic way for
administrating replicas.
Regards, Orjan
--
Ted Dunning, CTO
DeepDyve
) somewhere that totally ignores that this would reset the
interrupt flag, if e is an InterruptedException. Therefore we better avoid
having all of the methods throwing that exception.
--
Ted Dunning, CTO
DeepDyve
is back and check if the znode is there. There is no way of
knowing whether it was us who created the node or somebody else, right?
--
Ted Dunning, CTO
DeepDyve
sessionid. As you say, it's highly implementation dependent. It's also
something we recognize is a problem for users, we've slated it for 3.3.0
http://issues.apache.org/jira/browse/ZOOKEEPER-22
--
Ted Dunning, CTO
DeepDyve
but that
is
not exposed.
Rob Baccus
425-201-3812
--
Ted Dunning, CTO
DeepDyve
, I know it makes more sense to
run an odd number of zookeeper nodes but I just want to make sure it works
first). Any suggestions?
--
Ted Dunning, CTO
DeepDyve
a delay and restarting it on the
same port. But the server doesn't startup. When I re-start on a
different port, it starts up correctly.
Can you let me know how I can make this one work.
Thank you.
Regards,
Siddharth
--
Ted Dunning, CTO
DeepDyve
/24/09 4:18 PM, Hamoun gh hamoun...@gmail.com wrote:
I am looking for the zookeeper viewer. seems the link is broken. can
somebody please help?
Thank you,
Hamoun Ghanbari
--
Ted Dunning, CTO
DeepDyve
in the future to do this?
TIA
A
--
Ted Dunning, CTO
DeepDyve
not restarting. Start/Stop the new/old process and then
start
a round of consensus for adding/removing a machine. I guess if one can do
that then there is stopping of process required. Am I missing something
here?
A
On Thu, Nov 5, 2009 at 11:14 AM, Ted Dunning ted.dunn...@gmail.com
wrote
the experience there? Are there more
timeouts, lead re-election, etc? Thanks,
Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099
jun...@almaden.ibm.com
--
Ted Dunning, CTO
DeepDyve
...@almaden.ibm.com
Ted Dunning ted.dunn...@gmail.com wrote on 11/09/2009 04:24:16 PM:
[image removed]
Re: ZK on EC2
Ted Dunning
to:
zookeeper-user
11/09/2009 04:25 PM
Please respond to zookeeper-user
Worked pretty well for me. We did extend all of our timeouts. The
biggest
in the wiki page on say, EC2
small/large nodes? I'd do it myself but I've not used ec2. If anyone could
try these and report I'd appreciate it.
Patrick
Ted Dunning wrote:
Worked pretty well for me. We did extend all of our timeouts. The
biggest
worry for us was timeouts on the client side
to get
double what I got for incoming transfer.
On Mon, Nov 9, 2009 at 9:47 PM, Patrick Hunt ph...@apache.org wrote:
Could you test networking - scping data between hosts? (I was seeing
64.1MB/s for a 512mb file - the one created by dd, random data)
--
Ted Dunning, CTO
DeepDyve
on the wiki for others interested in running in EC2.
--
Ted Dunning, CTO
DeepDyve
collector?
Patrick
Ted Dunning wrote:
The server side is a fairly standard (but old) config:
tickTime=2000
dataDir=/home/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
Most of our clients now use 5 seconds as the timeout, but I think that we
went to longer timeouts in the past. Without
13:06:39 -0600 (Wed, 18 Nov 2009) | 1 line
ZOOKEEPER-368. Observers: core functionality (henry robinson via mahadev)
Sweet! Congratulations, and thanks Henry.
--
Gustavo Niemeyer
http://niemeyer.net
--
Ted Dunning, CTO
DeepDyve
?
Solr now uses it, as does Avro I believe, and other parts of Hadoop.
-Yonik
http://www.lucidimagination.com
--
Ted Dunning, CTO
DeepDyve
).
Well, the disk IO or network first limits the throughput?
Thanks for you quick response. I'm studding Zookeeper in my master thesis,
for coordinate distributed index structures.
--
Ted Dunning, CTO
DeepDyve
only a idea. The world are changing to SSD's too!
--
Ted Dunning, CTO
DeepDyve
, but with a database, I would wonder if there are others.
On Tue, Jan 19, 2010 at 11:30 PM, xeoshow xeos...@gmail.com wrote:
I am wondering can this monitor part be replaced by zookeeper, using
zookeeper watch or something else?
--
Ted Dunning, CTO
DeepDyve
Take a look here at the recipes:
http://hadoop.apache.org/zookeeper/docs/r3.0.0/recipes.html
On Wed, Jan 20, 2010 at 12:15 AM, xeoshow xeos...@gmail.com wrote:
Ted, thank you very much for your reply. I think A will exit and so ZK can
help ..
Not sure if any further link can help on how to
processing the corresponding task (if something goes wrong, just
kill itself and the node will be gone)
if not, we go back to wait for watcher.
Will this work?
--
Ted Dunning, CTO
DeepDyve
--
Ted Dunning, CTO
DeepDyve
according to Zab's FIFO nature...just want to hear some
clarification about it.
Thanks alot!
--
With Regards!
Ye, Qian
Made in Zhejiang University
--
With Regards!
Ye, Qian
Made in Zhejiang University
--
Ted Dunning
!?
Thanks for any help.
Cheers,
Michael
--
Michael Bauland
michael.baul...@knipp.de
bauland.tel
--
Ted Dunning, CTO
DeepDyve
:
For example Hardware misconfiguration - NIC caused one system to
basically work, but with huge numbers of connection loss, esp whenever there
was load (and I've seen this particular issue twice now).
--
Ted Dunning, CTO
DeepDyve
On Thu, Feb 4, 2010 at 2:20 PM, Yonik Seeley yo...@lucidimagination.comwrote:
There's no way to hand over responsibility for an ephemeral znode, right?
Right.
We have solr nodes create ephemeral znodes (name based on host and port).
The ephemeral znode takes some time to remove of course,
, Patrick Hunt ph...@apache.org wrote:
Ah, excellent idea [jvm shutdownhooks], won't always work but may help. I
think in this case (ephemerals) all Yonik would need to do is close the
session. That will remove all ephemerals.
--
Ted Dunning, CTO
DeepDyve
/+archive/ppahttps://launchpad.net/%7Ettx/+archive/ppa
This is a Personal Package Archive at the moment, but these packages
may end up being promoted depending on how relevant they are.
Please let me know if these work or do not work for you.
--
Gustavo Niemeyer
http://niemeyer.net
--
Ted
Not sure this helps at all, but these times are remarkably asymmetrical. I
would expect members of a ZK cluster to have very comparable times.
Additionally, 345 ms is nowhere near large enough to cause a session to
expire. My take is that ZK doesn't think it caused the timeout.
On Mon, Feb
for a relatively short time (1 second on average), and by time I have
blundered through all the possible locks, ids that were locked at the start
might be available by time I finished.
--
Ted Dunning, CTO
DeepDyve
.
No feature but it does sound interesting. Are there any tools that allow
one to setup slow pipes ala stunnel but here for latency not encryp? I
believe freebsd has this feature at the os (firewall?) level, I don't know
if linux does.
--
Ted Dunning, CTO
DeepDyve
.
--
Ted Dunning, CTO
DeepDyve
Waite waite@googlemail.comwrote:
I really do not follow the delegator approach. Is this something I would
patch into Zookeeper ? Or the client ?
--
Ted Dunning, CTO
DeepDyve
locks to keep the size of the lock
table
small.
The trouble with managing these locks in a database is that the tables are
getting hot and becoming one of the main sources of contention. Also,
SQL
is not necessarily fast for doing the required updates.
--
Ted Dunning, CTO
DeepDyve
to ensure
the node FN was up-to-date - assuming I do not know if I am connected to
a
primary ZK instance ? Would 10K sync calls within a 2 minute period be
excessive ?
--
Ted Dunning, CTO
DeepDyve
That is one of the strengths of ZK. Your client would do this:
1) create node, if success client has lock
2) get current node (you get the current version when you do this), if lease
is current and ours, we have the lock, if lease is current and not ours, we
have failed to get the lock
3) try to
What other examples are you looking for?
On Tue, Mar 2, 2010 at 1:04 PM, David Rosenstrauch dar...@darose.netwrote:
Is there a library of higher-level zookeeper utilities that people have
contributed, beyond the barrier and queue examples provided in the docs?
--
Ted Dunning, CTO
DeepDyve
I taking Zookeeper out of its application
domain and just asking for trouble ?
--
Ted Dunning, CTO
DeepDyve
Your network admin is correct. Multicast often doesn't work.
ZK does not use multicast at the network level. Where events or
notifications must go to many places (that SOUNDS like multicast, I know) it
uses very standard TCP connections.
For almost any known modern network, ZK should be just
If you can stand the latency for updates then zk should work well for
you. It is unlikely that you will be able to better than zk does and
still maintain correctness.
Do note that you can, probalbly bias client to use a local server.
That should make things more efficient.
Sent from my
I have used 5 and 3 in different clusters. Moderate amounts of sharing is
reasonable, but sharing with less intensive applications is definitely
better. Sharing with the job tracker, for instance is likely fine since it
doesn't abuse disk so much. The namenode is similar, but not quite as
nice.
Your understanding is correct. But if you set a heap size nearly as big as
your physical memory (or larger) then java may allocate that heap which will
cause swapping.
So swapping is definitely done by the OS, but it is the applications like
Java that can cause the OS to do it.
On Mon, Mar 15,
I don't think that you have considered the impact of ordered updates here.
On Mon, Mar 15, 2010 at 6:19 PM, Maxime Caron maxime.ca...@gmail.comwrote:
So this is all about the operation log so if a node is in minority but
have more recent committed value this node is in Veto over the other
I like to say that the cost of now goes up dramatically with diameter.
On Mon, Mar 15, 2010 at 7:50 PM, Henry Robinson he...@cloudera.com wrote:
There is
a fundamental tension between synchronicity of updates and scale.
Hmm... this inspires me to have a thought as well.
Łukasz, there isn't any fancy network stuff going on here is there? No
NATing or fancy load balancing or reassignment of IP addresses of servers,
right?
On Tue, Mar 16, 2010 at 4:51 PM, Patrick Hunt ph...@apache.org wrote:
It will be good to
This kind of sounds strange to me.
My typical idiom is to create a watcher but not retain any references to it
outside the client. It sounds to me like your change will cause my watchers
to be collected and deactivated when GC happens.
On Thu, Mar 18, 2010 at 3:32 AM, Dominic Williams
This is not a good thing. ZK gains lots of its power and reliability by not
trying to do atomic updates to multiple znodes at once.
Can you say more about the update that you want to do? It is common for
updates like to be such that you can order the updates and do without a
truly atomic
I perhaps should not have said power, except insofar as ZK's strengths are
in reliability which derives from simplicity.
There are essentially two common ways to implement multi-node update. The
first is the tradtional db style with begin-transaction paired with either a
commit or a rollback
as a whole.
2010-03-30
Will
发件人: Ted Dunning ted.dunn...@gmail.com
发送时间: 2010-03-30 10:11
主 题: Re: How to ensure trasaction create-and-update
收件人: zookeeper-user@hadoop.apache.org
This is not a good thing. ZK gains lots of its power and reliability by
not
trying to do atomic updates
As usual, Ben says better what I was trying to say.
Henry's point that a very limited multi-update would be useful is also true,
though. If somebody can come up with a way to do that without making things
unreasonably complicated, it would be really nice to have.
In the meantime, I will try to
Suppose a machine has probability of soft-failure p_1 and catastrophic p_2
p_1. Assume that two machines have independent failure modes.
Probably of soft failure of a one machine cluster = p_1, two machine cluster
= probability of soft failure of 1 or 2 machines + probability of one
machine
As I pointed out in my response, you should distinguish hard and soft
failures. If one machine fails even catastrophically, you can provide a new
machine to replace it, thus converting a hard failure into a soft one.
The conclusion is the same. Three machines is vastly better than one or
two.
On Thu, Apr 1, 2010 at 7:27 PM, li li liqiyuan...@gmail.com wrote:
Now I can handle about 300 clients with one server,when I set the
session time out is 3.
In your opinion , the session time out is set in which value more
suitable?
5-30 seconds is a much more typically value.
We have just done an upgrade of ZK to 3.3.0. Previous to this, ZK has been
up for about a year with no problems.
On two nodes, we killed the previous instance and started the 3.3.0
instance. The first node was a follower and the second a leader.
All went according to plan and no clients seemed
I can't comment on the details of your code (but I have run in-process ZK's
in the past without problem)
Operationally, however, this isn't a great idea. The problem is two-fold:
a) firstly, somebody would probably like to look at Zookeeper to understand
the state of your service. If the
It is, of course, your decision, but a key coordination function is to
determine whether your application is up or not. That is very hard to do if
Zookeeper is inside your application.
On Fri, Apr 23, 2010 at 10:28 AM, Asankha C. Perera asan...@apache.orgwrote:
However, I believe that both the
The general way to do this is either
a) have lots of watchers who all try to create a single file when a watched
file changes. This is very simple to code, but leads to a lot of
notifications when you have thousands of watchers.
b) arrange the watchers in a chain. This is similar to the
Lei,
A contrary question for you is why you don't just share zk sessions within a
single process.
On Tue, Apr 27, 2010 at 5:17 PM, Lei Zhang lzvoya...@gmail.com wrote:
I am
in the process of changing to each thread of each daemon maintaining a zk
session. That means we will hit this 10
In general, the guarantee is that B will do exactly as you say it will
read the new value or the old value. Your question depends on a definition
of now that spans several machines. That is a dangerous concept and if
your reasoning requires it, you are headed for trouble.
On Thu, Apr 29,
, this is my browser homepage ;-)
http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing
Patrick
On 04/29/2010 09:14 AM, Ted Dunning wrote:
In general, the guarantee is that B will do exactly as you say it will
read the new value or the old value. Your question depends
and Slave(s) are broken while all other connections are still alive,
would
my system hang after some point? Because no new leader election will be
initiated by slaves and the leader can't get the work to slave(s).
Thanks,
Lei
On 4/30/10 1:54 PM, Ted Dunning ted.dunn...@gmail.com
This is used as part of katta where it gets a fair bit of exercise at low
update rates with small data. It is used for managing the state of the
search cluster.
I don't think it has had much external review or use for purposes apart from
katta. Katta generally has pretty decent code, though.
I don't think that zk is hard to get right.
What is hard is to layer a very different model on top of ZK that changes
the semantics significantly and that that translation right.
One of the very cool things about ZK is how easy it is to write correct
code. I know that Ben and co put a lot of
, 2010 at 2:21 PM, Ted Dunning ted.dunn...@gmail.com
wrote:
In general, writing this sort of layer on top of ZK is very, very hard
to
get really right for general use. In a simple use-case, you can
probably
nail it but distributed systems are a Zoo, to coin a phrase. The
problem
Impressive number here, especially at your quoted few per second rate.
Are you sure that you haven't inadvertently synchronized GC on multiple
machines?
On Wed, May 12, 2010 at 8:30 PM, Aaron Crow dirtyvagab...@yahoo.com wrote:
Right now we're at
1.9 million. This isn't a bug of our
Yes. That is roughly what I mean.
If one server starts a GC, it can effectively go offline. That might
pressure the other servers enough that one of them starts a GC.
This is unlikely with your GC settings, but you should turn on the verbose
GC logging to be sure.
On Wed, May 12, 2010 at
You may actually be swapping. That can be even worse than GC!
On Fri, May 21, 2010 at 11:32 AM, Stephen Green eelstretch...@gmail.comwrote:
Right. The system can be very memory-intensive, but at the time these
are occurring, it's not under a really heavy load, and there's plenty
of heap
Which version of maven do you have?
I have heard some versions don't follow redirects well. You can try
deleting these defective files in your local repository under .m2 and try
again. You may need to try with a newer maven to get things right.
Another option is to explicitly remove those
The only one that I think is important is the jmx which enables monitoring
of the servers.
On Mon, May 24, 2010 at 2:51 PM, Jack Orenstein j...@akiban.com wrote:
This at least gets me through the build/install phase. My usage of
zookeeper is pretty minimal right now -- just one a single node.
Same version I use.
On Mon, May 24, 2010 at 2:51 PM, Jack Orenstein j...@akiban.com wrote:
Ted Dunning wrote:
Which version of maven do you have?
2.2.1.
This looks a bit like a small bobble we had when upgrading a bit ago.
I THINK that the answer here is to mind-wipe the misbehaving node and have
it resynch from scratch from the other nodes.
Wait for confirmation from somebody real.
On Wed, Jun 2, 2010 at 11:11 AM, Charity Majors
I knew Patrick would remember to add an important detail.
On Wed, Jun 2, 2010 at 11:49 AM, Patrick Hunt ph...@apache.org wrote:
As Ted suggested you can remove the datadir -- *only on the effected
server* -- and then restart it.
1 - 100 of 167 matches
Mail list logo