I understand no strategy will work perfectly in all circumstances,
just need better documentation so developers can make correct
assumptions. Previously I assume delivery of session expiration event
& ephemeral dissapearance will occur together - not exact same time
but within certain definite time frame...

BTW, here is the thread dump of the zombie client:

Full thread dump Java HotSpot(TM) 64-Bit Server VM (11.3-b02 mixed mode):

"Attach Listener" daemon prio=10 tid=0x0000000054aad800 nid=0x6813
waiting on condition [0x0000000000000000..0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"IPC Client (47) connection to hdpnn/10.249.54.101:9000 from taobao"
daemon prio=10 tid=0x00002aaadc31c800 nid=0x67f8 in Object.wait()
[0x00000000427fa000..0x00000000427faa90]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aaaaf9360e0> (a 
org.apache.hadoop.ipc.Client$Connection)
        at org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:396)
        - locked <0x00002aaaaf9360e0> (a 
org.apache.hadoop.ipc.Client$Connection)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:438)

"ResponseProcessor for block blk_-7997360194615811589_639843163"
daemon prio=10 tid=0x0000000054aae000 nid=0x67ec runnable
[0x00000000429fc000..0x00000000429fcd10]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        - locked <0x00002aaaaf9b1de0> (a sun.nio.ch.Util$1)
        - locked <0x00002aaaaf9b1dc8> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00002aaaaf9818a0> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
        at java.io.DataInputStream.readFully(DataInputStream.java:178)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2318)

"DataStreamer for file
/group/tbads/TimeTunnel2/merge_pv/20100815/02/35/tt2yunti2.sds.cnz.alimama.com/43_040500a8-5aa3-4816-9ab5-31ffd70bf899.log.tmp
block blk_-7997360194615811589_639843163" daemon prio=10
tid=0x00000000549cc400 nid=0x67c9 in Object.wait()
[0x00000000423f6000..0x00000000423f6c90]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aaaaf6faf80> (a java.util.LinkedList)
        at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2166)
        - locked <0x00002aaaaf6faf80> (a java.util.LinkedList)

"LeaseChecker" daemon prio=10 tid=0x0000000054692800 nid=0x5882
waiting on condition [0x00000000428fb000..0x00000000428fbb90]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:978)
        at java.lang.Thread.run(Thread.java:619)

"DestroyJavaVM" prio=10 tid=0x00002aaac022d000 nid=0x585f waiting on
condition [0x0000000000000000..0x00000000415c9d00]
   java.lang.Thread.State: RUNNABLE

"Thread-5" prio=10 tid=0x00002aaac022b800 nid=0x5880 waiting on
condition [0x00000000426f9000..0x00000000426f9a90]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aaaaf6e8460> (a
java.util.concurrent.Semaphore$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
        at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
        at 
com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher$WatchDataOperation.execute(PathDataWatcher.java:37)
        at 
com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher$WatchDataOperation.execute(PathDataWatcher.java:28)
        at 
org.apache.zookeeper.recipes.lock.ProtocolSupport.retryOperation(ProtocolSupport.java:120)
        at 
com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher.watch(PathDataWatcher.java:45)
        at 
com.taobao.timetunnel2.cluster.zookeeper.ZooKeeperClient$2.run(ZooKeeperClient.java:82)

"Thread-4" prio=10 tid=0x00002aaac02a0c00 nid=0x587f waiting on
condition [0x00000000425f8000..0x00000000425f8d10]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aaaaf6ec150> (a
java.util.concurrent.Semaphore$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
        at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
        at 
com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher$WatchDataOperation.execute(PathDataWatcher.java:37)
        at 
com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher$WatchDataOperation.execute(PathDataWatcher.java:28)
        at 
org.apache.zookeeper.recipes.lock.ProtocolSupport.retryOperation(ProtocolSupport.java:120)
        at 
com.taobao.timetunnel2.cluster.zookeeper.operation.PathDataWatcher.watch(PathDataWatcher.java:45)
        at 
com.taobao.timetunnel2.cluster.zookeeper.ZooKeeperClient$2.run(ZooKeeperClient.java:82)

"Thread-3" prio=10 tid=0x00002aaac0264800 nid=0x587e runnable
[0x00000000424f7000..0x00000000424f7d90]
   java.lang.Thread.State: RUNNABLE
        at java.util.zip.Deflater.deflateBytes(Native Method)
        at java.util.zip.Deflater.deflate(Deflater.java:290)
        - locked <0x00002aaaaf8ce280> (a
org.apache.hadoop.io.compress.zlib.BuiltInZlibDeflater)
        at 
org.apache.hadoop.io.compress.zlib.BuiltInZlibDeflater.compress(BuiltInZlibDeflater.java:47)
        - locked <0x00002aaaaf8ce280> (a
org.apache.hadoop.io.compress.zlib.BuiltInZlibDeflater)
        at 
org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:76)
        at 
org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:71)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        - locked <0x00002aaaaf73b1b0> (a java.io.BufferedOutputStream)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        - locked <0x00002aaaaf90e210> (a java.io.DataOutputStream)
        at 
org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1247)
        - locked <0x00002aaaaf6eeac8> (a
org.apache.hadoop.io.SequenceFile$BlockCompressWriter)
        at 
org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1270)
        - locked <0x00002aaaaf6eeac8> (a
org.apache.hadoop.io.SequenceFile$BlockCompressWriter)
        at 
org.apache.hadoop.io.SequenceFile$BlockCompressWriter.append(SequenceFile.java:1321)
        - locked <0x00002aaaaf6eeac8> (a
org.apache.hadoop.io.SequenceFile$BlockCompressWriter)
        at 
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
        - locked <0x00002aaaaf6eeac8> (a
org.apache.hadoop.io.SequenceFile$BlockCompressWriter)
        at 
com.taobao.timetunnel2.savefile.util.HDFSWriter.write(HDFSWriter.java:42)
        at 
com.taobao.timetunnel2.savefile.reader.HDFSHandler.handleRecord(HDFSHandler.java:46)
        at 
com.taobao.timetunnel2.savefile.reader.FileReader.processFile(FileReader.java:151)
        at 
com.taobao.timetunnel2.savefile.reader.FileReader.doProcessFile(FileReader.java:130)
        at 
com.taobao.timetunnel2.savefile.reader.FileReader.execute(FileReader.java:82)
        at 
com.taobao.timetunnel2.savefile.app.StoppableService.run(StoppableService.java:37)
        at java.lang.Thread.run(Thread.java:619)

"main-EventThread" daemon prio=10 tid=0x00002aaac01c7400 nid=0x587c
waiting on condition [0x00000000422f5000..0x00000000422f5c90]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00002aaaaf6e1538> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
        at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)

"main-SendThread" daemon prio=10 tid=0x00002aaac01ad000 nid=0x587b
runnable [0x00000000421f4000..0x00000000421f4b10]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        - locked <0x00002aaaaf6efb68> (a sun.nio.ch.Util$1)
        - locked <0x00002aaaaf6efb80> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00002aaaaf6f7ff0> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:921)

"Low Memory Detector" daemon prio=10 tid=0x00002aaac0026800 nid=0x5879
runnable [0x0000000000000000..0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread1" daemon prio=10 tid=0x00002aaac0024400 nid=0x5878
waiting on condition [0x0000000000000000..0x00000000418cb320]
   java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0x00002aaac0022400 nid=0x5877
waiting on condition [0x0000000000000000..0x00000000417ca5b0]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00002aaac0020800 nid=0x5876
runnable [0x0000000000000000..0x00000000416caa20]
   java.lang.Thread.State: RUNNABLE

"Surrogate Locker Thread (CMS)" daemon prio=10 tid=0x00002aaac001ec00
nid=0x5875 waiting on condition
[0x0000000000000000..0x0000000041472ec8]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00002aaac0000c00 nid=0x5874 in
Object.wait() [0x0000000041ff2000..0x0000000041ff2c90]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aaaaf6efb98> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
        - locked <0x00002aaaaf6efb98> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x0000000054301000 nid=0x5873
in Object.wait() [0x0000000041ef1000..0x0000000041ef1b10]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00002aaaaf6ef670> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
        - locked <0x00002aaaaf6ef670> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=10 tid=0x00000000542fb800 nid=0x5872 runnable

"Gang worker#0 (Parallel GC Threads)" prio=10 tid=0x0000000053ff7c00
nid=0x5860 runnable

"Gang worker#1 (Parallel GC Threads)" prio=10 tid=0x0000000053ff9400
nid=0x5861 runnable

"Gang worker#2 (Parallel GC Threads)" prio=10 tid=0x0000000053ffb000
nid=0x5862 runnable

"Gang worker#3 (Parallel GC Threads)" prio=10 tid=0x0000000053ffc800
nid=0x5863 runnable

"Gang worker#4 (Parallel GC Threads)" prio=10 tid=0x0000000053ffe000
nid=0x5864 runnable

"Gang worker#5 (Parallel GC Threads)" prio=10 tid=0x0000000053fffc00
nid=0x5865 runnable

"Gang worker#6 (Parallel GC Threads)" prio=10 tid=0x0000000054001400
nid=0x5866 runnable

"Gang worker#7 (Parallel GC Threads)" prio=10 tid=0x0000000054002c00
nid=0x5867 runnable

"Gang worker#8 (Parallel GC Threads)" prio=10 tid=0x0000000054004800
nid=0x5868 runnable

"Gang worker#9 (Parallel GC Threads)" prio=10 tid=0x0000000054006000
nid=0x5869 runnable

"Gang worker#10 (Parallel GC Threads)" prio=10 tid=0x0000000054007800
nid=0x586a runnable

"Gang worker#11 (Parallel GC Threads)" prio=10 tid=0x0000000054009400
nid=0x586b runnable

"Gang worker#12 (Parallel GC Threads)" prio=10 tid=0x000000005400ac00
nid=0x586c runnable

"Concurrent Mark-Sweep GC Thread" prio=10 tid=0x0000000054144400
nid=0x5871 runnable
"Gang worker#0 (Parallel CMS Threads)" prio=10 tid=0x000000005413d800
nid=0x586d runnable

"Gang worker#1 (Parallel CMS Threads)" prio=10 tid=0x000000005413f000
nid=0x586e runnable

"Gang worker#2 (Parallel CMS Threads)" prio=10 tid=0x0000000054140c00
nid=0x586f runnable

"Gang worker#3 (Parallel CMS Threads)" prio=10 tid=0x0000000054142400
nid=0x5870 runnable

"VM Periodic Task Thread" prio=10 tid=0x00002aaac0029000 nid=0x587a
waiting on condition

JNI global references: 637





On Tue, Aug 17, 2010 at 12:03 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Ben or somebody else will have to repeat some of the detailed logic for
> this, but it has
> to do with the fact that you can't be sure what has happened during the
> network partition.
> One possibility is the one you describe, but another is that the partition
> happened because
> a majority of the ZK cluster lost power and you can't see the remaining
> nodes.  Those nodes
> will continue to serve any files in a read-only fashion.  If the partition
> involves you losing
> contact with the entire cluster at the same time a partition of the cluster
> into a quorum and
> a minority happens, then your ephemeral files could continue to exist at
> least until the breach
> in the cluster itself is healed.
>
> Suffice it to say that there are only a few strategies that leave you with a
> coherent picture
> of the universe.  Importantly, you shouldn't assume that the ephemerals will
> disappear at
> the same time as the session expiration event is delivered.
>
> On Mon, Aug 16, 2010 at 8:31 PM, Qing Yan <qing...@gmail.com> wrote:
>
>> Ouch, is this the current ZK behavior? This is unexpected, if the
>> client get partitioned from ZK cluster, he should
>> get notified and take some action(e.g. commit suicide) otherwise how
>> to tell a ephemeral node is really
>> up or down? Zombie can create synchronization nightmares..
>>
>>
>>
>> On Mon, Aug 16, 2010 at 7:22 PM, Dave Wright <wrig...@gmail.com> wrote:
>> > Another possible cause for this that I ran into recently with the c
>> client -
>> > you don't get the session expired notification until you are reconnected
>> to
>> > the quorum and it informs you the session is lost.  If you get
>> disconnected
>> > and can't reconnect you won't get the notification.  Personally I think
>> the
>> > client api should track the session expiration time locally and
>> information
>> > you once it's expired.
>> >
>> > On Aug 16, 2010 2:09 AM, "Qing Yan" <qing...@gmail.com> wrote:
>> >
>> > Hi Ted,
>> >
>> >  Do you mean GC problem can prevent delivery of SESSION EXPIRE event?
>> > Hum...so you have met this problem before?
>> > I didn't see any OOM though, will look into it more.
>> >
>> >
>> > On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning <ted.dunn...@gmail.com>
>> wrote:
>> >> I am assuming that y...
>> >
>>
>

Reply via email to