[jira] [Created] (ARTEMIS-2806) deployQueue missing address argument

2020-06-15 Thread Qihong Xu (Jira)
Qihong Xu created ARTEMIS-2806:
--

 Summary: deployQueue missing address argument
 Key: ARTEMIS-2806
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2806
 Project: ActiveMQ Artemis
  Issue Type: Bug
Affects Versions: 2.12.0
Reporter: Qihong Xu


In ActiveMQServerControlImpl, deployQueue method is missing address argument 
which results in creating non-matched addresses and queues. (For example, using 
deployQueue to create a subscriber A_0 under existing address A, it finally 
returns a new topic A_0 with a subscriber A_0 )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARTEMIS-2251) Large messages might not be deleted when server crashed

2019-02-12 Thread Qihong Xu (JIRA)
Qihong Xu created ARTEMIS-2251:
--

 Summary: Large messages might not be deleted when server crashed
 Key: ARTEMIS-2251
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2251
 Project: ActiveMQ Artemis
  Issue Type: Bug
Reporter: Qihong Xu


When deleting large messages, artemis will use storePendingLargeMessage to 
insert a temporary record in journal for reload, in case server crashed and 
large messages stayed forever. But in storePendingLargeMessage that 
appendAddRecord inserts records asynchronously. In this way there are potential 
risks that tasks in executor get lost due to server crash, which may lead to 
undeletable large messages. To solve this problem a Boolean is added to 
storePendingLargeMessage so that it will be forced to use SimpleWaitIOCallback 
in delete situation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARTEMIS-2214) ARTEMIS-2214 Cache durable in PagedReference

2019-01-04 Thread Qihong Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Xu updated ARTEMIS-2214:
---
Summary: ARTEMIS-2214 Cache durable in PagedReference  (was: 
Cache durable in PagedReference to avoid blocks in consuming paged 
messages)

> ARTEMIS-2214 Cache durable in PagedReference
> -
>
> Key: ARTEMIS-2214
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2214
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: stacks.txt
>
>
> We recently performed a test on artemis broker and found a severe performance 
> issue.
> When paged messages are being consumed, decrementMetrics in 
> QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are 
> durable or not. In this way queue will be locked for a long time because page 
> may be GCed and need to be reload entirely. Other operations rely on queue 
> will be blocked at this time, which cause a significant TPS drop. Detailed 
> stacks are attached below.
> This also happens when consumer is closed and messages are pushed back to the 
> queue, artemis will check priority on return if these messages are paged.
> To solve the issue, durable and priority need to be cached in PagedReference 
> just like messageID, transactionID and so on. I have applied a patch to fix 
> the issue. Any review is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2018-12-29 Thread Qihong Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Xu updated ARTEMIS-2216:
---
Description: 
Improving throughput on paging mode is one of our concerns since our cluster 
uses paging a lot.

We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
consumers receiving messages simultaneously, it became difficult for 
pageSyncTimer to get the executor due to race condition. Therefore page sync 
was delayed and producers suffered low throughput.

 

To achieve higher performance we assign a specific executor to pageSyncTimer to 
avoid racing. And we run a small-scale test on a single modified broker.

 

Broker: 4C/8G/500G SSD

Producer: 200 threads, non-transactional send

Consumer 200 threads, transactional receive

Message text size: 100-200 bytes randomly

AddressFullPolicy: PAGE

 

Test result:
| |Only Send TPS|Only Receive TPS|Send TPS|
|Original ver|38k|33k|3k/30k|
|Modified ver|38k|34k|30k/12.5k|

 

The chart above shows that on modified broker send TPS improves from “poor” to 
“extremely fast”, while receive TPS drops from “extremely fast” to “not-bad” 
under heavy load. Considering consumer systems usually have a long processing 
chain after receiving messages, we don’t need too fast receive TPS. Instead, we 
want to guarantee send TPS to cope with traffic peak and lower producer’s delay 
time. Moreover, send and receive TPS in total raises from 33k to about 43k. 
From all above this trade-off seems beneficial and acceptable.

  was:
Improve paging throughput by using a specific executor for pageSyncTimer

 

Improving throughput on paging mode is one of our concerns since our cluster 
uses paging a lot.

We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
consumers receiving messages simultaneously, it became difficult for 
pageSyncTimer to get the executor due to race condition. Therefore page sync 
was delayed and producers suffered low throughput.

 

To achieve higher performance we assign a specific executor to pageSyncTimer to 
avoid racing. And we run a small-scale test on a single modified broker.

 

Broker: 4C/8G/500G SSD

Producer: 200 threads, non-transactional send

Consumer 200 threads, transactional receive

Message text size: 100-200 bytes randomly

AddressFullPolicy: PAGE

 

Test result:
| |Only Send TPS|Only Receive TPS|Send TPS|
|Original ver|38k|33k|3k/30k|
|Modified ver|38k|34k|30k/12.5k|

 

The chart above shows that on modified broker send TPS improves from “poor” to 
“extremely fast”, while receive TPS drops from “extremely fast” to “not-bad” 
under heavy load. Considering consumer systems usually have a long processing 
chain after receiving messages, we don’t need too fast receive TPS. Instead, we 
want to guarantee send TPS to cope with traffic peak and lower producer’s delay 
time. Moreover, send and receive TPS in total raises from 33k to about 43k. 
From all above this trade-off seems beneficial and acceptable.


> Use a specific executor for pageSyncTimer
> -
>
> Key: ARTEMIS-2216
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
> Project: ActiveMQ Artemis
>  Issue Type: Improvement
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
>
> Improving throughput on paging mode is one of our concerns since our cluster 
> uses paging a lot.
> We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
> pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
> consumers receiving messages simultaneously, it became difficult for 
> pageSyncTimer to get the executor due to race condition. Therefore page sync 
> was delayed and producers suffered low throughput.
>  
> To achieve higher performance we assign a specific executor to pageSyncTimer 
> to avoid racing. And we run a small-scale test on a single modified broker.
>  
> Broker: 4C/8G/500G SSD
> Producer: 200 threads, non-transactional send
> Consumer 200 threads, transactional receive
> Message text size: 100-200 bytes randomly
> AddressFullPolicy: PAGE
>  
> Test result:
> | |Only Send TPS|Only Receive TPS|Send TPS|
> |Original ver|38k|33k|3k/30k|
> |Modified ver|38k|34k|30k/12.5k|
>  
> The chart above shows that on modified broker send TPS improves from “poor” 
> to “extremely fast”, while receive TPS drops from “extremely fast” to 
> “not-bad” under heavy load. Considering consumer systems usually have a long 
> processing chain after receiving messages, we don’t need too fast receive 
> TPS. Instead, we want to guarantee send TPS to cope with traffic peak and 
> lower producer’s delay time. Moreover, send and receive 

[jira] [Created] (ARTEMIS-2216) Use a specific executor for pageSyncTimer

2018-12-29 Thread Qihong Xu (JIRA)
Qihong Xu created ARTEMIS-2216:
--

 Summary: Use a specific executor for pageSyncTimer
 Key: ARTEMIS-2216
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2216
 Project: ActiveMQ Artemis
  Issue Type: Improvement
Affects Versions: 2.6.3
Reporter: Qihong Xu


Improve paging throughput by using a specific executor for pageSyncTimer

 

Improving throughput on paging mode is one of our concerns since our cluster 
uses paging a lot.

We found that pageSyncTimer in PagingStoreImpl shared the same executor with 
pageCursorProvider from thread pool. In heavy load scenario like hundreds of 
consumers receiving messages simultaneously, it became difficult for 
pageSyncTimer to get the executor due to race condition. Therefore page sync 
was delayed and producers suffered low throughput.

 

To achieve higher performance we assign a specific executor to pageSyncTimer to 
avoid racing. And we run a small-scale test on a single modified broker.

 

Broker: 4C/8G/500G SSD

Producer: 200 threads, non-transactional send

Consumer 200 threads, transactional receive

Message text size: 100-200 bytes randomly

AddressFullPolicy: PAGE

 

Test result:
| |Only Send TPS|Only Receive TPS|Send TPS|
|Original ver|38k|33k|3k/30k|
|Modified ver|38k|34k|30k/12.5k|

 

The chart above shows that on modified broker send TPS improves from “poor” to 
“extremely fast”, while receive TPS drops from “extremely fast” to “not-bad” 
under heavy load. Considering consumer systems usually have a long processing 
chain after receiving messages, we don’t need too fast receive TPS. Instead, we 
want to guarantee send TPS to cope with traffic peak and lower producer’s delay 
time. Moreover, send and receive TPS in total raises from 33k to about 43k. 
From all above this trade-off seems beneficial and acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARTEMIS-2214) Cache durable in PagedReference to avoid blocks in consuming paged messages

2018-12-25 Thread Qihong Xu (JIRA)
Qihong Xu created ARTEMIS-2214:
--

 Summary: Cache durable in PagedReference to avoid blocks 
in consuming paged messages
 Key: ARTEMIS-2214
 URL: https://issues.apache.org/jira/browse/ARTEMIS-2214
 Project: ActiveMQ Artemis
  Issue Type: Bug
  Components: Broker
Affects Versions: 2.6.3
Reporter: Qihong Xu


We recently performed a test on artemis broker and found a severe performance 
issue.

When paged messages are being consumed, decrementMetrics in 
QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are 
durable or not. In this way queue will be locked for a long time because page 
may be GCed and need to be reload entirely. Other operations rely on queue will 
be blocked at this time, which cause a significant TPS drop. Detailed stacks 
are attached below.

This also happens when consumer is closed and messages are pushed back to the 
queue, artemis will check priority on return if these messages are paged.

To solve the issue, durable and priority need to be cached in PagedReference 
just like messageID, transactionID and so on. I have applied a patch to fix the 
issue. Any review is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARTEMIS-2214) Cache durable in PagedReference to avoid blocks in consuming paged messages

2018-12-25 Thread Qihong Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Xu updated ARTEMIS-2214:
---
Attachment: (was: 0001-Add-durable-and-priority-to-pagedReference.patch)

> Cache durable in PagedReference to avoid blocks in consuming paged 
> messages
> 
>
> Key: ARTEMIS-2214
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2214
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: stacks.txt
>
>
> We recently performed a test on artemis broker and found a severe performance 
> issue.
> When paged messages are being consumed, decrementMetrics in 
> QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are 
> durable or not. In this way queue will be locked for a long time because page 
> may be GCed and need to be reload entirely. Other operations rely on queue 
> will be blocked at this time, which cause a significant TPS drop. Detailed 
> stacks are attached below.
> This also happens when consumer is closed and messages are pushed back to the 
> queue, artemis will check priority on return if these messages are paged.
> To solve the issue, durable and priority need to be cached in PagedReference 
> just like messageID, transactionID and so on. I have applied a patch to fix 
> the issue. Any review is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARTEMIS-2214) Cache durable in PagedReference to avoid blocks in consuming paged messages

2018-12-25 Thread Qihong Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Xu updated ARTEMIS-2214:
---
Attachment: 0001-Add-durable-and-priority-to-pagedReference.patch

> Cache durable in PagedReference to avoid blocks in consuming paged 
> messages
> 
>
> Key: ARTEMIS-2214
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2214
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: 0001-Add-durable-and-priority-to-pagedReference.patch, 
> stacks.txt
>
>
> We recently performed a test on artemis broker and found a severe performance 
> issue.
> When paged messages are being consumed, decrementMetrics in 
> QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are 
> durable or not. In this way queue will be locked for a long time because page 
> may be GCed and need to be reload entirely. Other operations rely on queue 
> will be blocked at this time, which cause a significant TPS drop. Detailed 
> stacks are attached below.
> This also happens when consumer is closed and messages are pushed back to the 
> queue, artemis will check priority on return if these messages are paged.
> To solve the issue, durable and priority need to be cached in PagedReference 
> just like messageID, transactionID and so on. I have applied a patch to fix 
> the issue. Any review is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARTEMIS-2214) Cache durable in PagedReference to avoid blocks in consuming paged messages

2018-12-25 Thread Qihong Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARTEMIS-2214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Xu updated ARTEMIS-2214:
---
Attachment: stacks.txt

> Cache durable in PagedReference to avoid blocks in consuming paged 
> messages
> 
>
> Key: ARTEMIS-2214
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2214
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.6.3
>Reporter: Qihong Xu
>Priority: Major
> Attachments: stacks.txt
>
>
> We recently performed a test on artemis broker and found a severe performance 
> issue.
> When paged messages are being consumed, decrementMetrics in 
> QueuePendingMessageMetrics will try to ‘getMessage’ to check whether they are 
> durable or not. In this way queue will be locked for a long time because page 
> may be GCed and need to be reload entirely. Other operations rely on queue 
> will be blocked at this time, which cause a significant TPS drop. Detailed 
> stacks are attached below.
> This also happens when consumer is closed and messages are pushed back to the 
> queue, artemis will check priority on return if these messages are paged.
> To solve the issue, durable and priority need to be cached in PagedReference 
> just like messageID, transactionID and so on. I have applied a patch to fix 
> the issue. Any review is appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARTEMIS-1700) Server stopped responding and killed itself while exiting paging state

2018-02-23 Thread Qihong Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/ARTEMIS-1700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375225#comment-16375225
 ] 

Qihong Xu commented on ARTEMIS-1700:


[~nigro@gmail.com] Yes, we just use the default setting here.

> Server stopped responding and killed itself while exiting paging state
> --
>
> Key: ARTEMIS-1700
> URL: https://issues.apache.org/jira/browse/ARTEMIS-1700
> Project: ActiveMQ Artemis
>  Issue Type: Bug
>  Components: Broker
>Affects Versions: 2.4.0
>Reporter: Qihong Xu
>Priority: Major
> Attachments: artemis.log
>
>
> We are currently experiencing this error while running stress test on artemis.
>  
> Basic configuration:
> 1 broker ,1 topic, pub-sub mode.
> Journal type = MAPPED. 
> Threadpool max size = 60.
>  
> In order to test the throughput of artemis we use 300 producers and 300 
> consumers. However we found that sometimes when artemis exit paging state, it 
> will stop responding and kill itself. This situatuion happened on some 
> specific servers.
>  
> Details can be found in attached dump file.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARTEMIS-1700) Server stopped responding and killed itself while exiting paging state

2018-02-23 Thread Qihong Xu (JIRA)
Qihong Xu created ARTEMIS-1700:
--

 Summary: Server stopped responding and killed itself while exiting 
paging state
 Key: ARTEMIS-1700
 URL: https://issues.apache.org/jira/browse/ARTEMIS-1700
 Project: ActiveMQ Artemis
  Issue Type: Bug
  Components: Broker
Affects Versions: 2.4.0
Reporter: Qihong Xu
 Attachments: artemis.log

We are currently experiencing this error while running stress test on artemis.

 

Basic configuration:

1 broker ,1 topic, pub-sub mode.

Journal type = MAPPED. 

Threadpool max size = 60.

 

In order to test the throughput of artemis we use 300 producers and 300 
consumers. However we found that sometimes when artemis exit paging state, it 
will stop responding and kill itself. This situatuion happened on some specific 
servers.

 

Details can be found in attached dump file.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-2964) "Conf" command returns dataDir and dataLogDir opposingly

2018-01-08 Thread Qihong Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Xu updated ZOOKEEPER-2964:
-
Summary: "Conf" command returns dataDir and dataLogDir opposingly  (was: 
dataDir and dataLogDir are printed opposingly)

> "Conf" command returns dataDir and dataLogDir opposingly
> 
>
> Key: ZOOKEEPER-2964
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2964
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3
>Reporter: Qihong Xu
>Priority: Minor
> Attachments: ZOOKEEPER-2964.patch
>
>
> I foung a bug that "conf" command would return dataDir and dataLogDir 
> opposingly.
> This bug only exists in versions newer than 3.5. I only found dumpConf in 
> [ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L188]
>  prints these two paths opposingly. Unlike ZOOKEEPER-2960, the actual paths 
> are not affected and server function is ok.
> I made a small patch to fix this bug. Any review is appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-2964) dataDir and dataLogDir are printed opposingly

2018-01-07 Thread Qihong Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Xu updated ZOOKEEPER-2964:
-
Attachment: ZOOKEEPER-2964.patch

> dataDir and dataLogDir are printed opposingly
> -
>
> Key: ZOOKEEPER-2964
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2964
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3
>Reporter: Qihong Xu
>Priority: Minor
> Attachments: ZOOKEEPER-2964.patch
>
>
> I foung a bug that "conf" command would return dataDir and dataLogDir 
> opposingly.
> This bug only exists in versions newer than 3.5. I only found dumpConf in 
> [ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L188]
>  prints these two paths opposingly. Unlike ZOOKEEPER-2960, the actual paths 
> are not affected and server function is ok.
> I made a small patch to fix this bug. Any review is appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-2964) dataDir and dataLogDir are printed opposingly

2018-01-07 Thread Qihong Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Xu updated ZOOKEEPER-2964:
-
Description: 
I foung a bug that "conf" command would return dataDir and dataLogDir 
opposingly.

This bug only exists in versions newer than 3.5. I only found dumpConf in 
[ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L188]
 prints these two paths opposingly. Unlike ZOOKEEPER-2960, the actual paths are 
not affected and server function is ok.

I made a small patch to fix this bug. Any review is appreciated.

  was:
I foung a bug that "conf" command would return dataDir and dataLogDir 
opposingly.

This bug only exists in versions newer than 3.5. I only found dumpConf in 
[ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L188]
 print these two paths opposingly. Unlike ZOOKEEPER-2960, the actual paths are 
not affected and server function is ok.


> dataDir and dataLogDir are printed opposingly
> -
>
> Key: ZOOKEEPER-2964
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2964
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3
>Reporter: Qihong Xu
>Priority: Minor
>
> I foung a bug that "conf" command would return dataDir and dataLogDir 
> opposingly.
> This bug only exists in versions newer than 3.5. I only found dumpConf in 
> [ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L188]
>  prints these two paths opposingly. Unlike ZOOKEEPER-2960, the actual paths 
> are not affected and server function is ok.
> I made a small patch to fix this bug. Any review is appreciated.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZOOKEEPER-2964) dataDir and dataLogDir are printed opposingly

2018-01-07 Thread Qihong Xu (JIRA)
Qihong Xu created ZOOKEEPER-2964:


 Summary: dataDir and dataLogDir are printed opposingly
 Key: ZOOKEEPER-2964
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2964
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.3
Reporter: Qihong Xu
Priority: Minor


I foung a bug that "conf" command would return dataDir and dataLogDir 
opposingly.

This bug only exists in versions newer than 3.5. I only found dumpConf in 
[ZookeeperServer.java|https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L188]
 print these two paths opposingly. Unlike ZOOKEEPER-2960, the actual paths are 
not affected and server function is ok.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ZOOKEEPER-2927) Local session reconnect validation not forward to leader

2017-10-28 Thread Qihong Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qihong Xu updated ZOOKEEPER-2927:
-
Description: 
When zookeeper quorum recovers from shutdown/crash, a client with a local 
session will reconnect to a random server in quorum. If this random-chosen 
server is not leader and does not own the local session previously, it will 
forward this session to leader for validation. And then if this is a global 
session, leader will update its owner, if not, leader adds Boolean false to 
packet and does nothing. 

Since our system involves mostly local session and has a large amount of 
connections, this procedure may be redundant and add potential pressure to 
leader. Is this reasonable for the reconnect scenario that local session 
validation not forward to leader, instead return by follower directly? 


  was:
When zookeeper quorum recovers from shutdown/crash, a client with a local 
session will reconnect to a random server in quorum. If this random-chosen 
server is not leader and does not own the local session previously, it will 
forward this session to leader for validation. And then if this is a global 
session, leader will update its owner, if not, leader adds Boolean false to 
packet and does nothing. 

Since our system involves mostly local session and has a large amount of 
connections, this procedure may be redundant and add potential pressure to 
leader. Is this reasonable for the reconnect scenario that local session does 
not forward to leader, instead return by follower directly? 



> Local session reconnect validation not forward to leader
> 
>
> Key: ZOOKEEPER-2927
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2927
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client, quorum, server
>Affects Versions: 3.5.3
> Environment: configuration management system based on zookeeper 3.5.3
>Reporter: Qihong Xu
>Priority: Minor
>
> When zookeeper quorum recovers from shutdown/crash, a client with a local 
> session will reconnect to a random server in quorum. If this random-chosen 
> server is not leader and does not own the local session previously, it will 
> forward this session to leader for validation. And then if this is a global 
> session, leader will update its owner, if not, leader adds Boolean false to 
> packet and does nothing. 
> Since our system involves mostly local session and has a large amount of 
> connections, this procedure may be redundant and add potential pressure to 
> leader. Is this reasonable for the reconnect scenario that local session 
> validation not forward to leader, instead return by follower directly? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)