[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328028#comment-15328028
 ] 

ASF GitHub Bot commented on SOLR-8744:
--

Github user dragonsinth closed the pull request at:

https://github.com/apache/lucene-solr/pull/42


> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326290#comment-15326290
 ] 

ASF subversion and git services commented on SOLR-8744:
---

Commit 4726c5b2d2efa9ba160b608d46a977d0a6b83f94 in lucene-solr's branch 
refs/heads/branch_6_1 from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4726c5b ]

SOLR-8744: Minimize the impact on ZK when there are a lot of blocked tasks


> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326247#comment-15326247
 ] 

ASF subversion and git services commented on SOLR-8744:
---

Commit ff6475c3a7229a27f7e97057fbbfadd2600032dd in lucene-solr's branch 
refs/heads/branch_6x from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ff6475c ]

SOLR-8744: Minimize the impact on ZK when there are a lot of blocked tasks


> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326246#comment-15326246
 ] 

ASF subversion and git services commented on SOLR-8744:
---

Commit 232b44e283dfd01f9ec01b4e68b09b3755a1b17a in lucene-solr's branch 
refs/heads/master from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=232b44e ]

SOLR-8744: Minimize the impact on ZK when there are a lot of blocked tasks


> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-10 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15325119#comment-15325119
 ] 

Scott Blum commented on SOLR-8744:
--

That said, if you'd feel more comfortable committing your patch as-is, I can 
toss my tweaks down after.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-10 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324928#comment-15324928
 ] 

Scott Blum commented on SOLR-8744:
--

Attached a slightly modified patchfile with the changes I suggested.
Otherwise LGTM

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-10 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324896#comment-15324896
 ] 

Scott Blum commented on SOLR-8744:
--

Also, add blockedTasks to printTrackingMaps()?

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-10 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324891#comment-15324891
 ] 

Scott Blum commented on SOLR-8744:
--

LG.  I only have one suggestion left, to formulate the "fetch" section like 
this:

{code}
  ArrayList heads = new ArrayList<>(blockedTasks.size() + 
MAX_PARALLEL_TASKS);
  heads.addAll(blockedTasks.values());

  // If we have enough items in the blocked tasks already, it makes
  // no sense to read more items from the work queue. it makes sense
  // to clear out at least a few items in the queue before we read more 
items
  if (heads.size() < MAX_BLOCKED_TASKS) {
//instead of reading MAX_PARALLEL_TASKS items always, we should 
only fetch as much as we can execute
int toFetch = Math.min(MAX_BLOCKED_TASKS - heads.size(), 
MAX_PARALLEL_TASKS - runningTasks.size());
List newTasks = workQueue.peekTopN(toFetch, 
excludedTasks, 2000L);
log.debug("Got {} tasks from work-queue : [{}]", newTasks.size(), 
newTasks.toString());
heads.addAll(newTasks);
  } else {
Thread.sleep(1000);
  }

  if (isClosed) break;

  if (heads.isEmpty()) {
continue;
  }

  blockedTasks.clear(); // clear it now; may get refilled below.
{code}

This prevents two problems:

1) The log.debug message "Got {} tasks from work-queue" won't keep reporting 
blockedTasks as if they were freshly fetched.
2) When the blockedTask map gets completely full, the Thread.sleep() will 
prevent a free-spin (the only other real pause in the loop is peekTopN)

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-

[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-10 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324026#comment-15324026
 ] 

Noble Paul commented on SOLR-8744:
--

That is true, they may not be necessarily blocked, They are essentially items 
read from work queue which we don't want to refetch

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-10 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324024#comment-15324024
 ] 

Noble Paul commented on SOLR-8744:
--

I will have to beast this test and see if I can reproduce this

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-10 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15324022#comment-15324022
 ] 

Noble Paul commented on SOLR-8744:
--

Yes, but I'm not sure if it's worth. The other counts asnd assumptions will be 
valid if I do this.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323589#comment-15323589
 ] 

ASF GitHub Bot commented on SOLR-8744:
--

Github user dragonsinth commented on the issue:

https://github.com/apache/lucene-solr/pull/42
  
FYI: both of these SHAs passed the test suite


> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-09 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323478#comment-15323478
 ] 

Scott Blum commented on SOLR-8744:
--

See second commit in https://github.com/apache/lucene-solr/pull/42

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323476#comment-15323476
 ] 

ASF GitHub Bot commented on SOLR-8744:
--

GitHub user dragonsinth opened a pull request:

https://github.com/apache/lucene-solr/pull/42

SOLR-8744 blockedTasks

@noblepaul 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fullstorydev/lucene-solr SOLR-8744

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/42.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #42


commit 668d426633fe5551b24ab38036e14c15e7ed4cdf
Author: Scott Blum 
Date:   2016-06-09T21:28:07Z

WIP: SOLR-8744 blockedTasks

commit 4b2512abcc5e55c653c923eba9762988cf1faae8
Author: Scott Blum 
Date:   2016-06-09T22:11:26Z

Simplifications




> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-09 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323447#comment-15323447
 ] 

Scott Blum commented on SOLR-8744:
--

One other comment:

{code}
// We are breaking out if we already have reached the no:of 
parallel tasks running
// By doing so we may end up discarding the old list of blocked 
tasks . But we have
// no means to know if they would still be blocked after some of 
the items ahead
// were cleared.
if (runningTasks.size() >= MAX_PARALLEL_TASKS) break;
{code}

In this case, can't we just shove all the remaining tasks into blockedTasks?  
It's a slight redefinition to mean either "tasks that are blocked because of 
locks" or "tasks blocked because too many are running".

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-09 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323436#comment-15323436
 ] 

Scott Blum commented on SOLR-8744:
--

Actually, I may have a fix.  You need a Thread.sleep() in the final loop or you 
can burn through 500 iterations on a fast machine.  Here's a patch.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, 
> SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-09 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323393#comment-15323393
 ] 

Scott Blum commented on SOLR-8744:
--

I got one test failure patching this into master:

MultiThreadedOCPTest fails reliably:

java.lang.AssertionError
at 
__randomizedtesting.SeedInfo.seed([7F3C69CF3DAFFDD5:F76856159353902D]:0)
at org.junit.Assert.fail(Assert.java:92)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertTrue(Assert.java:54)
at 
org.apache.solr.cloud.MultiThreadedOCPTest.testFillWorkQueue(MultiThreadedOCPTest.java:106)


> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, 
> SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-09 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323207#comment-15323207
 ] 

Scott Blum commented on SOLR-8744:
--

BTW, I landed SOLR-9191 in master and 6x, so you should be good to go on that 
front.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, 
> SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-09 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323205#comment-15323205
 ] 

Scott Blum commented on SOLR-8744:
--

Mostly LG.  One completely minor comment:

{code}
+  if (blockedTasks.size() < MAX_BLOCKED_TASKS) {
+blockedTasks.put(head.getId(), head);
+  }
{code}

Maybe unnecessary?  It doesn't actually matter if blockedTasks.size() gets to 
1100 items, the check at the top of the loop will keep it from growing 
endlessly.  If you drop these tasks on the floor, you'll just end up 
re-fetching the bytes from ZK later.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>Priority: Blocker
>  Labels: sharding, solrcloud
> Fix For: 6.1
>
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, 
> SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-07 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319849#comment-15319849
 ] 

Noble Paul commented on SOLR-8744:
--

Sure Scott. Let's commit that first. Let's make these two a blocker for 6.1.0 
release.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-07 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319289#comment-15319289
 ] 

Scott Blum commented on SOLR-8744:
--

btw: mind if I land SOLR-9191 first?  I will potentially need to backport that 
change to a bunch of branches which don't have the LockTree stuff, so it'll be 
a little cleaner port if I land first.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-07 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15319026#comment-15319026
 ] 

Scott Blum commented on SOLR-8744:
--

Yeah 1000 seems totally reasonable.  Beyond that, it seems kind of pointless; 
if you have 1000 tasks blocked, chances are that subsequent tasks aren't going 
to be able to run either.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-07 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318967#comment-15318967
 ] 

Noble Paul commented on SOLR-8744:
--

bq. think we need a cap on the total size of blockedTasks; otherwise in the 
face of a cluster level task, you'll read everything in ZK into memory 
continually and grow blockedTasks without bound.

It's a tricky situation. It is very cheap to hold a few 1000 tasks in memory 
(or even 1s) . While we have limited no:of threads to process. It still 
makes sense to read newer tasks hoping that they will be able to process 
without getting blocked. I shall address these issues (as much as possible) and 
post a fresh patch.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-07 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318816#comment-15318816
 ] 

Scott Blum commented on SOLR-8744:
--

Some quick feedback:

- I think we need a cap on the total size of blockedTasks; otherwise in the 
face of a cluster level task, you'll read everything in ZK into memory 
continually and grow blockedTasks without bound.

1) "while (runningTasks.size() > MAX_PARALLEL_TASKS) { ...wait... }" should 
account for blockedTasks.size() as well, or else maybe there should be a 
separate sleep check on blockedTasks being too full

2) "workQueue.peekTopN(MAX_PARALLEL_TASKS" could also peek for a smaller number 
of tasks when blockedTasks() is almost full.  Something like:

workQueue.peekTopN(Math.min(MAX_PARALLEL_TASKS - runningTasks.size(), 
MAX_BLOCKED_TASKS - blockedTasks.size(), ...

- excludedTasks could be a Sets.union() of the other two if you want to 
automatically retain things like toString() and size() instead of a Function



> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-07 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317982#comment-15317982
 ] 

Noble Paul commented on SOLR-8744:
--

We have another problem because of tasks which cannot be run because they can't 
acquire a lock.

imagine we have around 2000 items (this is the current limit) in the 
work-queue. New tasks are  added to the work queue. The call to {{peekTopN()}} 
would return only the same set of 2000 items. Newly added items do not get 
fetched. This leads to 2 problems

# New tasks which could be processed, do not get a chance
# ZK is unnecessarily hammered with requests to fetches the same set of items 
again and again. 



> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312017#comment-15312017
 ] 

ASF subversion and git services commented on SOLR-8744:
---

Commit 734dcb99fcdd557ef046166193ae804824023db3 in lucene-solr's branch 
refs/heads/branch_6x from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=734dcb9 ]

SOLR-8744: Overseer operations performed with fine grained mutual exclusion


> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-02 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312007#comment-15312007
 ] 

ASF subversion and git services commented on SOLR-8744:
---

Commit 459a9c77a6a9b807deb98c58225d4d0ec1f75bac in lucene-solr's branch 
refs/heads/master from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=459a9c7 ]

SOLR-8744: Overseer operations performed with fine grained mutual exclusion


> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-01 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311583#comment-15311583
 ] 

Noble Paul commented on SOLR-8744:
--

I'm planning to commit this soon

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-06-01 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15311044#comment-15311044
 ] 

Scott Blum commented on SOLR-8744:
--

Hi, any update on this!  Seemed like it was really close.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SOLR-8744.patch, SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-25 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300417#comment-15300417
 ] 

Scott Blum commented on SOLR-8744:
--

Good stuff, new LockTree looking good.

When does TaskBatch.batchId get assigned?  It looks like it's always 0 right 
now.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SOLR-8744.patch, 
> SmileyLockTree.java, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299479#comment-15299479
 ] 

Noble Paul commented on SOLR-8744:
--

bq. One more big question for me: why does the LockTree need a 
clusterStateProvider?

Exactly. I could create a lock blindly and don't bother whether that entity 
actually exists. 

bq.One more comment, instead of making LockTree.LockObject public

True

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, 
> SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-24 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298841#comment-15298841
 ] 

Scott Blum commented on SOLR-8744:
--

Relatedly, lockTask should probably ensure that the appropriate number of path 
elements are non-nil for the given lock level.  E.g. a shard lock level should 
ensure that message.getStr(ZkStateReader.SHARD_ID_PROP) is non-null.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, 
> SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-24 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298839#comment-15298839
 ] 

Scott Blum commented on SOLR-8744:
--

[~noble.paul] One more big question for me: why does the LockTree need a 
clusterStateProvider?  I don't follow the need to call findChildren in order to 
prepopulate locks at a given level.  In many cases, you might need to lock 
something that doesn't really even exist.  For example, create collection or 
create shard or create replica should acquire the lock for the thing they are 
about to try to create, even if the thing doesn't exist yet.  This code:

{code}
if (child == null) {
  LOG.info("locktree_Unable to get path for " + currentAction + " path 
" + path);
  return FREELOCK;//no such entity . So no need to lock
}
{code}

It seems to me that this basically isn't true.  I don't think the LockTree 
needs knowledge of the actual elements in the real cluster state tree?  I'm 
also not sure the Level actually matters beyond OCMH.lockTask, where we really 
just need to translate the lock level into the right-length path.  Does that 
make any sense?


> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, 
> SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-24 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298822#comment-15298822
 ] 

Scott Blum commented on SOLR-8744:
--

Or make "OverseerLock" a top-level interface type in the package.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, 
> SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-24 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298821#comment-15298821
 ] 

Scott Blum commented on SOLR-8744:
--

[~noblepaul] One more comment, instead of making LockTree.LockObject public, I 
think you could just have LockObject implement the Lock interface and return 
the interface.  Then you don't have to do the "return lock == null ? null : 
lock::unlock" weirdness.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, 
> SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-24 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298781#comment-15298781
 ] 

David Smiley commented on SOLR-8744:


Maybe my ideas to simplify should really be its own issue because there's 
apparently more to it than the locking -- it's interrelated with  
OverseerTaskProcessor itself.  I think the SmileyLockTree.java could be part of 
a solution (assuming I'm not missing requirements overall by the OTP) but by 
itself with OTP's present design, it is perhaps not enough / not right.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, 
> SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-24 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298756#comment-15298756
 ] 

David Smiley commented on SOLR-8744:


bq. (Noble) David , It works. But it is not yet complete. But, I miss the 
point. What are we trying to solve? Is the current implementation buggy? We 
just need one correct implementation.

I am merely offering an alternative implementation to part of the task that I 
feel is more simple / elegant (as I stated).  In part I'm doing this because 
it's fun/interesting :-)  Other than complexity of it and related code already 
in OverseerTaskProcessor, I'm not sure what bugs may or may not be in your 
patch or the existing code.

bq. (Scott) Have you run your impl against the test Noble Paul wrote? Curious 
if it passes.

No; I'd like to do that.  I suggest [~noble.paul] commit to a branch, push, and 
I commit on top (assuming tests pass).  Of course it will be reverted if you 
all don't like it.

bq. (Scott) LayeredLock.tryLock() doesn't really implement markBusy() because 
it doesn't retain readLocks on the parent chain if you fail to lock the child. 
To implement markBusy you'd need to leave them locked (and have a way to later 
completely reset all the read locks).

Thanks for clarifying on IRC why the Overseer would want to use tryLock() 
instead of lock() -- something that confused me from your question/statement.  
I _had thought_ OverseerTaskProcessor was going to spawn off a task/thread 
(more likely Executor impl) that would simply call lock() in its own thread and 
it may wait as long as needed to acquire the lock and to do its task.  I am 
quite ready to admit I don't know what's actually going on here ;-P  which is 
apparently the case.  Let me ask you this then... why *doesn't* it work in this 
manner?  It sounds simple enough.  I do see an executor (tpe field), but 
there's so much surrounding code that it is confusing my ability to grok it and 
thus why tryLock vs lock might be wanted.  In such a design (in my simple 
conceptual model), fairness=true should be on the ReentrantReadWriteLocks in 
SmileyLockTree to prevent starvation to get FIFO behavior on the write lock.  
If it's too complicated to explain, maybe let me have at it on a branch.  And 
again, I may very well not appreciate (yet) why OverseerTaskProcessor with it's 
proposed LockTree needs to be so complicated.  So please don't take any of my 
questions or suggestions as a slight.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, 
> SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, 

[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298486#comment-15298486
 ] 

Noble Paul commented on SOLR-8744:
--

David , It works.  But it is not yet complete. But, I miss the point. What are 
we trying to solve?  Is the current implementation buggy? We just need one 
correct implementation. I have no reason to reimplement everything unless there 
is something wrong with the current implementation. I'm mostly done with the 
current patch and I plan to commit this as soon as I clean up the cruft. 

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, 
> SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-24 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298482#comment-15298482
 ] 

Scott Blum commented on SOLR-8744:
--

Have you run your impl against the test Noble Paul wrote?  Curious if it passes.

I think I grok it, but I found a few problems on inspection:

- LayeredLock.tryLock() doesn't really implement markBusy() because it doesn't 
retain readLocks on the parent chain if you fail to lock the child.  To 
implement markBusy you'd need to leave them locked (and have a way to later 
completely reset all the read locks).

- I still think implementing the j.u.c.L interface is silly here, tryLock() is 
the only operation that makes sense in context.

- You can move the synchronized(locks) from the private recursive method into 
the public one and acquire it once.
- You can move the ArrayList copy from the private recursive method into the 
public one and only make a single copy that you later take sublists of.

- Just as a note: you'd need to occasionally throw away the entire tree as it 
only grows and never shrinks.



> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SOLR-8744.patch, SmileyLockTree.java, 
> SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297771#comment-15297771
 ] 

Noble Paul commented on SOLR-8744:
--

Thanks [~dsmiley]

The implementation is *wrong*

It's not enough to ensure that your parent is not locked. It is also required 
that none of your children are locked too. For example, {{C1}} cannot be locked 
if {{C1/S1/R1}} or {{C1/S2/R1}} is locked

The flat map and substring based approach is expensive in terms of resources. 
Every substring() call creates a new String. It does not have to be that 
expensive when you can achieve the same with a map lookup. It may look simple, 
but I would say it's inefficient and lazy. 

Using the java Lock is contrived because we don't need any of that 
functionality. We just need a mark() unmark() functionality. There are no real 
locks required here. 

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch, SmileyLockTree.java
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-23 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297713#comment-15297713
 ] 

Noble Paul commented on SOLR-8744:
--

bq. Node would keep track of its parent only, not children.

I do not know how to traverse down a tree without keeping track of the 
children. 

bq. No Session or LockObject.

The session is created for a purpose.  It is to abstract out the starvation 
logic.

I'm not sure I understand your design. Maybe an alternate patch with an 
implementation of LockTree would help

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-23 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297710#comment-15297710
 ] 

Noble Paul commented on SOLR-8744:
--

bq. In OverseerTaskProcessor, handing off the lock object (up to and including 
the tpe.execute call) should probably move inside the try block; on 
successfully executing the task, null out the lock local, and then put an "if 
lock != null unlock" into a finally block.

The runner is always executed in a different thread. So the finally block will 
always evaluate {{lock != null}} to true. Actually, only in case of an 
exception, OverseerTaskProcessor should unlock it

bq.Do we need some kind of complete reset in the event stuff really blows up? 
Should there be a session-level clear that just unlocks everything?

Yes, we need it, but not at the session level. If I clear up everything for 
each session, it defeats the purpose. The session is valid for only one batch. 
The locks should survive for multiple batches.

A better solution is to periodically check if the running tasks are empty and 
if yes, just create a new {{LockTree}}. I was planning t do it anyway

bq. I think MigrateStateFormat could be collection level rather than cluster 
level?

I have set the lock levels more pessimistically. We should re-evaluate them

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-23 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297100#comment-15297100
 ] 

Scott Blum commented on SOLR-8744:
--

Digging into the LockTree now to understand it better.  Some preliminary notes:

- In OverseerTaskProcessor, handing off the lock object (up to and including 
the tpe.execute call) should probably move inside the try block; on 
successfully executing the task, null out the lock local, and then put an "if 
lock != null unlock" into a finally block.

- Do we need some kind of complete reset in the event stuff really blows up?  
Should there be a session-level clear that just unlocks everything?

- I think MigrateStateFormat could be collection level rather than cluster 
level?


> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-23 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297086#comment-15297086
 ] 

David Smiley commented on SOLR-8744:


The main purpose of my suggestion is simplicity:  LockTree (public) and 
internally LockTree.Node (private) -- that's it; no new abstractions, and it'd 
have very little code.  Node would keep track of it's parent only, not 
children.  No Session or LockObject.  Returning a JDK Lock helps in that regard 
(it is an existing abstraction and if we wanted to it'd be easy to support all 
its methods) but it's not critical -- my suggested LockTree.Node could be 
public and we don't implement Lock if you feel that's better.  I disagree with 
you on the semantics being different vs the same but whatever. It just so 
happens that my suggestion might yield something general purpose but that's not 
my objective, it's simplicity.  Locking on fewer objects is nice too, but 
perhaps it won't become a real issue with your proposal.  Perhaps I'm not fully 
appreciating the task at hand -- I admit I glossed over the starvation issue 
and the code here concerned with marking busy/not.  I hoped the "fairness" 
boolean configuration to JDK's ReentrantReadWriteLock would address that but 
maybe not?

bq. (me) Getting the key is a simple matter of stringKey.substring(0, 
stringKey.lastIndexOf('/'))

Just a minor implementation detail to get a parent key instead of working with 
a List -- nevermind.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-23 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296569#comment-15296569
 ] 

Noble Paul commented on SOLR-8744:
--

Thanks [~dsmiley] for your comments

bq.which we only obtain a fixed small number of locks, which is for each parent.

I miss that . Only one operation should be able to lock at a level.

bq.lockTree maintains a cache/map of string keys to weakly reference-able 
LockNode. 

It's an optimisation that could be done.  Here we are just talking about a few 
hundred (may be  afew thousand? ) extremely lightweight objects

bq.Getting the key is a simple matter of stringKey.substring(0, 
stringKey.lastIndexOf('/')) 

I didn't get that one

bq.LockNode implements the JDK Lock interface

Java {{Lock}} interface makes me implement too many things. In this, we don't 
even expose the {{lock()}} method. The semantics are much different here. 
implementing/using java {{Lock}} interface sets a wrong set of expectations. 


{{LockTree}} is not a general purpose utility. I wrote it as an independent 
class because it is a complex enough functionality which should be tested 
independently. 


> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-23 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296426#comment-15296426
 ] 

David Smiley commented on SOLR-8744:


I looked over the patch just a little bit but the suggestion I'm about to give 
is mostly based on the issue description & comments.  This is an 
interesting/fun programming problem.  The main thing I don't like about the 
current design (and this is not a blocker, I'm not vetoing) is that we obtain 
locks recursively on down to every object of lower depths (ultimately to 
replicas).  Instead, I propose we consider an alternative design (to follow) in 
which we only obtain a fixed small number of locks, which is for each parent.  
This lowers the number of locks, and also allows the actual lock nodes to be 
much more dynamic, being GC'ed when the locks aren't in use.
Proposal:
* LockTree maintains a cache/map of string keys to weakly reference-able 
LockNode.  That's right; they'll get GC'ed when not in use.  The root node will 
be strongly referenced apart from the map so it's always there.  It's just a 
string key, with expectation we'll use some separator to denote hierarchy.  
When looking up a lock, we simply synchronize as obtaining locks will be 
infrequent.  When looking up a Lock, it will recursively look up a parent, 
creating it first if not there.  Getting the key is a simple matter of 
stringKey.substring(0, stringKey.lastIndexOf('/')) and assuming all keys start 
with a leading "/" (thus the root key is the empty string).
* LockNode implements the JDK Lock interface, but only implementing {{lock()}} 
and {{unlock()}}.  This makes the API easy for consumers -- it's a known 
abstraction for code using this utility.
* LockNode's fields consist of a parent LockNode and a JDK 
ReentrantReadWriteLock.  The ReentrantReadWriteLock is instantiated with 
fairness=true.
* LockNode.lock: call parent.readLock() (see below) then call 
readWriteLock.writeLock().lock().
* LockNode.unlock: call readWriteLock.writeLock.unlock(); then 
parent.readUnlock(); (see below)
* LockNode.readLock (private method): call readLock on parent first (this is 
recursive), and then after that call readWriteLock.readLock().lock().  Thus we 
lock from top-down (from root down) in effect.
* LockNode.readUnlock (private method): call readWriteLock.readLock() then 
recursively call parent.readUnlock().

What do you think?

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would 

[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-22 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295648#comment-15295648
 ] 

Noble Paul commented on SOLR-8744:
--

Patch with tests . More testing required. But please review the class 
{{LockTree}} to see the logic implemented

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
> Attachments: SOLR-8744.patch
>
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-11 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281187#comment-15281187
 ] 

Scott Blum commented on SOLR-8744:
--

Good plan!

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-11 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281175#comment-15281175
 ] 

Noble Paul commented on SOLR-8744:
--

good question. Starvation is indeed a problem and we must make the best effort 
to run tasks on a first come first served basis

We can address it by making the scheduling itself smarter as follows:

As the scheduler thread starts with trying to acquire a lock, it could build a 
tree of its own with  nodes marked 'busy' in a local tree for previously 
unassaignable tasks. The local tree would look as follows.

If the Task {{T1}} needs a lock on Collection {{C1}} and it is unable to 
acquire a lock, The local tree would look as follows
{code}
Cluster
  /
 /
C1
{code}
When task {{T2}} is taken up, which requires a lock on {{Cluster -> C1 -> S1}} 
, It would not try to acquire a lock from loct tree because the path {{Cluster 
->C1}} is already is marked as 'busy' in the local tree.  So {{T2}} would 
remain in the work queue till {{T1}} is completed.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8744) Overseer operations need more fine grained mutual exclusion

2016-05-11 Thread Scott Blum (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280534#comment-15280534
 ] 

Scott Blum commented on SOLR-8744:
--

Design looks good to me.

One think to talk through, this may not be a problem per se, but I could 
imagine starvation becoming an issue.  If a task is trying to lock the cluster 
it may never get to run if smaller tasks keep acquiring pieces of the lock 
tree.  This may not be an issue in practice.

> Overseer operations need more fine grained mutual exclusion
> ---
>
> Key: SOLR-8744
> URL: https://issues.apache.org/jira/browse/SOLR-8744
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 5.4.1
>Reporter: Scott Blum
>Assignee: Noble Paul
>  Labels: sharding, solrcloud
>
> SplitShard creates a mutex over the whole collection, but, in practice, this 
> is a big scaling problem.  Multiple split shard operations could happen at 
> the time time, as long as different shards are being split.  In practice, 
> those shards often reside on different machines, so there's no I/O bottleneck 
> in those cases, just the mutex in Overseer forcing the operations to be done 
> serially.
> Given that a single split can take many minutes on a large collection, this 
> is a bottleneck at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need 
> exclusive access at various levels. Each operation must define the Access 
> level at which the access is required. Access level is an enum. 
> CLUSTER(0)
> COLLECTION(1)
> SHARD(2)
> REPLICA(3)
> The Overseer node maintains a tree of these locks. The lock tree would look 
> as follows. The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>  Cluster
> /   \
>/ \ 
>   C1  C2
>  / \ /   \ 
> /   \   / \  
>S1   S2  S1 S2
> R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate 
> lock from the tree. For example, if an operation needs a lock at a Collection 
> level and it needs to operate on Collection C1, the node C1 and all child 
> nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and 
> start moving down depending upon the operation. After it reaches the right 
> node, it checks if all the children are free from a lock.  If it fails to 
> acquire a lock, it remains in the work queue. A scheduler thread waits for 
> notification from the current set of tasks . Every task would do a 
> {{notify()}} on the monitor of  the scheduler thread. The thread would start 
> from the head of the queue and check all tasks to see if that task is able to 
> acquire the right lock. If yes, it is executed, if not, the task is left in 
> the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just 
> try to schedule that task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org