[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-05-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001482#comment-16001482
 ] 

Lars Hofhansl commented on PHOENIX-3806:


Going to commit to 4.x.* unless I hear objections today.

> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Vincent Poon
> Attachments: PHOENIX-3806.v1.patch
>
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-05-05 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998793#comment-15998793
 ] 

James Taylor commented on PHOENIX-3806:
---

+1. Nice work, [~vincentpoon]. I can't think of a reason why a TreeSet wasn't 
used in the first place. I was thinking perhaps because the underlying 
collection needed to be thread safe, but PriorityQueue isn't thread safe 
either. Maybe [~jesse_yates] remembers?

> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
> Attachments: PHOENIX-3806.v1.patch
>
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-05-04 Thread Vincent Poon (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997680#comment-15997680
 ] 

Vincent Poon commented on PHOENIX-3806:
---

put up a patch that simply changes the SortedCollection to a TreeSet, so that 
we don't have to do a sort every time we get an iterator on a collection.

For a mutation with 200 versions, this brings the runtime down from 90 seconds 
to around 1 second.

The test for this patch will be part of PHOENIX-3824.  That also fixes the 
arithmetic series summation problem described earlier.

> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-05-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997676#comment-15997676
 ] 

ASF GitHub Bot commented on PHOENIX-3806:
-

GitHub user vincentpoon opened a pull request:

https://github.com/apache/phoenix/pull/243

PHOENIX-3806 IndexUpdateManager spending a lot of time sorting mutati…

…ons on Index rebuild

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vincentpoon/phoenix PHOENIX-3806

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/phoenix/pull/243.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #243


commit e6e849763ccb74a687c7fd4d180ad7113206735d
Author: Vincent 
Date:   2017-05-04T23:17:28Z

PHOENIX-3806 IndexUpdateManager spending a lot of time sorting mutations on 
Index rebuild




> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-04-26 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985520#comment-15985520
 ] 

Lars Hofhansl commented on PHOENIX-3806:


Yeah we need to look at all the versions.

I think most importantly we should fix summation issue that [~vincentpoon] 
identified. Imagine we had 1000 versions... We'd be making 500500 index updates 
per row and col.

> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-04-26 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15985223#comment-15985223
 ] 

Samarth Jain commented on PHOENIX-3806:
---

After discussing this yesterday I realized that we cannot rely on the max 
versions attribute that is set on HTableDescriptor. We *need* to look at all 
the versions of all the rows that have been changed since the index was 
disabled. This is because for index maintenance we need to issue delete markers 
for those index rows where the index column was updated in the corresponding 
data table row. This actually exposes a hole in our mutable index 
implementation. If the max versions on a table is set to a lower number (like 
1) and if the major compaction runs before index is rebuild, we will end up 
losing these mutations on the data table rows. This will make the index 
inconsistent.

> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-04-25 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983858#comment-15983858
 ] 

Samarth Jain commented on PHOENIX-3806:
---

One thing we could do is to look in the HTableDescriptor of the data table and 
see what the "versions" attribute is set to and then use the same number of 
version when doing raw scan (instead of doing scan.setMaxVersions())

> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-04-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982344#comment-15982344
 ] 

Lars Hofhansl commented on PHOENIX-3806:


[~vincentpoon], in your case we should only be doing 100 index updates?

> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-04-24 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982243#comment-15982243
 ] 

James Taylor commented on PHOENIX-3806:
---

How about we limit the number of versions we consider to only the one right 
before our mutation timestamp? I don't think versions earlier than that are 
relevant.

> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-04-24 Thread Vincent Poon (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982230#comment-15982230
 ] 

Vincent Poon commented on PHOENIX-3806:
---

What I find is that if I create 100 versions of a row, 
NonTxIndexBuilder#createTimestampBatchesFromMutation creates 100 batches based 
on timestamp.

Then for each batch, we call addMutationsForBatch, which in step 3 does a loop 
and adds index updates for all timestamps up to the current batch's timestamp.  
All the index updates are DELETE updates.

What this means is that overall, the # of updates you have is the summation of 
the series 1...100.  So you have something like 5050 index updates, and because 
of the issues described above, you're sorting 5050 times.

And of course as you create more versions, the numbers quickly become 
unfeasible.

> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-04-24 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982197#comment-15982197
 ] 

James Taylor commented on PHOENIX-3806:
---

IMHO, we'd want to change the code such that we add all versions of a row to 
the data structure first, and then sort once. We also may be able to get away 
with only adding the last version before the version of which we have. We'd 
know this since we're getting back the versions in timestamp order. I don't 
think adding versions earlier than this impact the "answer", where the 
"question" is "What is the prior value of column x?"

> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-04-24 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981864#comment-15981864
 ] 

Samarth Jain commented on PHOENIX-3806:
---

The above suggestion won't work since we support point in time queries. And we 
would want to apply all mutations (all versions) in that case.

> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-04-24 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981699#comment-15981699
 ] 

Samarth Jain commented on PHOENIX-3806:
---

If there is a util to convert from a data table row to a corresponding index 
row, then we can just get away by scanning only one version of the row. This 
way we won't have to do a raw scan and fetch all the versions. We would need to 
run the scan on the data table where the lower bound of time range would be 
index_disable_timestamp and upper bound would be the time at which index was 
switched from disable to inactive.

> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3806) IndexUpdateManager spending a lot of time sorting mutations on Index rebuild

2017-04-24 Thread Vincent Poon (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981677#comment-15981677
 ] 

Vincent Poon commented on PHOENIX-3806:
---

Some more detail on this issue:

What we found was that the sorting being doing in 
IndexUpdateManager#fixUpCurrentUpdates can potentially be called a large number 
of times, and it seems to only happen when triggered from the 
BuildIndexScheduleTask code path.In our case the collection being sorted 
had 50001 Deletes because even though the indexing rebuilding code batches the 
mutations it is replaying, there is a raw scan that happens which can surface 
all the versions for the mutations.  From a quick glance, it seems the raw scan 
is determined by whether BaseScannerRegionObserver.IGNORE_NEWER_MUTATIONS is 
set.

So there are number of avenues to explore here - can the batching be done 
better?  Do we need all versions?  Certainly it seems the sorting in 
fixUpCurrentUpdates can be done better, by sorting once rather than on every 
loop iteration.  I'll look into that further.

Steps to reproduce (perhaps can be converted to a unit test):
1)  Produce a large number of versions for a single row
2)  Disable the index on the table with "alter index  on  disable"
3)  Make sure the index_disable_timestamp is set (filed PHOENIX-3810)
4) The rebuilder should trigger the sort for a large number of deletes


> IndexUpdateManager spending a lot of time sorting mutations on Index rebuild
> 
>
> Key: PHOENIX-3806
> URL: https://issues.apache.org/jira/browse/PHOENIX-3806
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Here's the stack trace. The Array contains 50001 Delete Mutations in this 
> case.
> It seems the code is sorting this over and over again.
> {code}
> Thread 170 (B.DefaultRpcServer.handler=67,queue=7,port=60020):
>   State: RUNNABLE
>   Blocked count: 220598
>   Waited count: 377933
>   Stack:
> java.util.TimSort.binarySort(TimSort.java:296)
> java.util.TimSort.sort(TimSort.java:239)
> java.util.Arrays.sort(Arrays.java:1438)
> 
> org.apache.phoenix.hbase.index.covered.update.SortedCollection.iterator(SortedCollection.java:78)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.fixUpCurrentUpdates(IndexUpdateManager.java:128)
> 
> org.apache.phoenix.hbase.index.covered.update.IndexUpdateManager.addIndexUpdate(IndexUpdateManager.java:115)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:333)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:258)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:231)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
> 
> org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:137)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:133)
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
> 
> com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:61)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
> 
> org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
> 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:144)
> 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:324)
> Thread 169 (B.DefaultRpcServer.handler=66,queue=6,port=60020):
> {code}
> [~jamestaylor]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)