[jira] [Commented] (PHOENIX-4704) Presplit index tables when building asynchronously

2018-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487917#comment-16487917
 ] 

Hudson commented on PHOENIX-4704:
-

FAILURE: Integrated in Jenkins build PreCommit-PHOENIX-Build #1885 (See 
[https://builds.apache.org/job/PreCommit-PHOENIX-Build/1885/])
PHOENIX-4704 Presplit index tables when building asynchronously (vincentpoon: 
rev 6ab9b372f16f37b11e657b6803c6a60007815824)
* (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexToolIT.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexTool.java


> Presplit index tables when building asynchronously
> --
>
> Key: PHOENIX-4704
> URL: https://issues.apache.org/jira/browse/PHOENIX-4704
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4704.master.v1.patch, 
> PHOENIX-4704.master.v2.patch
>
>
> For large data tables with many regions, if we build the index asynchronously 
> using the IndexTool, the index table will initial face a hotspot as all data 
> region mappers attempt to write to the sole new index region.  This can 
> potentially lead to the index getting disabled if writes to the index table 
> timeout during this hotspotting.
> We can add an optional step (or perhaps activate it based on the count of 
> regions in the data table) to the IndexTool to first do a MR job to gather 
> stats on the indexed column values, and then attempt to presplit the index 
> table before we do the actual index build MR job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4704) Presplit index tables when building asynchronously

2018-05-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483432#comment-16483432
 ] 

Hudson commented on PHOENIX-4704:
-

SUCCESS: Integrated in Jenkins build Phoenix-4.x-HBase-0.98 #1902 (See 
[https://builds.apache.org/job/Phoenix-4.x-HBase-0.98/1902/])
PHOENIX-4704 Presplit index tables when building asynchronously (vincentpoon: 
rev fce9a6712faf8df3117372a6cdf244d420e829d1)
* (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexToolIT.java


> Presplit index tables when building asynchronously
> --
>
> Key: PHOENIX-4704
> URL: https://issues.apache.org/jira/browse/PHOENIX-4704
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4704.master.v1.patch, 
> PHOENIX-4704.master.v2.patch
>
>
> For large data tables with many regions, if we build the index asynchronously 
> using the IndexTool, the index table will initial face a hotspot as all data 
> region mappers attempt to write to the sole new index region.  This can 
> potentially lead to the index getting disabled if writes to the index table 
> timeout during this hotspotting.
> We can add an optional step (or perhaps activate it based on the count of 
> regions in the data table) to the IndexTool to first do a MR job to gather 
> stats on the indexed column values, and then attempt to presplit the index 
> table before we do the actual index build MR job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4704) Presplit index tables when building asynchronously

2018-05-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481461#comment-16481461
 ] 

Hudson commented on PHOENIX-4704:
-

FAILURE: Integrated in Jenkins build Phoenix-4.x-HBase-0.98 #1899 (See 
[https://builds.apache.org/job/Phoenix-4.x-HBase-0.98/1899/])
PHOENIX-4704 Presplit index tables when building asynchronously (vincentpoon: 
rev 2f35fe3069bdb48a0603bda2dc59bea1f3145f0d)
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexTool.java
* (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexToolIT.java


> Presplit index tables when building asynchronously
> --
>
> Key: PHOENIX-4704
> URL: https://issues.apache.org/jira/browse/PHOENIX-4704
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4704.master.v1.patch, 
> PHOENIX-4704.master.v2.patch
>
>
> For large data tables with many regions, if we build the index asynchronously 
> using the IndexTool, the index table will initial face a hotspot as all data 
> region mappers attempt to write to the sole new index region.  This can 
> potentially lead to the index getting disabled if writes to the index table 
> timeout during this hotspotting.
> We can add an optional step (or perhaps activate it based on the count of 
> regions in the data table) to the IndexTool to first do a MR job to gather 
> stats on the indexed column values, and then attempt to presplit the index 
> table before we do the actual index build MR job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4704) Presplit index tables when building asynchronously

2018-05-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481419#comment-16481419
 ] 

Hudson commented on PHOENIX-4704:
-

FAILURE: Integrated in Jenkins build Phoenix-4.x-HBase-1.3 #138 (See 
[https://builds.apache.org/job/Phoenix-4.x-HBase-1.3/138/])
PHOENIX-4704 Presplit index tables when building asynchronously (vincentpoon: 
rev 52304092f5876ab9c1086e954f2c5b0ba875a03e)
* (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/IndexToolIT.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexTool.java


> Presplit index tables when building asynchronously
> --
>
> Key: PHOENIX-4704
> URL: https://issues.apache.org/jira/browse/PHOENIX-4704
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4704.master.v1.patch, 
> PHOENIX-4704.master.v2.patch
>
>
> For large data tables with many regions, if we build the index asynchronously 
> using the IndexTool, the index table will initial face a hotspot as all data 
> region mappers attempt to write to the sole new index region.  This can 
> potentially lead to the index getting disabled if writes to the index table 
> timeout during this hotspotting.
> We can add an optional step (or perhaps activate it based on the count of 
> regions in the data table) to the IndexTool to first do a MR job to gather 
> stats on the indexed column values, and then attempt to presplit the index 
> table before we do the actual index build MR job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4704) Presplit index tables when building asynchronously

2018-05-18 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481170#comment-16481170
 ] 

James Taylor commented on PHOENIX-4704:
---

How about a follow up JIRA to use the same technique when the index is build 
synchronously, [~vincentpoon]?

> Presplit index tables when building asynchronously
> --
>
> Key: PHOENIX-4704
> URL: https://issues.apache.org/jira/browse/PHOENIX-4704
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4704.master.v1.patch, 
> PHOENIX-4704.master.v2.patch
>
>
> For large data tables with many regions, if we build the index asynchronously 
> using the IndexTool, the index table will initial face a hotspot as all data 
> region mappers attempt to write to the sole new index region.  This can 
> potentially lead to the index getting disabled if writes to the index table 
> timeout during this hotspotting.
> We can add an optional step (or perhaps activate it based on the count of 
> regions in the data table) to the IndexTool to first do a MR job to gather 
> stats on the indexed column values, and then attempt to presplit the index 
> table before we do the actual index build MR job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4704) Presplit index tables when building asynchronously

2018-05-17 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480176#comment-16480176
 ] 

James Taylor commented on PHOENIX-4704:
---

+1. Great work, [~vincentpoon]! Let's get this nice feature into 4.14.

> Presplit index tables when building asynchronously
> --
>
> Key: PHOENIX-4704
> URL: https://issues.apache.org/jira/browse/PHOENIX-4704
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4704.master.v1.patch
>
>
> For large data tables with many regions, if we build the index asynchronously 
> using the IndexTool, the index table will initial face a hotspot as all data 
> region mappers attempt to write to the sole new index region.  This can 
> potentially lead to the index getting disabled if writes to the index table 
> timeout during this hotspotting.
> We can add an optional step (or perhaps activate it based on the count of 
> regions in the data table) to the IndexTool to first do a MR job to gather 
> stats on the indexed column values, and then attempt to presplit the index 
> table before we do the actual index build MR job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4704) Presplit index tables when building asynchronously

2018-05-17 Thread Vincent Poon (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479977#comment-16479977
 ] 

Vincent Poon commented on PHOENIX-4704:
---

and thanks [~aertoria] for the TABLESAMPLE feature that makes this possible!

> Presplit index tables when building asynchronously
> --
>
> Key: PHOENIX-4704
> URL: https://issues.apache.org/jira/browse/PHOENIX-4704
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Major
> Attachments: PHOENIX-4704.master.v1.patch
>
>
> For large data tables with many regions, if we build the index asynchronously 
> using the IndexTool, the index table will initial face a hotspot as all data 
> region mappers attempt to write to the sole new index region.  This can 
> potentially lead to the index getting disabled if writes to the index table 
> timeout during this hotspotting.
> We can add an optional step (or perhaps activate it based on the count of 
> regions in the data table) to the IndexTool to first do a MR job to gather 
> stats on the indexed column values, and then attempt to presplit the index 
> table before we do the actual index build MR job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4704) Presplit index tables when building asynchronously

2018-05-17 Thread Vincent Poon (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479971#comment-16479971
 ] 

Vincent Poon commented on PHOENIX-4704:
---

[~jamestaylor] mind doing a review?  This enhances IndexTool with an option to 
use TABLESAMPLE to sample the data table and presplit the index table.  There's 
also an option to only split if the data table has > N regions.

 

> Presplit index tables when building asynchronously
> --
>
> Key: PHOENIX-4704
> URL: https://issues.apache.org/jira/browse/PHOENIX-4704
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Vincent Poon
>Assignee: Vincent Poon
>Priority: Major
> Attachments: PHOENIX-4704.master.v1.patch
>
>
> For large data tables with many regions, if we build the index asynchronously 
> using the IndexTool, the index table will initial face a hotspot as all data 
> region mappers attempt to write to the sole new index region.  This can 
> potentially lead to the index getting disabled if writes to the index table 
> timeout during this hotspotting.
> We can add an optional step (or perhaps activate it based on the count of 
> regions in the data table) to the IndexTool to first do a MR job to gather 
> stats on the indexed column values, and then attempt to presplit the index 
> table before we do the actual index build MR job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4704) Presplit index tables when building asynchronously

2018-04-27 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457151#comment-16457151
 ] 

Andrew Purtell commented on PHOENIX-4704:
-

Even a uniform split into a few regions would be an improvement? (And 
subsequent organic splitting would cause region boundaries to move toward the 
ideal.)

> Presplit index tables when building asynchronously
> --
>
> Key: PHOENIX-4704
> URL: https://issues.apache.org/jira/browse/PHOENIX-4704
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Vincent Poon
>Priority: Major
>
> For large data tables with many regions, if we build the index asynchronously 
> using the IndexTool, the index table will initial face a hotspot as all data 
> region mappers attempt to write to the sole new index region.  This can 
> potentially lead to the index getting disabled if writes to the index table 
> timeout during this hotspotting.
> We can add an optional step (or perhaps activate it based on the count of 
> regions in the data table) to the IndexTool to first do a MR job to gather 
> stats on the indexed column values, and then attempt to presplit the index 
> table before we do the actual index build MR job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)