[jira] [Updated] (PHOENIX-4925) Use a Variant Segment tree to organize Guide Post Info
[ https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4925: - Attachment: PHOENIX-4925.phoenix-stats.003.patch > Use a Variant Segment tree to organize Guide Post Info > -- > > Key: PHOENIX-4925 > URL: https://issues.apache.org/jira/browse/PHOENIX-4925 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-4925.phoenix-stats.003.patch, > PHOENIX-4925.phoenix-stats.0502.patch, PHOENIX-4925.phoenix-stats.0510.patch > > Time Spent: 13h 40m > Remaining Estimate: 0h > > As reported, Query compilation (for the sample queries showed below), > especially deriving estimation and generating parallel scans from guide > posts, becomes much slower after we introduced Phoenix Stats. > a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c > b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY > Pk1__c > c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY > pk1__c,pk2__c > d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c > ORDER BY pk1__c,pk2__c > e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= > 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only > column to make the primary key. > > By using prefix encoding for guide post info, we have to decode and traverse > guide posts sequentially, which causes time complexity in > BaseResultIterators.getParallelScan(...) to be O( n ) , where n is the total > count of guide posts. > According to PHOENIX-2417, to reduce footprint in client cache and over > transmition, the prefix encoding is used as in-memory and over-the-wire > encoding for guide post info. > We can use Segment Tree to address both memory and performance concerns. The > guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by > prefix encoding and the encoded data is a leaf node of the tree. The inner > node contains summary info (the count of rows, the data size) of the sub tree > rooted at the inner node. > With this tree like data structure, compared to the current data structure, > the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. > The time complexity for queries a, b, c can be reduced to O(m) where m is the > total count of regions; the time complexity for "EXPLAN" queries a, b, c can > be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can > even be reduced to O(1). For queries d and e, the time complexity to find the > start of target scan ranges can be reduced to O(log(n/k)). > The tree can also integrate AVL and B+ characteristics to support partial > load/unload when interacting with stats client cache. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4927) Disentangle the granularity of guidepost data from the granularity stored in the client cache
[ https://issues.apache.org/jira/browse/PHOENIX-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4927: - Summary: Disentangle the granularity of guidepost data from the granularity stored in the client cache (was: Disentangle the granularity of guidepost data from that of client cached guide post data) > Disentangle the granularity of guidepost data from the granularity stored in > the client cache > - > > Key: PHOENIX-4927 > URL: https://issues.apache.org/jira/browse/PHOENIX-4927 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > The expected behaviors: > # It should be possible to have 10MB guideposts for precision and not force > that to cache that many guide posts on the clients. > # It should combine as many guideposts into chunks as needed for the > particular query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4925) Use a Variant Segment tree to organize Guide Post Info
[ https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4925: - Attachment: PHOENIX-4925.phoenix-stats.0510.patch > Use a Variant Segment tree to organize Guide Post Info > -- > > Key: PHOENIX-4925 > URL: https://issues.apache.org/jira/browse/PHOENIX-4925 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-4925.phoenix-stats.0502.patch, > PHOENIX-4925.phoenix-stats.0510.patch > > Time Spent: 6h > Remaining Estimate: 0h > > As reported, Query compilation (for the sample queries showed below), > especially deriving estimation and generating parallel scans from guide > posts, becomes much slower after we introduced Phoenix Stats. > a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c > b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY > Pk1__c > c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY > pk1__c,pk2__c > d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c > ORDER BY pk1__c,pk2__c > e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= > 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only > column to make the primary key. > > By using prefix encoding for guide post info, we have to decode and traverse > guide posts sequentially, which causes time complexity in > BaseResultIterators.getParallelScan(...) to be O( n ) , where n is the total > count of guide posts. > According to PHOENIX-2417, to reduce footprint in client cache and over > transmition, the prefix encoding is used as in-memory and over-the-wire > encoding for guide post info. > We can use Segment Tree to address both memory and performance concerns. The > guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by > prefix encoding and the encoded data is a leaf node of the tree. The inner > node contains summary info (the count of rows, the data size) of the sub tree > rooted at the inner node. > With this tree like data structure, compared to the current data structure, > the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. > The time complexity for queries a, b, c can be reduced to O(m) where m is the > total count of regions; the time complexity for "EXPLAN" queries a, b, c can > be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can > even be reduced to O(1). For queries d and e, the time complexity to find the > start of target scan ranges can be reduced to O(log(n/k)). > The tree can also integrate AVL and B+ characteristics to support partial > load/unload when interacting with stats client cache. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4925) Use a Variant Segment tree to organize Guide Post Info
[ https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4925: - Summary: Use a Variant Segment tree to organize Guide Post Info (was: Use a Variant Segment tree to organize Guide Post Info.) > Use a Variant Segment tree to organize Guide Post Info > -- > > Key: PHOENIX-4925 > URL: https://issues.apache.org/jira/browse/PHOENIX-4925 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > As reported, Query compilation (for the sample queries showed below), > especially deriving estimation and generating parallel scans from guide > posts, becomes much slower after we introduced Phoenix Stats. > a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c > b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY > Pk1__c > c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY > pk1__c,pk2__c > d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c > ORDER BY pk1__c,pk2__c > e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= > 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only > column to make the primary key. > > By using prefix encoding for guide post info, we have to decode and traverse > guide posts sequentially, which causes time complexity in > BaseResultIterators.getParallelScan(...) to be O( n ) , where n is the total > count of guide posts. > According to PHOENIX-2417, to reduce footprint in client cache and over > transmition, the prefix encoding is used as in-memory and over-the-wire > encoding for guide post info. > We can use Segment Tree to address both memory and performance concerns. The > guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by > prefix encoding and the encoded data is a leaf node of the tree. The inner > node contains summary info (the count of rows, the data size) of the sub tree > rooted at the inner node. > With this tree like data structure, compared to the current data structure, > the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. > The time complexity for queries a, b, c can be reduced to O(m) where m is the > total count of regions; the time complexity for "EXPLAN" queries a, b, c can > be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can > even be reduced to O(1). For queries d and e, the time complexity to find the > start of target scan ranges can be reduced to O(log(n/k)). > The tree can also integrate AVL and B+ characteristics to support partial > load/unload when interacting with stats client cache. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4925) Use a Variant Segment tree to organize Guide Post Info.
[ https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4925: - Attachment: (was: PHOENIX-4925.phoenix-stats.0502.patch) > Use a Variant Segment tree to organize Guide Post Info. > --- > > Key: PHOENIX-4925 > URL: https://issues.apache.org/jira/browse/PHOENIX-4925 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > As reported, Query compilation (for the sample queries showed below), > especially deriving estimation and generating parallel scans from guide > posts, becomes much slower after we introduced Phoenix Stats. > a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c > b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY > Pk1__c > c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY > pk1__c,pk2__c > d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c > ORDER BY pk1__c,pk2__c > e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= > 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only > column to make the primary key. > > By using prefix encoding for guide post info, we have to decode and traverse > guide posts sequentially, which causes time complexity in > BaseResultIterators.getParallelScan(...) to be O( n ) , where n is the total > count of guide posts. > According to PHOENIX-2417, to reduce footprint in client cache and over > transmition, the prefix encoding is used as in-memory and over-the-wire > encoding for guide post info. > We can use Segment Tree to address both memory and performance concerns. The > guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by > prefix encoding and the encoded data is a leaf node of the tree. The inner > node contains summary info (the count of rows, the data size) of the sub tree > rooted at the inner node. > With this tree like data structure, compared to the current data structure, > the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. > The time complexity for queries a, b, c can be reduced to O(m) where m is the > total count of regions; the time complexity for "EXPLAN" queries a, b, c can > be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can > even be reduced to O(1). For queries d and e, the time complexity to find the > start of target scan ranges can be reduced to O(log(n/k)). > The tree can also integrate AVL and B+ characteristics to support partial > load/unload when interacting with stats client cache. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4925) Use a Variant Segment tree to organize Guide Post Info.
[ https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4925: - Summary: Use a Variant Segment tree to organize Guide Post Info. (was: Use a Variant Segment tree to organize Guide Post Info) > Use a Variant Segment tree to organize Guide Post Info. > --- > > Key: PHOENIX-4925 > URL: https://issues.apache.org/jira/browse/PHOENIX-4925 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > As reported, Query compilation (for the sample queries showed below), > especially deriving estimation and generating parallel scans from guide > posts, becomes much slower after we introduced Phoenix Stats. > a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c > b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY > Pk1__c > c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY > pk1__c,pk2__c > d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c > ORDER BY pk1__c,pk2__c > e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= > 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only > column to make the primary key. > > By using prefix encoding for guide post info, we have to decode and traverse > guide posts sequentially, which causes time complexity in > BaseResultIterators.getParallelScan(...) to be O( n ) , where n is the total > count of guide posts. > According to PHOENIX-2417, to reduce footprint in client cache and over > transmition, the prefix encoding is used as in-memory and over-the-wire > encoding for guide post info. > We can use Segment Tree to address both memory and performance concerns. The > guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by > prefix encoding and the encoded data is a leaf node of the tree. The inner > node contains summary info (the count of rows, the data size) of the sub tree > rooted at the inner node. > With this tree like data structure, compared to the current data structure, > the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. > The time complexity for queries a, b, c can be reduced to O(m) where m is the > total count of regions; the time complexity for "EXPLAN" queries a, b, c can > be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can > even be reduced to O(1). For queries d and e, the time complexity to find the > start of target scan ranges can be reduced to O(log(n/k)). > The tree can also integrate AVL and B+ characteristics to support partial > load/unload when interacting with stats client cache. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4925) Use a Variant Segment tree to organize Guide Post Info
[ https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4925: - Summary: Use a Variant Segment tree to organize Guide Post Info (was: Use Segment tree to organize Guide Post Info) > Use a Variant Segment tree to organize Guide Post Info > -- > > Key: PHOENIX-4925 > URL: https://issues.apache.org/jira/browse/PHOENIX-4925 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > Time Spent: 6h > Remaining Estimate: 0h > > As reported, Query compilation (for the sample queries showed below), > especially deriving estimation and generating parallel scans from guide > posts, becomes much slower after we introduced Phoenix Stats. > a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c > b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY > Pk1__c > c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY > pk1__c,pk2__c > d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c > ORDER BY pk1__c,pk2__c > e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= > 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only > column to make the primary key. > > By using prefix encoding for guide post info, we have to decode and traverse > guide posts sequentially, which causes time complexity in > BaseResultIterators.getParallelScan(...) to be O( n ) , where n is the total > count of guide posts. > According to PHOENIX-2417, to reduce footprint in client cache and over > transmition, the prefix encoding is used as in-memory and over-the-wire > encoding for guide post info. > We can use Segment Tree to address both memory and performance concerns. The > guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by > prefix encoding and the encoded data is a leaf node of the tree. The inner > node contains summary info (the count of rows, the data size) of the sub tree > rooted at the inner node. > With this tree like data structure, compared to the current data structure, > the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. > The time complexity for queries a, b, c can be reduced to O(m) where m is the > total count of regions; the time complexity for "EXPLAN" queries a, b, c can > be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can > even be reduced to O(1). For queries d and e, the time complexity to find the > start of target scan ranges can be reduced to O(log(n/k)). > The tree can also integrate AVL and B+ characteristics to support partial > load/unload when interacting with stats client cache. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-5265) Phoenix Test should use gold files for result comparison instead of using hard-corded comparison.
[ https://issues.apache.org/jira/browse/PHOENIX-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5265: - Description: Currently, in Phoenix Test, the comparison of the returned result and the expected result is always hard-coded, which is especially widely used by E2E and Integration test for comparing the query result, including the result of EXPLAIN query plan. This has significantly impaired the productivity and efficiency of workin on Phoenix. The right approach is, each test case should write the generated result to a file under a new directory, then compares this file with a gold file under "gold" directory. After we make some code changes and make sure the change is correct by manually verifying several specific test cases, we can safely replace the whole "gold" directory with the directory which contains all new generated test files during test running. In this way, we can easily rebase all the tests. For example, in BaseStatsCollectorIT.java, we verify the estimated row size in the returned result of EXPLAIN query plan, the row size is decided by many factors, like the column is encoded or not, it is mutable or not, it uses transaction provider or not, it uses TEPHRA or OMID as transaction provider, etc. The code snippet "testWithMultiCF" shows a typical test case. The comparisons of the result result and the expected result are hard-coded in those asserts. Now imagine, if we change the way collecting stats or we change column encoding scheme or we change the cell storage format for TEPHRA or OMID, which is very likely to happen, then we need to manually change all those hard-coded comparison everywhere, and it isn't trivial to re-calculate all expected row sizes with the different conditions . Today you might need one or two weeks to rebase all the tests, which is a huge waste. We should use "gold" files here, so that we can rebase the test very easily. BTW, the new generated test result files and gold files should be in XML or JSON. The result of "EXPLAN" query should be in XML or JSON too, because the file format matches the structure of a query plan. {code:java} // code placeholder @Test public void testWithMultiCF() throws Exception { int nRows = 20; Connection conn = getConnection(0); PreparedStatement stmt; conn.createStatement().execute( "CREATE TABLE " + fullTableName + "(k VARCHAR PRIMARY KEY, a.v INTEGER, b.v INTEGER, c.v INTEGER NULL, d.v INTEGER NULL) " + tableDDLOptions ); stmt = conn.prepareStatement("UPSERT INTO " + fullTableName + " VALUES(?,?, ?, ?, ?)"); byte[] val = new byte[250]; for (int i = 0; i < nRows; i++) { stmt.setString(1, Character.toString((char)('a' + i)) + Bytes.toString(val)); stmt.setInt(2, i); stmt.setInt(3, i); stmt.setInt(4, i); stmt.setInt(5, i); stmt.executeUpdate(); } conn.commit(); stmt = conn.prepareStatement("UPSERT INTO " + fullTableName + "(k, c.v, d.v) VALUES(?,?,?)"); for (int i = 0; i < 5; i++) { stmt.setString(1, Character.toString((char)('a' + 'z' + i)) + Bytes.toString(val)); stmt.setInt(2, i); stmt.setInt(3, i); stmt.executeUpdate(); } conn.commit(); ResultSet rs; String actualExplainPlan; collectStatistics(conn, fullTableName); List keyRanges = getAllSplits(conn, fullTableName); assertEquals(26, keyRanges.size()); rs = conn.createStatement().executeQuery("EXPLAIN SELECT * FROM " + fullTableName); actualExplainPlan = QueryUtil.getExplainPlan(rs); assertEquals( "CLIENT 26-CHUNK 25 ROWS " + (columnEncoded ? ( mutable ? "12530" : "14190" ) : (TransactionFactory.Provider.OMID.name().equals(transactionProvider)) ? "25320" : "12420") + " BYTES PARALLEL 1-WAY FULL SCAN OVER " + physicalTableName, actualExplainPlan); ConnectionQueryServices services = conn.unwrap(PhoenixConnection.class).getQueryServices(); List regions = services.getAllTableRegions(Bytes.toBytes(physicalTableName)); assertEquals(1, regions.size()); collectStatistics(conn, fullTableName, Long.toString(1000)); keyRanges = getAllSplits(conn, fullTableName); boolean oneCellPerColFamilyStorageScheme = !mutable && columnEncoded; boolean hasShadowCells = TransactionFactory.Provider.OMID.name().equals(transactionProvider); assertEquals(oneCellPerColFamilyStorageScheme ? 14 : hasShadowCells ? 24 : 13, keyRanges.size()); rs = conn .createStatement() .executeQuery( "SELECT COLUMN_FAMILY,SUM(GUIDE_POSTS_ROW_COUNT),SUM(GUIDE_POSTS_WIDTH),COUNT(*) from \"SYSTEM\".STATS where PHYSICAL_NAME = '" + physicalTableName + "' GROUP BY COLUMN_FAMILY ORDER BY COLUMN_FAMILY"); assertTrue(rs.next());
[jira] [Updated] (PHOENIX-5265) Phoenix Test should use gold files for result comparison instead of using hard-corded comparison.
[ https://issues.apache.org/jira/browse/PHOENIX-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5265: - Description: Currently, in Phoenix Test, the comparison of the returned result and the expected result is always hard-coded, which is especially widely used by E2E and Integration test for comparing the query result, including the result of EXPLAIN query plan. This has significantly impaired the productivity and efficiency of workin on Phoenix. The right approach is, each test case should write the generated result to a file under a new directory, then compares this file with a gold file under "gold" directory. After we make some code changes and make sure the change is correct by manually verifying several specific test cases, we can safely replace the whole "gold" directory with the directory which contains all new generated test files during test running. In this way, we can easily rebase all the tests. For example, in BaseStatsCollectorIT.java, we verify the estimated row size in the returned result of EXPLAIN query plan, the row size is decided by many factors, like the column is encoded or not, it is mutable or not, it uses transaction provider or not, it uses TEPHRA or OMID as transaction provider, etc. The code snippet "testWithMultiCF" shows a typical test case. The comparisons of the result result and the expected result are hard-coded in those asserts. Now imagine, if we change the way collecting stats or we change column encoding scheme or we change the cell storage format for TEPHRA or OMID, which is very likely to happen, then we need to manually change all those hard-coded comparison everywhere, and it isn't trivial to re-calculate all expected row sizes with the different conditions . Today you might need one or two weeks to rebase all the tests, which is a huge waste. We should use "gold" files here, so that we can rebase the test very easily. {code:java} // code placeholder @Test public void testWithMultiCF() throws Exception { int nRows = 20; Connection conn = getConnection(0); PreparedStatement stmt; conn.createStatement().execute( "CREATE TABLE " + fullTableName + "(k VARCHAR PRIMARY KEY, a.v INTEGER, b.v INTEGER, c.v INTEGER NULL, d.v INTEGER NULL) " + tableDDLOptions ); stmt = conn.prepareStatement("UPSERT INTO " + fullTableName + " VALUES(?,?, ?, ?, ?)"); byte[] val = new byte[250]; for (int i = 0; i < nRows; i++) { stmt.setString(1, Character.toString((char)('a' + i)) + Bytes.toString(val)); stmt.setInt(2, i); stmt.setInt(3, i); stmt.setInt(4, i); stmt.setInt(5, i); stmt.executeUpdate(); } conn.commit(); stmt = conn.prepareStatement("UPSERT INTO " + fullTableName + "(k, c.v, d.v) VALUES(?,?,?)"); for (int i = 0; i < 5; i++) { stmt.setString(1, Character.toString((char)('a' + 'z' + i)) + Bytes.toString(val)); stmt.setInt(2, i); stmt.setInt(3, i); stmt.executeUpdate(); } conn.commit(); ResultSet rs; String actualExplainPlan; collectStatistics(conn, fullTableName); List keyRanges = getAllSplits(conn, fullTableName); assertEquals(26, keyRanges.size()); rs = conn.createStatement().executeQuery("EXPLAIN SELECT * FROM " + fullTableName); actualExplainPlan = QueryUtil.getExplainPlan(rs); assertEquals( "CLIENT 26-CHUNK 25 ROWS " + (columnEncoded ? ( mutable ? "12530" : "14190" ) : (TransactionFactory.Provider.OMID.name().equals(transactionProvider)) ? "25320" : "12420") + " BYTES PARALLEL 1-WAY FULL SCAN OVER " + physicalTableName, actualExplainPlan); ConnectionQueryServices services = conn.unwrap(PhoenixConnection.class).getQueryServices(); List regions = services.getAllTableRegions(Bytes.toBytes(physicalTableName)); assertEquals(1, regions.size()); collectStatistics(conn, fullTableName, Long.toString(1000)); keyRanges = getAllSplits(conn, fullTableName); boolean oneCellPerColFamilyStorageScheme = !mutable && columnEncoded; boolean hasShadowCells = TransactionFactory.Provider.OMID.name().equals(transactionProvider); assertEquals(oneCellPerColFamilyStorageScheme ? 14 : hasShadowCells ? 24 : 13, keyRanges.size()); rs = conn .createStatement() .executeQuery( "SELECT COLUMN_FAMILY,SUM(GUIDE_POSTS_ROW_COUNT),SUM(GUIDE_POSTS_WIDTH),COUNT(*) from \"SYSTEM\".STATS where PHYSICAL_NAME = '" + physicalTableName + "' GROUP BY COLUMN_FAMILY ORDER BY COLUMN_FAMILY"); assertTrue(rs.next()); assertEquals("A", rs.getString(1)); assertEquals(25, rs.getInt(2)); assertEquals(columnEncoded ? ( mutable ? 12530 : 14190 ) : hasShadowCells ? 25320 : 12420, rs.getInt(3));
[jira] [Created] (PHOENIX-5265) Phoenix Test should use gold files for result comparison instead of using hard-corded comparison.
Bin Shi created PHOENIX-5265: Summary: Phoenix Test should use gold files for result comparison instead of using hard-corded comparison. Key: PHOENIX-5265 URL: https://issues.apache.org/jira/browse/PHOENIX-5265 Project: Phoenix Issue Type: Improvement Environment: {code:java} // code placeholder @Test public void testWithMultiCF() throws Exception { int nRows = 20; Connection conn = getConnection(0); PreparedStatement stmt; conn.createStatement().execute( "CREATE TABLE " + fullTableName + "(k VARCHAR PRIMARY KEY, a.v INTEGER, b.v INTEGER, c.v INTEGER NULL, d.v INTEGER NULL) " + tableDDLOptions ); stmt = conn.prepareStatement("UPSERT INTO " + fullTableName + " VALUES(?,?, ?, ?, ?)"); byte[] val = new byte[250]; for (int i = 0; i < nRows; i++) { stmt.setString(1, Character.toString((char)('a' + i)) + Bytes.toString(val)); stmt.setInt(2, i); stmt.setInt(3, i); stmt.setInt(4, i); stmt.setInt(5, i); stmt.executeUpdate(); } conn.commit(); stmt = conn.prepareStatement("UPSERT INTO " + fullTableName + "(k, c.v, d.v) VALUES(?,?,?)"); for (int i = 0; i < 5; i++) { stmt.setString(1, Character.toString((char)('a' + 'z' + i)) + Bytes.toString(val)); stmt.setInt(2, i); stmt.setInt(3, i); stmt.executeUpdate(); } conn.commit(); ResultSet rs; String actualExplainPlan; collectStatistics(conn, fullTableName); List keyRanges = getAllSplits(conn, fullTableName); assertEquals(26, keyRanges.size()); rs = conn.createStatement().executeQuery("EXPLAIN SELECT * FROM " + fullTableName); actualExplainPlan = QueryUtil.getExplainPlan(rs); assertEquals( "CLIENT 26-CHUNK 25 ROWS " + (columnEncoded ? ( mutable ? "12530" : "14190" ) : (TransactionFactory.Provider.OMID.name().equals(transactionProvider)) ? "25320" : "12420") + " BYTES PARALLEL 1-WAY FULL SCAN OVER " + physicalTableName, actualExplainPlan); ConnectionQueryServices services = conn.unwrap(PhoenixConnection.class).getQueryServices(); List regions = services.getAllTableRegions(Bytes.toBytes(physicalTableName)); assertEquals(1, regions.size()); collectStatistics(conn, fullTableName, Long.toString(1000)); keyRanges = getAllSplits(conn, fullTableName); boolean oneCellPerColFamilyStorageScheme = !mutable && columnEncoded; boolean hasShadowCells = TransactionFactory.Provider.OMID.name().equals(transactionProvider); assertEquals(oneCellPerColFamilyStorageScheme ? 14 : hasShadowCells ? 24 : 13, keyRanges.size()); rs = conn .createStatement() .executeQuery( "SELECT COLUMN_FAMILY,SUM(GUIDE_POSTS_ROW_COUNT),SUM(GUIDE_POSTS_WIDTH),COUNT(*) from \"SYSTEM\".STATS where PHYSICAL_NAME = '" + physicalTableName + "' GROUP BY COLUMN_FAMILY ORDER BY COLUMN_FAMILY"); assertTrue(rs.next()); assertEquals("A", rs.getString(1)); assertEquals(25, rs.getInt(2)); assertEquals(columnEncoded ? ( mutable ? 12530 : 14190 ) : hasShadowCells ? 25320 : 12420, rs.getInt(3)); assertEquals(oneCellPerColFamilyStorageScheme ? 13 : hasShadowCells ? 23 : 12, rs.getInt(4)); assertTrue(rs.next()); assertEquals("B", rs.getString(1)); assertEquals(oneCellPerColFamilyStorageScheme ? 25 : 20, rs.getInt(2)); assertEquals(columnEncoded ? ( mutable ? 5600 : 7260 ) : hasShadowCells ? 11260 : 5540, rs.getInt(3)); assertEquals(oneCellPerColFamilyStorageScheme ? 7 : hasShadowCells ? 10 : 5, rs.getInt(4)); assertTrue(rs.next()); assertEquals("C", rs.getString(1)); assertEquals(25, rs.getInt(2)); assertEquals(columnEncoded ? ( mutable ? 7005 : 7280 ) : hasShadowCells ? 14085 : 6930, rs.getInt(3)); assertEquals(hasShadowCells ? 13 : 7, rs.getInt(4)); assertTrue(rs.next()); assertEquals("D", rs.getString(1)); assertEquals(25, rs.getInt(2)); assertEquals(columnEncoded ? ( mutable ? 7005 : 7280 ) : hasShadowCells ? 14085 : 6930, rs.getInt(3)); assertEquals(hasShadowCells ? 13 : 7, rs.getInt(4)); assertFalse(rs.next()); // Disable stats conn.createStatement().execute("ALTER TABLE " + fullTableName + " SET " + PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH + "=0"); collectStatistics(conn, fullTableName); // Assert that there are no more guideposts rs = conn.createStatement().executeQuery("SELECT count(1) FROM " + PhoenixDatabaseMetaData.SYSTEM_STATS_NAME + " WHERE " + PhoenixDatabaseMetaData.PHYSICAL_NAME + "='" + physicalTableName + "' AND " + PhoenixDatabaseMetaData.COLUMN_FAMILY + " IS NOT NULL"); assertTrue(rs.next()); assertEquals(0, rs.getLong(1)); assertFalse(rs.next()); rs =
[jira] [Updated] (PHOENIX-5262) Wrong Result on Salted table with Varbinary PK
[ https://issues.apache.org/jira/browse/PHOENIX-5262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5262: - Comment: was deleted (was: [~dbwong], The description for this bug isn't accurate. This is a bug generally for Salted Table regardless of PK. If the PK is "k INTEGER PRIMARY KEY", we actually have the same issue.) > Wrong Result on Salted table with Varbinary PK > -- > > Key: PHOENIX-5262 > URL: https://issues.apache.org/jira/browse/PHOENIX-5262 > Project: Phoenix > Issue Type: Bug >Reporter: Daniel Wong >Assignee: Daniel Wong >Priority: Major > Attachments: PHOENIX-5262.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-5176) KeyRange.compareUpperRange(KeyRang 1, KeyRang 2) returns wrong result when two key ranges have the same upper bound values but one is inclusive and another is exclusive
[ https://issues.apache.org/jira/browse/PHOENIX-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5176: - Fix Version/s: (was: 4.14.2) (was: 5.1.0) > KeyRange.compareUpperRange(KeyRang 1, KeyRang 2) returns wrong result when > two key ranges have the same upper bound values but one is inclusive and > another is exclusive > - > > Key: PHOENIX-5176 > URL: https://issues.apache.org/jira/browse/PHOENIX-5176 > Project: Phoenix > Issue Type: Bug >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Blocker > Fix For: 4.15.0 > > > In KeyRange.java, > {color:#262626} public static int compareUpperRange(KeyRange rowKeyRange1, > KeyRange rowKeyRange2) {{color} > {color:#262626} int result = > Boolean.compare(rowKeyRange1.upperUnbound(), > rowKeyRange2.upperUnbound());{color} > {color:#262626} if (result != 0) {{color} > {color:#262626} return result;{color} > {color:#262626} }{color} > {color:#262626} result = > Bytes.BYTES_COMPARATOR.compare(rowKeyRange1.getUpperRange(), > rowKeyRange2.getUpperRange());{color} > {color:#262626} if (result != 0) {{color} > {color:#262626} return result;{color} > {color:#262626} }{color} > {color:#262626} return > Boolean.compare(*rowKeyRange2*.isUpperInclusive(), > *rowKeyRange1*.isUpperInclusive());{color} > {color:#262626} }{color} > {color:#262626} {color} > {color:#262626}The last line in yellow color should be "{color}return > Boolean.compare(*rowKeyRange1*.isUpperInclusive(), > *rowKeyRange2*.isUpperInclusive());". Given rowKeyRange1 [3, 5) and > rowKeyRange2 [3, 5], the function should return -1, but now it returns 1 due > to the bug I mentioned. > > The KeyRange.compareUpperRange is only used in > KeyRange.intersect(List rowKeyRanges1, List > rowKeyRanges2). Given rowKeyRanges1 \{[3, 5), [5, 6)} and rowKeyRanges2\{[3, > 5], [6, 7]}, the function should return \{[3, 5), [5, 5]}, i.e., \{[3, 5]}, > but it seems that now it returns \{[3,5)} due to the bug. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-5176) KeyRange.compareUpperRange(KeyRang 1, KeyRang 2) returns wrong result when two key ranges have the same upper bound values but one is inclusive and another is exclusive
Bin Shi created PHOENIX-5176: Summary: KeyRange.compareUpperRange(KeyRang 1, KeyRang 2) returns wrong result when two key ranges have the same upper bound values but one is inclusive and another is exclusive Key: PHOENIX-5176 URL: https://issues.apache.org/jira/browse/PHOENIX-5176 Project: Phoenix Issue Type: Bug Reporter: Bin Shi In KeyRange.java, {color:#262626} public static int compareUpperRange(KeyRange rowKeyRange1, KeyRange rowKeyRange2) {{color} {color:#262626} int result = Boolean.compare(rowKeyRange1.upperUnbound(), rowKeyRange2.upperUnbound());{color} {color:#262626} if (result != 0) {{color} {color:#262626} return result;{color} {color:#262626} }{color} {color:#262626} result = Bytes.BYTES_COMPARATOR.compare(rowKeyRange1.getUpperRange(), rowKeyRange2.getUpperRange());{color} {color:#262626} if (result != 0) {{color} {color:#262626} return result;{color} {color:#262626} }{color} {color:#262626} return Boolean.compare(*rowKeyRange2*.isUpperInclusive(), *rowKeyRange1*.isUpperInclusive());{color} {color:#262626} }{color} {color:#262626} {color} {color:#262626}The last line in yellow color should be "{color}return Boolean.compare(*rowKeyRange1*.isUpperInclusive(), *rowKeyRange2*.isUpperInclusive());". Given rowKeyRange1 [3, 5) and rowKeyRange2 [3, 5], the function should return -1, but now it returns 1 due to the bug I mentioned. The KeyRange.compareUpperRange is only used in KeyRange.intersect(List rowKeyRanges1, List rowKeyRanges2). Given rowKeyRanges1 \{[3, 5), [5, 6)} and rowKeyRanges2\{[3, 5], [6, 7]}, the function should return \{[3, 5), [5, 5]}, i.e., \{[3, 5]}, but it seems that now it returns \{[3,5)} due to the bug. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PHOENIX-5176) KeyRange.compareUpperRange(KeyRang 1, KeyRang 2) returns wrong result when two key ranges have the same upper bound values but one is inclusive and another is exclusiv
[ https://issues.apache.org/jira/browse/PHOENIX-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi reassigned PHOENIX-5176: Assignee: Bin Shi > KeyRange.compareUpperRange(KeyRang 1, KeyRang 2) returns wrong result when > two key ranges have the same upper bound values but one is inclusive and > another is exclusive > - > > Key: PHOENIX-5176 > URL: https://issues.apache.org/jira/browse/PHOENIX-5176 > Project: Phoenix > Issue Type: Bug >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > In KeyRange.java, > {color:#262626} public static int compareUpperRange(KeyRange rowKeyRange1, > KeyRange rowKeyRange2) {{color} > {color:#262626} int result = > Boolean.compare(rowKeyRange1.upperUnbound(), > rowKeyRange2.upperUnbound());{color} > {color:#262626} if (result != 0) {{color} > {color:#262626} return result;{color} > {color:#262626} }{color} > {color:#262626} result = > Bytes.BYTES_COMPARATOR.compare(rowKeyRange1.getUpperRange(), > rowKeyRange2.getUpperRange());{color} > {color:#262626} if (result != 0) {{color} > {color:#262626} return result;{color} > {color:#262626} }{color} > {color:#262626} return > Boolean.compare(*rowKeyRange2*.isUpperInclusive(), > *rowKeyRange1*.isUpperInclusive());{color} > {color:#262626} }{color} > {color:#262626} {color} > {color:#262626}The last line in yellow color should be "{color}return > Boolean.compare(*rowKeyRange1*.isUpperInclusive(), > *rowKeyRange2*.isUpperInclusive());". Given rowKeyRange1 [3, 5) and > rowKeyRange2 [3, 5], the function should return -1, but now it returns 1 due > to the bug I mentioned. > > The KeyRange.compareUpperRange is only used in > KeyRange.intersect(List rowKeyRanges1, List > rowKeyRanges2). Given rowKeyRanges1 \{[3, 5), [5, 6)} and rowKeyRanges2\{[3, > 5], [6, 7]}, the function should return \{[3, 5), [5, 5]}, i.e., \{[3, 5]}, > but it seems that now it returns \{[3,5)} due to the bug. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4925) Use Segment tree to organize Guide Post Info
[ https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4925: - Description: As reported, Query compilation (for the sample queries showed below), especially deriving estimation and generating parallel scans from guide posts, becomes much slower after we introduced Phoenix Stats. a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY Pk1__c c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY pk1__c,pk2__c d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c ORDER BY pk1__c,pk2__c e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only column to make the primary key. By using prefix encoding for guide post info, we have to decode and traverse guide posts sequentially, which causes time complexity in BaseResultIterators.getParallelScan(...) to be O( n ) , where n is the total count of guide posts. According to PHOENIX-2417, to reduce footprint in client cache and over transmition, the prefix encoding is used as in-memory and over-the-wire encoding for guide post info. We can use Segment Tree to address both memory and performance concerns. The guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by prefix encoding and the encoded data is a leaf node of the tree. The inner node contains summary info (the count of rows, the data size) of the sub tree rooted at the inner node. With this tree like data structure, compared to the current data structure, the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. The time complexity for queries a, b, c can be reduced to O(m) where m is the total count of regions; the time complexity for "EXPLAN" queries a, b, c can be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can even be reduced to O(1). For queries d and e, the time complexity to find the start of target scan ranges can be reduced to O(log(n/k)). The tree can also integrate AVL and B+ characteristics to support partial load/unload when interacting with stats client cache. was: As reported, Query compilation (for the sample queries showed below), especially deriving estimation and generating parallel scans from guide posts, becomes much slower after we introduced Phoenix Stats. a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY Pk1__c c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY pk1__c,pk2__c d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c ORDER BY pk1__c,pk2__c e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only column to make the primary key. By using prefix encoding for guide post info, we have to decode and traverse guide posts sequentially, which causes time complexity in BaseResultIterators.getParallelScan(...) to be O(n) , where n is the total count of guide posts. According to PHOENIX-2417, to reduce footprint in client cache and over transmition, the prefix encoding is used as in-memory and over-the-wire encoding for guide post info. We can use Segment Tree to address both memory and performance concerns. The guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by prefix encoding and the encoded data is a leaf node of the tree. The inner node contains summary info (the count of rows, the data size) of the sub tree rooted at the inner node. With this tree like data structure, compared to the current data structure, the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. The time complexity for queries a, b, c can be reduced to O(m) where m is the total count of regions; the time complexity for "EXPLAN" queries a, b, c can be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can even be reduced to O(1). For queries d and e, the time complexity to find the start of target scan ranges can be reduced to O(log(n/k)). The tree can also integrate AVL and B+ characteristics to support partial load/unload when interacting with stats client cache. > Use Segment tree to organize Guide Post Info > > > Key: PHOENIX-4925 > URL: https://issues.apache.org/jira/browse/PHOENIX-4925 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > As reported, Query compilation (for the sample queries showed below), > especially deriving estimation and generating parallel scans from guide > posts,
[jira] [Updated] (PHOENIX-4925) Use Segment tree to organize Guide Post Info
[ https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4925: - Summary: Use Segment tree to organize Guide Post Info (was: Use Segment/SUM tree to organize Guide Post Info) > Use Segment tree to organize Guide Post Info > > > Key: PHOENIX-4925 > URL: https://issues.apache.org/jira/browse/PHOENIX-4925 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > As reported, Query compilation (for the sample queries showed below), > especially deriving estimation and generating parallel scans from guide > posts, becomes much slower after we introduced Phoenix Stats. > a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c > b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY > Pk1__c > c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY > pk1__c,pk2__c > d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c > ORDER BY pk1__c,pk2__c > e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= > 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only > column to make the primary key. > > By using prefix encoding for guide post info, we have to decode and traverse > guide posts sequentially, which causes time complexity in > BaseResultIterators.getParallelScan(...) to be O(n) , where n is the total > count of guide posts. > According to PHOENIX-2417, to reduce footprint in client cache and over > transmition, the prefix encoding is used as in-memory and over-the-wire > encoding for guide post info. > We can use something like Sum Tree (even Binary Indexed Tree) to address both > memory and performance concerns. The guide posts are partitioned to k chunks > (k=1024?), each chunk is encoded by prefix encoding and the encoded data is a > leaf node of the tree. The inner node contains summary info (the count of > rows, the data size) of the sub tree rooted at the inner node. > With this tree like data structure, compared to the current data structure, > the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. > The time complexity for queries a, b, c can be reduced to O(m) where m is the > total count of regions; the time complexity for "EXPLAN" queries a, b, c can > be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can > even be reduced to O(1). For queries d and e, the time complexity to find the > start of target scan ranges can be reduced to O(log(n/k)). > The tree can also integrate AVL and B+ characteristics to support partial > load/unload when interacting with stats client cache. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4925) Use Segment tree to organize Guide Post Info
[ https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4925: - Description: As reported, Query compilation (for the sample queries showed below), especially deriving estimation and generating parallel scans from guide posts, becomes much slower after we introduced Phoenix Stats. a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY Pk1__c c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY pk1__c,pk2__c d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c ORDER BY pk1__c,pk2__c e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only column to make the primary key. By using prefix encoding for guide post info, we have to decode and traverse guide posts sequentially, which causes time complexity in BaseResultIterators.getParallelScan(...) to be O(n) , where n is the total count of guide posts. According to PHOENIX-2417, to reduce footprint in client cache and over transmition, the prefix encoding is used as in-memory and over-the-wire encoding for guide post info. We can use Segment Tree to address both memory and performance concerns. The guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by prefix encoding and the encoded data is a leaf node of the tree. The inner node contains summary info (the count of rows, the data size) of the sub tree rooted at the inner node. With this tree like data structure, compared to the current data structure, the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. The time complexity for queries a, b, c can be reduced to O(m) where m is the total count of regions; the time complexity for "EXPLAN" queries a, b, c can be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can even be reduced to O(1). For queries d and e, the time complexity to find the start of target scan ranges can be reduced to O(log(n/k)). The tree can also integrate AVL and B+ characteristics to support partial load/unload when interacting with stats client cache. was: As reported, Query compilation (for the sample queries showed below), especially deriving estimation and generating parallel scans from guide posts, becomes much slower after we introduced Phoenix Stats. a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY Pk1__c c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY pk1__c,pk2__c d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c ORDER BY pk1__c,pk2__c e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only column to make the primary key. By using prefix encoding for guide post info, we have to decode and traverse guide posts sequentially, which causes time complexity in BaseResultIterators.getParallelScan(...) to be O(n) , where n is the total count of guide posts. According to PHOENIX-2417, to reduce footprint in client cache and over transmition, the prefix encoding is used as in-memory and over-the-wire encoding for guide post info. We can use something like Sum Tree (even Binary Indexed Tree) to address both memory and performance concerns. The guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by prefix encoding and the encoded data is a leaf node of the tree. The inner node contains summary info (the count of rows, the data size) of the sub tree rooted at the inner node. With this tree like data structure, compared to the current data structure, the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. The time complexity for queries a, b, c can be reduced to O(m) where m is the total count of regions; the time complexity for "EXPLAN" queries a, b, c can be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can even be reduced to O(1). For queries d and e, the time complexity to find the start of target scan ranges can be reduced to O(log(n/k)). The tree can also integrate AVL and B+ characteristics to support partial load/unload when interacting with stats client cache. > Use Segment tree to organize Guide Post Info > > > Key: PHOENIX-4925 > URL: https://issues.apache.org/jira/browse/PHOENIX-4925 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > As reported, Query compilation (for the sample queries showed below), > especially deriving estimation and generating
[jira] [Updated] (PHOENIX-5069) Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache
[ https://issues.apache.org/jira/browse/PHOENIX-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5069: - Attachment: PHOENIX-5069.4.x-HBase-1.3.001.patch > Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache > --- > > Key: PHOENIX-5069 > URL: https://issues.apache.org/jira/browse/PHOENIX-5069 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-5069-4.14.1-hbase-1.3-phoenix-stats.001.patch, > PHOENIX-5069-4.14.1-hbase-1.3-phoenix-stats.002.patch, > PHOENIX-5069.4.x-HBase-1.3.001.patch, PHOENIX-5069.4.x-HBase-1.4.001.patch, > PHOENIX-5069.master.001.patch, PHOENIX-5069.master.002.patch, > PHOENIX-5069.master.003.patch, PHOENIX-5069.master.004.patch, > PHOENIX-5069.patch > > Time Spent: 6h 40m > Remaining Estimate: 0h > > The current Phoenix Stats Cache uses TTL based eviction policy. A cached > entry will expire after a given amount of time (900s by default) passed since > the entry's been created. This will lead to cache miss when > Compiler/Optimizer fetches stats from cache at the next time. As you can see > from the above graph, fetching stats from the cache is a blocking operation — > when there is cache miss, it has a round trip over the wire to scan the > SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and > finally return the stats to the Compiler/Optimizer. Whenever there is a cache > miss, this blocking call causes significant performance penalty and see > periodic spikes. > *This Jira suggests to use asynchronous refresh mechanism to provide a > non-blocking cache. For details, please see the linked design document below.* > [~karanmehta93] [~twdsi...@gmail.com] [~dbwong] [~elserj] [~an...@apache.org] > [~sergey soldatov] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4594) Query Compilation where there are many guideposts causes overhead
[ https://issues.apache.org/jira/browse/PHOENIX-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4594: - Summary: Query Compilation where there are many guideposts causes overhead (was: Perform binary search on guideposts during query compilation) > Query Compilation where there are many guideposts causes overhead > - > > Key: PHOENIX-4594 > URL: https://issues.apache.org/jira/browse/PHOENIX-4594 > Project: Phoenix > Issue Type: Improvement >Reporter: James Taylor >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-4594-0913.patch, PHOENIX-4594_0917.patch, > PHOENIX-4594_0918.patch > > > If there are many guideposts, performance will suffer during query > compilation because we do a linear search of the guideposts to find the > intersection with the scan ranges. Instead, in > BaseResultIterators.getParallelScans() we should populate an array of > guideposts and perform a binary search. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4594) Query Compilation where there are many guideposts causes overhead
[ https://issues.apache.org/jira/browse/PHOENIX-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4594: - Description: If there are many guideposts, performance will suffer during query compilation. (was: If there are many guideposts, performance will suffer during query compilation because we do a linear search of the guideposts to find the intersection with the scan ranges. Instead, in BaseResultIterators.getParallelScans() we should populate an array of guideposts and perform a binary search. ) > Query Compilation where there are many guideposts causes overhead > - > > Key: PHOENIX-4594 > URL: https://issues.apache.org/jira/browse/PHOENIX-4594 > Project: Phoenix > Issue Type: Improvement >Reporter: James Taylor >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-4594-0913.patch, PHOENIX-4594_0917.patch, > PHOENIX-4594_0918.patch > > > If there are many guideposts, performance will suffer during query > compilation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-5069) Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache
[ https://issues.apache.org/jira/browse/PHOENIX-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5069: - Attachment: PHOENIX-5069.master.004.patch > Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache > --- > > Key: PHOENIX-5069 > URL: https://issues.apache.org/jira/browse/PHOENIX-5069 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-5069-4.14.1-hbase-1.3-phoenix-stats.001.patch, > PHOENIX-5069-4.14.1-hbase-1.3-phoenix-stats.002.patch, > PHOENIX-5069.4.x-HBase-1.4.001.patch, PHOENIX-5069.master.001.patch, > PHOENIX-5069.master.002.patch, PHOENIX-5069.master.003.patch, > PHOENIX-5069.master.004.patch, PHOENIX-5069.patch > > Time Spent: 6h 40m > Remaining Estimate: 0h > > The current Phoenix Stats Cache uses TTL based eviction policy. A cached > entry will expire after a given amount of time (900s by default) passed since > the entry's been created. This will lead to cache miss when > Compiler/Optimizer fetches stats from cache at the next time. As you can see > from the above graph, fetching stats from the cache is a blocking operation — > when there is cache miss, it has a round trip over the wire to scan the > SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and > finally return the stats to the Compiler/Optimizer. Whenever there is a cache > miss, this blocking call causes significant performance penalty and see > periodic spikes. > *This Jira suggests to use asynchronous refresh mechanism to provide a > non-blocking cache. For details, please see the linked design document below.* > [~karanmehta93] [~twdsi...@gmail.com] [~dbwong] [~elserj] [~an...@apache.org] > [~sergey soldatov] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-5069) Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache
[ https://issues.apache.org/jira/browse/PHOENIX-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5069: - Attachment: PHOENIX-5069.4.x-HBase-1.4.001.patch > Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache > --- > > Key: PHOENIX-5069 > URL: https://issues.apache.org/jira/browse/PHOENIX-5069 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-5069-4.14.1-hbase-1.3-phoenix-stats.001.patch, > PHOENIX-5069-4.14.1-hbase-1.3-phoenix-stats.002.patch, > PHOENIX-5069.4.x-HBase-1.4.001.patch, PHOENIX-5069.master.001.patch, > PHOENIX-5069.master.002.patch, PHOENIX-5069.master.003.patch, > PHOENIX-5069.patch > > Time Spent: 6h 40m > Remaining Estimate: 0h > > The current Phoenix Stats Cache uses TTL based eviction policy. A cached > entry will expire after a given amount of time (900s by default) passed since > the entry's been created. This will lead to cache miss when > Compiler/Optimizer fetches stats from cache at the next time. As you can see > from the above graph, fetching stats from the cache is a blocking operation — > when there is cache miss, it has a round trip over the wire to scan the > SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and > finally return the stats to the Compiler/Optimizer. Whenever there is a cache > miss, this blocking call causes significant performance penalty and see > periodic spikes. > *This Jira suggests to use asynchronous refresh mechanism to provide a > non-blocking cache. For details, please see the linked design document below.* > [~karanmehta93] [~twdsi...@gmail.com] [~dbwong] [~elserj] [~an...@apache.org] > [~sergey soldatov] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PHOENIX-4924) Provide strong guarantee for Stats Write to prevent inconsistent and incomplete data issue
[ https://issues.apache.org/jira/browse/PHOENIX-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi resolved PHOENIX-4924. -- Resolution: Fixed > Provide strong guarantee for Stats Write to prevent inconsistent and > incomplete data issue > -- > > Key: PHOENIX-4924 > URL: https://issues.apache.org/jira/browse/PHOENIX-4924 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > The Stats could be inconsistent and incomplete due to region servers going > down, RPC failures when committing Guidepost data to Stats table and the lack > of retry and atomic operation. The expected behavior from the Platform is “We > need a strong guarantee that on stats write there are sufficient retries and > monitoring in place so that stats writes are resilient such that we will not > end up with inconsistent, partial or empty stats until the next run of UPDATE > STATISTICS”. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4924) Provide strong guarantee for Stats Write to prevent inconsistent and incomplete data issue
[ https://issues.apache.org/jira/browse/PHOENIX-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4924: - Comment: was deleted (was: h2. Conclusion: # The test result clearly shows that the change doesn't introduce any new performance regression and issues. # The test result clearly shows that the pattern of “Periodical query spikes” observed in blocking stats cache (see the design document [Use Asynchronous Refresh to Provide Non-blocking Phoenix Stats Cache|https://salesforce.quip.com/rxokAkVatiQO] for details) is gone. # When analyzing the noises in the test result of the Flighting, I suspected that it is because multiple loading stats async tasks are triggered for the same cached entry during the same period. By exploring the code base of Google Guava Cache, I found this negative case won't happen, because Google Guava Cache won't try to reload a cache entry when there is another thread performing the refresh. h2. Test Result: h3. Baseline 1st run !x8q4HN0IRAZ0wBJRU5ErkJggg==! h3. Flighting 1st RUN !B8EntiMHCbVvAElFTkSuQmCC! h3. Baseline 4th Run !0HRAABAKBQCCweRHwyX0jwmTzykK0PBAIBFYfgf8PqgsrMqhZgIIASUVORK5CYII=! h3. Flighting 4th Run !j8FfZ 2FQd2ugBJRU5ErkJggg==! h2. ) > Provide strong guarantee for Stats Write to prevent inconsistent and > incomplete data issue > -- > > Key: PHOENIX-4924 > URL: https://issues.apache.org/jira/browse/PHOENIX-4924 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > The Stats could be inconsistent and incomplete due to region servers going > down, RPC failures when committing Guidepost data to Stats table and the lack > of retry and atomic operation. The expected behavior from the Platform is “We > need a strong guarantee that on stats write there are sufficient retries and > monitoring in place so that stats writes are resilient such that we will not > end up with inconsistent, partial or empty stats until the next run of UPDATE > STATISTICS”. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (PHOENIX-4924) Provide strong guarantee for Stats Write to prevent inconsistent and incomplete data issue
[ https://issues.apache.org/jira/browse/PHOENIX-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi reopened PHOENIX-4924: -- > Provide strong guarantee for Stats Write to prevent inconsistent and > incomplete data issue > -- > > Key: PHOENIX-4924 > URL: https://issues.apache.org/jira/browse/PHOENIX-4924 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > The Stats could be inconsistent and incomplete due to region servers going > down, RPC failures when committing Guidepost data to Stats table and the lack > of retry and atomic operation. The expected behavior from the Platform is “We > need a strong guarantee that on stats write there are sufficient retries and > monitoring in place so that stats writes are resilient such that we will not > end up with inconsistent, partial or empty stats until the next run of UPDATE > STATISTICS”. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-5069) Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache
[ https://issues.apache.org/jira/browse/PHOENIX-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5069: - Attachment: PHOENIX-5069-4.14.1-hbase-1.3-phoenix-stats.001.patch > Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache > --- > > Key: PHOENIX-5069 > URL: https://issues.apache.org/jira/browse/PHOENIX-5069 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-5069-4.14.1-hbase-1.3-phoenix-stats.001.patch, > PHOENIX-5069.master.001.patch, PHOENIX-5069.master.002.patch, > PHOENIX-5069.master.003.patch, PHOENIX-5069.patch > > Time Spent: 6h 40m > Remaining Estimate: 0h > > The current Phoenix Stats Cache uses TTL based eviction policy. A cached > entry will expire after a given amount of time (900s by default) passed since > the entry's been created. This will lead to cache miss when > Compiler/Optimizer fetches stats from cache at the next time. As you can see > from the above graph, fetching stats from the cache is a blocking operation — > when there is cache miss, it has a round trip over the wire to scan the > SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and > finally return the stats to the Compiler/Optimizer. Whenever there is a cache > miss, this blocking call causes significant performance penalty and see > periodic spikes. > *This Jira suggests to use asynchronous refresh mechanism to provide a > non-blocking cache. For details, please see the linked design document below.* > [~karanmehta93] [~twdsi...@gmail.com] [~dbwong] [~elserj] [~an...@apache.org] > [~sergey soldatov] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-5069) Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache
[ https://issues.apache.org/jira/browse/PHOENIX-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5069: - Attachment: PHOENIX-5069.master.003.patch > Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache > --- > > Key: PHOENIX-5069 > URL: https://issues.apache.org/jira/browse/PHOENIX-5069 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-5069.master.001.patch, > PHOENIX-5069.master.002.patch, PHOENIX-5069.master.003.patch, > PHOENIX-5069.patch > > Time Spent: 6h 40m > Remaining Estimate: 0h > > The current Phoenix Stats Cache uses TTL based eviction policy. A cached > entry will expire after a given amount of time (900s by default) passed since > the entry's been created. This will lead to cache miss when > Compiler/Optimizer fetches stats from cache at the next time. As you can see > from the above graph, fetching stats from the cache is a blocking operation — > when there is cache miss, it has a round trip over the wire to scan the > SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and > finally return the stats to the Compiler/Optimizer. Whenever there is a cache > miss, this blocking call causes significant performance penalty and see > periodic spikes. > *This Jira suggests to use asynchronous refresh mechanism to provide a > non-blocking cache. For details, please see the linked design document below.* > [~karanmehta93] [~twdsi...@gmail.com] [~dbwong] [~elserj] [~an...@apache.org] > [~sergey soldatov] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-5069) Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache
[ https://issues.apache.org/jira/browse/PHOENIX-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5069: - Attachment: PHOENIX-5069.master.002.patch > Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache > --- > > Key: PHOENIX-5069 > URL: https://issues.apache.org/jira/browse/PHOENIX-5069 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-5069.master.001.patch, > PHOENIX-5069.master.002.patch, PHOENIX-5069.patch > > Time Spent: 6h > Remaining Estimate: 0h > > The current Phoenix Stats Cache uses TTL based eviction policy. A cached > entry will expire after a given amount of time (900s by default) passed since > the entry's been created. This will lead to cache miss when > Compiler/Optimizer fetches stats from cache at the next time. As you can see > from the above graph, fetching stats from the cache is a blocking operation — > when there is cache miss, it has a round trip over the wire to scan the > SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and > finally return the stats to the Compiler/Optimizer. Whenever there is a cache > miss, this blocking call causes significant performance penalty and see > periodic spikes. > *This Jira suggests to use asynchronous refresh mechanism to provide a > non-blocking cache. For details, please see the linked design document below.* > [~karanmehta93] [~twdsi...@gmail.com] [~dbwong] [~elserj] [~an...@apache.org] > [~sergey soldatov] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-5069) Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache
[ https://issues.apache.org/jira/browse/PHOENIX-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5069: - Attachment: PHOENIX-5069.master.001.patch > Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache > --- > > Key: PHOENIX-5069 > URL: https://issues.apache.org/jira/browse/PHOENIX-5069 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-5069.master.001.patch, PHOENIX-5069.patch > > Time Spent: 3h > Remaining Estimate: 0h > > The current Phoenix Stats Cache uses TTL based eviction policy. A cached > entry will expire after a given amount of time (900s by default) passed since > the entry's been created. This will lead to cache miss when > Compiler/Optimizer fetches stats from cache at the next time. As you can see > from the above graph, fetching stats from the cache is a blocking operation — > when there is cache miss, it has a round trip over the wire to scan the > SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and > finally return the stats to the Compiler/Optimizer. Whenever there is a cache > miss, this blocking call causes significant performance penalty and see > periodic spikes. > *This Jira suggests to use asynchronous refresh mechanism to provide a > non-blocking cache. For details, please see the linked design document below.* > [~karanmehta93] [~twdsi...@gmail.com] [~dbwong] [~elserj] [~an...@apache.org] > [~sergey soldatov] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PHOENIX-5069) Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache
[ https://issues.apache.org/jira/browse/PHOENIX-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi reassigned PHOENIX-5069: Assignee: Bin Shi > Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache > --- > > Key: PHOENIX-5069 > URL: https://issues.apache.org/jira/browse/PHOENIX-5069 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > The current Phoenix Stats Cache uses TTL based eviction policy. A cached > entry will expire after a given amount of time (900s by default) passed since > the entry's been created. This will lead to cache miss when > Compiler/Optimizer fetches stats from cache at the next time. As you can see > from the above graph, fetching stats from the cache is a blocking operation — > when there is cache miss, it has a round trip over the wire to scan the > SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and > finally return the stats to the Compiler/Optimizer. Whenever there is a cache > miss, this blocking call causes significant performance penalty and see > periodic spikes. > *This Jira suggests to use asynchronous refresh mechanism to provide a > non-blocking cache. For details, please see the linked design document below.* > [~karanmehta93] [~twdsi...@gmail.com] [~dbwong] [~elserj] [~an...@apache.org] > [~sergey soldatov] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-5069) Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache
[ https://issues.apache.org/jira/browse/PHOENIX-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5069: - Description: The current Phoenix Stats Cache uses TTL based eviction policy. A cached entry will expire after a given amount of time (900s by default) passed since the entry's been created. This will lead to cache miss when Compiler/Optimizer fetches stats from cache at the next time. As you can see from the above graph, fetching stats from the cache is a blocking operation — when there is cache miss, it has a round trip over the wire to scan the SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and finally return the stats to the Compiler/Optimizer. Whenever there is a cache miss, this blocking call causes significant performance penalty and see periodic spikes. *This Jira suggests to use asynchronous refresh mechanism to provide a non-blocking cache. For details, please see the linked design document below.* [~karanmehta93] [~twdsi...@gmail.com] [~dbwong] [~elserj] [~an...@apache.org] [~sergey soldatov] was: The current Phoenix Stats Cache uses TTL based eviction policy. A cached entry will expire after a given amount of time (900s by default) passed since the entry's been created. This will lead to cache miss when Compiler/Optimizer fetches stats from cache at the next time. As you can see from the above graph, fetching stats from the cache is a blocking operation — when there is cache miss, it has a round trip over the wire to scan the SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and finally return the stats to the Compiler/Optimizer. Whenever there is a cache miss, this blocking call causes significant performance penalty and see periodic spikes. This Jira suggests to use asynchronous refresh mechanism to provide a non-blocking cache. > Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache > --- > > Key: PHOENIX-5069 > URL: https://issues.apache.org/jira/browse/PHOENIX-5069 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Priority: Major > > The current Phoenix Stats Cache uses TTL based eviction policy. A cached > entry will expire after a given amount of time (900s by default) passed since > the entry's been created. This will lead to cache miss when > Compiler/Optimizer fetches stats from cache at the next time. As you can see > from the above graph, fetching stats from the cache is a blocking operation — > when there is cache miss, it has a round trip over the wire to scan the > SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and > finally return the stats to the Compiler/Optimizer. Whenever there is a cache > miss, this blocking call causes significant performance penalty and see > periodic spikes. > *This Jira suggests to use asynchronous refresh mechanism to provide a > non-blocking cache. For details, please see the linked design document below.* > [~karanmehta93] [~twdsi...@gmail.com] [~dbwong] [~elserj] [~an...@apache.org] > [~sergey soldatov] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-5069) Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache
[ https://issues.apache.org/jira/browse/PHOENIX-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5069: - Description: The current Phoenix Stats Cache uses TTL based eviction policy. A cached entry will expire after a given amount of time (900s by default) passed since the entry's been created. This will lead to cache miss when Compiler/Optimizer fetches stats from cache at the next time. As you can see from the above graph, fetching stats from the cache is a blocking operation — when there is cache miss, it has a round trip over the wire to scan the SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and finally return the stats to the Compiler/Optimizer. Whenever there is a cache miss, this blocking call causes significant performance penalty and see periodic spikes. This Jira suggests to use asynchronous refresh mechanism to provide a non-blocking cache. was: Below is the high level picture of Phoenix Stats Cache which is based on Google Guava cache. !OmCWFETQBJRU5ErkJggg==! The current Phoenix Stats Cache uses TTL based eviction policy. A cached entry will expire after a given amount of time (900s by default) passed since the entry's been created. This will lead to cache miss when Compiler/Optimizer fetches stats from cache at the next time. As you can see from the above graph, fetching stats from the cache is a blocking operation — when there is cache miss, it has a round trip over the wire to scan the SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and finally return the stats to the Compiler/Optimizer. Whenever there is a cache miss, this blocking call causes significant performance penalty and see periodic spikes. This Jira suggests to use asynchronous refresh mechanism to fix this and provide a non-blocking cache. > Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache > --- > > Key: PHOENIX-5069 > URL: https://issues.apache.org/jira/browse/PHOENIX-5069 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Priority: Major > > The current Phoenix Stats Cache uses TTL based eviction policy. A cached > entry will expire after a given amount of time (900s by default) passed since > the entry's been created. This will lead to cache miss when > Compiler/Optimizer fetches stats from cache at the next time. As you can see > from the above graph, fetching stats from the cache is a blocking operation — > when there is cache miss, it has a round trip over the wire to scan the > SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and > finally return the stats to the Compiler/Optimizer. Whenever there is a cache > miss, this blocking call causes significant performance penalty and see > periodic spikes. > This Jira suggests to use asynchronous refresh mechanism to provide a > non-blocking cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-5069) Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache
[ https://issues.apache.org/jira/browse/PHOENIX-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5069: - Attachment: (was: image-2018-12-13-11-16-16-884.png) > Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache > --- > > Key: PHOENIX-5069 > URL: https://issues.apache.org/jira/browse/PHOENIX-5069 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Priority: Major > > Below is the high level picture of Phoenix Stats Cache which is based on > Google Guava cache. > !OmCWFETQBJRU5ErkJggg==! The current Phoenix Stats Cache uses TTL based > eviction policy. A cached entry will expire after a given amount of time > (900s by default) passed since the entry's been created. This will lead to > cache miss when Compiler/Optimizer fetches stats from cache at the next time. > As you can see from the above graph, fetching stats from the cache is a > blocking operation — when there is cache miss, it has a round trip over the > wire to scan the SYSTEM.STATS Table and to get the latest stats info, rebuild > the cache and finally return the stats to the Compiler/Optimizer. Whenever > there is a cache miss, this blocking call causes significant performance > penalty and see periodic spikes. > This Jira suggests to use asynchronous refresh mechanism to fix this and > provide a non-blocking cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-5069) Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache
[ https://issues.apache.org/jira/browse/PHOENIX-5069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-5069: - Description: Below is the high level picture of Phoenix Stats Cache which is based on Google Guava cache. !OmCWFETQBJRU5ErkJggg==! The current Phoenix Stats Cache uses TTL based eviction policy. A cached entry will expire after a given amount of time (900s by default) passed since the entry's been created. This will lead to cache miss when Compiler/Optimizer fetches stats from cache at the next time. As you can see from the above graph, fetching stats from the cache is a blocking operation — when there is cache miss, it has a round trip over the wire to scan the SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and finally return the stats to the Compiler/Optimizer. Whenever there is a cache miss, this blocking call causes significant performance penalty and see periodic spikes. This Jira suggests to use asynchronous refresh mechanism to fix this and provide a non-blocking cache. was: Below is the high level picture of Phoenix Stats Cache which is based on Google Guava cache. !OmCWFETQBJRU5ErkJggg==! The current Phoenix Stats Cache uses TTL based eviction policy. A cached entry will expire after a given amount of time (900s by default) passed since the entry's been created. This will lead to cache miss when Compiler/Optimizer fetches stats from cache at the next time. As you can see from the above graph, fetching stats from the cache is a blocking operation — when there is cache miss, it has a round trip over the wire to scan the SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and finally return the stats to the Compiler/Optimizer. Whenever there is a cache miss, this blocking call causes significant performance penalty and see periodic spikes. This Jira suggests to use asynchronous refresh mechanism[link title|http://example.com][link title|http://example.com] to fix this and provide a non-blocking cache. > Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache > --- > > Key: PHOENIX-5069 > URL: https://issues.apache.org/jira/browse/PHOENIX-5069 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Priority: Major > > Below is the high level picture of Phoenix Stats Cache which is based on > Google Guava cache. > !OmCWFETQBJRU5ErkJggg==! The current Phoenix Stats Cache uses TTL based > eviction policy. A cached entry will expire after a given amount of time > (900s by default) passed since the entry's been created. This will lead to > cache miss when Compiler/Optimizer fetches stats from cache at the next time. > As you can see from the above graph, fetching stats from the cache is a > blocking operation — when there is cache miss, it has a round trip over the > wire to scan the SYSTEM.STATS Table and to get the latest stats info, rebuild > the cache and finally return the stats to the Compiler/Optimizer. Whenever > there is a cache miss, this blocking call causes significant performance > penalty and see periodic spikes. > This Jira suggests to use asynchronous refresh mechanism to fix this and > provide a non-blocking cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-5069) Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache
Bin Shi created PHOENIX-5069: Summary: Use asynchronous refresh to provide non-blocking Phoenix Stats Client Cache Key: PHOENIX-5069 URL: https://issues.apache.org/jira/browse/PHOENIX-5069 Project: Phoenix Issue Type: Improvement Reporter: Bin Shi Attachments: image-2018-12-13-11-16-16-884.png Below is the high level picture of Phoenix Stats Cache which is based on Google Guava cache. !OmCWFETQBJRU5ErkJggg==! The current Phoenix Stats Cache uses TTL based eviction policy. A cached entry will expire after a given amount of time (900s by default) passed since the entry's been created. This will lead to cache miss when Compiler/Optimizer fetches stats from cache at the next time. As you can see from the above graph, fetching stats from the cache is a blocking operation — when there is cache miss, it has a round trip over the wire to scan the SYSTEM.STATS Table and to get the latest stats info, rebuild the cache and finally return the stats to the Compiler/Optimizer. Whenever there is a cache miss, this blocking call causes significant performance penalty and see periodic spikes. This Jira suggests to use asynchronous refresh mechanism[link title|http://example.com][link title|http://example.com] to fix this and provide a non-blocking cache. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PHOENIX-5058) Improvements to client side cache guideposts cache
[ https://issues.apache.org/jira/browse/PHOENIX-5058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi reassigned PHOENIX-5058: Assignee: Bin Shi > Improvements to client side cache guideposts cache > -- > > Key: PHOENIX-5058 > URL: https://issues.apache.org/jira/browse/PHOENIX-5058 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.15.0, 5.1.0 >Reporter: Karan Mehta >Assignee: Bin Shi >Priority: Major > > Current implementation of phoenix guideposts cache has certain limitations > and is not easy for production use cases. I am attaching a design document to > this Jira, which highlights these aspects of the cache and the potential > solutions to mitigate them. These improvements should be taken as child > Jira's under this umbrella Jira. > [~Bin Shi] [~dbwong] [~elserj] [~an...@apache.org] [~sergey soldatov] > Thoughts? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4925) Use Segment/SUM tree to organize Guide Post Info
[ https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4925: - Summary: Use Segment/SUM tree to organize Guide Post Info (was: Use SUM tree to organize Guide Post Info) > Use Segment/SUM tree to organize Guide Post Info > > > Key: PHOENIX-4925 > URL: https://issues.apache.org/jira/browse/PHOENIX-4925 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > As reported, Query compilation (for the sample queries showed below), > especially deriving estimation and generating parallel scans from guide > posts, becomes much slower after we introduced Phoenix Stats. > a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c > b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY > Pk1__c > c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY > pk1__c,pk2__c > d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c > ORDER BY pk1__c,pk2__c > e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= > 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only > column to make the primary key. > > By using prefix encoding for guide post info, we have to decode and traverse > guide posts sequentially, which causes time complexity in > BaseResultIterators.getParallelScan(...) to be O(n) , where n is the total > count of guide posts. > According to PHOENIX-2417, to reduce footprint in client cache and over > transmition, the prefix encoding is used as in-memory and over-the-wire > encoding for guide post info. > We can use something like Sum Tree (even Binary Indexed Tree) to address both > memory and performance concerns. The guide posts are partitioned to k chunks > (k=1024?), each chunk is encoded by prefix encoding and the encoded data is a > leaf node of the tree. The inner node contains summary info (the count of > rows, the data size) of the sub tree rooted at the inner node. > With this tree like data structure, compared to the current data structure, > the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. > The time complexity for queries a, b, c can be reduced to O(m) where m is the > total count of regions; the time complexity for "EXPLAN" queries a, b, c can > be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can > even be reduced to O(1). For queries d and e, the time complexity to find the > start of target scan ranges can be reduced to O(log(n/k)). > The tree can also integrate AVL and B+ characteristics to support partial > load/unload when interacting with stats client cache. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PHOENIX-4998) The estimated instance size of GuidePostsInfo doesn't count the last update timestamp array
[ https://issues.apache.org/jira/browse/PHOENIX-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi reassigned PHOENIX-4998: Assignee: Bin Shi > The estimated instance size of GuidePostsInfo doesn't count the last update > timestamp array > --- > > Key: PHOENIX-4998 > URL: https://issues.apache.org/jira/browse/PHOENIX-4998 > Project: Phoenix > Issue Type: Bug >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > GuidePostsCache (implemented Guava Cache) use the estimated instance size of > GuidePostsInfo as the weight of the cache entry. The bug results in the > underestimation of the weight of the cache entry. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-4998) The estimated instance size of GuidePostsInfo doesn't count the last update timestamp array
Bin Shi created PHOENIX-4998: Summary: The estimated instance size of GuidePostsInfo doesn't count the last update timestamp array Key: PHOENIX-4998 URL: https://issues.apache.org/jira/browse/PHOENIX-4998 Project: Phoenix Issue Type: Bug Reporter: Bin Shi GuidePostsCache (implemented Guava Cache) use the estimated instance size of GuidePostsInfo as the weight of the cache entry. The bug results in the underestimation of the weight of the cache entry. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PHOENIX-4927) Disentangle the granularity of guidepost data from that of client cached guide post data
[ https://issues.apache.org/jira/browse/PHOENIX-4927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi reassigned PHOENIX-4927: Assignee: Bin Shi > Disentangle the granularity of guidepost data from that of client cached > guide post data > > > Key: PHOENIX-4927 > URL: https://issues.apache.org/jira/browse/PHOENIX-4927 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > The expected behaviors: > # It should be possible to have 10MB guideposts for precision and not force > that to cache that many guide posts on the clients. > # It should combine as many guideposts into chunks as needed for the > particular query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-4927) Disentangle the granularity of guidepost data from that of client cached guide post data
Bin Shi created PHOENIX-4927: Summary: Disentangle the granularity of guidepost data from that of client cached guide post data Key: PHOENIX-4927 URL: https://issues.apache.org/jira/browse/PHOENIX-4927 Project: Phoenix Issue Type: Improvement Reporter: Bin Shi The expected behaviors: # It should be possible to have 10MB guideposts for precision and not force that to cache that many guide posts on the clients. # It should combine as many guideposts into chunks as needed for the particular query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-4926) Disentangle the guidepost data from the actually materialized scans
Bin Shi created PHOENIX-4926: Summary: Disentangle the guidepost data from the actually materialized scans Key: PHOENIX-4926 URL: https://issues.apache.org/jira/browse/PHOENIX-4926 Project: Phoenix Issue Type: Improvement Reporter: Bin Shi Assignee: Bin Shi Currently, when we enable stats for intra-region read parallelization, the granularity of guide posts is the same as the granularity of parallel scans, which blocks us from using smaller guide posts in query complexity estimation and query optimization for better accuracy. We can define the target intra-region parallelism so that a scan is an integral multiple of the defined GUIDE_POSTS_WIDTH value and its value is between [1, the number of threads in thread pool or PRC handlers on Region server]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4925) Use SUM tree to organize Guide Post Info
[ https://issues.apache.org/jira/browse/PHOENIX-4925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4925: - Description: As reported, Query compilation (for the sample queries showed below), especially deriving estimation and generating parallel scans from guide posts, becomes much slower after we introduced Phoenix Stats. a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY Pk1__c c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY pk1__c,pk2__c d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c ORDER BY pk1__c,pk2__c e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only column to make the primary key. By using prefix encoding for guide post info, we have to decode and traverse guide posts sequentially, which causes time complexity in BaseResultIterators.getParallelScan(...) to be O(n) , where n is the total count of guide posts. According to PHOENIX-2417, to reduce footprint in client cache and over transmition, the prefix encoding is used as in-memory and over-the-wire encoding for guide post info. We can use something like Sum Tree (even Binary Indexed Tree) to address both memory and performance concerns. The guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by prefix encoding and the encoded data is a leaf node of the tree. The inner node contains summary info (the count of rows, the data size) of the sub tree rooted at the inner node. With this tree like data structure, compared to the current data structure, the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. The time complexity for queries a, b, c can be reduced to O(m) where m is the total count of regions; the time complexity for "EXPLAN" queries a, b, c can be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can even be reduced to O(1). For queries d and e, the time complexity to find the start of target scan ranges can be reduced to O(log(n/k)). The tree can also integrate AVL and B+ characteristics to support partial load/unload when interacting with stats client cache. was: As reported, Query compilation (for the sample queries showed below), especially deriving estimation and generating parallel scans from guide posts, becomes much slower after we introduced Phoenix Stats. a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY Pk1__c c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY pk1__c,pk2__c d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c ORDER BY pk1__c,pk2__c e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only column to make the primary key. By using prefix encoding for guide post info, we have to decode and traverse guide posts sequentially, which causes time complexity in BaseResultIterators.getParallelScan(...) to be O(n) , where n is the total count of guide posts. According to PHOENIX-2417, to reduce footprint in client cache and over transmition, the prefix encoding is used as in-memory and over-the-wire encoding for guide post info. We can use something like Sum Tree (even Binary Indexed Tree) to address both memory and performance concerns. The guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by prefix encoding and the encoded data is a leaf node of the tree. The inner node contains summary info (the count of rows, the data size) of the sub tree rooted at the inner node. With this tree like data structure, compared to the current data structure, the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. The time complexity for queries a, b, c can be reduced to O(m) where m is the total count of regions; the time complexity for "EXPLAN" queries a, b, c can be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can even be reduced to O(1). For queries d and e, the time complexity to find the start of target scan ranges can be reduced to O(log(n/k)). > Use SUM tree to organize Guide Post Info > > > Key: PHOENIX-4925 > URL: https://issues.apache.org/jira/browse/PHOENIX-4925 > Project: Phoenix > Issue Type: Improvement >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > As reported, Query compilation (for the sample queries showed below), > especially deriving estimation and generating parallel scans from guide > posts, becomes much slower after we introduced Phoenix Stats. > a. SELECT
[jira] [Created] (PHOENIX-4925) Use SUM tree to organize Guide Post Info
Bin Shi created PHOENIX-4925: Summary: Use SUM tree to organize Guide Post Info Key: PHOENIX-4925 URL: https://issues.apache.org/jira/browse/PHOENIX-4925 Project: Phoenix Issue Type: Improvement Reporter: Bin Shi Assignee: Bin Shi As reported, Query compilation (for the sample queries showed below), especially deriving estimation and generating parallel scans from guide posts, becomes much slower after we introduced Phoenix Stats. a. SELECT f1__c FROM MyCustomBigObject__b ORDER BY Pk1__c b. SELECT f1__c FROM MyCustomBigObject__b WHERE nonpk1__c = ‘x’ ORDER BY Pk1__c c. SELECT f1__c FROM MyCustomBigObject__b WHERE pk2__c = ‘x’ ORDER BY pk1__c,pk2__c d. SELECT f1__c FROM MyCustomBigObject__b WHERE pk1__c = ‘x’ AND nonpk1__c ORDER BY pk1__c,pk2__c e. SELECT f1__c FROM MyCustomBigObject__b WHERE pk__c >= 'd' AND pk__c <= 'm' OR pk__c >= 'o' AND pk__c <= 'x' ORDER BY pk__c // pk__c is the only column to make the primary key. By using prefix encoding for guide post info, we have to decode and traverse guide posts sequentially, which causes time complexity in BaseResultIterators.getParallelScan(...) to be O(n) , where n is the total count of guide posts. According to PHOENIX-2417, to reduce footprint in client cache and over transmition, the prefix encoding is used as in-memory and over-the-wire encoding for guide post info. We can use something like Sum Tree (even Binary Indexed Tree) to address both memory and performance concerns. The guide posts are partitioned to k chunks (k=1024?), each chunk is encoded by prefix encoding and the encoded data is a leaf node of the tree. The inner node contains summary info (the count of rows, the data size) of the sub tree rooted at the inner node. With this tree like data structure, compared to the current data structure, the increased size (mainly coming from the n/k-1 inner nodes) is ignorable. The time complexity for queries a, b, c can be reduced to O(m) where m is the total count of regions; the time complexity for "EXPLAN" queries a, b, c can be reduced to O(m) too, and if we support "EXPLAIN (ESTIMATE ONLY)", it can even be reduced to O(1). For queries d and e, the time complexity to find the start of target scan ranges can be reduced to O(log(n/k)). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-4924) Provide strong guarantee for Stats Write to prevent inconsistent and incomplete data issue
Bin Shi created PHOENIX-4924: Summary: Provide strong guarantee for Stats Write to prevent inconsistent and incomplete data issue Key: PHOENIX-4924 URL: https://issues.apache.org/jira/browse/PHOENIX-4924 Project: Phoenix Issue Type: Improvement Reporter: Bin Shi Assignee: Bin Shi The Stats could be inconsistent and incomplete due to region servers going down, RPC failures when committing Guidepost data to Stats table and the lack of retry and atomic operation. The expected behavior from the Platform is “We need a strong guarantee that on stats write there are sufficient retries and monitoring in place so that stats writes are resilient such that we will not end up with inconsistent, partial or empty stats until the next run of UPDATE STATISTICS”. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-4923) Create more integration test cases for Phoenix Stats
Bin Shi created PHOENIX-4923: Summary: Create more integration test cases for Phoenix Stats Key: PHOENIX-4923 URL: https://issues.apache.org/jira/browse/PHOENIX-4923 Project: Phoenix Issue Type: Test Reporter: Bin Shi Assignee: Bin Shi For Phoenix Stats, we need to create more test cases for point lookup, serial plan and parallel plan -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4008) UPDATE STATISTIC should collect all versions of cells
[ https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4008: - Attachment: PHONEIX-4008.4.X-HBase-1.2.001.patch PHONEIX-4008.4.X-HBase-1.3.001.patch PHONEIX-4008.4.X-HBase-1.4.001.patch > UPDATE STATISTIC should collect all versions of cells > - > > Key: PHOENIX-4008 > URL: https://issues.apache.org/jira/browse/PHOENIX-4008 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Bin Shi >Priority: Major > Fix For: 4.15.0, 5.1.0 > > Attachments: PHOENIX-4008_0918.patch, PHOENIX-4008_0920.patch, > PHONEIX-4008.4.X-HBase-1.2.001.patch, PHONEIX-4008.4.X-HBase-1.3.001.patch, > PHONEIX-4008.4.X-HBase-1.4.001.patch > > > In order to truly measure the size of data when calculating guide posts, > UPDATE STATISTIC should taken into account all versions of cells. We should > also be setting the max versions on the scan. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4916) When collecting statistics, the estimated size of a guide post may only count part of cells of the last row
[ https://issues.apache.org/jira/browse/PHOENIX-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4916: - Description: In DefaultStatisticsCollector.collectStatistics(...), it iterates all cells of the current row, once the accumulated estimated size plus the size of the current cell >= guide post width, it skips all the remaining cells. The result is that the estimated size of a guide post may only count part of cells of the last row. This problem can be ignored in clusters with real data where the guide post width is much bigger than the row size, but it does have impact on unit test and integration test, because we use very small guide post width in the test which results in inaccuracy of the estimated size of the query. was: In DefaultStatisticsCollector.collectStatistics(...), it iterate all cells of the current row, once the accumulated estimated size plus the size of the current cell >= guide post width, it skipped all the remaining cells. The result is that he estimated size of a guide post may only count part of cells of the last row. This problem can be ignored in clusters with real data where the guide post width is much bigger than the row size, but it does have impact on unit test and iteration test, because we use very small guide post width in the test which results in inaccuracy of the estimated size of the query. > When collecting statistics, the estimated size of a guide post may only count > part of cells of the last row > --- > > Key: PHOENIX-4916 > URL: https://issues.apache.org/jira/browse/PHOENIX-4916 > Project: Phoenix > Issue Type: Bug >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > In DefaultStatisticsCollector.collectStatistics(...), it iterates all cells > of the current row, once the accumulated estimated size plus the size of the > current cell >= guide post width, it skips all the remaining cells. The > result is that the estimated size of a guide post may only count part of > cells of the last row. > This problem can be ignored in clusters with real data where the guide post > width is much bigger than the row size, but it does have impact on unit test > and integration test, because we use very small guide post width in the test > which results in inaccuracy of the estimated size of the query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts
[ https://issues.apache.org/jira/browse/PHOENIX-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4912: - Description: The current implementation of table sampling is based on the assumption "Every two consecutive guide posts contains the equal number of rows" which isn't accurate in practice, and once we collect multiple versions of cells and the deleted rows, the thing will become worse. In details, the current implementation of table sampling is (see BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end of function) as described below: # Iterate all parallel scans generated; # For each scan, if getHashHode(start row key of the scan) MOD 100 < tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; otherwise discard this scan. The problem can be formalized as: We have a group of scans and each scan is defined as . Now we want to randomly pick X groups so that the sum of count of rows in the selected groups is close to Y, where Y = the total count of rows of all scans denoted as T * table sampling rate denoted as R (0 <= R <= 100) / 100.00. To resolve the above problem, one of algorithms that we can consider are described below. The core idea is to adjust T, R, Y after each pick, so the new problem is a child problem of the original problem. {code:java} ArrayList TableSampling(ArrayList scans, T, R) { ArrayList pickedScans = new ArrayList(); Y = T * R / 100.00; for (scan in scans) { if (Y <= 0) break; if (getHashCode(Ki) MOD 100 < R) { // then pick this scan, and adjust T, R, Y accordingly pickedScans.Add(scan); T -= Ci; Y -= Ci; if (T != 0 && Y > 0) { R = 100.00 * Y / T; } } } return pickedScans; } {code} was: The current implementation of table sampling is based on the assumption "Every two consecutive guide posts contains the equal number of rows" which isn't accurate in practice, and once we collect multiple versions of cells and the deleted rows, the thing will become worse. In details, the current implementation of table sampling is (see BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end of function) as described below: # Iterate all parallel scans generated; # For each scan, if getHashHode(start row key of the scan) MOD 100 < tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; otherwise discard this scan. The problem can be formalized as: We have a group of scans and each scan is defined as . Now we want to randomly pick X groups so that the sum of count of rows in the selected groups is close to Y, where Y = the total count of rows of all scans denoted as T * table sampling rate denoted as R (0 <= R <= 100). To resolve the above problem, one of algorithms that we can consider are described below. The core idea is to adjust T, R, Y after each pick, so the new problem is a child problem of the original problem. {code:java} ArrayList TableSampling(ArrayList scans, T, R) { ArrayList pickedScans = new ArrayList(); Y = T * R / 100.00; for (scan in scans) { if (Y <= 0) break; if (getHashCode(Ki) MOD 100 < R) { // then pick this scan, and adjust T, R, Y accordingly pickedScans.Add(scan); T -= Ci; Y -= Ci; if (T != 0 && Y > 0) { R = 100.00 * Y / T; } } } return pickedScans; } {code} > Make Table Sampling algorithm to accommodate to the imbalance row > distribution across guide posts > - > > Key: PHOENIX-4912 > URL: https://issues.apache.org/jira/browse/PHOENIX-4912 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 5.0.0, 4.15.0 >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > The current implementation of table sampling is based on the assumption > "Every two consecutive guide posts contains the equal number of rows" which > isn't accurate in practice, and once we collect multiple versions of cells > and the deleted rows, the thing will become worse. > In details, the current implementation of table sampling is (see > BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end > of function) as described below: > # Iterate all parallel scans generated; > # For each scan, if getHashHode(start row key of the scan) MOD 100 < > tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; > otherwise discard this scan. > The problem can be formalized as: We have a group of scans and each scan is > defined as Ci>. Now we want to randomly pick X groups so that the sum of count of rows
[jira] [Created] (PHOENIX-4916) When collecting statistics, the estimated size of a guide post may only count part of cells of the last row
Bin Shi created PHOENIX-4916: Summary: When collecting statistics, the estimated size of a guide post may only count part of cells of the last row Key: PHOENIX-4916 URL: https://issues.apache.org/jira/browse/PHOENIX-4916 Project: Phoenix Issue Type: Bug Reporter: Bin Shi Assignee: Bin Shi In DefaultStatisticsCollector.collectStatistics(...), it iterate all cells of the current row, once the accumulated estimated size plus the size of the current cell >= guide post width, it skipped all the remaining cells. The result is that he estimated size of a guide post may only count part of cells of the last row. This problem can be ignored in clusters with real data where the guide post width is much bigger than the row size, but it does have impact on unit test and iteration test, because we use very small guide post width in the test which results in inaccuracy of the estimated size of the query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts
[ https://issues.apache.org/jira/browse/PHOENIX-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4912: - Description: The current implementation of table sampling is based on the assumption "Every two consecutive guide posts contains the equal number of rows" which isn't accurate in practice, and once we collect multiple versions of cells and the deleted rows, the thing will become worse. In details, the current implementation of table sampling is (see BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end of function) as described below: # Iterate all parallel scans generated; # For each scan, if getHashHode(start row key of the scan) MOD 100 < tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; otherwise discard this scan. The problem can be formalized as: We have a group of scans and each scan is defined as . Now we want to randomly pick X groups so that the sum of count of rows in the selected groups is close to Y, where Y = the total count of rows of all scans denoted as T * table sampling rate denoted as R (0 <= R <= 100). To resolve the above problem, one of algorithms that we can consider are described below. The core idea is to adjust T, R, Y after each pick, so the new problem is a child problem of the original problem. {code:java} ArrayList TableSampling(ArrayList scans, T, R) { ArrayList pickedScans = new ArrayList(); Y = T * R / 100.00; for (scan in scans) { if (Y <= 0) break; if (getHashCode(Ki) MOD 100 < R) { // then pick this scan, and adjust T, R, Y accordingly pickedScans.Add(scan); T -= Ci; Y -= Ci; if (T != 0 && Y > 0) { R = 100.00 * Y / T; } } } return pickedScans; } {code} was: The current implementation of table sampling is based on the assumption "Every two consecutive guide posts contains the equal number of rows" which isn't accurate in practice, and once we collect multiple versions of cells and the deleted rows, the thing will become worse. In details, the current implementation of table sampling is (see BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end of function) as described below: # Iterate all parallel scans generated; # For each scan, if getHashHode(start row key of the scan) MOD 100 < tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; otherwise discard this scan. The problem can be formalized as: We have a group of scans and each scan is defined as . Now we want to randomly pick X groups so that the sum of count of rows in the selected groups is close to Y, where Y = the total count of rows of all scans T * table sampling rate R. To resolve the above problem, one of algorithms that we can consider are described below: {code:java} ArrayList TableSampling(ArrayList scans, T, R) { ArrayList pickedScans = new ArrayList(); Y = T * R; for (scan in scans) { if (Y <= 0) break; if (getHashCode(Ki) MOD 100 < R) { // then pick this scan, and adjust T, R, Y accordingly pickedScans.Add(scan); T -= Ci; Y -= Ci; if (T != 0 && Y > 0) { R = Y / T; } } } return pickedScans; } {code} > Make Table Sampling algorithm to accommodate to the imbalance row > distribution across guide posts > - > > Key: PHOENIX-4912 > URL: https://issues.apache.org/jira/browse/PHOENIX-4912 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 5.0.0, 4.15.0 >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > The current implementation of table sampling is based on the assumption > "Every two consecutive guide posts contains the equal number of rows" which > isn't accurate in practice, and once we collect multiple versions of cells > and the deleted rows, the thing will become worse. > In details, the current implementation of table sampling is (see > BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end > of function) as described below: > # Iterate all parallel scans generated; > # For each scan, if getHashHode(start row key of the scan) MOD 100 < > tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; > otherwise discard this scan. > The problem can be formalized as: We have a group of scans and each scan is > defined as Ci>. Now we want to randomly pick X groups so that the sum of count of rows > in the selected groups is close to Y, where Y = the total count of rows of > all scans denoted as T * table sampling rate denoted as R (0 <= R <= 100). > To resolve the above
[jira] [Created] (PHOENIX-4914) There is one corner case in which BaseResultIterators.getParallelScans() returns wrong result of the last guide post info update timestamp.
Bin Shi created PHOENIX-4914: Summary: There is one corner case in which BaseResultIterators.getParallelScans() returns wrong result of the last guide post info update timestamp. Key: PHOENIX-4914 URL: https://issues.apache.org/jira/browse/PHOENIX-4914 Project: Phoenix Issue Type: Bug Reporter: Bin Shi Assignee: Bin Shi When I add the following test case to testSelectQueriesWithFilters(...) in [ExplainPlanWithStatsEnabledIT.java|https://github.com/apache/phoenix/pull/347/files#diff-21d3742c352623e12ec4889b0ac4f5d2] in my clean local repository (without any local changes), the highlighted assertion (commented out) will be hit which indicates a bug in BaseResultIterators.getParallelScans() in the current code base. // Query with multiple scan ranges, and each range's start key and end key // are both between data sql = "SELECT a FROM " + tableName + " WHERE K <= 103 AND K >= 101 OR K <= 108 AND K >= 106"; rs = conn.createStatement().executeQuery(sql); i = 0; numRows = 6; int[] result = new int[] \{ 101, 102, 103, 106, 107, 108 }; while (rs.next()) { assertEquals(result[i++], rs.getInt(1)); assertEquals(numRows, i); info = getByteRowEstimates(conn, sql, binds); {color:#8eb021}*// TODO: the original code before this change will hit the following assertion.* {color} {color:#8eb021}*// Need to investigate it.*{color} {color:#8eb021} *// assertTrue(info.getEstimateInfoTs() > 0);*{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4913) UPDATE STATISTICS should run raw scan to collect the deleted rows
[ https://issues.apache.org/jira/browse/PHOENIX-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4913: - Description: In order to truly measure the size of data when calculating guide posts, UPDATE STATISTIC should run raw scan to take into account the deleted rows. For the deleted rows, they will contribute to estimated size of guide post but it has no contribution to the count of rows of guide post. was: In order to truly measure the size of data when calculating guide posts, UPDATE STATISTIC should run raw scan to take into account all versions of cells. For the deleted rows, they will contribute to estimated size of guide post but it has no contribution to the count of rows of guide post. > UPDATE STATISTICS should run raw scan to collect the deleted rows > - > > Key: PHOENIX-4913 > URL: https://issues.apache.org/jira/browse/PHOENIX-4913 > Project: Phoenix > Issue Type: Bug >Affects Versions: 5.0.0 >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > In order to truly measure the size of data when calculating guide posts, > UPDATE STATISTIC should run raw scan to take into account the deleted rows. > For the deleted rows, they will contribute to estimated size of guide post > but it has no contribution to the count of rows of guide post. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-4913) UPDATE STATISTICS should run raw scan to collect the deleted rows
Bin Shi created PHOENIX-4913: Summary: UPDATE STATISTICS should run raw scan to collect the deleted rows Key: PHOENIX-4913 URL: https://issues.apache.org/jira/browse/PHOENIX-4913 Project: Phoenix Issue Type: Bug Affects Versions: 5.0.0 Reporter: Bin Shi Assignee: Bin Shi In order to truly measure the size of data when calculating guide posts, UPDATE STATISTIC should run raw scan to take into account all versions of cells. For the deleted rows, they will contribute to estimated size of guide post but it has no contribution to the count of rows of guide post. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts
[ https://issues.apache.org/jira/browse/PHOENIX-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4912: - Description: The current implementation of table sampling is based on the assumption "Every two consecutive guide posts contains the equal number of rows" which isn't accurate in practice, and once we collect multiple versions of cells and the deleted rows, the thing will become worse. In details, the current implementation of table sampling is (see BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end of function) as described below: # Iterate all parallel scans generated; # For each scan, if getHashHode(start row key of the scan) MOD 100 < tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; otherwise discard this scan. The problem can be formalized as: We have a group of scans and each scan is defined as . Now we want to randomly pick X groups so that the sum of count of rows in the selected groups is close to Y, where Y = the total count of rows of all scans T * table sampling rate R. To resolve the above problem, one of algorithms that we can consider are described below: ArrayList TableSampling(ArrayList scans, T, R) { ArrayList pickedScans = new ArrayList(); Y = T * R; for (scan in scans) { if (Y <= 0) break; if (getHashCode(Ki) MOD 100 < R) { // then pick this scan, and adjust T, R, Y accordingly pickedScans.Add(scan); T -= Ci; Y -= Ci; if (T != 0 && Y > 0) { R = Y / T; } } } return pickedScans; } was: The current implementation of table sampling is based on the assumption "Every two consecutive guide posts contains the equal number of rows" which isn't accurate in practice, and once we collect multiple versions of cells and the deleted rows, the thing will become worse. In details, the current implementation of table sampling is (see BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end of function) as described below: # Iterate all parallel scans generated; # For each scan, if getHashHode(start row key of the scan) MOD 100 < tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; otherwise discard this scan. The problem can be formalized as: We have a group of scans and each scan is defined as . Now we want to randomly pick X groups so that the sum of count of rows in the selected groups is close to Y, where Y = the total count of rows of all scans T * table sampling rate R. To resolve the above problem, one of algorithms that we can consider are described below: ArrayList TableSampling(ArrayList scans, T, R) { ArrayList pickedScans = new ArrayList(); Y = T * R; for (scan in scans) { if (Y <= 0) break; if (getHashCode(Ki) MOD 100 < R) { // then pick this scan, and adjust T, R, Y accordingly pickedScans.Add(scan); T -= Ci; Y -= Ci; if (T != 0 && Y > 0) R = Y / T; } } return pickedScans; } > Make Table Sampling algorithm to accommodate to the imbalance row > distribution across guide posts > - > > Key: PHOENIX-4912 > URL: https://issues.apache.org/jira/browse/PHOENIX-4912 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 5.0.0 >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > The current implementation of table sampling is based on the assumption > "Every two consecutive guide posts contains the equal number of rows" which > isn't accurate in practice, and once we collect multiple versions of cells > and the deleted rows, the thing will become worse. > In details, the current implementation of table sampling is (see > BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end > of function) as described below: > # Iterate all parallel scans generated; > # For each scan, if getHashHode(start row key of the scan) MOD 100 < > tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; > otherwise discard this scan. > The problem can be formalized as: We have a group of scans and each scan is > defined as Ci>. Now we want to randomly pick X groups so that the sum of count of rows > in the selected groups is close to Y, where Y = the total count of rows of > all scans T * table sampling rate R. > To resolve the above problem, one of algorithms that we can consider are > described below: > ArrayList TableSampling(ArrayList scans, T, R) > { > ArrayList pickedScans = new ArrayList(); > Y = T * R; > for (scan in scans) { > if (Y <= 0) break; >
[jira] [Updated] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts
[ https://issues.apache.org/jira/browse/PHOENIX-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4912: - Description: The current implementation of table sampling is based on the assumption "Every two consecutive guide posts contains the equal number of rows" which isn't accurate in practice, and once we collect multiple versions of cells and the deleted rows, the thing will become worse. In details, the current implementation of table sampling is (see BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end of function) as described below: # Iterate all parallel scans generated; # For each scan, if getHashHode(start row key of the scan) MOD 100 < tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; otherwise discard this scan. The problem can be formalized as: We have a group of scans and each scan is defined as . Now we want to randomly pick X groups so that the sum of count of rows in the selected groups is close to Y, where Y = the total count of rows of all scans T * table sampling rate R. To resolve the above problem, one of algorithms that we can consider are described below: ArrayList TableSampling(ArrayList scans, T, R) { ArrayList pickedScans = new ArrayList(); Y = T * R; for (scan in scans) { if (Y <= 0) break; if (getHashCode(Ki) MOD 100 < R) { // then pick this scan, and adjust T, R, Y accordingly pickedScans.Add(scan); T -= Ci; Y -= Ci; if (T != 0 && Y > 0) R = Y / T; } } return pickedScans; } was: The current implementation of table sampling is based on the assumption "Every two consecutive guide posts contains the equal number of rows" which isn't accurate in practice, and once we collect multiple versions of cells and the deleted rows, the thing will become worse. In details, the current implementation of table sampling is (see BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end of function) as described below: # Iterate all parallel scans generated; # For each scan, if getHashHode(start row key of the scan) MOD 100 < tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; otherwise discard this scan. The problem can be formalized as: We have a group of scans and each scan is defined as . Now we want to randomly pick X groups so that the sum of count of rows in the selected groups is close to Y, where Y = the total count of rows of all scans T * table sampling rate R. To resolve the above problem, one of algorithms that we can consider are described below: ArrayList TableSampling(ArrayList scans, T, R) { ArrayList pickedScans = new ArrayList(); Y = T * R; for (scan in scans) { if (Y <= 0) break; if (getHashCode(Ki) MOD 100 < R) { // then pick this scan, and adjust T, R, Y accordingly pickedScans.Add(scan); T -= Ci; Y -= Ci; if (T != 0 && Y > 0) { R = Y / T; } } } return pickedScans; } > Make Table Sampling algorithm to accommodate to the imbalance row > distribution across guide posts > - > > Key: PHOENIX-4912 > URL: https://issues.apache.org/jira/browse/PHOENIX-4912 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 5.0.0 >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > The current implementation of table sampling is based on the assumption > "Every two consecutive guide posts contains the equal number of rows" which > isn't accurate in practice, and once we collect multiple versions of cells > and the deleted rows, the thing will become worse. > In details, the current implementation of table sampling is (see > BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end > of function) as described below: > # Iterate all parallel scans generated; > # For each scan, if getHashHode(start row key of the scan) MOD 100 < > tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; > otherwise discard this scan. > The problem can be formalized as: We have a group of scans and each scan is > defined as Ci>. Now we want to randomly pick X groups so that the sum of count of rows > in the selected groups is close to Y, where Y = the total count of rows of > all scans T * table sampling rate R. > To resolve the above problem, one of algorithms that we can consider are > described below: > ArrayList TableSampling(ArrayList scans, T, R) > { > ArrayList pickedScans = new ArrayList(); > Y = T * R; > for (scan in scans) { > if (Y <= 0) break; >
[jira] [Updated] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts
[ https://issues.apache.org/jira/browse/PHOENIX-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4912: - Description: The current implementation of table sampling is based on the assumption "Every two consecutive guide posts contains the equal number of rows" which isn't accurate in practice, and once we collect multiple versions of cells and the deleted rows, the thing will become worse. In details, the current implementation of table sampling is (see BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end of function) as described below: # Iterate all parallel scans generated; # For each scan, if getHashHode(start row key of the scan) MOD 100 < tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; otherwise discard this scan. The problem can be formalized as: We have a group of scans and each scan is defined as . Now we want to randomly pick X groups so that the sum of count of rows in the selected groups is close to Y, where Y = the total count of rows of all scans T * table sampling rate R. To resolve the above problem, one of algorithms that we can consider are described below: ArrayList TableSampling(ArrayList scans, T, R) { ArrayList pickedScans = new ArrayList(); Y = T * R; for (scan in scans) { if (Y <= 0) break; if (getHashCode(Ki) MOD 100 < R) { // then pick this scan, and adjust T, R, Y accordingly pickedScans.Add(scan); T -= Ci; Y -= Ci; if (T != 0 && Y > 0) { R = Y / T; } } } return pickedScans; } was: The current implementation of table sampling is based on the assumption "Every two consecutive guide posts contains the equal number of rows" which isn't accurate in practice, and once we collect multiple versions of cells and the deleted rows, the thing will become worse. In details, the current implementation of table sampling is (see BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end of function) as described below: # Iterate all parallel scans generated; # For each scan, if getHashHode(start row key of the scan) MOD 100 < tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; otherwise discard this scan. The problem can be formalized as: We have a group of scans and each scan is defined as . Now we want to randomly pick X groups so that the sum of count of rows in the selected groups is close to Y, where Y = the total count of rows of all scans T * table sampling rate R. To resolve the above problem, one of algorithms that we can consider are described below: ArrayList TableSampling(ArrayList scans, T, R) { ArrayList pickedScans = new ArrayList(); Y = T * R; for (scan in scans) { if (Y <= 0) break; if (getHashCode(Ki) MOD 100 < R) { // then pick this scan, and adjust T, R, Y accordingly pickedScans.Add(scan); T -= Ci; Y -= Ci; if (T != 0 && Y > 0) { R = Y / T; } } } return pickedScans; } > Make Table Sampling algorithm to accommodate to the imbalance row > distribution across guide posts > - > > Key: PHOENIX-4912 > URL: https://issues.apache.org/jira/browse/PHOENIX-4912 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 5.0.0 >Reporter: Bin Shi >Assignee: Bin Shi >Priority: Major > > The current implementation of table sampling is based on the assumption > "Every two consecutive guide posts contains the equal number of rows" which > isn't accurate in practice, and once we collect multiple versions of cells > and the deleted rows, the thing will become worse. > In details, the current implementation of table sampling is (see > BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end > of function) as described below: > # Iterate all parallel scans generated; > # For each scan, if getHashHode(start row key of the scan) MOD 100 < > tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; > otherwise discard this scan. > The problem can be formalized as: We have a group of scans and each scan is > defined as Ci>. Now we want to randomly pick X groups so that the sum of count of rows > in the selected groups is close to Y, where Y = the total count of rows of > all scans T * table sampling rate R. > To resolve the above problem, one of algorithms that we can consider are > described below: > ArrayList TableSampling(ArrayList scans, T, R) > { > ArrayList pickedScans = new ArrayList(); > Y = T * R; > for (scan in scans) {
[jira] [Created] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts
Bin Shi created PHOENIX-4912: Summary: Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts Key: PHOENIX-4912 URL: https://issues.apache.org/jira/browse/PHOENIX-4912 Project: Phoenix Issue Type: Improvement Affects Versions: 5.0.0 Reporter: Bin Shi Assignee: Bin Shi The current implementation of table sampling is based on the assumption "Every two consecutive guide posts contains the equal number of rows" which isn't accurate in practice, and once we collect multiple versions of cells and the deleted rows, the thing will become worse. In details, the current implementation of table sampling is (see BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end of function) as described below: # Iterate all parallel scans generated; # For each scan, if getHashHode(start row key of the scan) MOD 100 < tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; otherwise discard this scan. The problem can be formalized as: We have a group of scans and each scan is defined as . Now we want to randomly pick X groups so that the sum of count of rows in the selected groups is close to Y, where Y = the total count of rows of all scans T * table sampling rate R. To resolve the above problem, one of algorithms that we can consider are described below: ArrayList TableSampling(ArrayList scans, T, R) { ArrayList pickedScans = new ArrayList(); Y = T * R; for (scan in scans) { if (Y <= 0) break; if (getHashCode(Ki) MOD 100 < R) { // then pick this scan, and adjust T, R, Y accordingly pickedScans.Add(scan); T -= Ci; Y -= Ci; if (T != 0 && Y > 0) { R = Y / T; } } } return pickedScans; } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4008) UPDATE STATISTIC should collect all versions of cells
[ https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4008: - Attachment: PHOENIX-4008_0920.patch > UPDATE STATISTIC should collect all versions of cells > - > > Key: PHOENIX-4008 > URL: https://issues.apache.org/jira/browse/PHOENIX-4008 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-4008_0918.patch, PHOENIX-4008_0920.patch > > > In order to truly measure the size of data when calculating guide posts, > UPDATE STATISTIC should taken into account all versions of cells. We should > also be setting the max versions on the scan. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4008) UPDATE STATISTIC should collect all versions of cells
[ https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4008: - Description: In order to truly measure the size of data when calculating guide posts, UPDATE STATISTIC should taken into account all versions of cells. We should also be setting the max versions on the scan. (was: In order to truly measure the size of data when calculating guide posts, UPDATE STATISTIC should run a raw scan to taken into account all versions of cells. We should also be setting the max versions on the scan.) Summary: UPDATE STATISTIC should collect all versions of cells (was: UPDATE STATISTIC should run raw scan with all versions of cells) > UPDATE STATISTIC should collect all versions of cells > - > > Key: PHOENIX-4008 > URL: https://issues.apache.org/jira/browse/PHOENIX-4008 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain >Assignee: Bin Shi >Priority: Major > > In order to truly measure the size of data when calculating guide posts, > UPDATE STATISTIC should taken into account all versions of cells. We should > also be setting the max versions on the scan. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4594) Perform binary search on guideposts during query compilation
[ https://issues.apache.org/jira/browse/PHOENIX-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi updated PHOENIX-4594: - Attachment: PHOENIX-4594_0917.patch > Perform binary search on guideposts during query compilation > > > Key: PHOENIX-4594 > URL: https://issues.apache.org/jira/browse/PHOENIX-4594 > Project: Phoenix > Issue Type: Improvement >Reporter: James Taylor >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-4594-0913.patch, PHOENIX-4594_0917.patch > > > If there are many guideposts, performance will suffer during query > compilation because we do a linear search of the guideposts to find the > intersection with the scan ranges. Instead, in > BaseResultIterators.getParallelScans() we should populate an array of > guideposts and perform a binary search. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PHOENIX-4594) Perform binary search on guideposts during query compilation
[ https://issues.apache.org/jira/browse/PHOENIX-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bin Shi reassigned PHOENIX-4594: Assignee: Bin Shi (was: Abhishek Singh Chouhan) Attachment: PHOENIX-4594-0913.patch Please review. Thanks! > Perform binary search on guideposts during query compilation > > > Key: PHOENIX-4594 > URL: https://issues.apache.org/jira/browse/PHOENIX-4594 > Project: Phoenix > Issue Type: Improvement >Reporter: James Taylor >Assignee: Bin Shi >Priority: Major > Attachments: PHOENIX-4594-0913.patch > > > If there are many guideposts, performance will suffer during query > compilation because we do a linear search of the guideposts to find the > intersection with the scan ranges. Instead, in > BaseResultIterators.getParallelScans() we should populate an array of > guideposts and perform a binary search. -- This message was sent by Atlassian JIRA (v7.6.3#76005)