[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108408#comment-16108408 ] Ethan Wang commented on PHOENIX-153: Make sense. Thanks [~jamestaylor] > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108390#comment-16108390 ] James Taylor commented on PHOENIX-153: -- Seems like review comments aren't appearing here in JIRA (maybe because your commit message doesn't include the JIRA number in the expected format), so I'll repeat it here: Let's move the explain for the sampling into the first line, before we recurse down for the other steps. You can put it on the same line, after the "-WAY " like this: CLIENT PARALLEL 1-WAY 0.48-SAMPLED ... Otherwise, users will interpret the sampling as happening after the scan/filtering which isn't the case. > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] phoenix issue #262: PHOENIX 153 implement TABLESAMPLE clause
Github user aertoria commented on the issue: https://github.com/apache/phoenix/pull/262 The last changes (if we are referring to explain plan change etc) has already been done and went with the last commits. I'm now rebase with the latest phoenix master and squash all commits into one. Thanks! @JamesRTaylor --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] phoenix issue #262: PHOENIX 153 implement TABLESAMPLE clause
Github user JamesRTaylor commented on the issue: https://github.com/apache/phoenix/pull/262 Ping @aertoria - would you have a few spare cycles to make that last change? Also, please squash all commits into one and amend your commit message to be prefixed with PHOENIX-153 (i.e. include the dash). Otherwise, we the pull request isn't tied to the JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause
[ https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108354#comment-16108354 ] ASF GitHub Bot commented on PHOENIX-153: Github user JamesRTaylor commented on the issue: https://github.com/apache/phoenix/pull/262 Ping @aertoria - would you have a few spare cycles to make that last change? Also, please squash all commits into one and amend your commit message to be prefixed with PHOENIX-153 (i.e. include the dash). Otherwise, we the pull request isn't tied to the JIRA. > Implement TABLESAMPLE clause > > > Key: PHOENIX-153 > URL: https://issues.apache.org/jira/browse/PHOENIX-153 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: enhancement > Attachments: Sampling_Accuracy_Performance.jpg > > > Support the standard SQL TABLESAMPLE clause by implementing a filter that > uses a skip next hint based on the region boundaries of the table to only > return n rows per region. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT
[ https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108353#comment-16108353 ] James Taylor commented on PHOENIX-418: -- Let's just keep it simple and support APPROX_COUNT_DISTINCT. That way we don't need any grammar changes. > Support approximate COUNT DISTINCT > -- > > Key: PHOENIX-418 > URL: https://issues.apache.org/jira/browse/PHOENIX-418 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: gsoc2016 > > Support an "approximation" of count distinct to prevent having to hold on to > all distinct values (since this will not scale well when the number of > distinct values is huge). The Apache Drill folks have had some interesting > discussions on this > [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E). > They recommend using [Welford's > method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm). > I'm open to having a config option that uses exact versus approximate. I > don't have experience implementing an approximate implementation, so I'm not > sure how much state is required to keep on the server and return to the > client (other than realizing it'd be much less that returning all distinct > values and their counts). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT
[ https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108332#comment-16108332 ] Ethan Wang commented on PHOENIX-418: +1 on {quote}select count(distinct name) from person APPROXIMATE () select count(distinct name) from person APPROXIMATE (ALGORITHM 'hll') select count(distinct name) from person APPROXIMATE (ALGORITHM 'ABC' WITHIN 10 PERCENT) select count(distinct name) APPROXIMATE () from person select count(distinct name) APPROXIMATE (ALGORITHM 'hll') from person select count(distinct name) APPROXIMATE (ALGORITHM 'ABC' WITHIN 10 PERCENT) from person {quote} The current patch that is in progress basically align with this suggestion, supporting both APPROX_COUNT_DISTINCT(xxx) and select count(name) from person APPROXIMATE ('hll') Also, for hyperLogLog algorithm, suggesting PHOENIX to be as similar as Druid(CALCITE-1588), in that: 1, not including accuracy option for user to specify. 2, hard coded the two parameters during HLL initialization. (i.e., the precision value for the normal set and the precision value for the sparse set) > Support approximate COUNT DISTINCT > -- > > Key: PHOENIX-418 > URL: https://issues.apache.org/jira/browse/PHOENIX-418 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: gsoc2016 > > Support an "approximation" of count distinct to prevent having to hold on to > all distinct values (since this will not scale well when the number of > distinct values is huge). The Apache Drill folks have had some interesting > discussions on this > [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E). > They recommend using [Welford's > method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm). > I'm open to having a config option that uses exact versus approximate. I > don't have experience implementing an approximate implementation, so I'm not > sure how much state is required to keep on the server and return to the > client (other than realizing it'd be much less that returning all distinct > values and their counts). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (PHOENIX-4052) Create the correct index row mutations for out-of-order data mutations
[ https://issues.apache.org/jira/browse/PHOENIX-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107973#comment-16107973 ] James Taylor edited comment on PHOENIX-4052 at 7/31/17 9:25 PM: The following test, slightly different than the description above, reproduces an issue with an extra row with an invalid row key ending up in the index table: {code} @Test public void testOutOfOrderDelete() throws Exception { String tableName = generateUniqueName(); String indexName = generateUniqueName(); Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES); long ts = 1000; props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); Connection conn = DriverManager.getConnection(getUrl(), props); conn.createStatement().execute("CREATE TABLE " + tableName + "(k CHAR(1) PRIMARY KEY, v VARCHAR) COLUMN_ENCODED_BYTES = 0"); conn.close(); ts = 1010; props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); conn = DriverManager.getConnection(getUrl(), props); conn.createStatement().execute("CREATE INDEX " + indexName + " ON " + tableName + "(v)"); conn.close(); ts = 1020; props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); conn = DriverManager.getConnection(getUrl(), props); conn.createStatement().execute("UPSERT INTO " + tableName + " VALUES('a','a')"); conn.commit(); conn.close(); ts = 1040; props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); conn = DriverManager.getConnection(getUrl(), props); conn.createStatement().execute("DELETE FROM " + tableName + " WHERE k='a'"); conn.commit(); conn.close(); ts = 1030; props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); conn = DriverManager.getConnection(getUrl(), props); conn.createStatement().execute("UPSERT INTO " + tableName + " VALUES('a','bbb')"); conn.commit(); conn.close(); TestUtil.dumpTable(conn.unwrap(PhoenixConnection.class).getQueryServices().getTable(Bytes.toBytes(tableName))); TestUtil.dumpTable(conn.unwrap(PhoenixConnection.class).getQueryServices().getTable(Bytes.toBytes(indexName))); ts = 1050; props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); conn = DriverManager.getConnection(getUrl(), props); long count1 = getRowCount(conn, tableName); long count2 = getRowCount(conn, indexName); assertTrue("Expected table row count ( " + count1 + ") to match index row count (" + count2 + ")", count1 == count2); conn.close(); /** * dumping T01;hconnection-0x8f57e4c ** a/0:/1040/DeleteFamily/vlen=0/seqid=0 a/0:V/1030/Put/vlen=3/seqid=0 a/0:V/1020/Put/vlen=1/seqid=0 a/0:_0/1030/Put/vlen=1/seqid=0 a/0:_0/1020/Put/vlen=1/seqid=0 --- dumping T02;hconnection-0x8f57e4c ** \x00a/0:_0/1040/Put/vlen=2/seqid=0 a\x00a/0:/1040/DeleteFamily/vlen=0/seqid=0 a\x00a/0:/1030/DeleteFamily/vlen=0/seqid=0 a\x00a/0:_0/1020/Put/vlen=2/seqid=0 --- */ } private static long getRowCount(Connection conn, String tableName) throws SQLException { ResultSet rs = conn.createStatement().executeQuery("SELECT /*+ NO_INDEX */ count(*) FROM " + tableName); assertTrue(rs.next()); return rs.getLong(1); } {code} Assertion that occurs is: {code} java.lang.AssertionError: Expected table row count ( 0) to match index row count (1) at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.phoenix.end2end.index.OutOfOrderMutationsIT.testOutOfOrderDelete(OutOfOrderMutationsIT.java:135) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at
[jira] [Commented] (PHOENIX-4052) Create the correct index row mutations for out-of-order data mutations
[ https://issues.apache.org/jira/browse/PHOENIX-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107973#comment-16107973 ] James Taylor commented on PHOENIX-4052: --- The following test, slightly different than the description above, reproduces an issue with an extra row with an invalid row key ending up in the index table: {code} @Test public void testOutOfOrderDelete() throws Exception { String tableName = generateUniqueName(); String indexName = generateUniqueName(); Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES); long ts = 1000; props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); Connection conn = DriverManager.getConnection(getUrl(), props); conn.createStatement().execute("CREATE TABLE " + tableName + "(k CHAR(1) PRIMARY KEY, v VARCHAR) COLUMN_ENCODED_BYTES = 0"); conn.close(); ts = 1010; props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); conn = DriverManager.getConnection(getUrl(), props); conn.createStatement().execute("CREATE INDEX " + indexName + " ON " + tableName + "(v)"); conn.close(); ts = 1020; props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); conn = DriverManager.getConnection(getUrl(), props); conn.createStatement().execute("UPSERT INTO " + tableName + " VALUES('a','a')"); conn.commit(); conn.close(); ts = 1040; props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); conn = DriverManager.getConnection(getUrl(), props); conn.createStatement().execute("DELETE FROM " + tableName + " WHERE k='a'"); conn.commit(); conn.close(); ts = 1030; props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); conn = DriverManager.getConnection(getUrl(), props); conn.createStatement().execute("UPSERT INTO " + tableName + " VALUES('a','bbb')"); conn.commit(); conn.close(); TestUtil.dumpTable(conn.unwrap(PhoenixConnection.class).getQueryServices().getTable(Bytes.toBytes(tableName))); TestUtil.dumpTable(conn.unwrap(PhoenixConnection.class).getQueryServices().getTable(Bytes.toBytes(indexName))); ts = 1050; props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); conn = DriverManager.getConnection(getUrl(), props); long count1 = getRowCount(conn, tableName); long count2 = getRowCount(conn, indexName); assertTrue("Expected table row count ( " + count1 + ") to match index row count (" + count2 + ")", count1 == count2); conn.close(); /** * dumping T01;hconnection-0x8f57e4c ** a/0:/1040/DeleteFamily/vlen=0/seqid=0 a/0:V/1030/Put/vlen=3/seqid=0 a/0:V/1020/Put/vlen=1/seqid=0 a/0:_0/1030/Put/vlen=1/seqid=0 a/0:_0/1020/Put/vlen=1/seqid=0 --- dumping T02;hconnection-0x8f57e4c ** \x00a/0:_0/1040/Put/vlen=2/seqid=0 a\x00a/0:/1040/DeleteFamily/vlen=0/seqid=0 a\x00a/0:/1030/DeleteFamily/vlen=0/seqid=0 a\x00a/0:_0/1020/Put/vlen=2/seqid=0 --- */ } private static long getRowCount(Connection conn, String tableName) throws SQLException { ResultSet rs = conn.createStatement().executeQuery("SELECT /*+ NO_INDEX */ count(*) FROM " + tableName); assertTrue(rs.next()); return rs.getLong(1); } {code} Assertion that occurs is: {code} java.lang.AssertionError: Expected table row count ( 0) to match index row count (1) at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.phoenix.end2end.index.OutOfOrderMutationsIT.testOutOfOrderDelete(OutOfOrderMutationsIT.java:135) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at
[jira] [Resolved] (PHOENIX-4022) Add PhoenixMetricsLog interface that can be used to log metrics for queries and mutations.
[ https://issues.apache.org/jira/browse/PHOENIX-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva resolved PHOENIX-4022. - Resolution: Fixed Committed v3 patch. > Add PhoenixMetricsLog interface that can be used to log metrics for queries > and mutations. > --- > > Key: PHOENIX-4022 > URL: https://issues.apache.org/jira/browse/PHOENIX-4022 > Project: Phoenix > Issue Type: New Feature >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Fix For: 4.12.0 > > Attachments: PHOENIX-4022.patch, PHOENIX-4022-v2.patch, > PHOENIX-4022-v3.patch, PHOENIX-4022-v4.patch > > > Create a wrapper for PhoenixConnection, PhoenixStatement, > PhoenixPreparedStatement and PhoenixResultSet that automatically calls the > PhoenixMetricsLog logging methods so users don't have to instrument this > themselves. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing
[ https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107848#comment-16107848 ] Hudson commented on PHOENIX-4053: - FAILURE: Integrated in Jenkins build Phoenix-master #1723 (See [https://builds.apache.org/job/Phoenix-master/1723/]) PHOENIX-4053 Lock row exclusively when necessary for mutable secondary (jamestaylor: rev 54d9e1c36c46e7c50c29def08cf866599c7a4e45) * (add) phoenix-core/src/it/java/org/apache/phoenix/end2end/ConcurrentMutationsIT.java * (edit) phoenix-core/src/test/java/org/apache/phoenix/util/TestUtil.java * (add) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/LockManager.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/Indexer.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/builder/IndexBuildManager.java > Lock row exclusively when necessary for mutable secondary indexing > -- > > Key: PHOENIX-4053 > URL: https://issues.apache.org/jira/browse/PHOENIX-4053 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Fix For: 4.12.0, 4.11.1 > > Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, > PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, > PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, > PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, > PHOENIX-4053_v6.patch, PHOENIX-4053_v7.patch, PHOENIX-4053_wip.patch > > > From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate > call is made (see HBASE-18474). The mutable secondary index (global and > local) depend on this to get a consistent snapshot of a row between the point > when the current row value is looked up, and when the new row is written, > until the mvcc is advanced. Otherwise, a subsequent update to a row may not > see the current row state. Even with pre HBase 1.2 releases, the lock isn't > held long enough for us. We need to hold the locks from the start of the > preBatchMutate (when we read the data table to get the prior row values) > until the mvcc is advanced (beginning of postBatchMutateIndispensably). > Given the above, it's best if Phoenix manages the row locking itself > (mimicing the current HBase mechanism). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT
[ https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107832#comment-16107832 ] Julian Hyde commented on PHOENIX-418: - In CALCITE-1588 [~gian] points out that Oracle, BigQuery and MemSQL support {{APPROX_COUNT_DISTINCT}}. I also see it in VoltDB. A quick survey of other databases: * Vertica has {{APPROXIMATE_COUNT_DISTINCT}} * Redshift has {{[ APPROXIMATE ] COUNT ( [ DISTINCT | ALL ] * | expression )}}. * In PostgreSQL you can bolt on your own hyperloglog function, but there doesn't seem to have a unified approach. * I don't see anything in DB2 or MySQL I think that is a sufficient de facto standard to support {{APPROX_COUNT_DISTINCT}} in Calcite. Also I think Calcite should support an APPROXIMATE clause. Allowed both as a clause in the SELECT statement, and also after the aggregate function (but before the OVER clause, if present). Algorithm and any parameters go inside parentheses. Examples: {code} select count(distinct name) from person APPROXIMATE () select count(distinct name) from person APPROXIMATE (ALGORITHM 'hll') select count(distinct name) from person APPROXIMATE (ALGORITHM 'ABC' WITHIN 10 PERCENT) select count(distinct name) APPROXIMATE () from person select count(distinct name) APPROXIMATE (ALGORITHM 'hll') from person select count(distinct name) APPROXIMATE (ALGORITHM 'ABC' WITHIN 10 PERCENT) from person {code} For now, I think Phoenix should support APPROX_COUNT_DISTINCT. We could add support for the APPROXIMATE clause later. > Support approximate COUNT DISTINCT > -- > > Key: PHOENIX-418 > URL: https://issues.apache.org/jira/browse/PHOENIX-418 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: gsoc2016 > > Support an "approximation" of count distinct to prevent having to hold on to > all distinct values (since this will not scale well when the number of > distinct values is huge). The Apache Drill folks have had some interesting > discussions on this > [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E). > They recommend using [Welford's > method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm). > I'm open to having a config option that uses exact versus approximate. I > don't have experience implementing an approximate implementation, so I'm not > sure how much state is required to keep on the server and return to the > client (other than realizing it'd be much less that returning all distinct > values and their counts). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing
[ https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-4053: -- Attachment: PHOENIX-4053_v7.patch Attaching final patch. Thanks for the reviews, [~apurtell] & [~samarthjain]. > Lock row exclusively when necessary for mutable secondary indexing > -- > > Key: PHOENIX-4053 > URL: https://issues.apache.org/jira/browse/PHOENIX-4053 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, > PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, > PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, > PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, > PHOENIX-4053_v6.patch, PHOENIX-4053_v7.patch, PHOENIX-4053_wip.patch > > > From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate > call is made (see HBASE-18474). The mutable secondary index (global and > local) depend on this to get a consistent snapshot of a row between the point > when the current row value is looked up, and when the new row is written, > until the mvcc is advanced. Otherwise, a subsequent update to a row may not > see the current row state. Even with pre HBase 1.2 releases, the lock isn't > held long enough for us. We need to hold the locks from the start of the > preBatchMutate (when we read the data table to get the prior row values) > until the mvcc is advanced (beginning of postBatchMutateIndispensably). > Given the above, it's best if Phoenix manages the row locking itself > (mimicing the current HBase mechanism). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PHOENIX-4055) Consider IndexFailurePolicy before throwing when lock cannot be gotten
James Taylor created PHOENIX-4055: - Summary: Consider IndexFailurePolicy before throwing when lock cannot be gotten Key: PHOENIX-4055 URL: https://issues.apache.org/jira/browse/PHOENIX-4055 Project: Phoenix Issue Type: Bug Reporter: James Taylor We shouldn't necessarily be throwing in Indexer.preBatchMutateWithExceptions when we cannot acquire a lock, since we're essentially failing the data write because we can't do the locking necessary for performing consistent index maintenance. We'd ideally want to go through the IndexFailurePolicy to determine whether or not we swallow that exception. We currently cannot ignore this lock failure as we lack the ability to keep state between batch mutation coprocessor calls (HBASE-18127). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing
[ https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107676#comment-16107676 ] Samarth Jain commented on PHOENIX-4053: --- Thanks for the explanation, James. +1 to the patch after the comment change. Looks great! > Lock row exclusively when necessary for mutable secondary indexing > -- > > Key: PHOENIX-4053 > URL: https://issues.apache.org/jira/browse/PHOENIX-4053 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, > PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, > PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, > PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, > PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch > > > From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate > call is made (see HBASE-18474). The mutable secondary index (global and > local) depend on this to get a consistent snapshot of a row between the point > when the current row value is looked up, and when the new row is written, > until the mvcc is advanced. Otherwise, a subsequent update to a row may not > see the current row state. Even with pre HBase 1.2 releases, the lock isn't > held long enough for us. We need to hold the locks from the start of the > preBatchMutate (when we read the data table to get the prior row values) > until the mvcc is advanced (beginning of postBatchMutateIndispensably). > Given the above, it's best if Phoenix manages the row locking itself > (mimicing the current HBase mechanism). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing
[ https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107656#comment-16107656 ] Andrew Purtell commented on PHOENIX-4053: - Looked at the v6 patch. I like that you're passing 'rowLockWaitDuration' in lock acquisition. Will make updating for HBASE-17210 when running on hbase 1.4+ easy. +1 > Lock row exclusively when necessary for mutable secondary indexing > -- > > Key: PHOENIX-4053 > URL: https://issues.apache.org/jira/browse/PHOENIX-4053 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, > PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, > PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, > PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, > PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch > > > From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate > call is made (see HBASE-18474). The mutable secondary index (global and > local) depend on this to get a consistent snapshot of a row between the point > when the current row value is looked up, and when the new row is written, > until the mvcc is advanced. Otherwise, a subsequent update to a row may not > see the current row state. Even with pre HBase 1.2 releases, the lock isn't > held long enough for us. We need to hold the locks from the start of the > preBatchMutate (when we read the data table to get the prior row values) > until the mvcc is advanced (beginning of postBatchMutateIndispensably). > Given the above, it's best if Phoenix manages the row locking itself > (mimicing the current HBase mechanism). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing
[ https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107614#comment-16107614 ] James Taylor commented on PHOENIX-4053: --- The lockManager.lockRow() call throws an unchecked exception if it times out or is interrupted. We need to "remember" that failure here so we don't attempt to unlock the row in the postBatchMutateIndispensibly call (which gets called in a finally block). Will add that to the comment to make it more clear. > Lock row exclusively when necessary for mutable secondary indexing > -- > > Key: PHOENIX-4053 > URL: https://issues.apache.org/jira/browse/PHOENIX-4053 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, > PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, > PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, > PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, > PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch > > > From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate > call is made (see HBASE-18474). The mutable secondary index (global and > local) depend on this to get a consistent snapshot of a row between the point > when the current row value is looked up, and when the new row is written, > until the mvcc is advanced. Otherwise, a subsequent update to a row may not > see the current row state. Even with pre HBase 1.2 releases, the lock isn't > held long enough for us. We need to hold the locks from the start of the > preBatchMutate (when we read the data table to get the prior row values) > until the mvcc is advanced (beginning of postBatchMutateIndispensably). > Given the above, it's best if Phoenix manages the row locking itself > (mimicing the current HBase mechanism). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing
[ https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107602#comment-16107602 ] Samarth Jain commented on PHOENIX-4053: --- [~jamestaylor], Regarding your comment here: {code} + if (!success) { + // We're throwing here, so we won't be locking any more rows. By setting the + // status to FAILURE, we prevent the attempt to unlock rows we've never + // locked when postBatchMutateIndispensably is executed. We're very + // limited about the state that can be shared between the batch mutate + // coprocessor calls (see HBASE-18482). + // Note that we shouldn't necessarily be throwing here, since we're + // essentially failing the data write because we can't do the locking + // necessary for performing consistent index maintenance. We'd ideally + // want to go through the index failure policy to determine what action + // to perform. We currently cannot ignore this lock failure + for (int j = i; j < miniBatchOp.size(); j++) { + miniBatchOp.setOperationStatus(j,FAILURE); + } + } {code} I don't see a throw statement here. My guess is there is some code in HBase which is going through the operation status array and taking appropriate action? I think it would make sense for the comment to state that. FWIW, I see callers of region.batchMutate within our Phoenix co-processors (UngroupedAggregateRegionObserver#commitBatch() and UngroupedAggregateRegionObserver#rebuildIndices) that don't seem to be looking at the return status. Maybe they should? > Lock row exclusively when necessary for mutable secondary indexing > -- > > Key: PHOENIX-4053 > URL: https://issues.apache.org/jira/browse/PHOENIX-4053 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, > PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, > PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, > PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, > PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch > > > From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate > call is made (see HBASE-18474). The mutable secondary index (global and > local) depend on this to get a consistent snapshot of a row between the point > when the current row value is looked up, and when the new row is written, > until the mvcc is advanced. Otherwise, a subsequent update to a row may not > see the current row state. Even with pre HBase 1.2 releases, the lock isn't > held long enough for us. We need to hold the locks from the start of the > preBatchMutate (when we read the data table to get the prior row values) > until the mvcc is advanced (beginning of postBatchMutateIndispensably). > Given the above, it's best if Phoenix manages the row locking itself > (mimicing the current HBase mechanism). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing
[ https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107564#comment-16107564 ] Hadoop QA commented on PHOENIX-4053: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12879662/PHOENIX-4053-4.x-HBase-0.98_v6.patch against 4.x-HBase-0.98 branch at commit 9c458fa3d3ecdeb17de5b717c26cfdea1608c358. ATTACHMENT ID: 12879662 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/1240//console This message is automatically generated. > Lock row exclusively when necessary for mutable secondary indexing > -- > > Key: PHOENIX-4053 > URL: https://issues.apache.org/jira/browse/PHOENIX-4053 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, > PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, > PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, > PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, > PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch > > > From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate > call is made (see HBASE-18474). The mutable secondary index (global and > local) depend on this to get a consistent snapshot of a row between the point > when the current row value is looked up, and when the new row is written, > until the mvcc is advanced. Otherwise, a subsequent update to a row may not > see the current row state. Even with pre HBase 1.2 releases, the lock isn't > held long enough for us. We need to hold the locks from the start of the > preBatchMutate (when we read the data table to get the prior row values) > until the mvcc is advanced (beginning of postBatchMutateIndispensably). > Given the above, it's best if Phoenix manages the row locking itself > (mimicing the current HBase mechanism). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing
[ https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-4053: -- Attachment: PHOENIX-4053-4.x-HBase-0.98_v6.patch Attaching 0.98 version of v6 patch. > Lock row exclusively when necessary for mutable secondary indexing > -- > > Key: PHOENIX-4053 > URL: https://issues.apache.org/jira/browse/PHOENIX-4053 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, > PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, > PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, > PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, > PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch > > > From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate > call is made (see HBASE-18474). The mutable secondary index (global and > local) depend on this to get a consistent snapshot of a row between the point > when the current row value is looked up, and when the new row is written, > until the mvcc is advanced. Otherwise, a subsequent update to a row may not > see the current row state. Even with pre HBase 1.2 releases, the lock isn't > held long enough for us. We need to hold the locks from the start of the > preBatchMutate (when we read the data table to get the prior row values) > until the mvcc is advanced (beginning of postBatchMutateIndispensably). > Given the above, it's best if Phoenix manages the row locking itself > (mimicing the current HBase mechanism). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing
[ https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-4053: -- Attachment: PHOENIX-4053_v6.patch Attaching slightly tweaked patch that passes in row lock wait duration so that LockManager can be agnostic about RPC duration (HBASE-17210). Would like to get this committed soon so we can get a perf run on it - can you give this a quick look, [~samarthjain] or [~apurtell]? > Lock row exclusively when necessary for mutable secondary indexing > -- > > Key: PHOENIX-4053 > URL: https://issues.apache.org/jira/browse/PHOENIX-4053 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor > Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, > PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, > PHOENIX-4053_v2.patch, PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, > PHOENIX-4053_v5.patch, PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch > > > From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate > call is made (see HBASE-18474). The mutable secondary index (global and > local) depend on this to get a consistent snapshot of a row between the point > when the current row value is looked up, and when the new row is written, > until the mvcc is advanced. Otherwise, a subsequent update to a row may not > see the current row state. Even with pre HBase 1.2 releases, the lock isn't > held long enough for us. We need to hold the locks from the start of the > preBatchMutate (when we read the data table to get the prior row values) > until the mvcc is advanced (beginning of postBatchMutateIndispensably). > Given the above, it's best if Phoenix manages the row locking itself > (mimicing the current HBase mechanism). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT
[ https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106864#comment-16106864 ] Ethan Wang commented on PHOENIX-418: Regarding the syntax of approximate discount. Carry on from the discussion from PHOENIX-3390, purposing the syntax to be Original cardinality count function: select count(distinct name) from person With approximate: select count(distinct name) from person APPROXIMATE select count(distinct name) from person APPROXIMATE 'hll' select count(distinct name) from person APPROXIMATE 'algorithm ABC' (WITHIN 10 PERCENT) > Support approximate COUNT DISTINCT > -- > > Key: PHOENIX-418 > URL: https://issues.apache.org/jira/browse/PHOENIX-418 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: gsoc2016 > > Support an "approximation" of count distinct to prevent having to hold on to > all distinct values (since this will not scale well when the number of > distinct values is huge). The Apache Drill folks have had some interesting > discussions on this > [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E). > They recommend using [Welford's > method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm). > I'm open to having a config option that uses exact versus approximate. I > don't have experience implementing an approximate implementation, so I'm not > sure how much state is required to keep on the server and return to the > client (other than realizing it'd be much less that returning all distinct > values and their counts). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-3390) Custom UDAF for HyperLogLogPlus
[ https://issues.apache.org/jira/browse/PHOENIX-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106844#comment-16106844 ] Ethan Wang commented on PHOENIX-3390: - Marking this ticket duplicate with PHOENIX-418. For "Supporting approximate COUNT DISTINCT", let us continue the implementation updates(and discussions) there at PHOENIX-418. For another feature which is to expose the raw hyperLogLog hash binary directly to the user, that will continue on this ticket. > Custom UDAF for HyperLogLogPlus > --- > > Key: PHOENIX-3390 > URL: https://issues.apache.org/jira/browse/PHOENIX-3390 > Project: Phoenix > Issue Type: New Feature >Reporter: Swapna Kasula >Assignee: Ethan Wang >Priority: Minor > > With ref # PHOENIX-2069 > Custome UDAF to aggregate/union of Hyperloglog's of a column and returns a > Hyperloglog. > select hllUnion(col1) from table; //returns a Hyperloglog, which is the > union of all hyperloglog's from all rows for column 'col1' -- This message was sent by Atlassian JIRA (v6.4.14#64029)