[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause

2017-07-31 Thread Ethan Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108408#comment-16108408
 ] 

Ethan Wang commented on PHOENIX-153:


Make sense. Thanks [~jamestaylor]

> Implement TABLESAMPLE clause
> 
>
> Key: PHOENIX-153
> URL: https://issues.apache.org/jira/browse/PHOENIX-153
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: Ethan Wang
>  Labels: enhancement
> Attachments: Sampling_Accuracy_Performance.jpg
>
>
> Support the standard SQL TABLESAMPLE clause by implementing a filter that 
> uses a skip next hint based on the region boundaries of the table to only 
> return n rows per region.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause

2017-07-31 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108390#comment-16108390
 ] 

James Taylor commented on PHOENIX-153:
--

Seems like review comments aren't appearing here in JIRA (maybe because your 
commit message doesn't include the JIRA number in the expected format), so I'll 
repeat it here:

Let's move the explain for the sampling into the first line, before we recurse 
down for the other steps. You can put it on the same line, after the "-WAY " 
like this:

CLIENT PARALLEL 1-WAY 0.48-SAMPLED ...

Otherwise, users will interpret the sampling as happening after the 
scan/filtering which isn't the case.

> Implement TABLESAMPLE clause
> 
>
> Key: PHOENIX-153
> URL: https://issues.apache.org/jira/browse/PHOENIX-153
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: Ethan Wang
>  Labels: enhancement
> Attachments: Sampling_Accuracy_Performance.jpg
>
>
> Support the standard SQL TABLESAMPLE clause by implementing a filter that 
> uses a skip next hint based on the region boundaries of the table to only 
> return n rows per region.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] phoenix issue #262: PHOENIX 153 implement TABLESAMPLE clause

2017-07-31 Thread aertoria
Github user aertoria commented on the issue:

https://github.com/apache/phoenix/pull/262
  
The last changes (if we are referring to explain plan change etc) has 
already been done and went with the last commits.

I'm now rebase with the latest phoenix master and squash all commits into 
one.

Thanks! @JamesRTaylor 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] phoenix issue #262: PHOENIX 153 implement TABLESAMPLE clause

2017-07-31 Thread JamesRTaylor
Github user JamesRTaylor commented on the issue:

https://github.com/apache/phoenix/pull/262
  
Ping @aertoria - would you have a few spare cycles to make that last 
change? Also, please squash all commits into one and amend your commit message 
to be prefixed with PHOENIX-153 (i.e. include the dash). Otherwise, we the pull 
request isn't tied to the JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (PHOENIX-153) Implement TABLESAMPLE clause

2017-07-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108354#comment-16108354
 ] 

ASF GitHub Bot commented on PHOENIX-153:


Github user JamesRTaylor commented on the issue:

https://github.com/apache/phoenix/pull/262
  
Ping @aertoria - would you have a few spare cycles to make that last 
change? Also, please squash all commits into one and amend your commit message 
to be prefixed with PHOENIX-153 (i.e. include the dash). Otherwise, we the pull 
request isn't tied to the JIRA.


> Implement TABLESAMPLE clause
> 
>
> Key: PHOENIX-153
> URL: https://issues.apache.org/jira/browse/PHOENIX-153
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: Ethan Wang
>  Labels: enhancement
> Attachments: Sampling_Accuracy_Performance.jpg
>
>
> Support the standard SQL TABLESAMPLE clause by implementing a filter that 
> uses a skip next hint based on the region boundaries of the table to only 
> return n rows per region.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT

2017-07-31 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108353#comment-16108353
 ] 

James Taylor commented on PHOENIX-418:
--

Let's just keep it simple and support APPROX_COUNT_DISTINCT. That way we don't 
need any grammar changes.

> Support approximate COUNT DISTINCT
> --
>
> Key: PHOENIX-418
> URL: https://issues.apache.org/jira/browse/PHOENIX-418
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: Ethan Wang
>  Labels: gsoc2016
>
> Support an "approximation" of count distinct to prevent having to hold on to 
> all distinct values (since this will not scale well when the number of 
> distinct values is huge). The Apache Drill folks have had some interesting 
> discussions on this 
> [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E).
>  They recommend using  [Welford's 
> method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm).
>  I'm open to having a config option that uses exact versus approximate. I 
> don't have experience implementing an approximate implementation, so I'm not 
> sure how much state is required to keep on the server and return to the 
> client (other than realizing it'd be much less that returning all distinct 
> values and their counts).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT

2017-07-31 Thread Ethan Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108332#comment-16108332
 ] 

Ethan Wang commented on PHOENIX-418:


+1 on 
{quote}select count(distinct name)
from person
APPROXIMATE ()

select count(distinct name)
from person
APPROXIMATE (ALGORITHM 'hll')

select count(distinct name)
from person
APPROXIMATE (ALGORITHM 'ABC' WITHIN 10 PERCENT)

select count(distinct name) APPROXIMATE ()
from person

select count(distinct name) APPROXIMATE (ALGORITHM 'hll')
from person

select count(distinct name)
  APPROXIMATE (ALGORITHM 'ABC' WITHIN 10 PERCENT)
from person {quote}


The current patch that is in progress basically align with this suggestion, 
supporting both 
APPROX_COUNT_DISTINCT(xxx) 
and 
select count(name) from person APPROXIMATE ('hll')

Also, for hyperLogLog algorithm, suggesting PHOENIX to be as similar as 
Druid(CALCITE-1588), in that:
1, not including accuracy option for user to specify.
2, hard coded the two parameters during HLL initialization. (i.e., the 
precision value for the normal set and the precision value for the sparse set)


> Support approximate COUNT DISTINCT
> --
>
> Key: PHOENIX-418
> URL: https://issues.apache.org/jira/browse/PHOENIX-418
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: Ethan Wang
>  Labels: gsoc2016
>
> Support an "approximation" of count distinct to prevent having to hold on to 
> all distinct values (since this will not scale well when the number of 
> distinct values is huge). The Apache Drill folks have had some interesting 
> discussions on this 
> [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E).
>  They recommend using  [Welford's 
> method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm).
>  I'm open to having a config option that uses exact versus approximate. I 
> don't have experience implementing an approximate implementation, so I'm not 
> sure how much state is required to keep on the server and return to the 
> client (other than realizing it'd be much less that returning all distinct 
> values and their counts).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (PHOENIX-4052) Create the correct index row mutations for out-of-order data mutations

2017-07-31 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107973#comment-16107973
 ] 

James Taylor edited comment on PHOENIX-4052 at 7/31/17 9:25 PM:


The following test, slightly different than the description above, reproduces 
an issue with an extra row with an invalid row key ending up in the index table:
{code}
@Test
public void testOutOfOrderDelete() throws Exception {
String tableName = generateUniqueName();
String indexName = generateUniqueName();
Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
long ts = 1000;
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
Connection conn = DriverManager.getConnection(getUrl(), props); 
conn.createStatement().execute("CREATE TABLE " + tableName + "(k 
CHAR(1) PRIMARY KEY, v VARCHAR) COLUMN_ENCODED_BYTES = 0");
conn.close();

ts = 1010;
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
conn = DriverManager.getConnection(getUrl(), props);
conn.createStatement().execute("CREATE INDEX " + indexName + " ON " + 
tableName + "(v)");
conn.close();

ts = 1020;
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
conn = DriverManager.getConnection(getUrl(), props);
conn.createStatement().execute("UPSERT INTO " + tableName + " 
VALUES('a','a')");
conn.commit();
conn.close();

ts = 1040;
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
conn = DriverManager.getConnection(getUrl(), props);
conn.createStatement().execute("DELETE FROM " + tableName + " WHERE 
k='a'");
conn.commit();
conn.close();

ts = 1030;
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
conn = DriverManager.getConnection(getUrl(), props);
conn.createStatement().execute("UPSERT INTO " + tableName + " 
VALUES('a','bbb')");
conn.commit();
conn.close();


TestUtil.dumpTable(conn.unwrap(PhoenixConnection.class).getQueryServices().getTable(Bytes.toBytes(tableName)));

TestUtil.dumpTable(conn.unwrap(PhoenixConnection.class).getQueryServices().getTable(Bytes.toBytes(indexName)));

ts = 1050;
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
conn = DriverManager.getConnection(getUrl(), props);
long count1 = getRowCount(conn, tableName);
long count2 = getRowCount(conn, indexName);
assertTrue("Expected table row count ( " + count1 + ") to match index 
row count (" + count2 + ")", count1 == count2);
conn.close();

/**
 *
 dumping T01;hconnection-0x8f57e4c 
**
a/0:/1040/DeleteFamily/vlen=0/seqid=0
a/0:V/1030/Put/vlen=3/seqid=0
a/0:V/1020/Put/vlen=1/seqid=0
a/0:_0/1030/Put/vlen=1/seqid=0
a/0:_0/1020/Put/vlen=1/seqid=0
---
 dumping T02;hconnection-0x8f57e4c 
**
\x00a/0:_0/1040/Put/vlen=2/seqid=0
a\x00a/0:/1040/DeleteFamily/vlen=0/seqid=0
a\x00a/0:/1030/DeleteFamily/vlen=0/seqid=0
a\x00a/0:_0/1020/Put/vlen=2/seqid=0
---
 */
}

private static long getRowCount(Connection conn, String tableName) throws 
SQLException {
ResultSet rs = conn.createStatement().executeQuery("SELECT /*+ NO_INDEX 
*/ count(*) FROM " + tableName);
assertTrue(rs.next());
return rs.getLong(1);
}
{code}
Assertion that occurs is:
{code}
java.lang.AssertionError: Expected table row count ( 0) to match index row 
count (1)
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.phoenix.end2end.index.OutOfOrderMutationsIT.testOutOfOrderDelete(OutOfOrderMutationsIT.java:135)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 

[jira] [Commented] (PHOENIX-4052) Create the correct index row mutations for out-of-order data mutations

2017-07-31 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107973#comment-16107973
 ] 

James Taylor commented on PHOENIX-4052:
---

The following test, slightly different than the description above, reproduces 
an issue with an extra row with an invalid row key ending up in the index table:
{code}
@Test
public void testOutOfOrderDelete() throws Exception {
String tableName = generateUniqueName();
String indexName = generateUniqueName();
Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
long ts = 1000;
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
Connection conn = DriverManager.getConnection(getUrl(), props); 
conn.createStatement().execute("CREATE TABLE " + tableName + "(k 
CHAR(1) PRIMARY KEY, v VARCHAR) COLUMN_ENCODED_BYTES = 0");
conn.close();

ts = 1010;
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
conn = DriverManager.getConnection(getUrl(), props);
conn.createStatement().execute("CREATE INDEX " + indexName + " ON " + 
tableName + "(v)");
conn.close();

ts = 1020;
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
conn = DriverManager.getConnection(getUrl(), props);
conn.createStatement().execute("UPSERT INTO " + tableName + " 
VALUES('a','a')");
conn.commit();
conn.close();

ts = 1040;
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
conn = DriverManager.getConnection(getUrl(), props);
conn.createStatement().execute("DELETE FROM " + tableName + " WHERE 
k='a'");
conn.commit();
conn.close();

ts = 1030;
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
conn = DriverManager.getConnection(getUrl(), props);
conn.createStatement().execute("UPSERT INTO " + tableName + " 
VALUES('a','bbb')");
conn.commit();
conn.close();


TestUtil.dumpTable(conn.unwrap(PhoenixConnection.class).getQueryServices().getTable(Bytes.toBytes(tableName)));

TestUtil.dumpTable(conn.unwrap(PhoenixConnection.class).getQueryServices().getTable(Bytes.toBytes(indexName)));

ts = 1050;
props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
conn = DriverManager.getConnection(getUrl(), props);
long count1 = getRowCount(conn, tableName);
long count2 = getRowCount(conn, indexName);
assertTrue("Expected table row count ( " + count1 + ") to match index 
row count (" + count2 + ")", count1 == count2);
conn.close();

/**
 *
 dumping T01;hconnection-0x8f57e4c 
**
a/0:/1040/DeleteFamily/vlen=0/seqid=0
a/0:V/1030/Put/vlen=3/seqid=0
a/0:V/1020/Put/vlen=1/seqid=0
a/0:_0/1030/Put/vlen=1/seqid=0
a/0:_0/1020/Put/vlen=1/seqid=0
---
 dumping T02;hconnection-0x8f57e4c 
**
\x00a/0:_0/1040/Put/vlen=2/seqid=0
a\x00a/0:/1040/DeleteFamily/vlen=0/seqid=0
a\x00a/0:/1030/DeleteFamily/vlen=0/seqid=0
a\x00a/0:_0/1020/Put/vlen=2/seqid=0
---
 */
}

private static long getRowCount(Connection conn, String tableName) throws 
SQLException {
ResultSet rs = conn.createStatement().executeQuery("SELECT /*+ NO_INDEX 
*/ count(*) FROM " + tableName);
assertTrue(rs.next());
return rs.getLong(1);
}
{code}
Assertion that occurs is:
{code}
java.lang.AssertionError: Expected table row count ( 0) to match index row 
count (1)
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.phoenix.end2end.index.OutOfOrderMutationsIT.testOutOfOrderDelete(OutOfOrderMutationsIT.java:135)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 

[jira] [Resolved] (PHOENIX-4022) Add PhoenixMetricsLog interface that can be used to log metrics for queries and mutations.

2017-07-31 Thread Thomas D'Silva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva resolved PHOENIX-4022.
-
Resolution: Fixed

Committed v3 patch.

> Add PhoenixMetricsLog interface that can be used to log metrics for queries 
> and mutations. 
> ---
>
> Key: PHOENIX-4022
> URL: https://issues.apache.org/jira/browse/PHOENIX-4022
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Fix For: 4.12.0
>
> Attachments: PHOENIX-4022.patch, PHOENIX-4022-v2.patch, 
> PHOENIX-4022-v3.patch, PHOENIX-4022-v4.patch
>
>
> Create a wrapper for PhoenixConnection, PhoenixStatement, 
> PhoenixPreparedStatement and PhoenixResultSet that automatically calls the 
> PhoenixMetricsLog logging methods so users don't have to instrument this 
> themselves.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing

2017-07-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107848#comment-16107848
 ] 

Hudson commented on PHOENIX-4053:
-

FAILURE: Integrated in Jenkins build Phoenix-master #1723 (See 
[https://builds.apache.org/job/Phoenix-master/1723/])
PHOENIX-4053 Lock row exclusively when necessary for mutable secondary 
(jamestaylor: rev 54d9e1c36c46e7c50c29def08cf866599c7a4e45)
* (add) 
phoenix-core/src/it/java/org/apache/phoenix/end2end/ConcurrentMutationsIT.java
* (edit) phoenix-core/src/test/java/org/apache/phoenix/util/TestUtil.java
* (add) 
phoenix-core/src/main/java/org/apache/phoenix/hbase/index/LockManager.java
* (edit) phoenix-core/src/main/java/org/apache/phoenix/hbase/index/Indexer.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/hbase/index/builder/IndexBuildManager.java


> Lock row exclusively when necessary for mutable secondary indexing
> --
>
> Key: PHOENIX-4053
> URL: https://issues.apache.org/jira/browse/PHOENIX-4053
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Fix For: 4.12.0, 4.11.1
>
> Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, 
> PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, 
> PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, 
> PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, 
> PHOENIX-4053_v6.patch, PHOENIX-4053_v7.patch, PHOENIX-4053_wip.patch
>
>
> From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate 
> call is made (see HBASE-18474). The mutable secondary index (global and 
> local) depend on this to get a consistent snapshot of a row between the point 
> when the current row value is looked up, and when the new row is written, 
> until the mvcc is advanced. Otherwise, a subsequent update to a row may not 
> see the current row state. Even with pre HBase 1.2 releases, the lock isn't 
> held long enough for us. We need to hold the locks from the start of the 
> preBatchMutate (when we read the data table to get the prior row values) 
> until the mvcc is advanced (beginning of postBatchMutateIndispensably).
> Given the above, it's best if Phoenix manages the row locking itself 
> (mimicing the current HBase mechanism).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT

2017-07-31 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107832#comment-16107832
 ] 

Julian Hyde commented on PHOENIX-418:
-

In CALCITE-1588 [~gian] points out that Oracle, BigQuery and MemSQL support 
{{APPROX_COUNT_DISTINCT}}. I also see it in VoltDB.

A quick survey of other databases:
* Vertica has {{APPROXIMATE_COUNT_DISTINCT}}
* Redshift has {{[ APPROXIMATE ] COUNT ( [ DISTINCT | ALL ] * | expression )}}.
* In PostgreSQL you can bolt on your own hyperloglog function, but there 
doesn't seem to have a unified approach.
* I don't see anything in DB2 or MySQL

I think that is a sufficient de facto standard to support 
{{APPROX_COUNT_DISTINCT}} in Calcite.

Also I think Calcite should support an APPROXIMATE clause. Allowed both as a 
clause in the SELECT statement, and also after the aggregate function (but 
before the OVER clause, if present). Algorithm and any parameters go inside 
parentheses. Examples:

{code}
select count(distinct name)
from person
APPROXIMATE ()

select count(distinct name)
from person
APPROXIMATE (ALGORITHM 'hll')

select count(distinct name)
from person
APPROXIMATE (ALGORITHM 'ABC' WITHIN 10 PERCENT)

select count(distinct name) APPROXIMATE ()
from person

select count(distinct name) APPROXIMATE (ALGORITHM 'hll')
from person

select count(distinct name)
  APPROXIMATE (ALGORITHM 'ABC' WITHIN 10 PERCENT)
from person 
{code}

For now, I think Phoenix should support APPROX_COUNT_DISTINCT. We could add 
support for the APPROXIMATE clause later.

> Support approximate COUNT DISTINCT
> --
>
> Key: PHOENIX-418
> URL: https://issues.apache.org/jira/browse/PHOENIX-418
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: Ethan Wang
>  Labels: gsoc2016
>
> Support an "approximation" of count distinct to prevent having to hold on to 
> all distinct values (since this will not scale well when the number of 
> distinct values is huge). The Apache Drill folks have had some interesting 
> discussions on this 
> [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E).
>  They recommend using  [Welford's 
> method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm).
>  I'm open to having a config option that uses exact versus approximate. I 
> don't have experience implementing an approximate implementation, so I'm not 
> sure how much state is required to keep on the server and return to the 
> client (other than realizing it'd be much less that returning all distinct 
> values and their counts).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing

2017-07-31 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4053:
--
Attachment: PHOENIX-4053_v7.patch

Attaching final patch. Thanks for the reviews, [~apurtell] & [~samarthjain].

> Lock row exclusively when necessary for mutable secondary indexing
> --
>
> Key: PHOENIX-4053
> URL: https://issues.apache.org/jira/browse/PHOENIX-4053
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, 
> PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, 
> PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, 
> PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, 
> PHOENIX-4053_v6.patch, PHOENIX-4053_v7.patch, PHOENIX-4053_wip.patch
>
>
> From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate 
> call is made (see HBASE-18474). The mutable secondary index (global and 
> local) depend on this to get a consistent snapshot of a row between the point 
> when the current row value is looked up, and when the new row is written, 
> until the mvcc is advanced. Otherwise, a subsequent update to a row may not 
> see the current row state. Even with pre HBase 1.2 releases, the lock isn't 
> held long enough for us. We need to hold the locks from the start of the 
> preBatchMutate (when we read the data table to get the prior row values) 
> until the mvcc is advanced (beginning of postBatchMutateIndispensably).
> Given the above, it's best if Phoenix manages the row locking itself 
> (mimicing the current HBase mechanism).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PHOENIX-4055) Consider IndexFailurePolicy before throwing when lock cannot be gotten

2017-07-31 Thread James Taylor (JIRA)
James Taylor created PHOENIX-4055:
-

 Summary: Consider IndexFailurePolicy before throwing when lock 
cannot be gotten
 Key: PHOENIX-4055
 URL: https://issues.apache.org/jira/browse/PHOENIX-4055
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor


We shouldn't necessarily be throwing in Indexer.preBatchMutateWithExceptions 
when we cannot acquire a lock, since we're essentially failing the data write 
because we can't do the locking necessary for performing consistent index 
maintenance. We'd ideally want to go through the IndexFailurePolicy to 
determine whether or not we swallow that exception. We currently cannot ignore 
this lock failure as we lack the ability to keep state between batch mutation 
coprocessor calls (HBASE-18127).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing

2017-07-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107676#comment-16107676
 ] 

Samarth Jain commented on PHOENIX-4053:
---

Thanks for the explanation, James.

+1 to the patch after the comment change. Looks great!

> Lock row exclusively when necessary for mutable secondary indexing
> --
>
> Key: PHOENIX-4053
> URL: https://issues.apache.org/jira/browse/PHOENIX-4053
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, 
> PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, 
> PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, 
> PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, 
> PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch
>
>
> From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate 
> call is made (see HBASE-18474). The mutable secondary index (global and 
> local) depend on this to get a consistent snapshot of a row between the point 
> when the current row value is looked up, and when the new row is written, 
> until the mvcc is advanced. Otherwise, a subsequent update to a row may not 
> see the current row state. Even with pre HBase 1.2 releases, the lock isn't 
> held long enough for us. We need to hold the locks from the start of the 
> preBatchMutate (when we read the data table to get the prior row values) 
> until the mvcc is advanced (beginning of postBatchMutateIndispensably).
> Given the above, it's best if Phoenix manages the row locking itself 
> (mimicing the current HBase mechanism).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing

2017-07-31 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107656#comment-16107656
 ] 

Andrew Purtell commented on PHOENIX-4053:
-

Looked at the v6 patch. I like that you're passing 'rowLockWaitDuration' in 
lock acquisition. Will make updating for HBASE-17210 when running on hbase 1.4+ 
easy.
+1

> Lock row exclusively when necessary for mutable secondary indexing
> --
>
> Key: PHOENIX-4053
> URL: https://issues.apache.org/jira/browse/PHOENIX-4053
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, 
> PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, 
> PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, 
> PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, 
> PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch
>
>
> From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate 
> call is made (see HBASE-18474). The mutable secondary index (global and 
> local) depend on this to get a consistent snapshot of a row between the point 
> when the current row value is looked up, and when the new row is written, 
> until the mvcc is advanced. Otherwise, a subsequent update to a row may not 
> see the current row state. Even with pre HBase 1.2 releases, the lock isn't 
> held long enough for us. We need to hold the locks from the start of the 
> preBatchMutate (when we read the data table to get the prior row values) 
> until the mvcc is advanced (beginning of postBatchMutateIndispensably).
> Given the above, it's best if Phoenix manages the row locking itself 
> (mimicing the current HBase mechanism).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing

2017-07-31 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107614#comment-16107614
 ] 

James Taylor commented on PHOENIX-4053:
---

The lockManager.lockRow() call throws an unchecked exception if it times out or 
is interrupted. We need to "remember" that failure here so we don't attempt to 
unlock the row in the postBatchMutateIndispensibly call (which gets called in a 
finally block). Will add that to the comment to make it more clear.

> Lock row exclusively when necessary for mutable secondary indexing
> --
>
> Key: PHOENIX-4053
> URL: https://issues.apache.org/jira/browse/PHOENIX-4053
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, 
> PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, 
> PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, 
> PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, 
> PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch
>
>
> From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate 
> call is made (see HBASE-18474). The mutable secondary index (global and 
> local) depend on this to get a consistent snapshot of a row between the point 
> when the current row value is looked up, and when the new row is written, 
> until the mvcc is advanced. Otherwise, a subsequent update to a row may not 
> see the current row state. Even with pre HBase 1.2 releases, the lock isn't 
> held long enough for us. We need to hold the locks from the start of the 
> preBatchMutate (when we read the data table to get the prior row values) 
> until the mvcc is advanced (beginning of postBatchMutateIndispensably).
> Given the above, it's best if Phoenix manages the row locking itself 
> (mimicing the current HBase mechanism).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing

2017-07-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107602#comment-16107602
 ] 

Samarth Jain commented on PHOENIX-4053:
---

[~jamestaylor], 

Regarding your comment here:
{code}
+  if (!success) {
+  // We're throwing here, so we won't be locking any more 
rows. By setting the
+  // status to FAILURE, we prevent the attempt to unlock 
rows we've never
+  // locked when postBatchMutateIndispensably is executed. 
We're very
+  // limited about the state that can be shared between 
the batch mutate
+  // coprocessor calls (see HBASE-18482).
+  // Note that we shouldn't necessarily be throwing here, 
since we're
+  // essentially failing the data write because we can't 
do the locking
+  // necessary for performing consistent index 
maintenance. We'd ideally
+  // want to go through the index failure policy to 
determine what action
+  // to perform. We currently cannot ignore this lock 
failure
+  for (int j = i; j < miniBatchOp.size(); j++) {
+  miniBatchOp.setOperationStatus(j,FAILURE);
+  }
+  }
{code}
I don't see a throw statement here. My guess is there is some code in HBase 
which is going through the operation status array and taking appropriate 
action? I think it would make sense for the comment to state that. FWIW, I see 
callers of region.batchMutate within our Phoenix co-processors 
(UngroupedAggregateRegionObserver#commitBatch() and 
UngroupedAggregateRegionObserver#rebuildIndices) that don't seem to be looking 
at the return status. Maybe they should? 



> Lock row exclusively when necessary for mutable secondary indexing
> --
>
> Key: PHOENIX-4053
> URL: https://issues.apache.org/jira/browse/PHOENIX-4053
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, 
> PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, 
> PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, 
> PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, 
> PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch
>
>
> From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate 
> call is made (see HBASE-18474). The mutable secondary index (global and 
> local) depend on this to get a consistent snapshot of a row between the point 
> when the current row value is looked up, and when the new row is written, 
> until the mvcc is advanced. Otherwise, a subsequent update to a row may not 
> see the current row state. Even with pre HBase 1.2 releases, the lock isn't 
> held long enough for us. We need to hold the locks from the start of the 
> preBatchMutate (when we read the data table to get the prior row values) 
> until the mvcc is advanced (beginning of postBatchMutateIndispensably).
> Given the above, it's best if Phoenix manages the row locking itself 
> (mimicing the current HBase mechanism).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing

2017-07-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16107564#comment-16107564
 ] 

Hadoop QA commented on PHOENIX-4053:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12879662/PHOENIX-4053-4.x-HBase-0.98_v6.patch
  against 4.x-HBase-0.98 branch at commit 
9c458fa3d3ecdeb17de5b717c26cfdea1608c358.
  ATTACHMENT ID: 12879662

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/1240//console

This message is automatically generated.

> Lock row exclusively when necessary for mutable secondary indexing
> --
>
> Key: PHOENIX-4053
> URL: https://issues.apache.org/jira/browse/PHOENIX-4053
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, 
> PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, 
> PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, 
> PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, 
> PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch
>
>
> From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate 
> call is made (see HBASE-18474). The mutable secondary index (global and 
> local) depend on this to get a consistent snapshot of a row between the point 
> when the current row value is looked up, and when the new row is written, 
> until the mvcc is advanced. Otherwise, a subsequent update to a row may not 
> see the current row state. Even with pre HBase 1.2 releases, the lock isn't 
> held long enough for us. We need to hold the locks from the start of the 
> preBatchMutate (when we read the data table to get the prior row values) 
> until the mvcc is advanced (beginning of postBatchMutateIndispensably).
> Given the above, it's best if Phoenix manages the row locking itself 
> (mimicing the current HBase mechanism).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing

2017-07-31 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4053:
--
Attachment: PHOENIX-4053-4.x-HBase-0.98_v6.patch

Attaching 0.98 version of v6 patch.

> Lock row exclusively when necessary for mutable secondary indexing
> --
>
> Key: PHOENIX-4053
> URL: https://issues.apache.org/jira/browse/PHOENIX-4053
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, 
> PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, 
> PHOENIX-4053-4.x-HBase-0.98_v6.patch, PHOENIX-4053_v2.patch, 
> PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, PHOENIX-4053_v5.patch, 
> PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch
>
>
> From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate 
> call is made (see HBASE-18474). The mutable secondary index (global and 
> local) depend on this to get a consistent snapshot of a row between the point 
> when the current row value is looked up, and when the new row is written, 
> until the mvcc is advanced. Otherwise, a subsequent update to a row may not 
> see the current row state. Even with pre HBase 1.2 releases, the lock isn't 
> held long enough for us. We need to hold the locks from the start of the 
> preBatchMutate (when we read the data table to get the prior row values) 
> until the mvcc is advanced (beginning of postBatchMutateIndispensably).
> Given the above, it's best if Phoenix manages the row locking itself 
> (mimicing the current HBase mechanism).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4053) Lock row exclusively when necessary for mutable secondary indexing

2017-07-31 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4053:
--
Attachment: PHOENIX-4053_v6.patch

Attaching slightly tweaked patch that passes in row lock wait duration so that 
LockManager can be agnostic about RPC duration (HBASE-17210). Would like to get 
this committed soon so we can get a perf run on it - can you give this a quick 
look, [~samarthjain] or [~apurtell]?

> Lock row exclusively when necessary for mutable secondary indexing
> --
>
> Key: PHOENIX-4053
> URL: https://issues.apache.org/jira/browse/PHOENIX-4053
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
> Attachments: PHOENIX-4053_4.x-HBase-0.98_v2.patch, 
> PHOENIX-4053_4.x-HBase-0.98_v3.patch, PHOENIX-4053-4.x-HBase-0.98_v4.patch, 
> PHOENIX-4053_v2.patch, PHOENIX-4053_v3.patch, PHOENIX-4053_v4.patch, 
> PHOENIX-4053_v5.patch, PHOENIX-4053_v6.patch, PHOENIX-4053_wip.patch
>
>
> From HBase 1.2 on, rows are not exclusively locked when the preBatchMutate 
> call is made (see HBASE-18474). The mutable secondary index (global and 
> local) depend on this to get a consistent snapshot of a row between the point 
> when the current row value is looked up, and when the new row is written, 
> until the mvcc is advanced. Otherwise, a subsequent update to a row may not 
> see the current row state. Even with pre HBase 1.2 releases, the lock isn't 
> held long enough for us. We need to hold the locks from the start of the 
> preBatchMutate (when we read the data table to get the prior row values) 
> until the mvcc is advanced (beginning of postBatchMutateIndispensably).
> Given the above, it's best if Phoenix manages the row locking itself 
> (mimicing the current HBase mechanism).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT

2017-07-31 Thread Ethan Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106864#comment-16106864
 ] 

Ethan Wang commented on PHOENIX-418:


Regarding the syntax of approximate discount.  Carry on from the discussion 
from PHOENIX-3390, purposing the syntax to be

Original cardinality count function:
select count(distinct name) from person

With approximate:
select count(distinct name) from person APPROXIMATE 
select count(distinct name) from person APPROXIMATE 'hll' 
select count(distinct name) from person APPROXIMATE 'algorithm ABC' (WITHIN 10 
PERCENT)


> Support approximate COUNT DISTINCT
> --
>
> Key: PHOENIX-418
> URL: https://issues.apache.org/jira/browse/PHOENIX-418
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: Ethan Wang
>  Labels: gsoc2016
>
> Support an "approximation" of count distinct to prevent having to hold on to 
> all distinct values (since this will not scale well when the number of 
> distinct values is huge). The Apache Drill folks have had some interesting 
> discussions on this 
> [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E).
>  They recommend using  [Welford's 
> method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm).
>  I'm open to having a config option that uses exact versus approximate. I 
> don't have experience implementing an approximate implementation, so I'm not 
> sure how much state is required to keep on the server and return to the 
> client (other than realizing it'd be much less that returning all distinct 
> values and their counts).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-3390) Custom UDAF for HyperLogLogPlus

2017-07-31 Thread Ethan Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106844#comment-16106844
 ] 

Ethan Wang commented on PHOENIX-3390:
-

Marking this ticket duplicate with PHOENIX-418.

For "Supporting approximate COUNT DISTINCT", let us continue the implementation 
updates(and discussions) there at PHOENIX-418.

For another feature which is to expose the raw hyperLogLog hash binary directly 
to the user, that will continue on this ticket.

> Custom UDAF for HyperLogLogPlus
> ---
>
> Key: PHOENIX-3390
> URL: https://issues.apache.org/jira/browse/PHOENIX-3390
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: Swapna Kasula
>Assignee: Ethan Wang
>Priority: Minor
>
> With ref # PHOENIX-2069
> Custome UDAF to aggregate/union of Hyperloglog's of a column and returns a 
> Hyperloglog.
> select hllUnion(col1) from table;  //returns a Hyperloglog, which is the 
> union of all hyperloglog's from all rows for column 'col1'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)