[jira] [Updated] (PHOENIX-3655) Metrics for PQS

2017-02-16 Thread Rahul Shrivastava (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Shrivastava updated PHOENIX-3655:
---
Attachment: MetricsforPhoenixQueryServerPQS.pdf

I have attached a document with my approach for Metrics for PQS.
Please provide feedback. 
[~samarthjain]
[~apurtell]
[~akshita.malhotra]
[~jamestaylor]

> Metrics for PQS
> ---
>
> Key: PHOENIX-3655
> URL: https://issues.apache.org/jira/browse/PHOENIX-3655
> Project: Phoenix
>  Issue Type: New Feature
>Affects Versions: 4.8.0
> Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98
>Reporter: Rahul Shrivastava
> Fix For: 4.9.0
>
> Attachments: MetricsforPhoenixQueryServerPQS.pdf
>
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Phoenix Query Server runs a separate process compared to its thin client. 
> Metrics collection is currently done by PhoenixRuntime.java i.e. at Phoenix 
> driver level. We need the following
> 1. For every jdbc statement/prepared statement/ run by PQS , we need 
> capability to collect metrics at PQS level and push the data to external sink 
> i.e. file, JMX , other external custom sources. 
> 2. Besides this global metrics could be periodically collected and pushed to 
> the sink. 
> 2. PQS can be configured to turn on metrics collection and type of collect ( 
> runtime or global) via hbase-site.xml
> 3. Sink could be configured via an interface in hbase-site.xml. 
> All metrics definition https://phoenix.apache.org/metrics.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3585) MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail for transactional tables and local indexes

2017-02-16 Thread Poorna Chandra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871192#comment-15871192
 ] 

Poorna Chandra commented on PHOENIX-3585:
-

Yes - if there is no way to combine the scanner returned by 
TransactionProcessor with the scanner for IndexHalfStoreFileReaderGenerator, 
then we'll have to do so.

You'll need to disable TransactionProcessor's {{preCompactScannerOpen}} and 
{{postCompact}} hooks during splits and merge. We don't record prune upper 
bound during flushes, so overriding {{preFlushScannerOpen}} is not necessary.


> MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail 
> for transactional tables and local indexes
> 
>
> Key: PHOENIX-3585
> URL: https://issues.apache.org/jira/browse/PHOENIX-3585
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Attachments: diff.patch
>
>
> the tests fail if we use HDFSTransactionStateStorage instead of  
> InMemoryTransactionStateStorage when we create the TransactionManager in 
> BaseTest



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3681) Store local indexes in a column family per index

2017-02-16 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871013#comment-15871013
 ] 

Sergey Soldatov commented on PHOENIX-3681:
--

Sounds like a great idea. Definitely maintenance is quite painful at the moment 
(dropping one of the local indexes on large table takes ages). 

> Store local indexes in a column family per index
> 
>
> Key: PHOENIX-3681
> URL: https://issues.apache.org/jira/browse/PHOENIX-3681
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> Currently all local indexes are stored in a single column family. That makes 
> maintenance (such as dropping an index) more expensive than necessary.
> Let's have each local index in its own column family (or be able to declare 
> which column family an index should go into).
> As [~jamestaylor] points out, this won't work for indexes on views as there 
> might be 1000's of them.
> Another issue are covered local indexes, but I'd argue that local indexes 
> would benefit little from being covered. (that also needs to be 
> experimentally verified)
> Local indexes in individual column families would be great to isolate any 
> maintenance and even usage from each other.
> [~rajeshbabu], [~mujtabachohan]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3684) ConnectionQueryServices connection leak on principal with "_HOST"

2017-02-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870817#comment-15870817
 ] 

Andrew Purtell commented on PHOENIX-3684:
-

lgtm [~elserj]

> ConnectionQueryServices connection leak on principal with "_HOST"
> -
>
> Key: PHOENIX-3684
> URL: https://issues.apache.org/jira/browse/PHOENIX-3684
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Arpit Gupta
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 4.9.1, 4.10.0
>
> Attachments: PHOENIX-3684.001.patch
>
>
> Through some internal testing, we found that Ambari's use of Phoenix to host 
> metrics data was leaking ConnectionQueryServices (and thus HConnections and 
> ZK connections), ultimately running into ZK's rate limiting maxClientCnxns.
> After a bit of digging around (and revisiting the old issues around this 
> topic PHOENIX-3607, PHOENIX-3611, etc), I finally realized that the logic in 
> ConnectionInfo was simply not correctly handling the {{_HOST}} special string 
> in the principal (that UGI will replace with the FQDN for the current host).
> This resulted in Phoenix repeatedly re-logging in the user when they created 
> a new Connection instead of using the UGI current user, leaking another set 
> of connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3667) Optimize BooleanExpressionFilter for tables with encoded columns

2017-02-16 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870775#comment-15870775
 ] 

Samarth Jain commented on PHOENIX-3667:
---

This sounds like a good optimization to do. There are a few more tweaks we can 
do here:

- for encoded mutable tables, because column qualifiers are unique across all 
column families, we don’t need to compare column families.
- for encoded mutable tables, stop after we have encountered a column qualifier 
that is >= max qualifier we expect.
- We should count the number of unique column qualifiers in our where clause 
expression using the KeyValueColumnExpressionVisitor. This will be used for us 
to determine the size of the array we will be using for storing the 
foundColumns. For non-encoded tables this count can be used to properly size 
the map of foundColumns.


> Optimize BooleanExpressionFilter for tables with encoded columns
> 
>
> Key: PHOENIX-3667
> URL: https://issues.apache.org/jira/browse/PHOENIX-3667
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: James Taylor
>Assignee: Samarth Jain
>
> The client side of Phoenix determines the subclass of BooleanExpressionFilter 
> we use based on how many column families and column qualifiers are being 
> referenced. The idea is to minimize the lookup cost during filter evaluation. 
> For encoded columns, instead of using a Map or Set, we can create a few new 
> subclasses of BooleanExpressionFilter that use an array instead. No need for 
> any lookups or equality checks - just fill in the position based on the 
> column qualifier value instead. Since filters are applied on every row 
> between the start/stop key, this will improve performance quite a bit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3683) Backward compatibility fails for joins

2017-02-16 Thread Mujtaba Chohan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870766#comment-15870766
 ] 

Mujtaba Chohan commented on PHOENIX-3683:
-

Verified.

> Backward compatibility fails for joins
> --
>
> Key: PHOENIX-3683
> URL: https://issues.apache.org/jira/browse/PHOENIX-3683
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3683.patch
>
>
> Query with joins returns null when client is v4.8.0 and server is 4.x head 
> with https://github.com/apache/phoenix/pull/232  and 
> https://issues.apache.org/jira/browse/PHOENIX-3678 patch applied.
> {noformat}
> CREATE TABLE Employee (
> Region VARCHAR NOT NULL,
> LocalID VARCHAR NOT NULL,
> Name VARCHAR,
> CONSTRAINT pk PRIMARY KEY (Region, LocalID));
> CREATE TABLE Patent (
> PatentID VARCHAR NOT NULL,
> Region VARCHAR,
> LocalID VARCHAR,
> Title VARCHAR,
> Category VARCHAR,
> CONSTRAINT pk PRIMARY KEY (PatentID));
> upsert into employee values ('region1','local1','foo');
> upsert into patent values ('patent1', 'region1','local1','title1','cat1');
> SELECT E.Name, E.Region, P.PCount
> FROM Employee AS E
> JOIN
> (SELECT Region, LocalID, count(*) AS PCount
>  FROM Patent
>  GROUP BY Region, LocalID) AS P
> ON E.Region = P.Region AND E.LocalID = P.LocalID;
> {noformat}
> Resultset returns
> {noformat}
> +-+---+---+
> | E.NAME  | E.REGION  | P.PCOUNT  |
> +-+---+---+
> | | region1   | null  |
> +-+---+---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3684) ConnectionQueryServices connection leak on principal with "_HOST"

2017-02-16 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated PHOENIX-3684:

Attachment: PHOENIX-3684.001.patch

.001 When a principal (and keytab) is specified in the JDBC url and there is a 
current user (as specified by {{UserGroupInformation.getCurrentUser()}}), 
compare the two names taking the special replacement string {{_HOST}} into 
consideration on the user-provided name.

> ConnectionQueryServices connection leak on principal with "_HOST"
> -
>
> Key: PHOENIX-3684
> URL: https://issues.apache.org/jira/browse/PHOENIX-3684
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Arpit Gupta
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 4.9.1, 4.10.0
>
> Attachments: PHOENIX-3684.001.patch
>
>
> Through some internal testing, we found that Ambari's use of Phoenix to host 
> metrics data was leaking ConnectionQueryServices (and thus HConnections and 
> ZK connections), ultimately running into ZK's rate limiting maxClientCnxns.
> After a bit of digging around (and revisiting the old issues around this 
> topic PHOENIX-3607, PHOENIX-3611, etc), I finally realized that the logic in 
> ConnectionInfo was simply not correctly handling the {{_HOST}} special string 
> in the principal (that UGI will replace with the FQDN for the current host).
> This resulted in Phoenix repeatedly re-logging in the user when they created 
> a new Connection instead of using the UGI current user, leaking another set 
> of connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (PHOENIX-3684) ConnectionQueryServices connection leak on principal with "_HOST"

2017-02-16 Thread Josh Elser (JIRA)
Josh Elser created PHOENIX-3684:
---

 Summary: ConnectionQueryServices connection leak on principal with 
"_HOST"
 Key: PHOENIX-3684
 URL: https://issues.apache.org/jira/browse/PHOENIX-3684
 Project: Phoenix
  Issue Type: Bug
Reporter: Arpit Gupta
Assignee: Josh Elser
Priority: Blocker
 Fix For: 4.9.1, 4.10.0


Through some internal testing, we found that Ambari's use of Phoenix to host 
metrics data was leaking ConnectionQueryServices (and thus HConnections and ZK 
connections), ultimately running into ZK's rate limiting maxClientCnxns.

After a bit of digging around (and revisiting the old issues around this topic 
PHOENIX-3607, PHOENIX-3611, etc), I finally realized that the logic in 
ConnectionInfo was simply not correctly handling the {{_HOST}} special string 
in the principal (that UGI will replace with the FQDN for the current host).

This resulted in Phoenix repeatedly re-logging in the user when they created a 
new Connection instead of using the UGI current user, leaking another set of 
connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3684) ConnectionQueryServices connection leak on principal with "_HOST"

2017-02-16 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870721#comment-15870721
 ] 

Josh Elser commented on PHOENIX-3684:
-

FYI, [~gjacoby]. Probably of interest given your digging on the related issues 
;)

Will get a patch up here in a second, if you have a moment to take a look at 
it./

> ConnectionQueryServices connection leak on principal with "_HOST"
> -
>
> Key: PHOENIX-3684
> URL: https://issues.apache.org/jira/browse/PHOENIX-3684
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Arpit Gupta
>Assignee: Josh Elser
>Priority: Blocker
> Fix For: 4.9.1, 4.10.0
>
>
> Through some internal testing, we found that Ambari's use of Phoenix to host 
> metrics data was leaking ConnectionQueryServices (and thus HConnections and 
> ZK connections), ultimately running into ZK's rate limiting maxClientCnxns.
> After a bit of digging around (and revisiting the old issues around this 
> topic PHOENIX-3607, PHOENIX-3611, etc), I finally realized that the logic in 
> ConnectionInfo was simply not correctly handling the {{_HOST}} special string 
> in the principal (that UGI will replace with the FQDN for the current host).
> This resulted in Phoenix repeatedly re-logging in the user when they created 
> a new Connection instead of using the UGI current user, leaking another set 
> of connections.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3683) Backward compatibility fails for joins

2017-02-16 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870662#comment-15870662
 ] 

James Taylor commented on PHOENIX-3683:
---

+1 to patch (pending verification)

> Backward compatibility fails for joins
> --
>
> Key: PHOENIX-3683
> URL: https://issues.apache.org/jira/browse/PHOENIX-3683
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3683.patch
>
>
> Query with joins returns null when client is v4.8.0 and server is 4.x head 
> with https://github.com/apache/phoenix/pull/232  and 
> https://issues.apache.org/jira/browse/PHOENIX-3678 patch applied.
> {noformat}
> CREATE TABLE Employee (
> Region VARCHAR NOT NULL,
> LocalID VARCHAR NOT NULL,
> Name VARCHAR,
> CONSTRAINT pk PRIMARY KEY (Region, LocalID));
> CREATE TABLE Patent (
> PatentID VARCHAR NOT NULL,
> Region VARCHAR,
> LocalID VARCHAR,
> Title VARCHAR,
> Category VARCHAR,
> CONSTRAINT pk PRIMARY KEY (PatentID));
> upsert into employee values ('region1','local1','foo');
> upsert into patent values ('patent1', 'region1','local1','title1','cat1');
> SELECT E.Name, E.Region, P.PCount
> FROM Employee AS E
> JOIN
> (SELECT Region, LocalID, count(*) AS PCount
>  FROM Patent
>  GROUP BY Region, LocalID) AS P
> ON E.Region = P.Region AND E.LocalID = P.LocalID;
> {noformat}
> Resultset returns
> {noformat}
> +-+---+---+
> | E.NAME  | E.REGION  | P.PCOUNT  |
> +-+---+---+
> | | region1   | null  |
> +-+---+---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3683) Backward compatibility fails for joins

2017-02-16 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-3683:
--
Attachment: PHOENIX-3683.patch

Thanks for finding this and providing the exact repro steps, [~mujtabachohan]. 
Attached patch should fix the issue.

> Backward compatibility fails for joins
> --
>
> Key: PHOENIX-3683
> URL: https://issues.apache.org/jira/browse/PHOENIX-3683
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3683.patch
>
>
> Query with joins returns null when client is v4.8.0 and server is 4.x head 
> with https://github.com/apache/phoenix/pull/232  and 
> https://issues.apache.org/jira/browse/PHOENIX-3678 patch applied.
> {noformat}
> CREATE TABLE Employee (
> Region VARCHAR NOT NULL,
> LocalID VARCHAR NOT NULL,
> Name VARCHAR,
> CONSTRAINT pk PRIMARY KEY (Region, LocalID));
> CREATE TABLE Patent (
> PatentID VARCHAR NOT NULL,
> Region VARCHAR,
> LocalID VARCHAR,
> Title VARCHAR,
> Category VARCHAR,
> CONSTRAINT pk PRIMARY KEY (PatentID));
> upsert into employee values ('region1','local1','foo');
> upsert into patent values ('patent1', 'region1','local1','title1','cat1');
> SELECT E.Name, E.Region, P.PCount
> FROM Employee AS E
> JOIN
> (SELECT Region, LocalID, count(*) AS PCount
>  FROM Patent
>  GROUP BY Region, LocalID) AS P
> ON E.Region = P.Region AND E.LocalID = P.LocalID;
> {noformat}
> Resultset returns
> {noformat}
> +-+---+---+
> | E.NAME  | E.REGION  | P.PCOUNT  |
> +-+---+---+
> | | region1   | null  |
> +-+---+---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (PHOENIX-3683) Backward compatibility fails for joins

2017-02-16 Thread Mujtaba Chohan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mujtaba Chohan reassigned PHOENIX-3683:
---

Assignee: Samarth Jain

> Backward compatibility fails for joins
> --
>
> Key: PHOENIX-3683
> URL: https://issues.apache.org/jira/browse/PHOENIX-3683
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
> Fix For: 4.10.0
>
>
> Query with joins returns null when client is v4.8.0 and server is 4.x head 
> with https://github.com/apache/phoenix/pull/232  and 
> https://issues.apache.org/jira/browse/PHOENIX-3678 patch applied.
> {noformat}
> CREATE TABLE Employee (
> Region VARCHAR NOT NULL,
> LocalID VARCHAR NOT NULL,
> Name VARCHAR,
> CONSTRAINT pk PRIMARY KEY (Region, LocalID));
> CREATE TABLE Patent (
> PatentID VARCHAR NOT NULL,
> Region VARCHAR,
> LocalID VARCHAR,
> Title VARCHAR,
> Category VARCHAR,
> CONSTRAINT pk PRIMARY KEY (PatentID));
> upsert into employee values ('region1','local1','foo');
> upsert into patent values ('patent1', 'region1','local1','title1','cat1');
> SELECT E.Name, E.Region, P.PCount
> FROM Employee AS E
> JOIN
> (SELECT Region, LocalID, count(*) AS PCount
>  FROM Patent
>  GROUP BY Region, LocalID) AS P
> ON E.Region = P.Region AND E.LocalID = P.LocalID;
> {noformat}
> Resultset returns
> {noformat}
> +-+---+---+
> | E.NAME  | E.REGION  | P.PCOUNT  |
> +-+---+---+
> | | region1   | null  |
> +-+---+---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (PHOENIX-3683) Backward compatibility fails for joins

2017-02-16 Thread Mujtaba Chohan (JIRA)
Mujtaba Chohan created PHOENIX-3683:
---

 Summary: Backward compatibility fails for joins
 Key: PHOENIX-3683
 URL: https://issues.apache.org/jira/browse/PHOENIX-3683
 Project: Phoenix
  Issue Type: Sub-task
Reporter: Mujtaba Chohan


Query with joins returns null when client is v4.8.0 and server is 4.x head with 
https://github.com/apache/phoenix/pull/232  and 
https://issues.apache.org/jira/browse/PHOENIX-3678 patch applied.

{noformat}
CREATE TABLE Employee (
Region VARCHAR NOT NULL,
LocalID VARCHAR NOT NULL,
Name VARCHAR,
CONSTRAINT pk PRIMARY KEY (Region, LocalID));

CREATE TABLE Patent (
PatentID VARCHAR NOT NULL,
Region VARCHAR,
LocalID VARCHAR,
Title VARCHAR,
Category VARCHAR,
CONSTRAINT pk PRIMARY KEY (PatentID));

upsert into employee values ('region1','local1','foo');
upsert into patent values ('patent1', 'region1','local1','title1','cat1');

SELECT E.Name, E.Region, P.PCount
FROM Employee AS E
JOIN
(SELECT Region, LocalID, count(*) AS PCount
 FROM Patent
 GROUP BY Region, LocalID) AS P
ON E.Region = P.Region AND E.LocalID = P.LocalID;
{noformat}

Resultset returns
{noformat}
+-+---+---+
| E.NAME  | E.REGION  | P.PCOUNT  |
+-+---+---+
| | region1   | null  |
+-+---+---+
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (PHOENIX-3472) Mutable index adds DeleteFamily markers which impacts query performance

2017-02-16 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor resolved PHOENIX-3472.
---
Resolution: Invalid

Issuing delete markers is necessary to maintain the secondary index. If this is 
a problem, then perhaps something can be done at the HBase level to improve 
performance.

> Mutable index adds DeleteFamily markers which impacts query performance
> ---
>
> Key: PHOENIX-3472
> URL: https://issues.apache.org/jira/browse/PHOENIX-3472
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.9.0
>Reporter: Mujtaba Chohan
>
> {noformat}
> create table M (k varchar not null primary key, v archer);
> create index MI on M(v);
> upsert into M values ('a','b');
> {noformat}
> Raw scan shows the following two kv pair with the first one being a delete 
> marker
> {noformat}
> ROW COLUMN+CELL   
> 
>  \x00acolumn=0:, timestamp=1478801841429, type=DeleteFamily   
>   
>  b\x00a   column=0:_0, timestamp=1478801841429, value=_0  
> {noformat}
> This severely impacts read query performance as data size grows, the number 
> of DeleteFamily markers also grows linearly in the table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (PHOENIX-3678) Backward compatibility fails for immutable tables after column encoding patch

2017-02-16 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain resolved PHOENIX-3678.
---
Resolution: Fixed

> Backward compatibility fails for immutable tables after column encoding patch
> -
>
> Key: PHOENIX-3678
> URL: https://issues.apache.org/jira/browse/PHOENIX-3678
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3678.patch
>
>
> Configuration: Client 4.8 - Server 4.x head with 
> https://github.com/apache/phoenix/pull/232 patch applied.
> Steps:
> Table created with 4.8 client/server. 
> Queries return null for all non-row key columns for immutable table when only 
> server side is upgraded.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3678) Backward compatibility fails for immutable tables after column encoding patch

2017-02-16 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870503#comment-15870503
 ] 

James Taylor commented on PHOENIX-3678:
---

+1

> Backward compatibility fails for immutable tables after column encoding patch
> -
>
> Key: PHOENIX-3678
> URL: https://issues.apache.org/jira/browse/PHOENIX-3678
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3678.patch
>
>
> Configuration: Client 4.8 - Server 4.x head with 
> https://github.com/apache/phoenix/pull/232 patch applied.
> Steps:
> Table created with 4.8 client/server. 
> Queries return null for all non-row key columns for immutable table when only 
> server side is upgraded.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3678) Backward compatibility fails for immutable tables after column encoding patch

2017-02-16 Thread Mujtaba Chohan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870454#comment-15870454
 ] 

Mujtaba Chohan commented on PHOENIX-3678:
-

Verified.

> Backward compatibility fails for immutable tables after column encoding patch
> -
>
> Key: PHOENIX-3678
> URL: https://issues.apache.org/jira/browse/PHOENIX-3678
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3678.patch
>
>
> Configuration: Client 4.8 - Server 4.x head with 
> https://github.com/apache/phoenix/pull/232 patch applied.
> Steps:
> Table created with 4.8 client/server. 
> Queries return null for all non-row key columns for immutable table when only 
> server side is upgraded.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3585) MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail for transactional tables and local indexes

2017-02-16 Thread Thomas D'Silva (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870298#comment-15870298
 ] 

Thomas D'Silva commented on PHOENIX-3585:
-

Yes IndexHalfStoreFileReaderGenerator can work on transaction data. Should I 
just override preFlushScannerOpen in PhoenixTransactionalProcessor to not call 
TransactionProcessor ?

> MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail 
> for transactional tables and local indexes
> 
>
> Key: PHOENIX-3585
> URL: https://issues.apache.org/jira/browse/PHOENIX-3585
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Attachments: diff.patch
>
>
> the tests fail if we use HDFSTransactionStateStorage instead of  
> InMemoryTransactionStateStorage when we create the TransactionManager in 
> BaseTest



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (PHOENIX-3664) Pyspark: pushing filter by date against apache phoenix

2017-02-16 Thread Josh Mahonin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Mahonin resolved PHOENIX-3664.
---
   Resolution: Duplicate
Fix Version/s: 4.10.0

> Pyspark: pushing filter by date against apache phoenix
> --
>
> Key: PHOENIX-3664
> URL: https://issues.apache.org/jira/browse/PHOENIX-3664
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Azure HDIndight (HDI 3.5) - pyspark using phoenix 
> client. (Spark 1.6.3 - HBase 1.1.2 under HDP 2.5)
>Reporter: Pablo Castilla
> Fix For: 4.10.0
>
>
> I am trying to filter by date in apache phoenix from pyspark. The column in 
> phoenix is created as Date and the filter is a datetime. When I use explain I 
> see spark doesn't push the filter to phoenix. I have tried a lot of 
> combinations without luck.
> Any way to do it?
> df = sqlContext.read \
>.format("org.apache.phoenix.spark") \
>   .option("table", "TABLENAME") \
>   .option("zkUrl",zookepperServer +":2181:/hbase-unsecure" ) \
>   .load()
> print(df.printSchema())
> startValidation = datetime.datetime.now()
> print(df.filter(df['FH'] >startValidation).explain(True))
> Results:
> root
>  |-- METER_ID: string (nullable = true)
>  |-- FH: date (nullable = true)
> None
>== Parsed Logical Plan ==
> 'Filter (FH#53 > 1486726683446150)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Analyzed Logical Plan ==
> METER_ID: string, FH: date, SUMMERTIME: string, MAGNITUDE: int, SOURCE: int, 
> ENTRY_DATETIME: date, BC: string, T_VAL_AE: int, T_VAL_AI: int, T_VAL_R1: 
> int, T_VAL_R2: int, T_VAL_R3: int, T_VAL_R4: int
> Filter (cast(FH#53 as string) > cast(1486726683446150 as string))
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Optimized Logical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Physical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- Scan 
> PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
> None
> if I set the FH column as timestamp it pushes the filter but throws an 
> exception:
> Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 
> (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "12" at line 
> 1, column 219.
> at 
> org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33)
> at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368)
> at 
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122)
> ... 102 more
> Caused by: MismatchedTokenException(106!=129)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:360)
> at 
> org.apache.phoenix.shaded.org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.not_expression(PhoenixSQLParser.java:6862)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.and_expression(PhoenixSQLParser.java:6677)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.or_expression(PhoenixSQLParser.java:6614)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.expression(PhoenixSQLParser.java:6579)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.single_select(PhoenixSQLParser.java:4615)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.unioned_selects(PhoenixSQLParser.java:4697)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.select_node(PhoenixSQLParser.java:4763)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.oneStatement(PhoenixSQLParser.java:789)
> at 
> 

[jira] [Comment Edited] (PHOENIX-3664) Pyspark: pushing filter by date against apache phoenix

2017-02-16 Thread Pablo Castilla (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869655#comment-15869655
 ] 

Pablo Castilla edited comment on PHOENIX-3664 at 2/16/17 9:57 AM:
--

At the end we have switched from python to scala and all seems to work at very 
good speed. We haven't found the phoenixTableAsRDD in python.

We would prefer python as we use it in machine learning implementations, but 
they are very very similar so moving to scala is not a big deal.

Thanks for helping! :)


was (Author: pablocastilla):
At the end we have switched from python to scala and all seems to work at very 
good speed. We haven't found the phoenixTableAsRDD in python.

We would prefer python as we use it in machine learning implementations, but 
they are very very similar so moving to scala is not a big deal.

> Pyspark: pushing filter by date against apache phoenix
> --
>
> Key: PHOENIX-3664
> URL: https://issues.apache.org/jira/browse/PHOENIX-3664
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Azure HDIndight (HDI 3.5) - pyspark using phoenix 
> client. (Spark 1.6.3 - HBase 1.1.2 under HDP 2.5)
>Reporter: Pablo Castilla
>
> I am trying to filter by date in apache phoenix from pyspark. The column in 
> phoenix is created as Date and the filter is a datetime. When I use explain I 
> see spark doesn't push the filter to phoenix. I have tried a lot of 
> combinations without luck.
> Any way to do it?
> df = sqlContext.read \
>.format("org.apache.phoenix.spark") \
>   .option("table", "TABLENAME") \
>   .option("zkUrl",zookepperServer +":2181:/hbase-unsecure" ) \
>   .load()
> print(df.printSchema())
> startValidation = datetime.datetime.now()
> print(df.filter(df['FH'] >startValidation).explain(True))
> Results:
> root
>  |-- METER_ID: string (nullable = true)
>  |-- FH: date (nullable = true)
> None
>== Parsed Logical Plan ==
> 'Filter (FH#53 > 1486726683446150)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Analyzed Logical Plan ==
> METER_ID: string, FH: date, SUMMERTIME: string, MAGNITUDE: int, SOURCE: int, 
> ENTRY_DATETIME: date, BC: string, T_VAL_AE: int, T_VAL_AI: int, T_VAL_R1: 
> int, T_VAL_R2: int, T_VAL_R3: int, T_VAL_R4: int
> Filter (cast(FH#53 as string) > cast(1486726683446150 as string))
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Optimized Logical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Physical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- Scan 
> PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
> None
> if I set the FH column as timestamp it pushes the filter but throws an 
> exception:
> Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 
> (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "12" at line 
> 1, column 219.
> at 
> org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33)
> at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368)
> at 
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122)
> ... 102 more
> Caused by: MismatchedTokenException(106!=129)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:360)
> at 
> org.apache.phoenix.shaded.org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.not_expression(PhoenixSQLParser.java:6862)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.and_expression(PhoenixSQLParser.java:6677)
> 

[jira] [Commented] (PHOENIX-3664) Pyspark: pushing filter by date against apache phoenix

2017-02-16 Thread Pablo Castilla (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869655#comment-15869655
 ] 

Pablo Castilla commented on PHOENIX-3664:
-

At the end we have switched from python to scala and all seems to work at very 
good speed. We haven't found the phoenixTableAsRDD in python.

We would prefer python as we use it in machine learning implementations, but 
they are very very similar so moving to scala is not a big deal.

> Pyspark: pushing filter by date against apache phoenix
> --
>
> Key: PHOENIX-3664
> URL: https://issues.apache.org/jira/browse/PHOENIX-3664
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Azure HDIndight (HDI 3.5) - pyspark using phoenix 
> client. (Spark 1.6.3 - HBase 1.1.2 under HDP 2.5)
>Reporter: Pablo Castilla
>
> I am trying to filter by date in apache phoenix from pyspark. The column in 
> phoenix is created as Date and the filter is a datetime. When I use explain I 
> see spark doesn't push the filter to phoenix. I have tried a lot of 
> combinations without luck.
> Any way to do it?
> df = sqlContext.read \
>.format("org.apache.phoenix.spark") \
>   .option("table", "TABLENAME") \
>   .option("zkUrl",zookepperServer +":2181:/hbase-unsecure" ) \
>   .load()
> print(df.printSchema())
> startValidation = datetime.datetime.now()
> print(df.filter(df['FH'] >startValidation).explain(True))
> Results:
> root
>  |-- METER_ID: string (nullable = true)
>  |-- FH: date (nullable = true)
> None
>== Parsed Logical Plan ==
> 'Filter (FH#53 > 1486726683446150)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Analyzed Logical Plan ==
> METER_ID: string, FH: date, SUMMERTIME: string, MAGNITUDE: int, SOURCE: int, 
> ENTRY_DATETIME: date, BC: string, T_VAL_AE: int, T_VAL_AI: int, T_VAL_R1: 
> int, T_VAL_R2: int, T_VAL_R3: int, T_VAL_R4: int
> Filter (cast(FH#53 as string) > cast(1486726683446150 as string))
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Optimized Logical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Physical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- Scan 
> PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
> None
> if I set the FH column as timestamp it pushes the filter but throws an 
> exception:
> Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 
> (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "12" at line 
> 1, column 219.
> at 
> org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33)
> at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368)
> at 
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122)
> ... 102 more
> Caused by: MismatchedTokenException(106!=129)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:360)
> at 
> org.apache.phoenix.shaded.org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.not_expression(PhoenixSQLParser.java:6862)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.and_expression(PhoenixSQLParser.java:6677)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.or_expression(PhoenixSQLParser.java:6614)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.expression(PhoenixSQLParser.java:6579)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.single_select(PhoenixSQLParser.java:4615)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.unioned_selects(PhoenixSQLParser.java:4697)
> at 
>