[jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes

2014-08-31 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116853#comment-14116853
 ] 

Anoop Sam John commented on PHOENIX-1203:
-

Got the issue.  Patch coming in shortly.

 Uable to work for count (distinct col) queries via phoenix table with 
 secondary indexes
 ---

 Key: PHOENIX-1203
 URL: https://issues.apache.org/jira/browse/PHOENIX-1203
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.0.0, 4.1
 Environment: hadoop-2.2.0   hbase: Version 0.98.3-hadoop2
Reporter: Sun Fulin
Assignee: Anoop Sam John
  Labels: distinct, secondaryIndex, test

 I build the latest 4.1 rc0 from here:  
 https://github.com/apache/phoenix/releases
 And examine the count (distinct col) query within the new environment. 
 However, the problem still exists with index queries as the following while 
 the correct distinct query result is expected to be 536 for my project:
 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t 
 where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 
 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ;
 +-+
 | COUNT(IMSI) |
 +-+
 | 2322|
 +-+
 1 row selected (70.572 seconds)
 As James suggests, I conduct the query adding group by t.imsi with /without 
 secondary indexes. And the result seems to be fine as they both got the 
 correct 536 distinct groups. 
 Here are some considerations:
 1. count (distinct col) query over index table did not work as expectation.
 2. only distinct query over index table works fine.
 3. If the phoenix version got some wrong configuration, correct me.
 Thanks and Best Regards,
 Sun
 ---
 Hi Sun,
 Thanks for the detailed description. Yes, your syntax is correct, and
 it's definitely true that the count distinct query should return the
 same result with and without the index. Would you mind trying this on
 our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to
 file a JIRA?
  
 One thing that make make it easier for your testing: do you know about
 our NO_INDEX hint which forces the query *not* to use an index, like
 this:
  
 select /*+ NO_INDEX */ ...
  
 Another question too. What about this query with and with/out the index:
  
 select count(*) from ranapsignal t
 where t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0
 and t.ranapsubmsgtype=0
 group by t.imsi;
  
 Thanks,
 James
  
 On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn
 su...@certusnet.com.cn wrote:
 
 
  Hi James,
 
Recently I got trouble while trying to conduct some query performance test
  in my phoenix tables with secondary indexes.
I created a table called RANAPSIGNAL for my projects in phoenix via
  sqlline client and load data into the table. Then I create
an index on the specific column PKTTIME for the table RANAPSIGNAL while
  including other more columns for adjusting my
index query, like the following DDL:
   create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include
  (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE );
 The index creation worked successfully without any errors. So, when I am
  trying to conduct such query as:
 select count (distinct t.imsi) from ranapsignal t where
  t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
  t.ranapsubmsgtype=0 ;
 Without secondary indexes, the final result got 536 distinct imsi, wihch
  is the right distinct count results. However, after I  create the above
secondary index PKT_IDX and reconducting the above count (distinct imsi)
  query, I got 2322 imsi rows which obviously are not the expected
distinct counts results. I used the explain grammar to observe the scan of
  the above select query and found that it definitely scaned over
the index table PKT_IDX. I then tried to conduct the following query with
  no count function:
  select distinct t.imsi from ranapsignal t where
  t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
  t.ranapsubmsgtype=0 ;
 And the result is right 536 distinct imsi over scanning the index table.
  By the way, imsi is one of the primary key when creating the table
  RANAPSIGNAL.
 Here are several considerations for my trouble and practice:
 1. Did you guys ever practice such count (distinct) queries over phoenix
  table via secondary index?
 2. I am not sure whether this problem was due to the index table, but my
  practice may assume that conclusion.
 3. Corrects 

[jira] [Updated] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes

2014-08-31 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated PHOENIX-1203:

Attachment: PHOENIX-1203.patch

The function name for the DistinctCount function was still COUNT !
So when rewrite the query plan using the index table, the 
DistinctCountAggregateFunction is longer getting used but just 
CountAggregateFunction.

 Uable to work for count (distinct col) queries via phoenix table with 
 secondary indexes
 ---

 Key: PHOENIX-1203
 URL: https://issues.apache.org/jira/browse/PHOENIX-1203
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.0.0, 4.1
 Environment: hadoop-2.2.0   hbase: Version 0.98.3-hadoop2
Reporter: Sun Fulin
Assignee: Anoop Sam John
  Labels: distinct, secondaryIndex, test
 Attachments: PHOENIX-1203.patch


 I build the latest 4.1 rc0 from here:  
 https://github.com/apache/phoenix/releases
 And examine the count (distinct col) query within the new environment. 
 However, the problem still exists with index queries as the following while 
 the correct distinct query result is expected to be 536 for my project:
 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t 
 where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 
 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ;
 +-+
 | COUNT(IMSI) |
 +-+
 | 2322|
 +-+
 1 row selected (70.572 seconds)
 As James suggests, I conduct the query adding group by t.imsi with /without 
 secondary indexes. And the result seems to be fine as they both got the 
 correct 536 distinct groups. 
 Here are some considerations:
 1. count (distinct col) query over index table did not work as expectation.
 2. only distinct query over index table works fine.
 3. If the phoenix version got some wrong configuration, correct me.
 Thanks and Best Regards,
 Sun
 ---
 Hi Sun,
 Thanks for the detailed description. Yes, your syntax is correct, and
 it's definitely true that the count distinct query should return the
 same result with and without the index. Would you mind trying this on
 our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to
 file a JIRA?
  
 One thing that make make it easier for your testing: do you know about
 our NO_INDEX hint which forces the query *not* to use an index, like
 this:
  
 select /*+ NO_INDEX */ ...
  
 Another question too. What about this query with and with/out the index:
  
 select count(*) from ranapsignal t
 where t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0
 and t.ranapsubmsgtype=0
 group by t.imsi;
  
 Thanks,
 James
  
 On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn
 su...@certusnet.com.cn wrote:
 
 
  Hi James,
 
Recently I got trouble while trying to conduct some query performance test
  in my phoenix tables with secondary indexes.
I created a table called RANAPSIGNAL for my projects in phoenix via
  sqlline client and load data into the table. Then I create
an index on the specific column PKTTIME for the table RANAPSIGNAL while
  including other more columns for adjusting my
index query, like the following DDL:
   create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include
  (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE );
 The index creation worked successfully without any errors. So, when I am
  trying to conduct such query as:
 select count (distinct t.imsi) from ranapsignal t where
  t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
  t.ranapsubmsgtype=0 ;
 Without secondary indexes, the final result got 536 distinct imsi, wihch
  is the right distinct count results. However, after I  create the above
secondary index PKT_IDX and reconducting the above count (distinct imsi)
  query, I got 2322 imsi rows which obviously are not the expected
distinct counts results. I used the explain grammar to observe the scan of
  the above select query and found that it definitely scaned over
the index table PKT_IDX. I then tried to conduct the following query with
  no count function:
  select distinct t.imsi from ranapsignal t where
  t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
  t.ranapsubmsgtype=0 ;
 And the result is right 536 distinct imsi over scanning the index table.
  By the way, imsi is one of the primary key when creating the table
  RANAPSIGNAL.
 Here are several considerations for my trouble and practice:
 1. Did you guys ever practice such count 

[jira] [Updated] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes

2014-08-31 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated PHOENIX-1203:

Affects Version/s: (was: 4.1)
   (was: 4.0.0)
   3.0.0

 Uable to work for count (distinct col) queries via phoenix table with 
 secondary indexes
 ---

 Key: PHOENIX-1203
 URL: https://issues.apache.org/jira/browse/PHOENIX-1203
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: hadoop-2.2.0   hbase: Version 0.98.3-hadoop2
Reporter: Sun Fulin
Assignee: Anoop Sam John
  Labels: distinct, secondaryIndex, test
 Attachments: PHOENIX-1203.patch


 I build the latest 4.1 rc0 from here:  
 https://github.com/apache/phoenix/releases
 And examine the count (distinct col) query within the new environment. 
 However, the problem still exists with index queries as the following while 
 the correct distinct query result is expected to be 536 for my project:
 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t 
 where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 
 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ;
 +-+
 | COUNT(IMSI) |
 +-+
 | 2322|
 +-+
 1 row selected (70.572 seconds)
 As James suggests, I conduct the query adding group by t.imsi with /without 
 secondary indexes. And the result seems to be fine as they both got the 
 correct 536 distinct groups. 
 Here are some considerations:
 1. count (distinct col) query over index table did not work as expectation.
 2. only distinct query over index table works fine.
 3. If the phoenix version got some wrong configuration, correct me.
 Thanks and Best Regards,
 Sun
 ---
 Hi Sun,
 Thanks for the detailed description. Yes, your syntax is correct, and
 it's definitely true that the count distinct query should return the
 same result with and without the index. Would you mind trying this on
 our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to
 file a JIRA?
  
 One thing that make make it easier for your testing: do you know about
 our NO_INDEX hint which forces the query *not* to use an index, like
 this:
  
 select /*+ NO_INDEX */ ...
  
 Another question too. What about this query with and with/out the index:
  
 select count(*) from ranapsignal t
 where t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0
 and t.ranapsubmsgtype=0
 group by t.imsi;
  
 Thanks,
 James
  
 On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn
 su...@certusnet.com.cn wrote:
 
 
  Hi James,
 
Recently I got trouble while trying to conduct some query performance test
  in my phoenix tables with secondary indexes.
I created a table called RANAPSIGNAL for my projects in phoenix via
  sqlline client and load data into the table. Then I create
an index on the specific column PKTTIME for the table RANAPSIGNAL while
  including other more columns for adjusting my
index query, like the following DDL:
   create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include
  (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE );
 The index creation worked successfully without any errors. So, when I am
  trying to conduct such query as:
 select count (distinct t.imsi) from ranapsignal t where
  t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
  t.ranapsubmsgtype=0 ;
 Without secondary indexes, the final result got 536 distinct imsi, wihch
  is the right distinct count results. However, after I  create the above
secondary index PKT_IDX and reconducting the above count (distinct imsi)
  query, I got 2322 imsi rows which obviously are not the expected
distinct counts results. I used the explain grammar to observe the scan of
  the above select query and found that it definitely scaned over
the index table PKT_IDX. I then tried to conduct the following query with
  no count function:
  select distinct t.imsi from ranapsignal t where
  t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
  t.ranapsubmsgtype=0 ;
 And the result is right 536 distinct imsi over scanning the index table.
  By the way, imsi is one of the primary key when creating the table
  RANAPSIGNAL.
 Here are several considerations for my trouble and practice:
 1. Did you guys ever practice such count (distinct) queries over phoenix
  table via secondary index?
 2. I am not sure whether this problem was due to the index table, but my
  

[jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes

2014-08-31 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116908#comment-14116908
 ] 

James Taylor commented on PHOENIX-1203:
---

+1. Nice find, [~anoop.hbase]. Please check in to 3.0, 4.0, and master branch.

 Uable to work for count (distinct col) queries via phoenix table with 
 secondary indexes
 ---

 Key: PHOENIX-1203
 URL: https://issues.apache.org/jira/browse/PHOENIX-1203
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: hadoop-2.2.0   hbase: Version 0.98.3-hadoop2
Reporter: Sun Fulin
Assignee: Anoop Sam John
  Labels: distinct, secondaryIndex, test
 Attachments: PHOENIX-1203.patch


 I build the latest 4.1 rc0 from here:  
 https://github.com/apache/phoenix/releases
 And examine the count (distinct col) query within the new environment. 
 However, the problem still exists with index queries as the following while 
 the correct distinct query result is expected to be 536 for my project:
 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t 
 where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 
 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ;
 +-+
 | COUNT(IMSI) |
 +-+
 | 2322|
 +-+
 1 row selected (70.572 seconds)
 As James suggests, I conduct the query adding group by t.imsi with /without 
 secondary indexes. And the result seems to be fine as they both got the 
 correct 536 distinct groups. 
 Here are some considerations:
 1. count (distinct col) query over index table did not work as expectation.
 2. only distinct query over index table works fine.
 3. If the phoenix version got some wrong configuration, correct me.
 Thanks and Best Regards,
 Sun
 ---
 Hi Sun,
 Thanks for the detailed description. Yes, your syntax is correct, and
 it's definitely true that the count distinct query should return the
 same result with and without the index. Would you mind trying this on
 our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to
 file a JIRA?
  
 One thing that make make it easier for your testing: do you know about
 our NO_INDEX hint which forces the query *not* to use an index, like
 this:
  
 select /*+ NO_INDEX */ ...
  
 Another question too. What about this query with and with/out the index:
  
 select count(*) from ranapsignal t
 where t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0
 and t.ranapsubmsgtype=0
 group by t.imsi;
  
 Thanks,
 James
  
 On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn
 su...@certusnet.com.cn wrote:
 
 
  Hi James,
 
Recently I got trouble while trying to conduct some query performance test
  in my phoenix tables with secondary indexes.
I created a table called RANAPSIGNAL for my projects in phoenix via
  sqlline client and load data into the table. Then I create
an index on the specific column PKTTIME for the table RANAPSIGNAL while
  including other more columns for adjusting my
index query, like the following DDL:
   create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include
  (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE );
 The index creation worked successfully without any errors. So, when I am
  trying to conduct such query as:
 select count (distinct t.imsi) from ranapsignal t where
  t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
  t.ranapsubmsgtype=0 ;
 Without secondary indexes, the final result got 536 distinct imsi, wihch
  is the right distinct count results. However, after I  create the above
secondary index PKT_IDX and reconducting the above count (distinct imsi)
  query, I got 2322 imsi rows which obviously are not the expected
distinct counts results. I used the explain grammar to observe the scan of
  the above select query and found that it definitely scaned over
the index table PKT_IDX. I then tried to conduct the following query with
  no count function:
  select distinct t.imsi from ranapsignal t where
  t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
  t.ranapsubmsgtype=0 ;
 And the result is right 536 distinct imsi over scanning the index table.
  By the way, imsi is one of the primary key when creating the table
  RANAPSIGNAL.
 Here are several considerations for my trouble and practice:
 1. Did you guys ever practice such count (distinct) queries over phoenix
  table via secondary index?
 2. I am not sure whether this problem was due to the 

[GitHub] phoenix pull request: Phoenix 180

2014-08-31 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/8#discussion_r16937281
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java
 ---
@@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) {
 //split each region in s splits such that:
 //s = max(x) where s * x  t
 //
-// The idea is to align splits with region boundaries. If rows are 
not evenly
-// distributed across regions, using this scheme compensates for 
regions that
-// have more rows than others, by applying tighter splits and 
therefore spawning
-// off more scans over the overloaded regions.
-int splitsPerRegion = getSplitsPerRegion(regions.size());
 // Create a multi-map of ServerName to ListKeyRange which we'll 
use to round robin from to ensure
 // that we keep each region server busy for each query.
-ListMultimapHRegionLocation,KeyRange keyRangesPerRegion = 
ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);;
+int splitsPerRegion = getSplitsPerRegion(regions.size());
+ListMultimapHRegionLocation,KeyRange keyRangesPerRegion = 
ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);
 if (splitsPerRegion == 1) {
 for (HRegionLocation region : regions) {
--- End diff --

One minor issue is the case where we have no column family. It's possible 
to not declare any key value column in a CREATE TABLE statement. In that case, 
the PTable will no column families. In that case, I'd suppose we could just 
store the guideposts on the PTable instead of the PColumnFamily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (PHOENIX-1098) Support CASCADE option on DROP TABLE that drops all VIEWs

2014-08-31 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116922#comment-14116922
 ] 

James Taylor commented on PHOENIX-1098:
---

Looks fantastic, [~jfernando_sfdc]. Couple of minor items:
- For 3.0 patch, don't remove the old MetaDataProtocol method because then a 
3.1 client won't work with a 3.2 server. Instead, just leave it and have it 
call your new method with isCascade=false:
{code}
diff --git 
a/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataProtocol.java
 
b/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataProtocol.java
index a303b95..7bd2705 100644
--- 
a/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataProtocol.java
+++ 
b/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataProtocol.java
@@ -232,7 +232,7 @@ public void write(DataOutput output) throws IOException {
  * @return MetaDataMutationResult
  * @throws IOException
  */
-MetaDataMutationResult dropTable(ListMutation tableMetadata, String 
tableType) throws IOException;
+MetaDataMutationResult dropTable(ListMutation tableMetadata, String 
tableType, boolean isCascade) throws IOException;
{code}
- Make sure to re-base your local repo, as I checked in the fix for 
SuffixFilter and your patches are going to conflict with that change. 


 Support CASCADE option on DROP TABLE that drops all VIEWs
 -

 Key: PHOENIX-1098
 URL: https://issues.apache.org/jira/browse/PHOENIX-1098
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.1
Reporter: James Taylor
Assignee: Jan Fernando
 Fix For: 4.1

 Attachments: PHOENIX-1098-3.1.patch, PHOENIX-1098-4.1.patch, 
 PHOENIX-1098-master.patch


 It's inconvenient to have to manually drop all of the views of a multi-tenant 
 table before being able to drop the table. We should support a CASCADE option 
 on DROP TABLE which automatically does this, like this:
 DROP TABLE foo CASCADE



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-852) Optimize child/parent foreign key joins

2014-08-31 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116930#comment-14116930
 ] 

James Taylor commented on PHOENIX-852:
--

Looks good, [~maryannxue]. See my comment on PHOENIX-1220 - it'd be good to 
adjust your getSampleValue call to take that into account prior to checking 
this in. +1 after that.

 Optimize child/parent foreign key joins
 ---

 Key: PHOENIX-852
 URL: https://issues.apache.org/jira/browse/PHOENIX-852
 Project: Phoenix
  Issue Type: Improvement
Reporter: James Taylor
Assignee: Maryann Xue
 Attachments: 852-2.patch, 852-3.patch, 852.patch, PHOENIX-852.patch


 Often times a join will occur from a child to a parent. Our current algorithm 
 would do a full scan of one side or the other. We can do much better than 
 that if the HashCache contains the PK (or even part of the PK) from the table 
 being joined to. In these cases, we should drive the second scan through a 
 skip scan on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1224) Dead loop in hbase scan when hint SKIP_SCAN is set and there is partial key match in RowValueConstructor

2014-08-31 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116932#comment-14116932
 ] 

James Taylor commented on PHOENIX-1224:
---

Would be good if this was fixed in conjunction with supporting a skip scan over 
a partial rvc match.

 Dead loop in hbase scan when hint SKIP_SCAN is set and there is partial key 
 match in RowValueConstructor
 

 Key: PHOENIX-1224
 URL: https://issues.apache.org/jira/browse/PHOENIX-1224
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0, 4.0.0, 5.0.0
Reporter: Maryann Xue
Assignee: Kyle Buzsaki
   Original Estimate: 48h
  Remaining Estimate: 48h

 The below test will end up in dead loop in hbase scan.
 {code}
 @Test
 public void testForceSkipScan() throws Exception {
 String tempTableWithCompositePK = TEMP_TABLE_COMPOSITE_PK;
 Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
 Connection conn = DriverManager.getConnection(getUrl(), props);
 try {
 conn.createStatement().execute(CREATE TABLE  + 
 tempTableWithCompositePK 
 +(col0 INTEGER NOT NULL,  
 + col1 INTEGER NOT NULL,  
 + col2 INTEGER NOT NULL, 
 + col3 INTEGER 
 +CONSTRAINT pk PRIMARY KEY (col0, col1, col2))  
 +SALT_BUCKETS=4);
 
 PreparedStatement upsertStmt = conn.prepareStatement(
 upsert into  + tempTableWithCompositePK + (col0, col1, 
 col2, col3)  + values (?, ?, ?, ?));
 for (int i = 0; i  3; i++) {
 upsertStmt.setInt(1, i + 1);
 upsertStmt.setInt(2, i + 2);
 upsertStmt.setInt(3, i + 3);
 upsertStmt.setInt(4, i + 5);
 upsertStmt.execute();
 }
 conn.commit();
 
 String query = SELECT /*+ SKIP_SCAN*/ * FROM  + 
 tempTableWithCompositePK +  WHERE (col0, col1) in ((2, 3), (3, 4), (4, 5));
 PreparedStatement statement = conn.prepareStatement(query);
 ResultSet rs = statement.executeQuery();
 assertTrue(rs.next());
 assertEquals(rs.getInt(1), 2);
 assertEquals(rs.getInt(2), 3);
 assertEquals(rs.getInt(3), 4);
 assertEquals(rs.getInt(4), 6);
 assertTrue(rs.next());
 assertEquals(rs.getInt(1), 3);
 assertEquals(rs.getInt(2), 4);
 assertEquals(rs.getInt(3), 5);
 assertEquals(rs.getInt(4), 7);
 assertFalse(rs.next());
 } finally {
 conn.close();
 }
 }
 {code}
 The dead-loop thread:
 {panel}
 defaultRpcServer.handler=4,queue=0,port=58945 daemon prio=10 
 tid=0x7fe4d408c000 nid=0x7bba runnable [0x7fe4c10cf000]
java.lang.Thread.State: RUNNABLE
 at java.util.ArrayList.size(ArrayList.java:177)
 at java.util.AbstractList$Itr.hasNext(AbstractList.java:339)
 at 
 org.apache.hadoop.hbase.filter.FilterList.filterAllRemaining(FilterList.java:199)
 at 
 org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:263)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:469)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3937)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4017)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3885)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3876)
 at 
 org.apache.phoenix.coprocessor.ScanRegionObserver$2.nextRaw(ScanRegionObserver.java:366)
 at 
 org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:76)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3157)
 - locked 0x000778d5dbd8 (a 
 org.apache.phoenix.coprocessor.BaseScannerRegionObserver$1)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
 at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 at 

[jira] [Created] (PHOENIX-1226) Exception in Tracing

2014-08-31 Thread Dan Di Spaltro (JIRA)
Dan Di Spaltro created PHOENIX-1226:
---

 Summary: Exception in Tracing
 Key: PHOENIX-1226
 URL: https://issues.apache.org/jira/browse/PHOENIX-1226
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.1
 Environment: 0.98.5 hbase, 4.1.0 phoenix
Reporter: Dan Di Spaltro


I was exposed to an exception in the tracing code, during my test setup of 
Phoenix in the following code:

{code}
58062 [defaultRpcServer.handler=2,queue=0,port=53950] WARN  
org.apache.hadoop.ipc.RpcServer  - 
defaultRpcServer.handler=2,queue=0,port=53950: caught: 
java.lang.IllegalArgumentException: offset (0) + length (4) exceed the capacity 
of the array: 3
at 
org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:600)
at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:749)
at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:725)
at 
org.apache.phoenix.trace.TracingCompat.readAnnotation(TracingCompat.java:56)
at 
org.apache.phoenix.trace.TraceMetricSource.receiveSpan(TraceMetricSource.java:121)
at org.cloudera.htrace.Tracer.deliver(Tracer.java:81)
at org.cloudera.htrace.impl.MilliSpan.stop(MilliSpan.java:70)
at org.cloudera.htrace.TraceScope.close(TraceScope.java:70)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
at java.lang.Thread.run(Thread.java:744)
{code}

It is related to the following line of code where we interpret all KV 
annotation values as byte-wise integers here:

https://github.com/apache/phoenix/blob/v4.1.0/phoenix-hadoop-compat/src/main/java/org/apache/phoenix/trace/TracingCompat.java#L56

Here is where HBase is adding a non-integer KV annotation:

https://github.com/apache/hbase/blob/0.98.5/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RequestContext.java#L105

The fix should be simple, but I am not aware of all the related issues in 
changing this.

cc [~jesse_yates], [~samarth.j...@gmail.com], [~giacomotaylor]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes

2014-08-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117013#comment-14117013
 ] 

Hudson commented on PHOENIX-1203:
-

SUCCESS: Integrated in Phoenix | 3.0 | Hadoop1 #201 (See 
[https://builds.apache.org/job/Phoenix-3.0-hadoop1/201/])
PHOENIX-1203 Uable to work for count (distinct col) queries via phoenix table 
with secondary indexes. (anoopsamjohn: rev 
b5cbb79b6cceecfe0ac012c8ddd19cef6916734e)
* phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeFactory.java
* phoenix-core/src/it/java/org/apache/phoenix/end2end/DistinctCountIT.java


 Uable to work for count (distinct col) queries via phoenix table with 
 secondary indexes
 ---

 Key: PHOENIX-1203
 URL: https://issues.apache.org/jira/browse/PHOENIX-1203
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0
 Environment: hadoop-2.2.0   hbase: Version 0.98.3-hadoop2
Reporter: Sun Fulin
Assignee: Anoop Sam John
  Labels: distinct, secondaryIndex, test
 Fix For: 5.0.0, 4.2, 3.2

 Attachments: PHOENIX-1203.patch


 I build the latest 4.1 rc0 from here:  
 https://github.com/apache/phoenix/releases
 And examine the count (distinct col) query within the new environment. 
 However, the problem still exists with index queries as the following while 
 the correct distinct query result is expected to be 536 for my project:
 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t 
 where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 
 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ;
 +-+
 | COUNT(IMSI) |
 +-+
 | 2322|
 +-+
 1 row selected (70.572 seconds)
 As James suggests, I conduct the query adding group by t.imsi with /without 
 secondary indexes. And the result seems to be fine as they both got the 
 correct 536 distinct groups. 
 Here are some considerations:
 1. count (distinct col) query over index table did not work as expectation.
 2. only distinct query over index table works fine.
 3. If the phoenix version got some wrong configuration, correct me.
 Thanks and Best Regards,
 Sun
 ---
 Hi Sun,
 Thanks for the detailed description. Yes, your syntax is correct, and
 it's definitely true that the count distinct query should return the
 same result with and without the index. Would you mind trying this on
 our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to
 file a JIRA?
  
 One thing that make make it easier for your testing: do you know about
 our NO_INDEX hint which forces the query *not* to use an index, like
 this:
  
 select /*+ NO_INDEX */ ...
  
 Another question too. What about this query with and with/out the index:
  
 select count(*) from ranapsignal t
 where t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0
 and t.ranapsubmsgtype=0
 group by t.imsi;
  
 Thanks,
 James
  
 On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn
 su...@certusnet.com.cn wrote:
 
 
  Hi James,
 
Recently I got trouble while trying to conduct some query performance test
  in my phoenix tables with secondary indexes.
I created a table called RANAPSIGNAL for my projects in phoenix via
  sqlline client and load data into the table. Then I create
an index on the specific column PKTTIME for the table RANAPSIGNAL while
  including other more columns for adjusting my
index query, like the following DDL:
   create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include
  (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE );
 The index creation worked successfully without any errors. So, when I am
  trying to conduct such query as:
 select count (distinct t.imsi) from ranapsignal t where
  t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
  t.ranapsubmsgtype=0 ;
 Without secondary indexes, the final result got 536 distinct imsi, wihch
  is the right distinct count results. However, after I  create the above
secondary index PKT_IDX and reconducting the above count (distinct imsi)
  query, I got 2322 imsi rows which obviously are not the expected
distinct counts results. I used the explain grammar to observe the scan of
  the above select query and found that it definitely scaned over
the index table PKT_IDX. I then tried to conduct the following query with
  no count function:
  select distinct t.imsi from ranapsignal t where
  t.pkttime=140496480 and t.pkttime=140496569
 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
  t.ranapsubmsgtype=0 ;
 And the 

[GitHub] phoenix pull request: Phoenix 180

2014-08-31 Thread ramkrish86
Github user ramkrish86 commented on the pull request:

https://github.com/apache/phoenix/pull/8#issuecomment-54014739
  
@JamesRTaylor - Thanks for the review.  I will do my level best to complete 
all the comments.  Some of them have now become design changes because now we 
will be collecting based on CF rather than region name. So the guide posts will 
be a map with key as CF and byte[] (representing the guideposts). It was a long 
weekend here and so could not take this up.  Will post an updated patch ASAP 
may be in a day or two.  After that will do performance testing. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] phoenix pull request: Phoenix 180

2014-08-31 Thread JamesRTaylor
Github user JamesRTaylor commented on the pull request:

https://github.com/apache/phoenix/pull/8#issuecomment-54014952
  
Sounds good, Ram. Thanks for all your effort on this one. Looking forward 
to see the perf numbers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---