[jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
[ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116853#comment-14116853 ] Anoop Sam John commented on PHOENIX-1203: - Got the issue. Patch coming in shortly. Uable to work for count (distinct col) queries via phoenix table with secondary indexes --- Key: PHOENIX-1203 URL: https://issues.apache.org/jira/browse/PHOENIX-1203 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0, 4.1 Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 Reporter: Sun Fulin Assignee: Anoop Sam John Labels: distinct, secondaryIndex, test I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project: 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; +-+ | COUNT(IMSI) | +-+ | 2322| +-+ 1 row selected (70.572 seconds) As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. Here are some considerations: 1. count (distinct col) query over index table did not work as expectation. 2. only distinct query over index table works fine. 3. If the phoenix version got some wrong configuration, correct me. Thanks and Best Regards, Sun --- Hi Sun, Thanks for the detailed description. Yes, your syntax is correct, and it's definitely true that the count distinct query should return the same result with and without the index. Would you mind trying this on our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to file a JIRA? One thing that make make it easier for your testing: do you know about our NO_INDEX hint which forces the query *not* to use an index, like this: select /*+ NO_INDEX */ ... Another question too. What about this query with and with/out the index: select count(*) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 group by t.imsi; Thanks, James On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn su...@certusnet.com.cn wrote: Hi James, Recently I got trouble while trying to conduct some query performance test in my phoenix tables with secondary indexes. I created a table called RANAPSIGNAL for my projects in phoenix via sqlline client and load data into the table. Then I create an index on the specific column PKTTIME for the table RANAPSIGNAL while including other more columns for adjusting my index query, like the following DDL: create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); The index creation worked successfully without any errors. So, when I am trying to conduct such query as: select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; Without secondary indexes, the final result got 536 distinct imsi, wihch is the right distinct count results. However, after I create the above secondary index PKT_IDX and reconducting the above count (distinct imsi) query, I got 2322 imsi rows which obviously are not the expected distinct counts results. I used the explain grammar to observe the scan of the above select query and found that it definitely scaned over the index table PKT_IDX. I then tried to conduct the following query with no count function: select distinct t.imsi from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; And the result is right 536 distinct imsi over scanning the index table. By the way, imsi is one of the primary key when creating the table RANAPSIGNAL. Here are several considerations for my trouble and practice: 1. Did you guys ever practice such count (distinct) queries over phoenix table via secondary index? 2. I am not sure whether this problem was due to the index table, but my practice may assume that conclusion. 3. Corrects
[jira] [Updated] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
[ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated PHOENIX-1203: Attachment: PHOENIX-1203.patch The function name for the DistinctCount function was still COUNT ! So when rewrite the query plan using the index table, the DistinctCountAggregateFunction is longer getting used but just CountAggregateFunction. Uable to work for count (distinct col) queries via phoenix table with secondary indexes --- Key: PHOENIX-1203 URL: https://issues.apache.org/jira/browse/PHOENIX-1203 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0, 4.1 Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 Reporter: Sun Fulin Assignee: Anoop Sam John Labels: distinct, secondaryIndex, test Attachments: PHOENIX-1203.patch I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project: 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; +-+ | COUNT(IMSI) | +-+ | 2322| +-+ 1 row selected (70.572 seconds) As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. Here are some considerations: 1. count (distinct col) query over index table did not work as expectation. 2. only distinct query over index table works fine. 3. If the phoenix version got some wrong configuration, correct me. Thanks and Best Regards, Sun --- Hi Sun, Thanks for the detailed description. Yes, your syntax is correct, and it's definitely true that the count distinct query should return the same result with and without the index. Would you mind trying this on our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to file a JIRA? One thing that make make it easier for your testing: do you know about our NO_INDEX hint which forces the query *not* to use an index, like this: select /*+ NO_INDEX */ ... Another question too. What about this query with and with/out the index: select count(*) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 group by t.imsi; Thanks, James On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn su...@certusnet.com.cn wrote: Hi James, Recently I got trouble while trying to conduct some query performance test in my phoenix tables with secondary indexes. I created a table called RANAPSIGNAL for my projects in phoenix via sqlline client and load data into the table. Then I create an index on the specific column PKTTIME for the table RANAPSIGNAL while including other more columns for adjusting my index query, like the following DDL: create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); The index creation worked successfully without any errors. So, when I am trying to conduct such query as: select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; Without secondary indexes, the final result got 536 distinct imsi, wihch is the right distinct count results. However, after I create the above secondary index PKT_IDX and reconducting the above count (distinct imsi) query, I got 2322 imsi rows which obviously are not the expected distinct counts results. I used the explain grammar to observe the scan of the above select query and found that it definitely scaned over the index table PKT_IDX. I then tried to conduct the following query with no count function: select distinct t.imsi from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; And the result is right 536 distinct imsi over scanning the index table. By the way, imsi is one of the primary key when creating the table RANAPSIGNAL. Here are several considerations for my trouble and practice: 1. Did you guys ever practice such count
[jira] [Updated] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
[ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated PHOENIX-1203: Affects Version/s: (was: 4.1) (was: 4.0.0) 3.0.0 Uable to work for count (distinct col) queries via phoenix table with secondary indexes --- Key: PHOENIX-1203 URL: https://issues.apache.org/jira/browse/PHOENIX-1203 Project: Phoenix Issue Type: Bug Affects Versions: 3.0.0 Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 Reporter: Sun Fulin Assignee: Anoop Sam John Labels: distinct, secondaryIndex, test Attachments: PHOENIX-1203.patch I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project: 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; +-+ | COUNT(IMSI) | +-+ | 2322| +-+ 1 row selected (70.572 seconds) As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. Here are some considerations: 1. count (distinct col) query over index table did not work as expectation. 2. only distinct query over index table works fine. 3. If the phoenix version got some wrong configuration, correct me. Thanks and Best Regards, Sun --- Hi Sun, Thanks for the detailed description. Yes, your syntax is correct, and it's definitely true that the count distinct query should return the same result with and without the index. Would you mind trying this on our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to file a JIRA? One thing that make make it easier for your testing: do you know about our NO_INDEX hint which forces the query *not* to use an index, like this: select /*+ NO_INDEX */ ... Another question too. What about this query with and with/out the index: select count(*) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 group by t.imsi; Thanks, James On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn su...@certusnet.com.cn wrote: Hi James, Recently I got trouble while trying to conduct some query performance test in my phoenix tables with secondary indexes. I created a table called RANAPSIGNAL for my projects in phoenix via sqlline client and load data into the table. Then I create an index on the specific column PKTTIME for the table RANAPSIGNAL while including other more columns for adjusting my index query, like the following DDL: create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); The index creation worked successfully without any errors. So, when I am trying to conduct such query as: select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; Without secondary indexes, the final result got 536 distinct imsi, wihch is the right distinct count results. However, after I create the above secondary index PKT_IDX and reconducting the above count (distinct imsi) query, I got 2322 imsi rows which obviously are not the expected distinct counts results. I used the explain grammar to observe the scan of the above select query and found that it definitely scaned over the index table PKT_IDX. I then tried to conduct the following query with no count function: select distinct t.imsi from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; And the result is right 536 distinct imsi over scanning the index table. By the way, imsi is one of the primary key when creating the table RANAPSIGNAL. Here are several considerations for my trouble and practice: 1. Did you guys ever practice such count (distinct) queries over phoenix table via secondary index? 2. I am not sure whether this problem was due to the index table, but my
[jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
[ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116908#comment-14116908 ] James Taylor commented on PHOENIX-1203: --- +1. Nice find, [~anoop.hbase]. Please check in to 3.0, 4.0, and master branch. Uable to work for count (distinct col) queries via phoenix table with secondary indexes --- Key: PHOENIX-1203 URL: https://issues.apache.org/jira/browse/PHOENIX-1203 Project: Phoenix Issue Type: Bug Affects Versions: 3.0.0 Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 Reporter: Sun Fulin Assignee: Anoop Sam John Labels: distinct, secondaryIndex, test Attachments: PHOENIX-1203.patch I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project: 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; +-+ | COUNT(IMSI) | +-+ | 2322| +-+ 1 row selected (70.572 seconds) As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. Here are some considerations: 1. count (distinct col) query over index table did not work as expectation. 2. only distinct query over index table works fine. 3. If the phoenix version got some wrong configuration, correct me. Thanks and Best Regards, Sun --- Hi Sun, Thanks for the detailed description. Yes, your syntax is correct, and it's definitely true that the count distinct query should return the same result with and without the index. Would you mind trying this on our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to file a JIRA? One thing that make make it easier for your testing: do you know about our NO_INDEX hint which forces the query *not* to use an index, like this: select /*+ NO_INDEX */ ... Another question too. What about this query with and with/out the index: select count(*) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 group by t.imsi; Thanks, James On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn su...@certusnet.com.cn wrote: Hi James, Recently I got trouble while trying to conduct some query performance test in my phoenix tables with secondary indexes. I created a table called RANAPSIGNAL for my projects in phoenix via sqlline client and load data into the table. Then I create an index on the specific column PKTTIME for the table RANAPSIGNAL while including other more columns for adjusting my index query, like the following DDL: create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); The index creation worked successfully without any errors. So, when I am trying to conduct such query as: select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; Without secondary indexes, the final result got 536 distinct imsi, wihch is the right distinct count results. However, after I create the above secondary index PKT_IDX and reconducting the above count (distinct imsi) query, I got 2322 imsi rows which obviously are not the expected distinct counts results. I used the explain grammar to observe the scan of the above select query and found that it definitely scaned over the index table PKT_IDX. I then tried to conduct the following query with no count function: select distinct t.imsi from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; And the result is right 536 distinct imsi over scanning the index table. By the way, imsi is one of the primary key when creating the table RANAPSIGNAL. Here are several considerations for my trouble and practice: 1. Did you guys ever practice such count (distinct) queries over phoenix table via secondary index? 2. I am not sure whether this problem was due to the
[GitHub] phoenix pull request: Phoenix 180
Github user JamesRTaylor commented on a diff in the pull request: https://github.com/apache/phoenix/pull/8#discussion_r16937281 --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java --- @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) { //split each region in s splits such that: //s = max(x) where s * x t // -// The idea is to align splits with region boundaries. If rows are not evenly -// distributed across regions, using this scheme compensates for regions that -// have more rows than others, by applying tighter splits and therefore spawning -// off more scans over the overloaded regions. -int splitsPerRegion = getSplitsPerRegion(regions.size()); // Create a multi-map of ServerName to ListKeyRange which we'll use to round robin from to ensure // that we keep each region server busy for each query. -ListMultimapHRegionLocation,KeyRange keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);; +int splitsPerRegion = getSplitsPerRegion(regions.size()); +ListMultimapHRegionLocation,KeyRange keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion); if (splitsPerRegion == 1) { for (HRegionLocation region : regions) { --- End diff -- One minor issue is the case where we have no column family. It's possible to not declare any key value column in a CREATE TABLE statement. In that case, the PTable will no column families. In that case, I'd suppose we could just store the guideposts on the PTable instead of the PColumnFamily. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (PHOENIX-1098) Support CASCADE option on DROP TABLE that drops all VIEWs
[ https://issues.apache.org/jira/browse/PHOENIX-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116922#comment-14116922 ] James Taylor commented on PHOENIX-1098: --- Looks fantastic, [~jfernando_sfdc]. Couple of minor items: - For 3.0 patch, don't remove the old MetaDataProtocol method because then a 3.1 client won't work with a 3.2 server. Instead, just leave it and have it call your new method with isCascade=false: {code} diff --git a/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataProtocol.java b/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataProtocol.java index a303b95..7bd2705 100644 --- a/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataProtocol.java +++ b/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataProtocol.java @@ -232,7 +232,7 @@ public void write(DataOutput output) throws IOException { * @return MetaDataMutationResult * @throws IOException */ -MetaDataMutationResult dropTable(ListMutation tableMetadata, String tableType) throws IOException; +MetaDataMutationResult dropTable(ListMutation tableMetadata, String tableType, boolean isCascade) throws IOException; {code} - Make sure to re-base your local repo, as I checked in the fix for SuffixFilter and your patches are going to conflict with that change. Support CASCADE option on DROP TABLE that drops all VIEWs - Key: PHOENIX-1098 URL: https://issues.apache.org/jira/browse/PHOENIX-1098 Project: Phoenix Issue Type: Bug Affects Versions: 4.1 Reporter: James Taylor Assignee: Jan Fernando Fix For: 4.1 Attachments: PHOENIX-1098-3.1.patch, PHOENIX-1098-4.1.patch, PHOENIX-1098-master.patch It's inconvenient to have to manually drop all of the views of a multi-tenant table before being able to drop the table. We should support a CASCADE option on DROP TABLE which automatically does this, like this: DROP TABLE foo CASCADE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-852) Optimize child/parent foreign key joins
[ https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116930#comment-14116930 ] James Taylor commented on PHOENIX-852: -- Looks good, [~maryannxue]. See my comment on PHOENIX-1220 - it'd be good to adjust your getSampleValue call to take that into account prior to checking this in. +1 after that. Optimize child/parent foreign key joins --- Key: PHOENIX-852 URL: https://issues.apache.org/jira/browse/PHOENIX-852 Project: Phoenix Issue Type: Improvement Reporter: James Taylor Assignee: Maryann Xue Attachments: 852-2.patch, 852-3.patch, 852.patch, PHOENIX-852.patch Often times a join will occur from a child to a parent. Our current algorithm would do a full scan of one side or the other. We can do much better than that if the HashCache contains the PK (or even part of the PK) from the table being joined to. In these cases, we should drive the second scan through a skip scan on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1224) Dead loop in hbase scan when hint SKIP_SCAN is set and there is partial key match in RowValueConstructor
[ https://issues.apache.org/jira/browse/PHOENIX-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14116932#comment-14116932 ] James Taylor commented on PHOENIX-1224: --- Would be good if this was fixed in conjunction with supporting a skip scan over a partial rvc match. Dead loop in hbase scan when hint SKIP_SCAN is set and there is partial key match in RowValueConstructor Key: PHOENIX-1224 URL: https://issues.apache.org/jira/browse/PHOENIX-1224 Project: Phoenix Issue Type: Bug Affects Versions: 3.0.0, 4.0.0, 5.0.0 Reporter: Maryann Xue Assignee: Kyle Buzsaki Original Estimate: 48h Remaining Estimate: 48h The below test will end up in dead loop in hbase scan. {code} @Test public void testForceSkipScan() throws Exception { String tempTableWithCompositePK = TEMP_TABLE_COMPOSITE_PK; Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES); Connection conn = DriverManager.getConnection(getUrl(), props); try { conn.createStatement().execute(CREATE TABLE + tempTableWithCompositePK +(col0 INTEGER NOT NULL, + col1 INTEGER NOT NULL, + col2 INTEGER NOT NULL, + col3 INTEGER +CONSTRAINT pk PRIMARY KEY (col0, col1, col2)) +SALT_BUCKETS=4); PreparedStatement upsertStmt = conn.prepareStatement( upsert into + tempTableWithCompositePK + (col0, col1, col2, col3) + values (?, ?, ?, ?)); for (int i = 0; i 3; i++) { upsertStmt.setInt(1, i + 1); upsertStmt.setInt(2, i + 2); upsertStmt.setInt(3, i + 3); upsertStmt.setInt(4, i + 5); upsertStmt.execute(); } conn.commit(); String query = SELECT /*+ SKIP_SCAN*/ * FROM + tempTableWithCompositePK + WHERE (col0, col1) in ((2, 3), (3, 4), (4, 5)); PreparedStatement statement = conn.prepareStatement(query); ResultSet rs = statement.executeQuery(); assertTrue(rs.next()); assertEquals(rs.getInt(1), 2); assertEquals(rs.getInt(2), 3); assertEquals(rs.getInt(3), 4); assertEquals(rs.getInt(4), 6); assertTrue(rs.next()); assertEquals(rs.getInt(1), 3); assertEquals(rs.getInt(2), 4); assertEquals(rs.getInt(3), 5); assertEquals(rs.getInt(4), 7); assertFalse(rs.next()); } finally { conn.close(); } } {code} The dead-loop thread: {panel} defaultRpcServer.handler=4,queue=0,port=58945 daemon prio=10 tid=0x7fe4d408c000 nid=0x7bba runnable [0x7fe4c10cf000] java.lang.Thread.State: RUNNABLE at java.util.ArrayList.size(ArrayList.java:177) at java.util.AbstractList$Itr.hasNext(AbstractList.java:339) at org.apache.hadoop.hbase.filter.FilterList.filterAllRemaining(FilterList.java:199) at org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:263) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:469) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3937) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:4017) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3885) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3876) at org.apache.phoenix.coprocessor.ScanRegionObserver$2.nextRaw(ScanRegionObserver.java:366) at org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:76) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3157) - locked 0x000778d5dbd8 (a org.apache.phoenix.coprocessor.BaseScannerRegionObserver$1) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) at
[jira] [Created] (PHOENIX-1226) Exception in Tracing
Dan Di Spaltro created PHOENIX-1226: --- Summary: Exception in Tracing Key: PHOENIX-1226 URL: https://issues.apache.org/jira/browse/PHOENIX-1226 Project: Phoenix Issue Type: Bug Affects Versions: 4.1 Environment: 0.98.5 hbase, 4.1.0 phoenix Reporter: Dan Di Spaltro I was exposed to an exception in the tracing code, during my test setup of Phoenix in the following code: {code} 58062 [defaultRpcServer.handler=2,queue=0,port=53950] WARN org.apache.hadoop.ipc.RpcServer - defaultRpcServer.handler=2,queue=0,port=53950: caught: java.lang.IllegalArgumentException: offset (0) + length (4) exceed the capacity of the array: 3 at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:600) at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:749) at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:725) at org.apache.phoenix.trace.TracingCompat.readAnnotation(TracingCompat.java:56) at org.apache.phoenix.trace.TraceMetricSource.receiveSpan(TraceMetricSource.java:121) at org.cloudera.htrace.Tracer.deliver(Tracer.java:81) at org.cloudera.htrace.impl.MilliSpan.stop(MilliSpan.java:70) at org.cloudera.htrace.TraceScope.close(TraceScope.java:70) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:106) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) at java.lang.Thread.run(Thread.java:744) {code} It is related to the following line of code where we interpret all KV annotation values as byte-wise integers here: https://github.com/apache/phoenix/blob/v4.1.0/phoenix-hadoop-compat/src/main/java/org/apache/phoenix/trace/TracingCompat.java#L56 Here is where HBase is adding a non-integer KV annotation: https://github.com/apache/hbase/blob/0.98.5/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/RequestContext.java#L105 The fix should be simple, but I am not aware of all the related issues in changing this. cc [~jesse_yates], [~samarth.j...@gmail.com], [~giacomotaylor] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
[ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117013#comment-14117013 ] Hudson commented on PHOENIX-1203: - SUCCESS: Integrated in Phoenix | 3.0 | Hadoop1 #201 (See [https://builds.apache.org/job/Phoenix-3.0-hadoop1/201/]) PHOENIX-1203 Uable to work for count (distinct col) queries via phoenix table with secondary indexes. (anoopsamjohn: rev b5cbb79b6cceecfe0ac012c8ddd19cef6916734e) * phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeFactory.java * phoenix-core/src/it/java/org/apache/phoenix/end2end/DistinctCountIT.java Uable to work for count (distinct col) queries via phoenix table with secondary indexes --- Key: PHOENIX-1203 URL: https://issues.apache.org/jira/browse/PHOENIX-1203 Project: Phoenix Issue Type: Bug Affects Versions: 3.0.0 Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 Reporter: Sun Fulin Assignee: Anoop Sam John Labels: distinct, secondaryIndex, test Fix For: 5.0.0, 4.2, 3.2 Attachments: PHOENIX-1203.patch I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project: 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; +-+ | COUNT(IMSI) | +-+ | 2322| +-+ 1 row selected (70.572 seconds) As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. Here are some considerations: 1. count (distinct col) query over index table did not work as expectation. 2. only distinct query over index table works fine. 3. If the phoenix version got some wrong configuration, correct me. Thanks and Best Regards, Sun --- Hi Sun, Thanks for the detailed description. Yes, your syntax is correct, and it's definitely true that the count distinct query should return the same result with and without the index. Would you mind trying this on our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to file a JIRA? One thing that make make it easier for your testing: do you know about our NO_INDEX hint which forces the query *not* to use an index, like this: select /*+ NO_INDEX */ ... Another question too. What about this query with and with/out the index: select count(*) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 group by t.imsi; Thanks, James On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn su...@certusnet.com.cn wrote: Hi James, Recently I got trouble while trying to conduct some query performance test in my phoenix tables with secondary indexes. I created a table called RANAPSIGNAL for my projects in phoenix via sqlline client and load data into the table. Then I create an index on the specific column PKTTIME for the table RANAPSIGNAL while including other more columns for adjusting my index query, like the following DDL: create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); The index creation worked successfully without any errors. So, when I am trying to conduct such query as: select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; Without secondary indexes, the final result got 536 distinct imsi, wihch is the right distinct count results. However, after I create the above secondary index PKT_IDX and reconducting the above count (distinct imsi) query, I got 2322 imsi rows which obviously are not the expected distinct counts results. I used the explain grammar to observe the scan of the above select query and found that it definitely scaned over the index table PKT_IDX. I then tried to conduct the following query with no count function: select distinct t.imsi from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; And the
[GitHub] phoenix pull request: Phoenix 180
Github user ramkrish86 commented on the pull request: https://github.com/apache/phoenix/pull/8#issuecomment-54014739 @JamesRTaylor - Thanks for the review. I will do my level best to complete all the comments. Some of them have now become design changes because now we will be collecting based on CF rather than region name. So the guide posts will be a map with key as CF and byte[] (representing the guideposts). It was a long weekend here and so could not take this up. Will post an updated patch ASAP may be in a day or two. After that will do performance testing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] phoenix pull request: Phoenix 180
Github user JamesRTaylor commented on the pull request: https://github.com/apache/phoenix/pull/8#issuecomment-54014952 Sounds good, Ram. Thanks for all your effort on this one. Looking forward to see the perf numbers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---