[jira] [Commented] (PHOENIX-1550) Freeze and thaw Phoenix tables
[ https://issues.apache.org/jira/browse/PHOENIX-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379397#comment-14379397 ] Sun Fulin commented on PHOENIX-1550: [~maghamraviki...@gmail.com] Any progress ? Freeze and thaw Phoenix tables -- Key: PHOENIX-1550 URL: https://issues.apache.org/jira/browse/PHOENIX-1550 Project: Phoenix Issue Type: New Feature Reporter: Andrew Purtell HBase snapshots provide a way to freeze the state of a table and then later restore that state from the snapshot or clone it into a new table. This can be very convenient for a wide range of use cases. However, we can't apply this technique directly to Phoenix tables because Phoenix generates and uses more state in more locations than a basic HBase table - a Phoenix table may have one or more secondary indexes, global or local, schema in the system catalog, sequence state, statistics. This issue is about considering a freeze and thaw feature for Phoenix tables, perhaps based on HBase snapshots. A freeze operation would create a consistent point in time snapshot of a Phoenix table and all related index data and metadata. A thaw operation would restore frozen state either by overwriting the table or by cloning the state into a new table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-1550) Freeze and thaw Phoenix tables
[ https://issues.apache.org/jira/browse/PHOENIX-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362752#comment-14362752 ] Sun Fulin edited comment on PHOENIX-1550 at 3/16/15 6:15 AM: - [~jamestaylor] - The priority lies that snapshot scan can improve aggregation work performance greatly as provided through HBASE-8369. We had integrated spark with hbase and had tested a lot for hbase scan VS snapshot, which had got results for that snapshot upgrade performance a lot in use cases of some hourly, daily, weekly and monthly aggregation work. We would prefer more like [~maghamraviki...@gmail.com] designed for mapreduce support, e.g. issuing more like {{PhoenixSnapshotInputFormat}}. Thus we believe can achieve our goal of resolving some kind of phoenix full table scan performance issues. Sorry for some misunderstanding about my comments. was (Author: sunfl): [~jamestaylor] - The priority lies that snapshot scan can improve aggregation work performance greatly as provided through HBASE-8369. We had integrated spark with hbase and had tested a lot for hbase scan VS snapshot, which had got results for that snapshot upgrade performance a lot in use cases of some hourly, daily, weekly and monthly aggregation work. We would prefer more like [~maghamraviki...@gmail.com] designed for mapreduce support, e.g. issuing more like PhoenixSnapshotInputFormat. Thus we believe can achieve our goal of resolving some kind of phoenix full table scan performance issues. Sorry for some misunderstanding about my comments. Freeze and thaw Phoenix tables -- Key: PHOENIX-1550 URL: https://issues.apache.org/jira/browse/PHOENIX-1550 Project: Phoenix Issue Type: New Feature Reporter: Andrew Purtell HBase snapshots provide a way to freeze the state of a table and then later restore that state from the snapshot or clone it into a new table. This can be very convenient for a wide range of use cases. However, we can't apply this technique directly to Phoenix tables because Phoenix generates and uses more state in more locations than a basic HBase table - a Phoenix table may have one or more secondary indexes, global or local, schema in the system catalog, sequence state, statistics. This issue is about considering a freeze and thaw feature for Phoenix tables, perhaps based on HBase snapshots. A freeze operation would create a consistent point in time snapshot of a Phoenix table and all related index data and metadata. A thaw operation would restore frozen state either by overwriting the table or by cloning the state into a new table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1550) Freeze and thaw Phoenix tables
[ https://issues.apache.org/jira/browse/PHOENIX-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362752#comment-14362752 ] Sun Fulin commented on PHOENIX-1550: [~jamestaylor] - The priority lies that snapshot scan can improve aggregation work performance greatly as provided through HBASE-8369. We had integrated spark with hbase and had tested a lot for hbase scan VS snapshot, which had got results for that snapshot upgrade performance a lot in use cases of some hourly, daily, weekly and monthly aggregation work. We would prefer more like [~maghamraviki...@gmail.com] designed for mapreduce support, e.g. issuing more like PhoenixSnapshotInputFormat. Thus we believe can achieve our goal of resolving some kind of phoenix full table scan performance issues. Sorry for some misunderstanding about my comments. Freeze and thaw Phoenix tables -- Key: PHOENIX-1550 URL: https://issues.apache.org/jira/browse/PHOENIX-1550 Project: Phoenix Issue Type: New Feature Reporter: Andrew Purtell HBase snapshots provide a way to freeze the state of a table and then later restore that state from the snapshot or clone it into a new table. This can be very convenient for a wide range of use cases. However, we can't apply this technique directly to Phoenix tables because Phoenix generates and uses more state in more locations than a basic HBase table - a Phoenix table may have one or more secondary indexes, global or local, schema in the system catalog, sequence state, statistics. This issue is about considering a freeze and thaw feature for Phoenix tables, perhaps based on HBase snapshots. A freeze operation would create a consistent point in time snapshot of a Phoenix table and all related index data and metadata. A thaw operation would restore frozen state either by overwriting the table or by cloning the state into a new table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1248) CsvBulkLoadTool is failing with IAE when local index specified for --index-table parameter
[ https://issues.apache.org/jira/browse/PHOENIX-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341173#comment-14341173 ] Sun Fulin commented on PHOENIX-1248: [~apurtell] Would these changes affect a lot for Phoenix 4.3 release to install upon 0.98.4-0.98.6 hbase cluster? Cause I got 3 major obstacles when trying to deploy and utilize the latest 4.3 release. 1. Phoenix 4.3 release shall not be compiled if we specify Dhbase.version=0.98.6-cdh5.3.0 2. CsvBulkloadTool would not function well under the current hbase 0.98.6-cdh5.3.0 version, the full stack trace can be referenced for above commemt of [~nagav] 3. Local index not work with current hbase 0.98.6 version. [~jamestaylor] Can the above incompatibility issues be smoothly resolved without upgrading our current hbase version? Also, I deployed the latest hbase 1.0.0 version and install phoenix with that, problems occure when starting the hbase cluster. Exception goes to : The interface of EnvironmentEdgeManager changed between 0.98.9 and 1.0.0 hbase version. Do you think to significantly specify these incompatibility issues offically or there can be other solution to resolve these? 2015-02-27 16:52:27,096 FATAL [hmaster:16020.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.lang.NoSuchMethodError: org.apache.hadoop.hbase.util.EnvironmentEdgeManager.currentTimeMillis()J at org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer.clinit(IndexLoadBalancer.java:84) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929) at org.apache.hadoop.hbase.master.balancer.LoadBalancerFactory.getLoadBalancer(LoadBalancerFactory.java:40) at org.apache.hadoop.hbase.master.HMaster.initializeZKBasedSystemTrackers(HMaster.java:519) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:613) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:165) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1425) at java.lang.Thread.run(Thread.java:744) CsvBulkLoadTool is failing with IAE when local index specified for --index-table parameter -- Key: PHOENIX-1248 URL: https://issues.apache.org/jira/browse/PHOENIX-1248 Project: Phoenix Issue Type: Bug Affects Versions: 4.1 Reporter: rajeshbabu Assignee: Gabriel Reid Fix For: 5.0.0, 4.3 Attachments: PHOENIX-1248.patch, PHOENIX-1248b.patch, PHOENIX-1248c.patch, p1248_0.log {code} 14/09/11 22:38:11 INFO mapreduce.HFileOutputFormat2: Writing partition information to /tmp/partitions_81e2ee23-836c-4eae-b413-ac6bd6a96623 14/09/11 22:38:11 ERROR mapreduce.CsvBulkLoadTool: Import job on table=TEST_IDX failed due to exception:java.lang.IllegalArgumentException: No regions passed 14/09/11 22:38:11 INFO client.HConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x148123145090287 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1248) CsvBulkLoadTool is failing with IAE when local index specified for --index-table parameter
[ https://issues.apache.org/jira/browse/PHOENIX-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341228#comment-14341228 ] Sun Fulin commented on PHOENIX-1248: [~apurtell] Thanks for the guidance. bq. What's the compile issue? MVN Command: {{mvn clean package -DskipTests -Dhadoop.version=2.5.0-cdh5.3.0 -Dhbase.version=0.98.6-cdh5.3.0}} Error message: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.0:compile (default-compile) on project phoenix-core: Compilation failure [ERROR] /home/soft/phoenix-4.3.0-src/phoenix-core/src/main/java/org/apache/hadoop/hbase/regionserver/LocalIndexSplitter.java:[89,23] cannot find symbol [ERROR] symbol: variable useZKForAssignment [ERROR] location: variable st of type org.apache.hadoop.hbase.regionserver.SplitTransaction bq. You're going to want to upgrade to 0.98.10.1 if using local mutable indexes, see PHOENIX-1634 Got that, shall make a try. CsvBulkLoadTool is failing with IAE when local index specified for --index-table parameter -- Key: PHOENIX-1248 URL: https://issues.apache.org/jira/browse/PHOENIX-1248 Project: Phoenix Issue Type: Bug Affects Versions: 4.1 Reporter: rajeshbabu Assignee: Gabriel Reid Fix For: 5.0.0, 4.3 Attachments: PHOENIX-1248.patch, PHOENIX-1248b.patch, PHOENIX-1248c.patch, p1248_0.log {code} 14/09/11 22:38:11 INFO mapreduce.HFileOutputFormat2: Writing partition information to /tmp/partitions_81e2ee23-836c-4eae-b413-ac6bd6a96623 14/09/11 22:38:11 ERROR mapreduce.CsvBulkLoadTool: Import job on table=TEST_IDX failed due to exception:java.lang.IllegalArgumentException: No regions passed 14/09/11 22:38:11 INFO client.HConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x148123145090287 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1550) Freeze and thaw Phoenix tables
[ https://issues.apache.org/jira/browse/PHOENIX-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315876#comment-14315876 ] Sun Fulin commented on PHOENIX-1550: Any progress on this issue ? [~apurtell] Freeze and thaw Phoenix tables -- Key: PHOENIX-1550 URL: https://issues.apache.org/jira/browse/PHOENIX-1550 Project: Phoenix Issue Type: New Feature Reporter: Andrew Purtell HBase snapshots provide a way to freeze the state of a table and then later restore that state from the snapshot or clone it into a new table. This can be very convenient for a wide range of use cases. However, we can't apply this technique directly to Phoenix tables because Phoenix generates and uses more state in more locations than a basic HBase table - a Phoenix table may have one or more secondary indexes, global or local, schema in the system catalog, sequence state, statistics. This issue is about considering a freeze and thaw feature for Phoenix tables, perhaps based on HBase snapshots. A freeze operation would create a consistent point in time snapshot of a Phoenix table and all related index data and metadata. A thaw operation would restore frozen state either by overwriting the table or by cloning the state into a new table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-1484) Index creation failed due to specifying DEFAULT_COLUMN_FAMILY option
Sun Fulin created PHOENIX-1484: -- Summary: Index creation failed due to specifying DEFAULT_COLUMN_FAMILY option Key: PHOENIX-1484 URL: https://issues.apache.org/jira/browse/PHOENIX-1484 Project: Phoenix Issue Type: Bug Affects Versions: 4.2 Environment: HBase 0.98.6-cdh5.2.0 Reporter: Sun Fulin I tryied again with phoenix 4.1 version and still got the exception. Following is my statements: 1 create table example (id integer not null,fn varchar,ln varchar constraint pk primary key(id)) DEFAULT_COLUMN_FAMILY='F'; Notice that I would like to create default mutable index for the table. 2 create local index my_idx on example (fn) DEFAULT_COLUMN_FAMILY='F'; Here no matter with or without specifying DEFAULT_COLUMN_FAMILY options we would always got the following error: Error: ERROR 1009 (42L02): Properties may not be defined for a view. (state=42L02,code=1009) 3 I also tried to create local index with immutable rows setting true, however, I still got the above error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1249) Support local immutable index
[ https://issues.apache.org/jira/browse/PHOENIX-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163150#comment-14163150 ] Sun Fulin commented on PHOENIX-1249: [~jamestaylor] Yes. Sorry for not giving response in time. The patch indeed improves our performance issues. Thanks for great work, [~rajesh23]. Support local immutable index -- Key: PHOENIX-1249 URL: https://issues.apache.org/jira/browse/PHOENIX-1249 Project: Phoenix Issue Type: Sub-task Affects Versions: 4.1 Environment: Hbase 0.98.4-Hadoop2 Phoenix 4.1 Reporter: Sun Fulin Assignee: rajeshbabu Fix For: 5.0.0, 4.2 Attachments: PHOENIX-1249.patch, PHOENIX-1249_v2.patch, PHOENIX-1249_v3.patch Currently local indexing are forced created as default mutable index which requires index maintenance and server side processing, while immutable indexes are appropriate for write-once/append-only use case that may fit for many use cases. Hope some work for capability to create local index as immutable index. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1248) CsvBulkLoadTool is failing with IAE when local index specified for --index-table parameter
[ https://issues.apache.org/jira/browse/PHOENIX-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131148#comment-14131148 ] Sun Fulin commented on PHOENIX-1248: [~rajesh23] Yes. Rajeshbabu said that CSVBulkloader via mapreduce has a little problem when dealing with local index related data bulkloading. CsvBulkLoadTool is failing with IAE when local index specified for --index-table parameter -- Key: PHOENIX-1248 URL: https://issues.apache.org/jira/browse/PHOENIX-1248 Project: Phoenix Issue Type: Bug Affects Versions: 4.1 Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 5.0.0, 4.2 {code} 14/09/11 22:38:11 INFO mapreduce.HFileOutputFormat2: Writing partition information to /tmp/partitions_81e2ee23-836c-4eae-b413-ac6bd6a96623 14/09/11 22:38:11 ERROR mapreduce.CsvBulkLoadTool: Import job on table=TEST_IDX failed due to exception:java.lang.IllegalArgumentException: No regions passed 14/09/11 22:38:11 INFO client.HConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x148123145090287 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
[ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14114890#comment-14114890 ] Sun Fulin commented on PHOENIX-1203: [~anoop.hbase] Any progress about the test? Uable to work for count (distinct col) queries via phoenix table with secondary indexes --- Key: PHOENIX-1203 URL: https://issues.apache.org/jira/browse/PHOENIX-1203 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0, 4.1 Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 Reporter: Sun Fulin Assignee: Anoop Sam John Labels: distinct, secondaryIndex, test I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project: 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; +-+ | COUNT(IMSI) | +-+ | 2322| +-+ 1 row selected (70.572 seconds) As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. Here are some considerations: 1. count (distinct col) query over index table did not work as expectation. 2. only distinct query over index table works fine. 3. If the phoenix version got some wrong configuration, correct me. Thanks and Best Regards, Sun --- Hi Sun, Thanks for the detailed description. Yes, your syntax is correct, and it's definitely true that the count distinct query should return the same result with and without the index. Would you mind trying this on our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to file a JIRA? One thing that make make it easier for your testing: do you know about our NO_INDEX hint which forces the query *not* to use an index, like this: select /*+ NO_INDEX */ ... Another question too. What about this query with and with/out the index: select count(*) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 group by t.imsi; Thanks, James On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn su...@certusnet.com.cn wrote: Hi James, Recently I got trouble while trying to conduct some query performance test in my phoenix tables with secondary indexes. I created a table called RANAPSIGNAL for my projects in phoenix via sqlline client and load data into the table. Then I create an index on the specific column PKTTIME for the table RANAPSIGNAL while including other more columns for adjusting my index query, like the following DDL: create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); The index creation worked successfully without any errors. So, when I am trying to conduct such query as: select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; Without secondary indexes, the final result got 536 distinct imsi, wihch is the right distinct count results. However, after I create the above secondary index PKT_IDX and reconducting the above count (distinct imsi) query, I got 2322 imsi rows which obviously are not the expected distinct counts results. I used the explain grammar to observe the scan of the above select query and found that it definitely scaned over the index table PKT_IDX. I then tried to conduct the following query with no count function: select distinct t.imsi from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; And the result is right 536 distinct imsi over scanning the index table. By the way, imsi is one of the primary key when creating the table RANAPSIGNAL. Here are several considerations for my trouble and practice: 1. Did you guys ever practice such count (distinct) queries over phoenix table via secondary index? 2. I am not sure whether this problem was due to the index table, but my practice may assume that conclusion. 3. Corrects my if
[jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
[ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110506#comment-14110506 ] Sun Fulin commented on PHOENIX-1203: James Taylor - I rebuilt the latest phoenix version 4.1 RC from here: https://dist.apache.org/repos/dist/dev/phoenix/phoenix-4.1.0-rc1/src/ And confirm queries with another unit test: DDL: create table example (my_pk integer not null, first_name varchar(20), last_name varchar(20) constraint pk PRIMARY KEY (my_pk)); create index my_idx on example (first_name); DML: Upsert into example values (9 records with several same first_name groups) The following queries: select /+NO_INDEX/ count (distinct first_name) from example; got finally 5 rows. The following queries: select /+INDEX(example my_idx)/ count (distinct first_name) from example; got all 9 records. So James, the count (distinct col) query over index table still not get expecting results even in the latest release. Hoping Anoop Sam John can explain this and get some available report. From: James Taylor (JIRA) Date: 2014-08-26 02:52 To: sunfl Subject: [jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes [ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109498#comment-14109498 ] James Taylor commented on PHOENIX-1203: --- [~sunfl] - can you confirm that when you tried with with the latest 4.1 RC that you replaced *both* the client and the *server* jar? If the problem is what I suspect it is, this should have been fixed. -- This message was sent by Atlassian JIRA (v6.2#6252) Uable to work for count (distinct col) queries via phoenix table with secondary indexes --- Key: PHOENIX-1203 URL: https://issues.apache.org/jira/browse/PHOENIX-1203 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0, 4.1 Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 Reporter: Sun Fulin Assignee: Anoop Sam John Labels: distinct, secondaryIndex, test I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project: 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; +-+ | COUNT(IMSI) | +-+ | 2322| +-+ 1 row selected (70.572 seconds) As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. Here are some considerations: 1. count (distinct col) query over index table did not work as expectation. 2. only distinct query over index table works fine. 3. If the phoenix version got some wrong configuration, correct me. Thanks and Best Regards, Sun --- Hi Sun, Thanks for the detailed description. Yes, your syntax is correct, and it's definitely true that the count distinct query should return the same result with and without the index. Would you mind trying this on our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to file a JIRA? One thing that make make it easier for your testing: do you know about our NO_INDEX hint which forces the query *not* to use an index, like this: select /*+ NO_INDEX */ ... Another question too. What about this query with and with/out the index: select count(*) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 group by t.imsi; Thanks, James On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn su...@certusnet.com.cn wrote: Hi James, Recently I got trouble while trying to conduct some query performance test in my phoenix tables with secondary indexes. I created a table called RANAPSIGNAL for my projects in phoenix via sqlline client and load data into the table. Then I create an index on the specific column PKTTIME for the table RANAPSIGNAL while including other more columns for adjusting my index query, like the following DDL: create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); The index
[jira] [Issue Comment Deleted] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
[ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sun Fulin updated PHOENIX-1203: --- Comment: was deleted (was: James Taylor - I rebuilt the latest phoenix version 4.1 RC from here: https://dist.apache.org/repos/dist/dev/phoenix/phoenix-4.1.0-rc1/src/ And confirm queries with another unit test: DDL: create table example (my_pk integer not null, first_name varchar(20), last_name varchar(20) constraint pk PRIMARY KEY (my_pk)); create index my_idx on example (first_name); DML: Upsert into example values (9 records with several same first_name groups) The following queries: select /+NO_INDEX/ count (distinct first_name) from example; got finally 5 rows. The following queries: select /+INDEX(example my_idx)/ count (distinct first_name) from example; got all 9 records. So James, the count (distinct col) query over index table still not get expecting results even in the latest release. Hoping Anoop Sam John can explain this and get some available report. From: James Taylor (JIRA) Date: 2014-08-26 02:52 To: sunfl Subject: [jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes [ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109498#comment-14109498 ] James Taylor commented on PHOENIX-1203: --- [~sunfl] - can you confirm that when you tried with with the latest 4.1 RC that you replaced *both* the client and the *server* jar? If the problem is what I suspect it is, this should have been fixed. -- This message was sent by Atlassian JIRA (v6.2#6252) ) Uable to work for count (distinct col) queries via phoenix table with secondary indexes --- Key: PHOENIX-1203 URL: https://issues.apache.org/jira/browse/PHOENIX-1203 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0, 4.1 Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 Reporter: Sun Fulin Assignee: Anoop Sam John Labels: distinct, secondaryIndex, test I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project: 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; +-+ | COUNT(IMSI) | +-+ | 2322| +-+ 1 row selected (70.572 seconds) As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. Here are some considerations: 1. count (distinct col) query over index table did not work as expectation. 2. only distinct query over index table works fine. 3. If the phoenix version got some wrong configuration, correct me. Thanks and Best Regards, Sun --- Hi Sun, Thanks for the detailed description. Yes, your syntax is correct, and it's definitely true that the count distinct query should return the same result with and without the index. Would you mind trying this on our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to file a JIRA? One thing that make make it easier for your testing: do you know about our NO_INDEX hint which forces the query *not* to use an index, like this: select /*+ NO_INDEX */ ... Another question too. What about this query with and with/out the index: select count(*) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 group by t.imsi; Thanks, James On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn su...@certusnet.com.cn wrote: Hi James, Recently I got trouble while trying to conduct some query performance test in my phoenix tables with secondary indexes. I created a table called RANAPSIGNAL for my projects in phoenix via sqlline client and load data into the table. Then I create an index on the specific column PKTTIME for the table RANAPSIGNAL while including other more columns for adjusting my index query, like the following DDL: create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); The index creation worked successfully
[jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
[ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111750#comment-14111750 ] Sun Fulin commented on PHOENIX-1203: [~jamestaylor] Fine. Following provides the records in table example after upsert statements. If I run the query: SELECT /*+NO_INDEX*/ COUNT (DISTINCT FIRST_NAME) FROM EXAMPLE; I can correctly got the final 5 rows as 'ssd', 'fsd', 'fdfd', 'dsad' and 'dffds'. However, If I run the query: SELECT /*+INDEX(example my_idx)*/ COUNT (DISTINCT FIRST_NAME) FROM EXAMPLE; I still got 9 counts and the distinct never works. 0: jdbc:phoenix:zookeeper1 select * from example; ++++ | MY_PK| FIRST_NAME | LAST_NAME | ++++ | 1 | ssd| ad | | 2 | fsd| re | | 3 | dffds | gfgf | | 4 | dsad | gfrg | | 5 | fdfd | fdsf | | 6 | ssd| fdfgg | | 7 | fsd| re | | 8 | dsad | ere| | 9 | ssd| ere| ++++ Uable to work for count (distinct col) queries via phoenix table with secondary indexes --- Key: PHOENIX-1203 URL: https://issues.apache.org/jira/browse/PHOENIX-1203 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0, 4.1 Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 Reporter: Sun Fulin Assignee: Anoop Sam John Labels: distinct, secondaryIndex, test I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project: 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; +-+ | COUNT(IMSI) | +-+ | 2322| +-+ 1 row selected (70.572 seconds) As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. Here are some considerations: 1. count (distinct col) query over index table did not work as expectation. 2. only distinct query over index table works fine. 3. If the phoenix version got some wrong configuration, correct me. Thanks and Best Regards, Sun --- Hi Sun, Thanks for the detailed description. Yes, your syntax is correct, and it's definitely true that the count distinct query should return the same result with and without the index. Would you mind trying this on our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to file a JIRA? One thing that make make it easier for your testing: do you know about our NO_INDEX hint which forces the query *not* to use an index, like this: select /*+ NO_INDEX */ ... Another question too. What about this query with and with/out the index: select count(*) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 group by t.imsi; Thanks, James On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn su...@certusnet.com.cn wrote: Hi James, Recently I got trouble while trying to conduct some query performance test in my phoenix tables with secondary indexes. I created a table called RANAPSIGNAL for my projects in phoenix via sqlline client and load data into the table. Then I create an index on the specific column PKTTIME for the table RANAPSIGNAL while including other more columns for adjusting my index query, like the following DDL: create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); The index creation worked successfully without any errors. So, when I am trying to conduct such query as: select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; Without secondary indexes, the final result got 536 distinct imsi, wihch is the right distinct count results. However, after I create the above secondary index PKT_IDX and reconducting the above count (distinct imsi) query, I got 2322 imsi rows which
[jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
[ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14111863#comment-14111863 ] Sun Fulin commented on PHOENIX-1203: [~anoop.hbase] Do you have available phoenix cluster and sqlline client now? Cause I did not run queries from java client. Mind if you just test the above data records or even just one table with two distinct records? I suspect the count distinct query would not function correctly as long as there was a secondary index. Uable to work for count (distinct col) queries via phoenix table with secondary indexes --- Key: PHOENIX-1203 URL: https://issues.apache.org/jira/browse/PHOENIX-1203 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0, 4.1 Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 Reporter: Sun Fulin Assignee: Anoop Sam John Labels: distinct, secondaryIndex, test I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project: 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; +-+ | COUNT(IMSI) | +-+ | 2322| +-+ 1 row selected (70.572 seconds) As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. Here are some considerations: 1. count (distinct col) query over index table did not work as expectation. 2. only distinct query over index table works fine. 3. If the phoenix version got some wrong configuration, correct me. Thanks and Best Regards, Sun --- Hi Sun, Thanks for the detailed description. Yes, your syntax is correct, and it's definitely true that the count distinct query should return the same result with and without the index. Would you mind trying this on our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to file a JIRA? One thing that make make it easier for your testing: do you know about our NO_INDEX hint which forces the query *not* to use an index, like this: select /*+ NO_INDEX */ ... Another question too. What about this query with and with/out the index: select count(*) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 group by t.imsi; Thanks, James On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn su...@certusnet.com.cn wrote: Hi James, Recently I got trouble while trying to conduct some query performance test in my phoenix tables with secondary indexes. I created a table called RANAPSIGNAL for my projects in phoenix via sqlline client and load data into the table. Then I create an index on the specific column PKTTIME for the table RANAPSIGNAL while including other more columns for adjusting my index query, like the following DDL: create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); The index creation worked successfully without any errors. So, when I am trying to conduct such query as: select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; Without secondary indexes, the final result got 536 distinct imsi, wihch is the right distinct count results. However, after I create the above secondary index PKT_IDX and reconducting the above count (distinct imsi) query, I got 2322 imsi rows which obviously are not the expected distinct counts results. I used the explain grammar to observe the scan of the above select query and found that it definitely scaned over the index table PKT_IDX. I then tried to conduct the following query with no count function: select distinct t.imsi from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; And the result is right 536 distinct imsi over scanning the index table. By the way, imsi is one of the primary key when creating the table RANAPSIGNAL. Here are several
[jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
[ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108861#comment-14108861 ] Sun Fulin commented on PHOENIX-1203: [~anoop.hbase] Yes. I conduct several queries about count (distinct col) over index table, which show - COUNT(IMSI) - instead of - DISTINCT_COUNT(IMSI) - That means the query did not function well. Uable to work for count (distinct col) queries via phoenix table with secondary indexes --- Key: PHOENIX-1203 URL: https://issues.apache.org/jira/browse/PHOENIX-1203 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0, 4.1 Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 Reporter: Sun Fulin Assignee: Anoop Sam John Labels: distinct, secondaryIndex, test I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project: 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; +-+ | COUNT(IMSI) | +-+ | 2322| +-+ 1 row selected (70.572 seconds) As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. Here are some considerations: 1. count (distinct col) query over index table did not work as expectation. 2. only distinct query over index table works fine. 3. If the phoenix version got some wrong configuration, correct me. Thanks and Best Regards, Sun --- Hi Sun, Thanks for the detailed description. Yes, your syntax is correct, and it's definitely true that the count distinct query should return the same result with and without the index. Would you mind trying this on our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to file a JIRA? One thing that make make it easier for your testing: do you know about our NO_INDEX hint which forces the query *not* to use an index, like this: select /*+ NO_INDEX */ ... Another question too. What about this query with and with/out the index: select count(*) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 group by t.imsi; Thanks, James On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn su...@certusnet.com.cn wrote: Hi James, Recently I got trouble while trying to conduct some query performance test in my phoenix tables with secondary indexes. I created a table called RANAPSIGNAL for my projects in phoenix via sqlline client and load data into the table. Then I create an index on the specific column PKTTIME for the table RANAPSIGNAL while including other more columns for adjusting my index query, like the following DDL: create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); The index creation worked successfully without any errors. So, when I am trying to conduct such query as: select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; Without secondary indexes, the final result got 536 distinct imsi, wihch is the right distinct count results. However, after I create the above secondary index PKT_IDX and reconducting the above count (distinct imsi) query, I got 2322 imsi rows which obviously are not the expected distinct counts results. I used the explain grammar to observe the scan of the above select query and found that it definitely scaned over the index table PKT_IDX. I then tried to conduct the following query with no count function: select distinct t.imsi from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; And the result is right 536 distinct imsi over scanning the index table. By the way, imsi is one of the primary key when creating the table RANAPSIGNAL. Here are several considerations for my trouble and practice: 1. Did you guys ever practice such
[jira] [Created] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
Sun Fulin created PHOENIX-1203: -- Summary: Uable to work for count (distinct col) queries via phoenix table with secondary indexes Key: PHOENIX-1203 URL: https://issues.apache.org/jira/browse/PHOENIX-1203 Project: Phoenix Issue Type: Bug Affects Versions: 4.0.0, 4.1 Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 Reporter: Sun Fulin I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project: 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; +-+ | COUNT(IMSI) | +-+ | 2322| +-+ 1 row selected (70.572 seconds) As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. Here are some considerations: 1. count (distinct col) query over index table did not work as expectation. 2. only distinct query over index table works fine. 3. If the phoenix version got some wrong configuration, correct me. Thanks and Best Regards, Sun --- Hi Sun, Thanks for the detailed description. Yes, your syntax is correct, and it's definitely true that the count distinct query should return the same result with and without the index. Would you mind trying this on our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to file a JIRA? One thing that make make it easier for your testing: do you know about our NO_INDEX hint which forces the query *not* to use an index, like this: select /*+ NO_INDEX */ ... Another question too. What about this query with and with/out the index: select count(*) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 group by t.imsi; Thanks, James On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn su...@certusnet.com.cn wrote: Hi James, Recently I got trouble while trying to conduct some query performance test in my phoenix tables with secondary indexes. I created a table called RANAPSIGNAL for my projects in phoenix via sqlline client and load data into the table. Then I create an index on the specific column PKTTIME for the table RANAPSIGNAL while including other more columns for adjusting my index query, like the following DDL: create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); The index creation worked successfully without any errors. So, when I am trying to conduct such query as: select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; Without secondary indexes, the final result got 536 distinct imsi, wihch is the right distinct count results. However, after I create the above secondary index PKT_IDX and reconducting the above count (distinct imsi) query, I got 2322 imsi rows which obviously are not the expected distinct counts results. I used the explain grammar to observe the scan of the above select query and found that it definitely scaned over the index table PKT_IDX. I then tried to conduct the following query with no count function: select distinct t.imsi from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; And the result is right 536 distinct imsi over scanning the index table. By the way, imsi is one of the primary key when creating the table RANAPSIGNAL. Here are several considerations for my trouble and practice: 1. Did you guys ever practice such count (distinct) queries over phoenix table via secondary index? 2. I am not sure whether this problem was due to the index table, but my practice may assume that conclusion. 3. Corrects my if I am wrong with my previous operations (index creation, DDL grammar, etc..) 4. Any available hints or reply are best appreciated. Thanks and Best Regards Sun -- This message was sent by Atlassian JIRA (v6.2#6252)