Insert using phoenix, get through Hbase
Hi, I created a table using phoenix. It has two field: integer and string. Integer field is primary key column. Then I inserted the two rows in his table. I am able to see the data by querying through sqllib.py shell(Using select *). I am also able to see the table and its data on hbase shell(using scan table). Now when I am trying to do Hbase get using java for a specif row, I am not able to fetch the data. In the code below 2 is the value in the primary key column. Can anyone suggest me, how should I proceed ? Get g = new Get(Bytes.toBytes(2)); Re
Re: create a view on existing production table ?
Hi James, From you reply I understand that it is NOT possible to create such a view, because each family can have different number of columns, and it could be just one column per family for one PK, and hundreds of thousands for another PK. How can I possibly accommodate it in a view specification, if I need to explicitly define column by name ? Or I misunderstand something ? Thank you, Sergey From: James Taylor jamestay...@apache.orgmailto:jamestay...@apache.org Reply-To: user@phoenix.apache.orgmailto:user@phoenix.apache.org user@phoenix.apache.orgmailto:user@phoenix.apache.org Date: Friday, June 26, 2015 at 12:04 To: user@phoenix.apache.orgmailto:user@phoenix.apache.org user@phoenix.apache.orgmailto:user@phoenix.apache.org Subject: Re: create a view on existing production table ? Hi Sergey, Yes, you can create a Phoenix view over this HBase table, but you have to explicitly list columns by name (i.e. column qualifier) either at view creation time or at read time (using dynamic columns). Also, the row key must conform to what Phoenix expects if there are multiple columns in your PK. Thanks, James On Wednesday, June 24, 2015, Sergey Malov sma...@collective.commailto:sma...@collective.com wrote: My apologies if this info can be found somewhere, I don’t see it. I have a production table “profile” with the following schema : {NAME = 'edge', BLOOMFILTER = 'ROW', VERSIONS = '2', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'false', DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY', TTL = ' FOREVER', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} {NAME = 'export', BLOOMFILTER = 'ROW', VERSIONS = '2', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'false', DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY', TTL = 'FOREVER', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} {NAME = 'visitor', BLOOMFILTER = 'ROW', VERSIONS = '2', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'false', DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY', TTL = 'FOREVER', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} Each of these families can have variable number of column for a given key, with some having hundreds of thousands columns. Is it possible to create a Phoenix view for such a table ? Thank you, Sergey Malov
Re: create a view on existing production table ?
Hi Sergey, Yes, you can create a Phoenix view over this HBase table, but you have to explicitly list columns by name (i.e. column qualifier) either at view creation time or at read time (using dynamic columns). Also, the row key must conform to what Phoenix expects if there are multiple columns in your PK. Thanks, James On Wednesday, June 24, 2015, Sergey Malov sma...@collective.com wrote: My apologies if this info can be found somewhere, I don’t see it. I have a production table “profile” with the following schema : {NAME = 'edge', BLOOMFILTER = 'ROW', VERSIONS = '2', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'false', DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY', TTL = ' FOREVER', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} {NAME = 'export', BLOOMFILTER = 'ROW', VERSIONS = '2', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'false', DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY', TTL = 'FOREVER', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} {NAME = 'visitor', BLOOMFILTER = 'ROW', VERSIONS = '2', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'false', DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY', TTL = 'FOREVER', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} Each of these families can have variable number of column for a given key, with some having hundreds of thousands columns. Is it possible to create a Phoenix view for such a table ? Thank you, Sergey Malov
How to count table rows from Java?
I wrote a Java program that runs nightly and collects metrics about our hive tables. I would like to include HBase tables in this as well. Since select count(*) is slow and not recommended on Phoenix, what are my alternatives from Java? Is there a way to call org.apache.hadoop.hbase.mapreduce.RowCounter from java and get results in some kind of result set? Thanks for any info!
Re: How to count table rows from Java?
RowCounter is s mapreduce program. After the program completes execution of the job, it returns information about that job, including job counters. RowCounter includes its counts in the job counters, so they're easily accessed programmatically from the returned object. It's not a ResultSet, but it should work none the less. On Friday, June 26, 2015, Riesland, Zack zack.riesl...@sensus.com wrote: I wrote a Java program that runs nightly and collects metrics about our hive tables. I would like to include HBase tables in this as well. Since select count(*) is slow and not recommended on Phoenix, what are my alternatives from Java? Is there a way to call org.apache.hadoop.hbase.mapreduce.RowCounter from java and get results in some kind of result set? Thanks for any info!
Re: create a view on existing production table ?
Sergey, It is possible, but maybe in your case it's not feasible. Thanks, James On Friday, June 26, 2015, Sergey Malov sma...@collective.com wrote: Hi James, From you reply I understand that it is NOT possible to create such a view, because each family can have different number of columns, and it could be just one column per family for one PK, and hundreds of thousands for another PK. How can I possibly accommodate it in a view specification, if I need to explicitly define column by name ? Or I misunderstand something ? Thank you, Sergey From: James Taylor jamestay...@apache.org javascript:_e(%7B%7D,'cvml','jamestay...@apache.org'); Reply-To: user@phoenix.apache.org javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org'); user@phoenix.apache.org javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org'); Date: Friday, June 26, 2015 at 12:04 To: user@phoenix.apache.org javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org'); user@phoenix.apache.org javascript:_e(%7B%7D,'cvml','user@phoenix.apache.org'); Subject: Re: create a view on existing production table ? Hi Sergey, Yes, you can create a Phoenix view over this HBase table, but you have to explicitly list columns by name (i.e. column qualifier) either at view creation time or at read time (using dynamic columns). Also, the row key must conform to what Phoenix expects if there are multiple columns in your PK. Thanks, James On Wednesday, June 24, 2015, Sergey Malov sma...@collective.com javascript:_e(%7B%7D,'cvml','sma...@collective.com'); wrote: My apologies if this info can be found somewhere, I don’t see it. I have a production table “profile” with the following schema : {NAME = 'edge', BLOOMFILTER = 'ROW', VERSIONS = '2', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'false', DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY', TTL = ' FOREVER', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} {NAME = 'export', BLOOMFILTER = 'ROW', VERSIONS = '2', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'false', DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY', TTL = 'FOREVER', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} {NAME = 'visitor', BLOOMFILTER = 'ROW', VERSIONS = '2', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'false', DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY', TTL = 'FOREVER', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} Each of these families can have variable number of column for a given key, with some having hundreds of thousands columns. Is it possible to create a Phoenix view for such a table ? Thank you, Sergey Malov
Re: Will phoenix support the checkandput methods in the further
Hi James I was under impression that UPSERT SELECT with WHERE would do the trick ? If I got it wrong what would be correct way of thinking about UPSERT SELECT with WHERE ? regards, S On Fri, Jun 26, 2015 at 11:53 AM, James Taylor jamestay...@apache.org wrote: Once transaction support goes into the main code line, we'll add support for INSERT which would use checkAndPut underneath (obviously with extra read overhead as compared with the Put that an UPSERT does currently). Thanks, James On Friday, June 26, 2015, dataware dataware_t...@163.com wrote: Or where can I find the documents if it is already support? Thanks
Re: How to count table rows from Java?
Zach, I wouldn't at all say that doing a count(*) is not recommended. It's important to know that 1) this requires a full table scan and 2) this is done by Phoenix asynchronously. You'll need to set the timeouts high enough for this to complete. Phoenix will be much faster than running a MR job, but the MR job runs asynchronously. To ensure your stats are up to date, run a major compaction on your table as that updates the stats. Also, if you have wide rows, consider using multiple column families so that the count(*) doesn't have to traverse all of your data. Thanks, James On Friday, June 26, 2015, Nick Dimiduk ndimi...@gmail.com wrote: RowCounter is s mapreduce program. After the program completes execution of the job, it returns information about that job, including job counters. RowCounter includes its counts in the job counters, so they're easily accessed programmatically from the returned object. It's not a ResultSet, but it should work none the less. On Friday, June 26, 2015, Riesland, Zack zack.riesl...@sensus.com javascript:_e(%7B%7D,'cvml','zack.riesl...@sensus.com'); wrote: I wrote a Java program that runs nightly and collects metrics about our hive tables. I would like to include HBase tables in this as well. Since select count(*) is slow and not recommended on Phoenix, what are my alternatives from Java? Is there a way to call org.apache.hadoop.hbase.mapreduce.RowCounter from java and get results in some kind of result set? Thanks for any info!
RE: Problem in finding the largest value of an indexed column
Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.commailto:mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.hamailto:zimmk...@hotmail.com | webex: https://h.a/mikewebex [Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. From: Yufan Liu [mailto:yli...@kent.edu] Sent: Friday, June 26, 2015 6:31 PM To: user@phoenix.apache.org Subject: Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan
Re: Problem in finding the largest value of an indexed column
Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex [image: Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 6:31 PM *To:* user@phoenix.apache.org *Subject:* Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan -- best, Yufan
RE: Problem in finding the largest value of an indexed column
OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike From: Yufan Liu [mailto:yli...@kent.edu] Sent: Friday, June 26, 2015 7:19 PM To: user@phoenix.apache.org Subject: Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.commailto:mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.commailto:mmcallis...@homeaway.com | C: 512.423.7447tel:512.423.7447 | skype: michael.mcallister.hamailto:zimmk...@hotmail.com | webex: https://h.a/mikewebex [Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. From: Yufan Liu [mailto:yli...@kent.edumailto:yli...@kent.edu] Sent: Friday, June 26, 2015 6:31 PM To: user@phoenix.apache.orgmailto:user@phoenix.apache.org Subject: Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan -- best, Yufan
Re: Problem in finding the largest value of an indexed column
Thank you anyway, Michael! 2015-06-26 17:21 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: OK, I’m a Phoenix newbie, so that was the extent of the advice I could give you. There are people here far more experienced than I am who should be able to give you deeper advice. Have a great weekend! Mike *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 7:19 PM *To:* user@phoenix.apache.org *Subject:* Re: Problem in finding the largest value of an indexed column Hi Michael, Thanks for the advice, for the first one, it's CLIENT 67-CHUNK PARALLEL 1-WAY FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER AGGREGATE INTO SINGLE ROW which is as expected. For the second one, it's CLIENT 67-CHUNK SERIAL 1-WAY REVERSE FULL SCAN OVER TIMESTAMP_INDEX; SERVER FILTER BY FIRST KEY ONLY; SERVER 1 ROW LIMIT which looks correct, but still returns the unexpected result. 2015-06-26 16:59 GMT-07:00 Michael McAllister mmcallis...@homeaway.com: Yufan Have you tried using the EXPLAIN command to see what plan is being used to access the data? Michael McAllister Staff Data Warehouse Engineer | Decision Systems mmcallis...@homeaway.com | C: 512.423.7447 | skype: michael.mcallister.ha zimmk...@hotmail.com | webex: https://h.a/mikewebex [image: Description: Description: cid:3410354473_30269081] This electronic communication (including any attachment) is confidential. If you are not an intended recipient of this communication, please be advised that any disclosure, dissemination, distribution, copying or other use of this communication or any attachment is strictly prohibited. If you have received this communication in error, please notify the sender immediately by reply e-mail and promptly destroy all electronic and printed copies of this communication and any attachment. *From:* Yufan Liu [mailto:yli...@kent.edu] *Sent:* Friday, June 26, 2015 6:31 PM *To:* user@phoenix.apache.org *Subject:* Problem in finding the largest value of an indexed column Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan -- best, Yufan -- best, Yufan
Re: create a view on existing production table ?
Hi Sergey, Since you have hundreds of thousand of columns. You can query your data by using dynamic columns features of phoenix. In this way, you wont need to predefine 100's of thousands of columns. Thanks, Anil Gupta On Fri, Jun 26, 2015 at 11:34 AM, James Taylor jamestay...@apache.org wrote: Sergey, It is possible, but maybe in your case it's not feasible. Thanks, James On Friday, June 26, 2015, Sergey Malov sma...@collective.com wrote: Hi James, From you reply I understand that it is NOT possible to create such a view, because each family can have different number of columns, and it could be just one column per family for one PK, and hundreds of thousands for another PK. How can I possibly accommodate it in a view specification, if I need to explicitly define column by name ? Or I misunderstand something ? Thank you, Sergey From: James Taylor jamestay...@apache.org Reply-To: user@phoenix.apache.org user@phoenix.apache.org Date: Friday, June 26, 2015 at 12:04 To: user@phoenix.apache.org user@phoenix.apache.org Subject: Re: create a view on existing production table ? Hi Sergey, Yes, you can create a Phoenix view over this HBase table, but you have to explicitly list columns by name (i.e. column qualifier) either at view creation time or at read time (using dynamic columns). Also, the row key must conform to what Phoenix expects if there are multiple columns in your PK. Thanks, James On Wednesday, June 24, 2015, Sergey Malov sma...@collective.com wrote: My apologies if this info can be found somewhere, I don’t see it. I have a production table “profile” with the following schema : {NAME = 'edge', BLOOMFILTER = 'ROW', VERSIONS = '2', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'false', DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY', TTL = ' FOREVER', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} {NAME = 'export', BLOOMFILTER = 'ROW', VERSIONS = '2', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'false', DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY', TTL = 'FOREVER', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} {NAME = 'visitor', BLOOMFILTER = 'ROW', VERSIONS = '2', IN_MEMORY = 'false', KEEP_DELETED_CELLS = 'false', DATA_BLOCK_ENCODING = 'FAST_DIFF', COMPRESSION = 'SNAPPY', TTL = 'FOREVER', MIN_VERSIONS = '0', BLOCKCACHE = 'true', BLOCKSIZE = '65536', REPLICATION_SCOPE = '1'} Each of these families can have variable number of column for a given key, with some having hundreds of thousands columns. Is it possible to create a Phoenix view for such a table ? Thank you, Sergey Malov -- Thanks Regards, Anil Gupta
Problem in finding the largest value of an indexed column
Hi, We have created a table (eg, t1), and a global index of one numeric column of t1 (eg, timestamp). Now we want to find the largest value of timestamp, we have tried two approaches: 1. select max(timestamp) from t1; This query takes forever to finish, so I think it maybe doing a full table scan/comparison . 2. select timestamp from t1 order by timestamp desc limit 1; This query finished fast, but the result it returns is far from the largest value. It seems it just return the largest value for a certain range of data. Did anyone else encounter this issue/have any suggestion? -- Thanks, Yufan
Will phoenix support the checkandput methods in the further
Or where can I find the documents if it is already support? Thanks