from:"Thomas D'Silva"

Re: Bulk Load data into Phoenix Table with Dynamic columns

2019-11-17 Thread Thomas D'Silva

Bulk loading with dynamic columns is not supported. You could try modeling
the data using multiple views on the same physical table.

On Tue, Nov 12, 2019 at 3:26 PM Mohammed Shoaib Quraishi <
shoaib.qurai...@outlook.com> wrote:

> Hi,
>
> I have a phoenix table created with a fixed set of columns. I have
> inserted and projected  dynamic column data using UPSERT and SELECT
> statements. I would like to bulk insert dynamic column data into this
> table. I do not see any documentation on how/whether this can be done.
> Scripting dynamic SQL with multiple UPSERT statements loads data very slow.
> It takes forever to load 500,000 records using multiple UPSERT statements.
> Also, one UPSERT statement with multiple VALUES doesn't work.
>
> Can you help me with the best method to bulk insert huge data into a
> Phoenix table with dynamic columns.
>
> Thank you,
>
> Shoaib
>

Re: Apache phoenix problem with order by and offset giving duplicate results in paging

2019-11-12 Thread Thomas D'Silva

Try including the title in the order by clause as well {order by TITLE,
TO_NUMBER(COUNT) desc}.
Using offset and limit for paging is not efficient when the table has a lot
of rows.
You try using row value constructors and ordering by the primary key column
of the table (see https://phoenix.apache.org/paged.html)

On Sun, Nov 10, 2019 at 11:04 PM Lalit Jadhav 
wrote:

> Hello,
>
> I'm using order by with offset for paging in apache phoenix. I am getting
> a duplicate result on the next page.
>
> I have a view of the HBase table in Apache phoenix. I am using TO_NUMBER()
> to convert my stringified column into an integer. I want to sort this
> converted column in descending order.
>
> Example: I have an Hbase view like :
>
>  --
> | TITLE  |   COUNT  |
>   --
> | t8 | 10   |
>   --
> | t9 | 2|
> | t4 | 1|
> | t6 | 1|
> | t10| 1|
> | t7 | 1|
> | t43| 0|
> | t14| 0|
> | t11| 0|
> | t42| 0|
>
> My Query:
>
> Select TITLE,COUNT from Hbase_view order by TO_NUMBER(COUNT) desc limit 5
> offset 0;
>
> Result :
>
> ++--+
> | TITLE  |   COUNT  |
> ++--+
> | t8 | 10   |
> | t9 | 2|
> | t4 | 1|
> | t7 | 1|
> | t10| 1|
>
> Next Query:
>
> Select TITLE,COUNT from Hbase_view order by TO_NUMBER(COUNT) desc limit 5
> offset 5;
>
> result :
>
> ++--+
> | TITLE  |COUNT |
> ++--+
> | t7 | 1|
> | t43| 0|
> | t14| 0|
> | t11| 0|
> | t42| 0|
> ++--+
>
> Here t7 is repeated on both the result.
>
> Expected result :
>
> 1st Query:-
> ++--+
> | TITLE  |COUNT |
> ++--+
> | t8 | 10   |
> | t9 | 2|
> | t4 | 1|
> | t6 | 1|
> | t10| 1|
>
> 2nd Query
>
>> ++--+
>> | TITLE  |COUNT |
>> ++--+
>> | t7 | 1|
>> | t43| 0|
>> | t14| 0|
>> | t11| 0|
>> | t42| 0|
>> ++--+
>
>
> Please help me to know what is the exact issue? Also, I want to know, is
> CURSOR can be used for the same purpose?.
>
>
> ---
> Lalit Jadhav
>

Re: Sequence number

2019-10-22 Thread Thomas D'Silva

Are you sure SYSTEM.SEQUENCE was restored properly? What is the current
value of the sequence in the restored table?

On Fri, Oct 4, 2019 at 1:52 PM jesse  wrote:

> Let's say there is a running cluster A, with table:books and
> system.sequence current value 5000, cache size 100, incremental is 1, the
> latest book with sequence id:4800
>
> Now the cluster A snapshot is backed up & restored into cluster b,
> system.sequence and books table are properly restored, when we add a new
> book, the book gets sequence id: 12, why it is not 4801 or 5001?
>
> Our Phoenix version : 4.14.2
>
> Thanks
>
>
>
>

[ANNOUNCE] Apache Phoenix 4.14.2 released

2019-05-28 Thread Thomas D'Silva

The Apache Phoenix team is pleased to announce the immediate availability
of the 4.14.2 patch release. Apache Phoenix enables SQL-based OLTP and
operational analytics for Apache Hadoop using Apache HBase as its backing
store and providing integration with other projects in the Apache ecosystem
such as Spark, Hive, Pig, Flume, and MapReduce.

This patch release has feature parity with supported HBase versions and
includes critical bug fixes for secondary indexes.

Download source and binaries here [1].

Thanks,
Thomas (on behalf of the Apache Phoenix team)

[1] http://phoenix.apache.org/download.html

Re: Local Index data not replicating for older HBase versions

2019-05-23 Thread Thomas D'Silva

The CDH branches have been maintained by Pedro, I am don't know what is the
release plan for those branches.

On Tue, May 21, 2019 at 11:13 PM William Shen 
wrote:

> Hieu,
>
> You're welcome to file a JIRA and submit a fix for the CDH branches
> (4.14-cdh5.11, 4.x-cdh5.15, etc). I think while 4.x-HBase-1.2 is EOL, the
> CDH branches are not. CDH 5.x is based on HBase-1.2, but it also contains a
> concoction of 1.2.x, 1.3, 1.4 patches back-ported by Cloudera. (Thomas,
> feel free to correct me if I misunderstood... but when we EOL'd HBase-1.2,
> I was under the impression that the CDH branches will live on).
>
>
> On Tue, May 21, 2019 at 9:24 PM Thomas D'Silva 
> wrote:
>
>> Your approach seems like the correct thing to do. HBase has stopped
>> supporting the 1.2 branch, so we also EOL'ed it, there will not be any more
>> releases targeting HBase 1.2. I would suggest that you upgrade to a later
>> version.
>>
>> On Tue, Apr 30, 2019 at 8:55 PM Hieu Nguyen  wrote:
>>
>>> Hi,
>>>
>>> We are on Phoenix 4.14-cdh5.11.  We are experiencing an issue where
>>> local index data is not being replicated through HBase replication.  As
>>> suggested in a previous email thread (
>>> https://lists.apache.org/thread.html/984fba3c8abd944846deefb3ea285195e0436b9181b9779feac39b59@%3Cuser.phoenix.apache.org%3E),
>>> we have enabled replication for the local indexes (the "L#0" column family
>>> on the same table).  We wrote an integration test to demonstrate this issue
>>> on top of 4.14-cdh5.11 branch (
>>> https://github.com/hnguyen08/phoenix/commit/3589cb45d941c6909fb3deb5f5abb0f8dfa78dd7
>>> ).
>>>
>>> After some investigation and debugging, we discovered the following:
>>> 1. Commit a2f4d7eebec621b58204a9eb78d552f18dcbcf24 (PHOENIX-3827) fixed
>>> the issue, but only in Phoenix for HBase1.3+.  It uses the
>>> miniBatchOp.addOperationsFromCP() API introduced in HBase1.3.
>>> Unfortunately, for the time being, we are stuck on cdh5.11 (based on
>>> HBase1.2).
>>> 2. IndexUtil.writeLocalUpdates() is called in both implementations of
>>> IndexCommitter, both taking skipWAL=true.  It seems like we'd actually want
>>> to not skip WAL to ensure that local-index updates are replicated correctly
>>> (since, as mentioned in the above email thread, "HBase-level replication of
>>> the data table will not trigger index updates").  After changing the
>>> skipWAL flag to false, the above integration test passes.
>>>
>>> Would it make sense to fix local-index replication for Phoenix versions
>>> on <= HBase1.2 by setting skipWAL to false for writeLocalUpdates?  As far
>>> as I can tell, it is not a perfect solution compared to using
>>> addOperationsFromCP() because the local index update still wouldn't be
>>> atomic.  But at least local-index updates can be replicated.  Let me know
>>> if I'm missing something.  Happy to file a JIRA or submit a fix.
>>>
>>> Thanks,
>>> -Hieu
>>>
>>>

Re: Local Index data not replicating for older HBase versions

2019-05-21 Thread Thomas D'Silva

Your approach seems like the correct thing to do. HBase has stopped
supporting the 1.2 branch, so we also EOL'ed it, there will not be any more
releases targeting HBase 1.2. I would suggest that you upgrade to a later
version.

On Tue, Apr 30, 2019 at 8:55 PM Hieu Nguyen  wrote:

> Hi,
>
> We are on Phoenix 4.14-cdh5.11.  We are experiencing an issue where local
> index data is not being replicated through HBase replication.  As suggested
> in a previous email thread (
> https://lists.apache.org/thread.html/984fba3c8abd944846deefb3ea285195e0436b9181b9779feac39b59@%3Cuser.phoenix.apache.org%3E),
> we have enabled replication for the local indexes (the "L#0" column family
> on the same table).  We wrote an integration test to demonstrate this issue
> on top of 4.14-cdh5.11 branch (
> https://github.com/hnguyen08/phoenix/commit/3589cb45d941c6909fb3deb5f5abb0f8dfa78dd7
> ).
>
> After some investigation and debugging, we discovered the following:
> 1. Commit a2f4d7eebec621b58204a9eb78d552f18dcbcf24 (PHOENIX-3827) fixed
> the issue, but only in Phoenix for HBase1.3+.  It uses the
> miniBatchOp.addOperationsFromCP() API introduced in HBase1.3.
> Unfortunately, for the time being, we are stuck on cdh5.11 (based on
> HBase1.2).
> 2. IndexUtil.writeLocalUpdates() is called in both implementations of
> IndexCommitter, both taking skipWAL=true.  It seems like we'd actually want
> to not skip WAL to ensure that local-index updates are replicated correctly
> (since, as mentioned in the above email thread, "HBase-level replication of
> the data table will not trigger index updates").  After changing the
> skipWAL flag to false, the above integration test passes.
>
> Would it make sense to fix local-index replication for Phoenix versions on
> <= HBase1.2 by setting skipWAL to false for writeLocalUpdates?  As far as I
> can tell, it is not a perfect solution compared to using
> addOperationsFromCP() because the local index update still wouldn't be
> atomic.  But at least local-index updates can be replicated.  Let me know
> if I'm missing something.  Happy to file a JIRA or submit a fix.
>
> Thanks,
> -Hieu
>
>

Re: Date for next release ?

2019-04-17 Thread Thomas D'Silva

Josh had started a discussion thread on the dev list about having a 5.0.1
release.
https://lists.apache.org/thread.html/99fcc737d7a8f82ddffb1b34a64f7099f7909900b8bea36dd6afca16@%3Cdev.phoenix.apache.org%3E

We would appreciate any help in making this release happen.

On Mon, Apr 15, 2019 at 4:11 AM Jean-Charles Jabouille <
jean-charles.jaboui...@kelkoogroup.com> wrote:

> Hi,
>
> first: thanks for this awesome project. We try to use Phoenix(5.0.0) for
> our big project. We are facing some issue with Hbase version. We would
> like to update it but Phoenix 5.0.0 is only Hbase 2.0.0 compliant.
>
> Do you have an idea about the next release of Phoenix using Hbase >
> 2.0.0 please ?
>
> Thanks again for you work.
>
> Regards,
>
> jean charles
>

Re: query cell timestamp and tag

2019-04-11 Thread Thomas D'Silva

We have an open JIRA PHOENIX-4552
 which has some details
on the work involved. Currently there cell tags cannot be queried using
Phoenix.

On Thu, Apr 11, 2019 at 11:41 AM Jimmy Xiang  wrote:

> I have an existing HBase table. The cell timestamp is not in the row key.
> I'd like to make the timestamp a Phoenix table column and query it via
> Phoenix.
>
> I noticed that the documentation says row_timestamp column must be one of
> the primary key columns. Is it possible to patch Phoenix to lift this
> restriction? If so, how hard is it?
>
> How about tags? Any way to query HBase cell tags via Phoenix?
>
> Thanks,
> Jimmy
>

Re: Using Hint with PhoenixRDD

2019-04-10 Thread Thomas D'Silva

Can you please file a JIRA for this?

On Wed, Apr 10, 2019 at 5:53 PM William Shen 
wrote:

> Thanks for chiming in Thomas. We were trying to pass in NO_CACHE to
> prevent large one-time scans from affecting the block cache.
>
> On Wed, Apr 10, 2019 at 5:14 PM Thomas D'Silva 
> wrote:
>
>> I don't there is a way to pass in a hint while using PhoenixRDD. Which
>> hint are you trying to pass in?
>>
>> On Wed, Apr 10, 2019 at 10:42 AM William Shen 
>> wrote:
>>
>>> Anyone still using PhoenixRDD with Spark, or anyone had used it in the
>>> past that might be able to answer this?
>>>
>>> Thanks!
>>>
>>> On Thu, Apr 4, 2019 at 12:16 PM William Shen 
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Do we have any way of passing in hints when querying Phoenix using
>>>> PhoenixRDD in Spark? I reviewed the implementation of PhoenixRDD
>>>> and PhoenixRecordWritable, but was not able to find an obvious way to do
>>>> so. Is it supported?
>>>>
>>>> Thanks in advance!
>>>>
>>>> - Will
>>>>
>>>

Re: Using Hint with PhoenixRDD

2019-04-10 Thread Thomas D'Silva

I don't there is a way to pass in a hint while using PhoenixRDD. Which hint
are you trying to pass in?

On Wed, Apr 10, 2019 at 10:42 AM William Shen 
wrote:

> Anyone still using PhoenixRDD with Spark, or anyone had used it in the
> past that might be able to answer this?
>
> Thanks!
>
> On Thu, Apr 4, 2019 at 12:16 PM William Shen 
> wrote:
>
>> Hi all,
>>
>> Do we have any way of passing in hints when querying Phoenix using
>> PhoenixRDD in Spark? I reviewed the implementation of PhoenixRDD
>> and PhoenixRecordWritable, but was not able to find an obvious way to do
>> so. Is it supported?
>>
>> Thanks in advance!
>>
>> - Will
>>
>

Re: Strange query results

2019-02-12 Thread Thomas D'Silva

This looks like a bug, can you please file a JIRA with the repro steps ?

On Tue, Feb 12, 2019 at 4:56 PM Victor Brakauskas 
wrote:

> Hello all,
> Running into some strange query results on a cluster I have that's running
> phoenix 4.14.0 and hbase 1.3.1.
> I've got a table that's defined as such.
>
> create table if not exists TEST_TABLE(col_1 integer not null,col_2 
> integer not null,
> col_3 bigint,
> col_4 float,
> col_5 date,
> col_6 date,
> col_7 varchar,
> col_8 varchar[],
> col_9 date,
> col_10 boolean
> constraint ind_attr_pk primary key (col_1, col_2)
> )column_encoded_bytes = 0,data_block_encoding = 'FAST_DIFF',versions = 
> 1,compression = 'GZ',salt_buckets = 120,filesize = 10737418240;
>
>
> and insert a row like this
>
> upsert into TEST_TABLE VALUES(55, , 42, 42.42, current_date(),
> current_time(),'hi', ARRAY['hi','hello','sup'], current_date(), true);
>
> Then, when querying the table, there are some weird query results
>  0: jdbc:phoenix:localhost> select count(*) from test_table where col_1 =
> 55 and col_2 = ;
>
> *+---+*
>
> *| **COUNT(1) ** |*
>
> *+---+*
>
> *| *1* |*
>
> *+---+*
>
> 1 row selected (0.012 seconds)
>
> 0: jdbc:phoenix:localhost> select count(*) from test_table where col_1 =
> 55;
>
> *+---+*
>
> *| **COUNT(1) ** |*
>
> *+---+*
>
> *| *0* |*
>
> *+---+*
>
> 1 row selected (0.025 seconds)
>
> Do you guys have any idea what mind be causing these results?
>
>
> 
> I'm using Inbox When Ready
> 
> to protect my focus.
>

Re: Phoenix JDBC Connection Warmup

2019-02-04 Thread Thomas D'Silva

As James suggested if you set the UPDATE_CACHE_FREQUENCY table property,
the server will not be pinged for the latest metadata until the update
frequency.
Check out the altering section (https://phoenix.apache.org/)

On Sun, Feb 3, 2019 at 5:57 PM William Shen 
wrote:

> Thanks for the suggestions!
>
> Jaanai - do you mean to enable trace logging on the client side or on the
> server side?
>
> James - no I have not tried setting it. I have not heard about the
> configuration. Do you have a ball park suggestion on how to approach
> setting this value?
> On Sat, Feb 2, 2019 at 8:53 AM James Taylor 
> wrote:
>
>> Have you tried setting UPDATE_CACHE_FREQUENCY on your tables?
>>
>> On Fri, Feb 1, 2019 at 6:28 PM Jaanai Zhang 
>> wrote:
>>
>>>  we experimented with issuing the same query repeatedly, and we observed
 a slow down not only on the first query
>>>
>>> I am not sure what the reasons are, perhaps you can enable TRACE log to
>>> find what leads to slow,  I guess that some meta information is reloaded
>>> under highly write workload.
>>>
>>> 
>>>Jaanai Zhang
>>>Best regards!
>>>
>>>
>>>
>>> William Shen  于2019年2月1日周五 上午2:09写道：
>>>
 Thanks Jaanai. Do you know if that is expected only on the first query
 against a table? For us, we experimented with issuing the same query
 repeatedly, and we observed a slow down not only on the first query. Does
 it make sense to preemptively load table metadata on start up to warm up
 the system to reduce latency during the actual query time (if it is
 possible to do so)?

 On Wed, Jan 30, 2019 at 10:54 PM Jaanai Zhang 
 wrote:

> It is expected when firstly query tables after establishing the
> connection. Something likes loads some meta information into local cache
> that need take some time,  mainly including two aspects: 1. access
> SYSTEM.CATALOG table to get schema information of the table  2. access the
> meta table of HBase to get regions information of the table
>
> 
>Jaanai Zhang
>Best regards!
>
>
>
> William Shen  于2019年1月31日周四 下午1:37写道：
>
>> Hi there,
>>
>> I have a component that makes Phoenix queries via the Phoenix JDBC
>> Connection. I noticed that consistently, the Phoenix Client takes longer 
>> to
>> execute a PreparedStatement and it takes longer to read through the
>> ResultSet for a period of time (~15m) after a restart of the component. 
>> It
>> seems like there is a warmup period for the JDBC connection. Is this to 
>> be
>> expected?
>>
>> Thanks!
>>
>

Re: split count for mapreduce jobs with PhoenixInputFormat

2019-01-30 Thread Thomas D'Silva

If stats are enabled PhoenixInputFormat will generate a split per
guidepost.

On Wed, Jan 30, 2019 at 7:31 AM Josh Elser  wrote:

> You can extend/customize the PhoenixInputFormat with your own code to
> increase the number of InputSplits and Mappers.
>
> On 1/30/19 6:43 AM, Edwin Litterst wrote:
> > Hi,
> > I am using PhoenixInputFormat as input source for mapreduce jobs.
> > The split count (which determines how many mappers are used for the job)
> > is always equal to the number of regions of the table from where I
> > select the input.
> > Is there a way to increase the number of splits? My job is running too
> > slow with only one mapper for every region.
> > (Increasing the number of regions is no option.)
> > regards,
> > Eddie
>

Re: FromCompiler - Re-resolved stale table logging

2019-01-28 Thread Thomas D'Silva

There is an open bug because of which we resolve a table multiple times
during a single query compilation (
https://issues.apache.org/jira/browse/PHOENIX-4962).
This will only affect query performance and is not a correctness issue.

On Fri, Jan 25, 2019 at 3:24 PM William Shen 
wrote:

> Hi all,
>
> I've tried looking around for documentation and in source code, but did
> not have much luck trying to understand the following logging from Phoenix
> JDBC that gets logged in DEBUG mode. Does anyone know what it means, and is
> it a problem to see a lot of these in the log? Thanks!
>
> DEBUG org.apache.phoenix.compile.FromCompiler - Re-resolved stale table 
> <*table
> name>* with seqNum 29 at timestamp 1523877109281 with 52 columns: [_SALT,
> <... *list of columns* ...>]
>

Re: unexpected behavior...MIN vs ORDER BY and LIMIT 1

2019-01-17 Thread Thomas D'Silva

The first query scans over all the rows in the index, while the second
query reads one row (SERVER 1 ROW LIMIT ).

On Tue, Jan 15, 2019 at 6:55 PM M. Aaron Bossert 
wrote:

> I have a table (~ 724M rows) with a secondary index on the "TIME" column.
> When I run a MIN function on the table, the query takes ~290 sec to
> complete and by selecting on TIME and ORDERing by TIME, the query runs in
> about 0.04 sec.
>
> Here is the explain output for both queries...I totally understand that
> reading the entire table takes much longer than reading one row, I don't
> quite get why there is such a large performance delta.  Am I missing
> something?
>
> 0: jdbc:phoenix:thin:url=http://localhost:876> EXPLAIN SELECT MIN(TIME)
> FROM CHASE.LANL_FLOW LIMIT 1;
>
> ++-+++
> |PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |
> EST_INFO_TS   |
>
> ++-+++
> | CLIENT 176-CHUNK 723963021 ROWS 54735673888 BYTES PARALLEL 1-WAY FULL
> SCAN OVER CHASE:ASYNC_INDEX  | 54735673888 | 723963021  |
> 1547591769731  |
> | SERVER FILTER BY FIRST KEY ONLY
>   | 54735673888 | 723963021  |
> 1547591769731  |
> | SERVER AGGREGATE INTO SINGLE ROW
>| 54735673888 | 723963021  |
> 1547591769731  |
> | CLIENT 1 ROW LIMIT
>| 54735673888 | 723963021  |
> 1547591769731  |
>
> ++-+++
> 4 rows selected (0.036 seconds)
> 0: jdbc:phoenix:thin:url=http://localhost:876> EXPLAIN SELECT TIME FROM
> CHASE.LANL_FLOW ORDER BY TIME ASC LIMIT 1;
>
> +---+-++--+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
>
> +---+-++--+
> | CLIENT 2-CHUNK 1 ROWS 77 BYTES SERIAL 1-WAY FULL SCAN OVER
> CHASE:ASYNC_INDEX  | 77  | 1  | 0|
> | SERVER FILTER BY FIRST KEY ONLY
>  | 77  | 1  | 0|
> | SERVER 1 ROW LIMIT
>   | 77  | 1  | 0|
> | CLIENT 1 ROW LIMIT
>   | 77  | 1  | 0|
>
> +---+-++--+
> 4 rows selected (0.02 seconds)
>

Re: Phoenix Performance Improvement

2019-01-17 Thread Thomas D'Silva

Can you provide the schema of the table and the queries you are running?

On Tue, Jan 15, 2019 at 12:35 PM Azharuddin Shaikh 
wrote:

> Thanks Pedro for your response.
>
> Actually we have an status column which consists of various status and we
> are executing queries which select queries which are fetching data from
> Hbase tables basis on status clause in the query and same is displayed on
> the application dashboard.
>
> An average daily ingestion is of 10k rows  to the existing table which is
> currently at 5M+.
>
> We had tried secondary index to avoid full scan on table but when queries
> are executed parallelly then timeout occurs after 30ms.
>
> It is working fine if we have only 1instance of an application running but
> if there are multiple application instances then we are experiencing
> timeouts.
>
>
> On Tue, 15 Jan, 2019, 9:47 PM Pedro Boado,  wrote:
>
>> What type of queries are being thrown to the cluster? What's the average
>> row size? 5M rows seems a tiny table size.  30ms is OK for scans over a few
>> thousand records, but maybe not for full table scans.
>>
>

Re: Hbase vs Phienix column names

2019-01-07 Thread Thomas D'Silva

There isn't an existing utility that does that. You would have to look up
the COLUMN_QUALIFIER for the columns you are interested in from
SYSTEM.CATALOG
and use then create a Scan.

On Mon, Jan 7, 2019 at 9:22 PM Anil  wrote:

> Hi Team,
>
> Is there any utility to read hbase data using hbase apis which is created
> with phoniex with column name encoding ?
>
> Idea is to use the all performance and disk usage improvements achieved
> with phoenix column name encoding feature and use our existing hbase jobs
> for our data analysis.
>
> Thanks,
> Anil
>
> On Tue, 11 Dec 2018 at 14:02, Anil  wrote:
>
>> Thanks.
>>
>> On Tue, 11 Dec 2018 at 11:51, Jaanai Zhang 
>> wrote:
>>
>>> The difference since used encode column names that support in 4.10
>>> version(Also see PHOENIX-1598
>>> ).
>>> You can config COLUMN_ENCODED_BYTES property to keep the original column
>>> names in the create table SQL, an example for:
>>>
>>> create table test(
>>>
>>> id varchar  primary key,
>>>
>>> col varchar
>>>
>>> )COLUMN_ENCODED_BYTES =0 ;
>>>
>>>
>>>
>>> 
>>>Jaanai Zhang
>>>Best regards!
>>>
>>>
>>>
>>> Anil  于2018年12月11日周二 下午1:24写道：
>>>
 HI,

 We have upgraded phoenix to Phoenix-4.11.0-cdh5.11.2 from phoenix 4.7.

 Problem - When a table is created in phoenix, underlying hbase column
 names and phoenix column names are different. Tables created in 4.7 version
 looks good. Looks

 CREATE TABLE TST_TEMP (TID VARCHAR PRIMARY KEY ,PRI VARCHAR,SFLG
 VARCHAR,PFLG VARCHAR,SOLTO VARCHAR,BILTO VARCHAR) COMPRESSION = 'SNAPPY';

 0: jdbc:phoenix:dq-13.labs.> select TID,PRI,SFLG from TST_TEMP limit 2;
 +-++---+
 |   TID   |PRI |SFLG   |
 +-++---+
 | 0060189122  | 0.00   |   |
 | 0060298478  | 13390.26   |   |
 +-++---+

 hbase(main):011:0> scan 'TST_TEMP', {LIMIT => 2}
 ROW  COLUMN+CELL
  0060189122  column=0:\x00\x00\x00\x00,
 timestamp=1544296959236, value=x
  0060189122  column=0:\x80\x0B,
 timestamp=1544296959236, value=0.00
  0060298478  column=0:\x00\x00\x00\x00,
 timestamp=1544296959236, value=x
  0060298478  column=0:\x80\x0B,
 timestamp=1544296959236, value=13390.26

 hbase columns names are completely different than phoenix column names.
 This change observed only post up-gradation. all existing tables created in
 earlier versions looks good and alter statements to existing tables also
 looks good.

 Is there any workaround to avoid this difference? we could not run
 hbase mapreduce jobs on hbase tables created  by phoenix. Thanks.

 Thanks

Re: column mapping schema decoding

2019-01-02 Thread Thomas D'Silva

The encoded column qualifiers do not start at one (see
QueryConstants.ENCODED_CQ_COUNTER_INITIAL_VALUE). Its best to use
QualifierEncodingScheme as was suggested.

On Wed, Jan 2, 2019 at 3:53 PM Shawn Li  wrote:

> Hi Jaanai and Pedro,
>
> Any input for my example?
>
> Thanks,
> Shawn
>
> On Thu, Dec 27, 2018, 12:34 Shawn Li 
>> Hi Jaanai,
>>
>> Thanks for the input. So the encoding schema is not simple first come
>> first assigned (such as in my example: A.population -> 1, A.type -> 2;
>> B.zipcode -> 1, B.quality ->2)? In order to decode it, we will have to
>> use QualifierEncodingScheme class? The reason we want to use the column
>> mapping is because the improvement of query performance mentioned on
>> Phoenix website.  We have tables with columns number between 100 to 200.
>>
>> Thanks,
>> Shawn
>>
>> On Thu, Dec 27, 2018 at 2:22 AM Jaanai Zhang 
>> wrote:
>>
>>> The actual column name and encoded qualifier number are stored in 
>>> SYSTEM.CATALOG
>>> table, the field names are COLUMN_NAME(string) and COLUMN_QUALIFIER(binary)
>>> respectively, QualifierEncodingScheme can be used to decode/encode
>>> COLUMN_QUALIFIER, but this is a little complicated process.
>>>
>>> For your scenario, maybe use the original column is better.
>>>
>>>
>>>
>>>
>>> 
>>>Jaanai Zhang
>>>Best regards!
>>>
>>>
>>>
>>> Shawn Li  于2018年12月27日周四 上午7:17写道：
>>>
 Hi Pedro,

 Thanks for reply. Can you explain a little bit more? For example, if we
 use COLUMN_ENCODED_BYTES = 1,How is the following table DDL converted
 to numbered column qualifier in Hbase? (such as A.population maps which
 number, B.zipcode Map to which number in Hbase)

 CREATE TABLE IF NOT EXISTS us_population (
   state CHAR(2) NOT NULL,
   city VARCHAR NOT NULL,
   A.population BIGINT,
   A.type CHAR,
   B.zipcode CHAR(5),
   B.quantity INT CONSTRAINT my_pk PRIMARY KEY (state, city));


 Thanks,
 Shawn

 On Wed, Dec 26, 2018 at 6:00 PM Pedro Boado  wrote:

> Hi,
>
> Column mapping is stored in SYSTEM.CATALOG table . There is only one
> column mapping strategy with between 1 to 4 bytes to be used to represent
> column number. Regardless of encoded column size, column name lookup
> strategy remains the same.
>
> Hope it helps,
>
> Pedro.
>
>
>
> On Wed, 26 Dec 2018, 23:00 Shawn Li 
>> Hi,
>>
>> Phoenix 4.10 introduced column mapping feature. There are four types
>> of mapping schema (https://phoenix.apache.org/columnencoding.html).
>> Is there any documentation that shows how to encode/map string column 
>> name
>> in Phoenix to number column qualifier in Hbase?
>>
>> We are using Lily Hbase indexer to do the batch indexing. So if the
>> column qualifier is number. We need find a way to decode it back to the
>> original String column name.
>>
>> Thanks,
>> Shawn
>>
>

Re: Inner Join Cursor Query fails with NullPointerException - JoinCompiler.java:187

2019-01-02 Thread Thomas D'Silva

This looks like a bug, please file a JIRA with your test case.

On Sat, Dec 29, 2018 at 7:42 AM Jack Steenkamp 
wrote:

> Hi All,
>
> Using Phoenix 4.14.1, I have come across an inner join query in my
> application that fails with the NullPointerException if executed as part of
> a Cursor, but executes fine if done without it.
>
> To reproduce this issue, you can run the attached program (assuming you
> update the JDBC_URL to point to an instance you have running) or you can
> follow the steps below:
>
> Create the Table:
>
> CREATE TABLE IF NOT EXISTS MY_STATS
> (
>ID  VARCHARNOT NULL,
>ENTRY_NAME VARCHAR,
>ENTRY_VALUE   DOUBLE ,
>TRANSACTION_TIME   TIMESTAMP  ,
>CONSTRAINT pk PRIMARY KEY(ID)
> )
> IMMUTABLE_STORAGE_SCHEME=ONE_CELL_PER_COLUMN,
> UPDATE_CACHE_FREQUENCY=90,
> COLUMN_ENCODED_BYTES=NONE,
> IMMUTABLE_ROWS=true
>
> Execute a normal query (this works fine):
>
> SELECT * FROM MY_STATS
>INNER JOIN
>(
> SELECT ENTRY_NAME, MAX(TRANSACTION_TIME) AS TRANSACTION_TIME
> FROM MY_STATS
>  GROUP BY ENTRY_NAME
>) sub
>ON MY_STATS.ENTRY_NAME = sub.ENTRY_NAME AND MY_STATS.TRANSACTION_TIME =
> sub.TRANSACTION_TIME
> ORDER BY MY_STATS.TRANSACTION_TIME DESC
>
> Now if you execute the same query, but with the cursor declaration at the
> top -
>
> DECLARE MyCursor CURSOR FOR
>
> It produces the following exception:
>
> Exception in thread "main" java.lang.NullPointerException
> at
> org.apache.phoenix.compile.JoinCompiler$JoinTableConstructor.resolveTable(JoinCompiler.java:187)
> at
> org.apache.phoenix.compile.JoinCompiler$JoinTableConstructor.visit(JoinCompiler.java:224)
> at
> org.apache.phoenix.compile.JoinCompiler$JoinTableConstructor.visit(JoinCompiler.java:181)
> at
> org.apache.phoenix.parse.DerivedTableNode.accept(DerivedTableNode.java:49)
> at
> org.apache.phoenix.compile.JoinCompiler$JoinTableConstructor.visit(JoinCompiler.java:201)
> at
> org.apache.phoenix.compile.JoinCompiler$JoinTableConstructor.visit(JoinCompiler.java:181)
> at org.apache.phoenix.parse.JoinTableNode.accept(JoinTableNode.java:81)
> at org.apache.phoenix.compile.JoinCompiler.compile(JoinCompiler.java:138)
> at
> org.apache.phoenix.compile.QueryCompiler.compileSelect(QueryCompiler.java:190)
> at org.apache.phoenix.compile.QueryCompiler.compile(QueryCompiler.java:153)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:490)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableDeclareCursorStatement.compilePlan(PhoenixStatement.java:950)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableDeclareCursorStatement.compilePlan(PhoenixStatement.java:941)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:401)
> at
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:391)
> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:390)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:378)
> at
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1825)
> at
> com.jsteenkamp.phoenix.PhoenixInnerJoinCursorTest.testCursorQuery(PhoenixInnerJoinCursorTest.java:68)
> at
> com.jsteenkamp.phoenix.PhoenixInnerJoinCursorTest.main(PhoenixInnerJoinCursorTest.java:20)
>
> I think this may be a bug  - though perhaps this is something you might
> have come across before?
>
> Thanks,
>
>
>
>
>
>

Re: Query All Dynamic Columns

2018-12-26 Thread Thomas D'Silva

With splittable system catalog you should be able to create views without
seeing performance issues.
Chinmay is working on enabling  running a select query to return the
dynamic column values without specifying the dynamic column names and types
ahead of times.
(see https://issues.apache.org/jira/browse/PHOENIX-374). Please take a look
at the JIRA and see if it meets you use case.


On Wed, Dec 26, 2018 at 1:55 PM James Taylor  wrote:

> Persisting dynamic column names+types in Phoenix is exactly what views are
> for.
>
>
> On Wed, Dec 26, 2018 at 12:05 PM Vincent Poon 
> wrote:
>
>> A lot of work is currently going into handling large numbers of views -
>> splittable syscat, view management, etc... but agree that it's not ideal.
>>
>> There's currently no built-in way to do what you want AFAIK, but you can
>> manage the columns yourself in a separate table:
>> - store them all in a single column value, and read that value before
>> doing your query.  HBase checkAndMutate for locking.
>> or
>> - store each column as separate rows.  Then you can do things like filter
>> by column name efficiently.
>> You could 'soft delete' by removing the entries.
>>
>> Would be a nice improvement to have an option to persist dynamic column
>> names+types in Phoenix.
>>
>> On Fri, Dec 21, 2018 at 12:18 PM Clay Baenziger (BLOOMBERG/ 731 LEX) <
>> cbaenzi...@bloomberg.net> wrote:
>>
>>> Hello,
>>>
>>> A user of mine brought up a question around dynamic columns in Phoenix
>>> today. The quantity of columns should become asymptotic to a few tends of
>>> thousands of columns as their data fills in.
>>>
>>> The user want to query all columns in a table and they are today
>>> thinking of using views to do this -- but it is ugly management. They have
>>> an unbounded number of views -- which will pollute the global catalog and
>>> fail relatively quickly.
>>>
>>> Has anyone thought about the potentially wasteful[1] approach of
>>> scanning all rows in a query to determine columns and then re-running the
>>> query for the rows once we know what columns the SQL result will contain.
>>> Maybe something cleaner like persisting the set of columns in the
>>> statistics table and a SELECT * may return columns with nothing but nulls.
>>> Or, even better is there an overall better way to model such a wide schema
>>> in Phoenix?
>>>
>>> -Clay
>>>
>>> [1]: Perhaps some heuristics could allow for not needing to do 2n reads
>>> in all cases?
>>>
>>

Re: Cursor Query Loops Eternally with Local Index, Returns Fine Without It

2018-12-17 Thread Thomas D'Silva

Jack,

Can you please file a JIRA that includes you repro steps?

On Fri, Dec 14, 2018 at 2:33 AM Jack Steenkamp 
wrote:

> Hi All,
>
> I have come across a curious case with Phoenix (4.14.1) cursors where a
> particular query would carry on looping forever if executed when a local
> index is present. If however, I execute the same query without a local
> index on the table, then it works as expected.
>
> Please find attached a standalone test case that you should be able to run
> to reproduce this problem (though you may need to modify the JDBC_URL
> constant). You can run it either with CREATE_INDEX = true or CREATE_INDEX =
> false. Below are the outputs
>
> With : CREATE_INDEX = true;
>
> Connecting To : jdbc:phoenix:localhost:63214
> CREATE TABLE IF NOT EXISTS SOME_NUMBERS
> (
>ID VARCHARNOT NULL,
>NAME   VARCHAR,
>ANOTHER_VALUE  VARCHAR,
>TRANSACTION_TIME   TIMESTAMP  ,
>CONSTRAINT pk PRIMARY KEY(ID)
> ) IMMUTABLE_STORAGE_SCHEME=ONE_CELL_PER_COLUMN,
> UPDATE_CACHE_FREQUENCY=90,
> COLUMN_ENCODED_BYTES=NONE,
> IMMUTABLE_ROWS=true
>
> Creating Index: CREATE LOCAL INDEX index_01 ON SOME_NUMBERS(NAME,
> TRANSACTION_TIME DESC) INCLUDE(ANOTHER_VALUE)
>
> Inserting Some Items
>
> Doing Count SQL : SELECT COUNT(1) AS TOTAL_ITEMS FROM SOME_NUMBERS
> TOTAL COUNT : 10
>
> Cursor SQL : DECLARE MyCursor CURSOR FOR SELECT NAME,ANOTHER_VALUE FROM
> SOME_NUMBERS where NAME like 'ReferenceData.Country/%' ORDER BY
> TRANSACTION_TIME DESC
> ITEMS returned by count : 10 | Items Returned by Cursor : 40
> Aborting the Cursor, as it is more than the count!
> Exception in thread "main" java.lang.RuntimeException: The cursor returned
> a different number of rows from the count !!
>
> With : CREATE_INDEX = false;
>
> Connecting To : jdbc:phoenix:localhost:63214
> CREATE TABLE IF NOT EXISTS SOME_NUMBERS
> (
>ID VARCHARNOT NULL,
>NAME   VARCHAR,
>ANOTHER_VALUE  VARCHAR,
>TRANSACTION_TIME   TIMESTAMP  ,
>CONSTRAINT pk PRIMARY KEY(ID)
> ) IMMUTABLE_STORAGE_SCHEME=ONE_CELL_PER_COLUMN,
> UPDATE_CACHE_FREQUENCY=90,
> COLUMN_ENCODED_BYTES=NONE,
> IMMUTABLE_ROWS=true
>
> Not Creating the Index
>
> Inserting Some Items
>
> Doing Count SQL : SELECT COUNT(1) AS TOTAL_ITEMS FROM SOME_NUMBERS
> TOTAL COUNT : 10
>
> Cursor SQL : DECLARE MyCursor CURSOR FOR SELECT NAME,ANOTHER_VALUE FROM
> SOME_NUMBERS where NAME like 'ReferenceData.Country/%' ORDER BY
> TRANSACTION_TIME DESC
> CLOSING THE CURSOR
> Result : 0
> ITEMS returned by count : 10 | Items Returned by Cursor : 10
> ALL GOOD - No Exception
>
> Any idea what might be going on here?
>
> Regards,
>

Re: Rolling hourly data

2018-11-26 Thread Thomas D'Silva

The 1-day aggregate query you are running should work, you might have to
increase the client thread pool and queue size.
See
http://mail-archives.apache.org/mod_mbox/phoenix-user/201607.mbox/%3c577d338d.2080...@gmail.com%3E

On Tue, Nov 20, 2018 at 9:26 PM Monil Gandhi  wrote:

> Thanks for getting back.
> The query we run is always an aggregation query. The date range is anytime
> current to 2 years back.
> I was looking into UPSERT SELECT, but there does not seem to be an easy
> way to run it. For instance I thought running something as below would be
> optimal
>
> UPSERT INTO DAILY_PUBLISHER_V4(segment_id, cross_segment_id, day, reach)
> VALUES (SELECT segment_id, cross_segment_id, TO_DATE('2017-07-01',
> '-MM-dd', 'GMT') as day, CAST(SUM(reach) AS BIGINT) as reach FROM
> PUBLISHER_V4 WHERE day BETWEEN TO_DATE('2018-07-01', '-MM-dd', 'GMT')
> AND TO_DATE('2018-07-02', '-MM-dd', 'GMT') GROUP BY segment_id,
> cross_segment_id)
>
> Hence I tried running the select statement to see if it works. I started
> to see below error
> Error: Task
> org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask@a66e580
> rejected from org.apache.phoenix.job.JobManager$1@20de05e5[Running, pool
> size = 128, active threads = 128, queued tasks = 5000, completed tasks =
> 5153] (state=08000,code=101)
>
> When I changed the SELECT query to include a particular sid, the upsert
> select worked.
> Hence I think the only way would be for me to run UPSERt for generating
> daily data for range of sids or segment_id.
>
> Did I miss something?
>
> On Tue, Nov 20, 2018 at 9:59 AM Thomas D'Silva 
> wrote:
>
>> Since your PK already leads with (sid, day) I don't think adding a
>> secondary index will help. Do you generally always run the aggregation
>> query for the recently inserted data? The row timestamp feature might help
>> in this case
>> https://phoenix.apache.org/rowtimestamp.html
>> If you run the same aggregate queries multiple times then another
>> approach is to store the coarser daily aggregated data in a separate table
>> that you can populate using an UPSERT SELECT.
>> I'm not sure why the explain plan you attached has a CLIENT MERGE SORT,
>> since you don't have an order by.
>>
>> On Fri, Nov 16, 2018 at 9:44 AM Monil Gandhi  wrote:
>>
>>> Here it is
>>> CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER PUBLISHER_V4
>>> [-3790006267705642038,'2018-07-01 00:00:00.000'] - [-37900062677056420
>>> SERVER AGGREGATE INTO DISTINCT ROWS BY [SEGMENT_ID, CROSS_SEGMENT_ID]
>>> CLIENT MERGE SORT
>>>
>>> Note: we have a dedicated phoenix query server
>>>
>>> On Thu, Nov 15, 2018 at 5:23 PM Geoffrey Jacoby 
>>> wrote:
>>>
>>>> Monil,
>>>>
>>>> Could you please post the results of an EXPLAIN plan of your query? For
>>>> directions how to do this please see
>>>> http://phoenix.apache.org/explainplan.html
>>>>
>>>> Geoffrey Jacoby
>>>>
>>>> On Thu, Nov 15, 2018 at 5:02 PM Monil Gandhi 
>>>> wrote:
>>>>
>>>>> Hello,
>>>>> Currently we have hourly data in our phoenix table. However, the
>>>>> schema was designed to perform well for daily data. Increasing the number
>>>>> of rows by 24X has lead to degradation of our service over time.
>>>>> Our current schema is as follows
>>>>>
>>>>> CREATE TABLE IF NOT EXISTS T1 (sid BIGINT NOT NULL, day DATE NOT NULL,
>>>>> cid BIGINT NOT NULL, s.count INTEGER, CONSTRAINT PK PRIMARY KEY (sid, day,
>>>>> cid) ) COMPRESSION='SNAPPY'
>>>>>
>>>>> The query we run is something along the lines of
>>>>> SELECT sid, cid, CAST(SUM(count) AS BIGINT) AS count FROM PUBLISHER_V4
>>>>> WHERE sid IN (34535) AND day BETWEEN TO_DATE('2018-07-01', '-MM-dd',
>>>>> 'GMT') AND TO_DATE('2018-07-02', '-MM-dd', 'GMT') GROUP BY sid, cid
>>>>>
>>>>> Based on our investigation we have concluded that the main reason is
>>>>> purely the number of rows that are being read. I am open to other
>>>>> suggestions
>>>>>
>>>>> If number of rows is the case
>>>>> I am wondering if there is a way to either
>>>>> 1. to roll hourly data to daily using views, secondary index or map
>>>>> reduce. I know map reduce is possible.
>>>>> 2. migrate to a newer schema where cid is not part of pk and is
>>>>> actually a column family. I was unable to find any kind of documentation 
>>>>> on
>>>>> this.
>>>>>
>>>>> Thanks
>>>>> Monil
>>>>>
>>>>

Re: Phoenix 5.0 documentation

2018-11-20 Thread Thomas D'Silva

Phoenix 5.0 only works with HBase 2.0 see
https://www.mail-archive.com/dev@phoenix.apache.org/msg49675.html

On Tue, Nov 20, 2018 at 5:20 AM Alexandre Berthaud <
alexandre.berth...@clever-cloud.com> wrote:

> Hello everyone,
>
> Is there a version of the documentation for Phoenix 5.0 somewhere I
> haven't found? I have been fighting with the 5.0 version on top of a HBase
> 2.1.1 cluster which I then downgraded to 2.0.2 to no avail.
>
> After a couple of days, I abandoned. I just installed a HBase 1.4 cluster
> and tried Phoenix 4.14 and it worked right out of the box so I'm assuming I
> was missing something with the 5.0 version and it's not clear to me what it
> was.
>
> Any pointers?
>
> (the error which I kept running into was
> "java.lang.IncompatibleClassChangeError: Inconsistent constant pool data in
> classfile for class org/apache/hadoop/hbase/client/Row. Method
> lambda$static$28(Lorg/apache/hadoop/hbase/client/Row;Lorg/apache/hadoop/hbase/client/Row;)I
> at index 57 is CONSTANT_MethodRef and should be
> CONSTANT_InterfaceMethodRef")
>
> Thanks,
>

Re: Rolling hourly data

2018-11-20 Thread Thomas D'Silva

Since your PK already leads with (sid, day) I don't think adding a
secondary index will help. Do you generally always run the aggregation
query for the recently inserted data? The row timestamp feature might help
in this case
https://phoenix.apache.org/rowtimestamp.html
If you run the same aggregate queries multiple times then another approach
is to store the coarser daily aggregated data in a separate table that you
can populate using an UPSERT SELECT.
I'm not sure why the explain plan you attached has a CLIENT MERGE SORT,
since you don't have an order by.

On Fri, Nov 16, 2018 at 9:44 AM Monil Gandhi  wrote:

> Here it is
> CLIENT 1-CHUNK PARALLEL 1-WAY RANGE SCAN OVER PUBLISHER_V4
> [-3790006267705642038,'2018-07-01 00:00:00.000'] - [-37900062677056420
> SERVER AGGREGATE INTO DISTINCT ROWS BY [SEGMENT_ID, CROSS_SEGMENT_ID]
> CLIENT MERGE SORT
>
> Note: we have a dedicated phoenix query server
>
> On Thu, Nov 15, 2018 at 5:23 PM Geoffrey Jacoby 
> wrote:
>
>> Monil,
>>
>> Could you please post the results of an EXPLAIN plan of your query? For
>> directions how to do this please see
>> http://phoenix.apache.org/explainplan.html
>>
>> Geoffrey Jacoby
>>
>> On Thu, Nov 15, 2018 at 5:02 PM Monil Gandhi  wrote:
>>
>>> Hello,
>>> Currently we have hourly data in our phoenix table. However, the schema
>>> was designed to perform well for daily data. Increasing the number of rows
>>> by 24X has lead to degradation of our service over time.
>>> Our current schema is as follows
>>>
>>> CREATE TABLE IF NOT EXISTS T1 (sid BIGINT NOT NULL, day DATE NOT NULL,
>>> cid BIGINT NOT NULL, s.count INTEGER, CONSTRAINT PK PRIMARY KEY (sid, day,
>>> cid) ) COMPRESSION='SNAPPY'
>>>
>>> The query we run is something along the lines of
>>> SELECT sid, cid, CAST(SUM(count) AS BIGINT) AS count FROM PUBLISHER_V4
>>> WHERE sid IN (34535) AND day BETWEEN TO_DATE('2018-07-01', '-MM-dd',
>>> 'GMT') AND TO_DATE('2018-07-02', '-MM-dd', 'GMT') GROUP BY sid, cid
>>>
>>> Based on our investigation we have concluded that the main reason is
>>> purely the number of rows that are being read. I am open to other
>>> suggestions
>>>
>>> If number of rows is the case
>>> I am wondering if there is a way to either
>>> 1. to roll hourly data to daily using views, secondary index or map
>>> reduce. I know map reduce is possible.
>>> 2. migrate to a newer schema where cid is not part of pk and is actually
>>> a column family. I was unable to find any kind of documentation on this.
>>>
>>> Thanks
>>> Monil
>>>
>>

Re: Phoenix Query Taking Long Time to Execute

2018-11-14 Thread Thomas D'Silva

Can you describe your cluster setup and table definitions, types of queries
you are running etc.?


On Wed, Nov 14, 2018 at 12:40 AM, Azharuddin Shaikh <
azharuddins...@gmail.com> wrote:

> Hi All,
>
> We have hbase tables which consist of 4.4 Million records on which we are
> performing query using phoenix. Initially we were getting result within 10
> secs but the same query is now taking more than 30 secs to fetch the
> results due to which are application are getting timeout errors.
>
> Request you to please let us know if we need to Tune any hbase/phoenix
> parameter to reduce the time taken to fetch the result. How can we
> improvise on the performance of hbase/phoenix.
>
> We have Hbase 1.2.3 version and Phoenix 4.12 version.
>
> Your help is greatly appreciated.
>
> Thanks,
>
> Azhar
>

Re: Phoenix 4.14 - VIEW creation

2018-11-14 Thread Thomas D'Silva

You cannot create a view over multiple tables.

On Wed, Nov 14, 2018 at 3:49 AM, lkyaes  wrote:

> Hello,
>
> I wonder, if there already  some  way how to CREATE VIEW over multiply
> tables (with aggregation)?
>
> Br,
> Liubov
>
>
>
>

Re: How are Phoenix Arrays Stored in HBase?

2018-10-19 Thread Thomas D'Silva

Take a look at PArrayDataTypeEncoder appendValue() and encode(). For
variable length data types  we store the individual element's serialized
bytes with a separator and the the last part of the array contains the
offsets.
For fixed length data types we just store the individual elements.

On Fri, Oct 19, 2018 at 2:40 PM, William Shen 
wrote:

> Hi,
>
> Sorry if this is too basic of a question. I tried to look through the
> documentation but could not find the information. How are Phoenix Arrays
> stored in HBase, and in particular, how are varchar array stored?
>
> I tried to upsert data in phoenix, and compare the HBase value:
>
> '0' *0*\x00\x00\x00\x80\x01\x00\x00\x00\x04\x00\x00\x00\x01\x01
> '0', '0' 0\x000\x00\x00\x00\x80\x01\x80\x03\x00\x00\x00\x06\x00\x00\x00
> \x02\x01
> '1', '1' 1\x001\x00\x00\x00\x80\x01\x80\x03\x00\x00\x00\x06\x00\x00\x00
> \x02\x01
> '1', '23' 1\x0023\x00\x00\x00\x80\x01\x80\x03\x00\x00\x00\x07\x00\x00\x00
> \x02\x01
> [''] \x00\x00\x80\x01\x00\x00\x00\x02\x00\x00\x00\x01\x01
> ['', ''] \x00\x00\x80\x01\x80\x01\x00\x00\x00\x02\x00\x00\x00\x02\x01
>
> I was able to deduce that the varchar is stored with a zero byte
> terminator, and I think there is another part tracking how many items are
> in an array, but I could not quite figure out the array information in hex
> that follows... Can someone help me understand how the array is stored in
> HBase?
>
> Thank you!
>

Re: Salting based on partial rowkeys

2018-09-13 Thread Thomas D'Silva

For the usage example that you provided when you write data how does the
values of id_1, id_2 and other_key vary?
I assume id_1 and id_2 remain the same while other_key is monotonically
increasing, and thats why the table is salted.
If you create the salt bucket only on id_2 then wouldn't you run into
region server hotspotting during writes?

On Thu, Sep 13, 2018 at 8:02 PM, Jaanai Zhang 
wrote:

> Sorry, I don't understander your purpose. According to your proposal, it
> seems that can't achieve.  You need a hash partition, However,  Some things
> need to clarify that HBase is a range partition engine and the salt buckets
> were used to avoid hotspot, in other words, HBase as a storage engine can't
> support hash partition.
>
> 
>Jaanai Zhang
>Best regards!
>
>
>
> Gerald Sangudi  于2018年9月13日周四 下午11:32写道：
>
>> Hi folks,
>>
>> Any thoughts or feedback on this?
>>
>> Thanks,
>> Gerald
>>
>> On Mon, Sep 10, 2018 at 1:56 PM, Gerald Sangudi 
>> wrote:
>>
>>> Hello folks,
>>>
>>> We have a requirement for salting based on partial, rather than full,
>>> rowkeys. My colleague Mike Polcari has identified the requirement and
>>> proposed an approach.
>>>
>>> I found an already-open JIRA ticket for the same issue:
>>> https://issues.apache.org/jira/browse/PHOENIX-4757. I can provide more
>>> details from the proposal.
>>>
>>> The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N, whereas Mike
>>> proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .
>>>
>>> The benefit at issue is that users gain more control over partitioning,
>>> and this can be used to push some additional aggregations and hash joins
>>> down to region servers.
>>>
>>> I would appreciate any go-ahead / thoughts / guidance / objections /
>>> feedback. I'd like to be sure that the concept at least is not
>>> objectionable. We would like to work on this and submit a patch down the
>>> road. I'll also add a note to the JIRA ticket.
>>>
>>> Thanks,
>>> Gerald
>>>
>>>
>>

Re: Missing content in phoenix after writing from Spark

2018-09-12 Thread Thomas D'Silva

Is there a reason you didn't use the spark-connector to serialize your data?

On Wed, Sep 12, 2018 at 2:28 PM, Saif Addin  wrote:

> Thank you Josh! That was helpful. Indeed, there was a salt bucket on the
> table, and the key-column now shows correctly.
>
> However, the problem still persists in that the rest of the columns show
> as completely empty on Phoenix (appear correctly on Hbase). We'll be
> looking into this but if you have any further advice, appreciated.
>
> Saif
>
> On Wed, Sep 12, 2018 at 5:50 PM Josh Elser  wrote:
>
>> Reminder: Using Phoenix internals forces you to understand exactly how
>> the version of Phoenix that you're using serializes data. Is there a
>> reason you're not using SQL to interact with Phoenix?
>>
>> Sounds to me that Phoenix is expecting more data at the head of your
>> rowkey. Maybe a salt bucket that you've defined on the table but not
>> created?
>>
>> On 9/12/18 4:32 PM, Saif Addin wrote:
>> > Hi all,
>> >
>> > We're trying to write tables with all string columns from spark.
>> > We are not using the Spark Connector, instead we are directly writing
>> > byte arrays from RDDs.
>> >
>> > The process works fine, and Hbase receives the data correctly, and
>> > content is consistent.
>> >
>> > However reading the table from Phoenix, we notice the first character
>> of
>> > strings are missing. This sounds like it's a byte encoding issue, but
>> > we're at loss. We're using PVarchar to generate bytes.
>> >
>> > Here's the snippet of code creating the RDD:
>> >
>> > val tdd = pdd.flatMap(x => {
>> >val rowKey = PVarchar.INSTANCE.toBytes(x._1)
>> >for(i <- 0 until cols.length) yield {
>> >  other stuff for other columns ...
>> >  ...
>> >  (rowKey, (column1, column2, column3))
>> >}
>> > })
>> >
>> > ...
>> >
>> > We then create the following output to be written down in Hbase
>> >
>> > val output = tdd.map(x => {
>> >  val rowKeyByte: Array[Byte] = x._1
>> >  val immutableRowKey = new ImmutableBytesWritable(rowKeyByte)
>> >
>> >  val kv = new KeyValue(rowKeyByte,
>> >  PVarchar.INSTANCE.toBytes(column1),
>> >  PVarchar.INSTANCE.toBytes(column2),
>> >PVarchar.INSTANCE.toBytes(column3)
>> >  )
>> >  (immutableRowKey, kv)
>> > })
>> >
>> > By the way, we are using *KryoSerializer* in order to be able to
>> > serialize all classes necessary for Hbase (KeyValue, BytesWritable,
>> etc).
>> >
>> > The key of this table is the one missing data when queried from
>> Phoenix.
>> > So we guess something is wrong with the byte ser.
>> >
>> > Any ideas? Appreciated!
>> > Saif
>>
>

Re: Issue in upgrading phoenix : java.lang.ArrayIndexOutOfBoundsException: SYSTEM:CATALOG 63

2018-09-12 Thread Thomas D'Silva

can you attach the schema of your table? and the explain plan for select *
from mytable?

On Tue, Sep 11, 2018 at 10:24 PM, Tanvi Bhandari 
wrote:

> " mapped hbase tables to phoenix and created them explicitly from phoenix
> sqlline client. I first created schema corresponding to namespace and then
> tables." By this statement, I meant the same. I re-created my tables
> since I had the DDLs with me.
>
> After that I tried getting the count of records in my table which gave me
> 8 records (expected result). - *select count(*) from "myTable"*;
> But when I performed the *select * from "myTable";* it is not returning
> any result.
>
> On Wed, Sep 12, 2018 at 1:55 AM Thomas D'Silva 
> wrote:
>
>> Since you dropped all the system tables, all the phoenix metadata was
>> lost. If you have the ddl statements used to create your tables, you can
>> try rerunning them.
>>
>> On Tue, Sep 11, 2018 at 9:32 AM, Tanvi Bhandari > > wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>> I am trying to upgrade the phoenix binaries in my setup from phoenix-4.6
>>> (had optional concept of schema) to phoenix-4.14 (schema is a must in
>>> here).
>>>
>>> Earlier, I had the phoenix-4.6-hbase-1.1 binaries. When I try to run the
>>> phoenix-4.14-hbase-1.3 on the same data. Hbase comes up fine But when I try
>>> to connect to phoenix using sqline client,  I get the following error on
>>> *console*:
>>>
>>>
>>>
>>> 18/09/07 04:22:48 WARN ipc.CoprocessorRpcChannel: Call failed on
>>> IOException
>>>
>>> org.apache.hadoop.hbase.DoNotRetryIOException: 
>>> org.apache.hadoop.hbase.DoNotRetryIOException:
>>> SYSTEM:CATALOG: 63
>>>
>>> at org.apache.phoenix.util.ServerUtil.createIOException(
>>> ServerUtil.java:120)
>>>
>>> at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.
>>> getVersion(MetaDataEndpointImpl.java:3572)
>>>
>>> at org.apache.phoenix.coprocessor.generated.MetaDataProtos$
>>> MetaDataService.callMethod(MetaDataProtos.java:16422)
>>>
>>> at org.apache.hadoop.hbase.regionserver.HRegion.
>>> execService(HRegion.java:7435)
>>>
>>> at org.apache.hadoop.hbase.regionserver.RSRpcServices.
>>> execServiceOnRegion(RSRpcServices.java:1875)
>>>
>>> at org.apache.hadoop.hbase.regionserver.RSRpcServices.
>>> execService(RSRpcServices.java:1857)
>>>
>>> at org.apache.hadoop.hbase.protobuf.generated.
>>> ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
>>>
>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:
>>> 2114)
>>>
>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.
>>> java:101)
>>>
>>> at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
>>> RpcExecutor.java:130)
>>>
>>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
>>> java:107)
>>>
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>> Caused by: java.lang.ArrayIndexOutOfBoundsException: 63
>>>
>>> at org.apache.phoenix.schema.PTableImpl.init(PTableImpl.
>>> java:517)
>>>
>>> at org.apache.phoenix.schema.PTableImpl.(PTableImpl.
>>> java:421)
>>>
>>> at org.apache.phoenix.schema.PTableImpl.makePTable(
>>> PTableImpl.java:406)
>>>
>>> at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(
>>> MetaDataEndpointImpl.java:1046)
>>>
>>> at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.
>>> buildTable(MetaDataEndpointImpl.java:587)
>>>
>>>at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.loadTable(
>>> MetaDataEndpointImpl.java:1305)
>>>
>>> at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.
>>> getVersion(MetaDataEndpointImpl.java:3568)
>>>
>>> ... 10 more
>>>
>>>
>>>
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>> Method)
>>>
>>> at sun.reflect.NativeConstructorAccessorImpl.newInstance(
>>> NativeConstructorAccessorImpl.java:62)
>>>
>>> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
>>> DelegatingConstructorAccessorImpl.java:45)
>>>
>>>

Re: Issue in upgrading phoenix : java.lang.ArrayIndexOutOfBoundsException: SYSTEM:CATALOG 63

2018-09-11 Thread Thomas D'Silva

Since you dropped all the system tables, all the phoenix metadata was lost.
If you have the ddl statements used to create your tables, you can try
rerunning them.

On Tue, Sep 11, 2018 at 9:32 AM, Tanvi Bhandari 
wrote:

> Hi,
>
>
>
> I am trying to upgrade the phoenix binaries in my setup from phoenix-4.6
> (had optional concept of schema) to phoenix-4.14 (schema is a must in
> here).
>
> Earlier, I had the phoenix-4.6-hbase-1.1 binaries. When I try to run the
> phoenix-4.14-hbase-1.3 on the same data. Hbase comes up fine But when I try
> to connect to phoenix using sqline client,  I get the following error on
> *console*:
>
>
>
> 18/09/07 04:22:48 WARN ipc.CoprocessorRpcChannel: Call failed on
> IOException
>
> org.apache.hadoop.hbase.DoNotRetryIOException: 
> org.apache.hadoop.hbase.DoNotRetryIOException:
> SYSTEM:CATALOG: 63
>
> at org.apache.phoenix.util.ServerUtil.createIOException(
> ServerUtil.java:120)
>
> at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(
> MetaDataEndpointImpl.java:3572)
>
> at org.apache.phoenix.coprocessor.generated.MetaDataProtos$
> MetaDataService.callMethod(MetaDataProtos.java:16422)
>
> at org.apache.hadoop.hbase.regionserver.HRegion.
> execService(HRegion.java:7435)
>
> at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> execServiceOnRegion(RSRpcServices.java:1875)
>
> at org.apache.hadoop.hbase.regionserver.RSRpcServices.
> execService(RSRpcServices.java:1857)
>
> at org.apache.hadoop.hbase.protobuf.generated.
> ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
>
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>
> at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> RpcExecutor.java:130)
>
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.
> java:107)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 63
>
> at org.apache.phoenix.schema.PTableImpl.init(PTableImpl.java:517)
>
> at org.apache.phoenix.schema.PTableImpl.(PTableImpl.
> java:421)
>
> at org.apache.phoenix.schema.PTableImpl.makePTable(
> PTableImpl.java:406)
>
> at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(
> MetaDataEndpointImpl.java:1046)
>
> at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.buildTable(
> MetaDataEndpointImpl.java:587)
>
>at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.loadTable(
> MetaDataEndpointImpl.java:1305)
>
> at org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(
> MetaDataEndpointImpl.java:3568)
>
> ... 10 more
>
>
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(
> NativeConstructorAccessorImpl.java:62)
>
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> DelegatingConstructorAccessorImpl.java:45)
>
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>
> at org.apache.hadoop.ipc.RemoteException.instantiateException(
> RemoteException.java:106)
>
> at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(
> RemoteException.java:95)
>
> at org.apache.hadoop.hbase.protobuf.ProtobufUtil.
> getRemoteException(ProtobufUtil.java:326)
>
> at org.apache.hadoop.hbase.protobuf.ProtobufUtil.
> execService(ProtobufUtil.java:1629)
>
> at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(
> RegionCoprocessorRpcChannel.java:104)
>
> at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(
> RegionCoprocessorRpcChannel.java:94)
>
> at org.apache.hadoop.hbase.client.RpcRetryingCaller.
> callWithRetries(RpcRetryingCaller.java:136)
>
> at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel.
> callExecService(RegionCoprocessorRpcChannel.java:107)
>
> at org.apache.hadoop.hbase.ipc.CoprocessorRpcChannel.callMethod(
> CoprocessorRpcChannel.java:56)
>
> at org.apache.phoenix.coprocessor.generated.MetaDataProtos$
> MetaDataService$Stub.getVersion(MetaDataProtos.java:16739)
>
> at org.apache.phoenix.query.ConnectionQueryServicesImpl$5.call(
> ConnectionQueryServicesImpl.java:1271)
>
> at org.apache.phoenix.query.ConnectionQueryServicesImpl$5.call(
> ConnectionQueryServicesImpl.java:1263)
>
> at org.apache.hadoop.hbase.client.HTable$15.call(HTable.java:1736)
>
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
> *Region-server logs are as follows:

Re: TTL on a single column family in table

2018-09-04 Thread Thomas D'Silva

If you  set different TTLs for column families you can run into issues with
SELECT count(*) queries not working correctly (depending on which column
family is used to store the EMPTY_COLUMN_VALUE).

On Tue, Sep 4, 2018 at 10:56 AM, Sergey Soldatov 
wrote:

> What is the use case to set TTL only for a single column family? I would
> say that making TTL table wide is a mostly technical decision because in
> relational databases we operate with rows and supporting TTL for only some
> columns sounds a bit strange.
>
> Thanks,
> Sergey
>
> On Fri, Aug 31, 2018 at 7:43 AM Domen Kren  wrote:
>
>> Hello,
>>
>> we have situation where we would like to set TTL on a single column
>> family in a table. After getting errors while trying to do that trough a
>> phoenix command i found this issue, https://issues.apache.org/
>> jira/browse/PHOENIX-1409, where it said "TTL - James Taylor and I
>> discussed offline and we decided that for now we will only be supporting
>> for all column families to have the same TTL as the empty column family.
>> This means we error out if a column family is specified while setting TTL
>> property - both at CREATE TABLE and ALTER TABLE time. Also changes were
>> made to make sure that any new column family added gets the same TTL as the
>> empty CF."
>>
>> If i understand correctly, this was a design decision and not a technical
>> one. So my question is, if i change this configuration trough HBase API or
>> console, could there be potential problems that arise in phoenix?
>>
>> Thanks you and best regards,
>> Domen Kren
>>
>>
>>

Re: Unable to find cached index metadata

2018-09-02 Thread Thomas D'Silva

Is your cluster under heavy write load when you see these expceptions? How
long does it take to write a batch of mutations?
If its longer than the config value of maxServerCacheTimeToLiveMs you will
see the exception because the index metadata expired from the cache.


On Sun, Sep 2, 2018 at 4:02 PM, Batyrshin Alexander <0x62...@gmail.com>
wrote:

>   Hello all,
> We use mutable table with many indexes on it. On upserts we getting this
> error:
>
> o.a.phoenix.execute.MutationState - Swallowing exception and retrying
> after clearing meta cache on connection. java.sql.SQLException: ERROR 2008
> (INT10): Unable to find cached index metadata. ERROR 2008 (INT10): ERROR
> 2008 (INT10): Unable to find cached index metadata. key=8283602185356160420
> region=HISTORY,D\xEF\xBF\xBD\xEF\xBF\xBDNt\x1B\xEF\xBF\xBD\
> xEF\xBF\xBD\xEF\xBF\xBD5\x1E\x01W\x02\xEF\xBF\xBD$,1531781097243.
> 95d19923178a7d80fa55428b97816e3f.host=cloud016,60020,1535926087741 Index
> update failed
>
>
> Current config:
> phoenix-4.14.0-HBase-1.4
> phoenix.coprocessor.maxServerCacheTimeToLiveMs = 6
> ALTER TABLE HISTORY SET UPDATE_CACHE_FREQUENCY=6

Re: Is read-only user of Phoeix table possible?

2018-08-27 Thread Thomas D'Silva

I misspoke on my earlier email, the behavior I described is valid from
Phoenix 4.14+.
You are probably running in to PHOENIX-2717, which was fixed in 4.11




On Mon, Aug 27, 2018 at 9:06 AM, Josh Elser  wrote:

> Note, that the functionality that Thomas describes is how we intend
> Phoenix to work, and may not be how the 4.9 release of Phoenix works (due
> to changes that have been made).
>
> On 8/23/18 12:42 PM, Thomas D'Silva wrote:
>
>> On a new cluster, the first time a client connects is when the SYSTEM
>> tables are created. You need to connect with a user that has RWX on the
>> SYSTEM schema the very first time.
>> After that user1 should be able to connect. Also from the doc: Every user
>> requires '|RX|' permissions on all Phoenix|SYSTEM|tables in order to work
>> correctly. Users also require '|RWX|' permissions on|SYSTEM.SEQUENCE|table
>> for using|SEQUENCES|.
>>
>> On Wed, Aug 22, 2018 at 10:38 PM, Sumanta Gh > sumanta...@tcs.com>> wrote:
>>
>> Hi,
>>
>> I have a Kerberos enabled Hbase 1.2 cluster with Phoenix 4.9.
>> In hbase shell, I have granted an hbase user with permission R.
>>
>> hbase shell > grant 'user1', 'R'
>>
>> Now while connecting through SqlLine, I am getting the below error -
>>
>> Insufficient permissions (user=us...@example.com
>> <mailto:user=us...@example.com>, scope=default:SYSTEM.CATALOG,
>> params=[table=default:SYSTEM.CATALOG],action=CREATE
>>
>> Is the Phoenix client trying to CREATE a table everytime? For this
>> issue, I am not able to create an absolute read-only user of Hbase.
>> NB : All Phoenix tables are already created
>>
>> Kindly help to resolve this issue.
>>
>>
>> Regards
>> Sumanta
>>
>> =-=-=
>> Notice: The information contained in this e-mail
>> message and/or attachments to it may contain
>> confidential or privileged information. If you are
>> not the intended recipient, any dissemination, use,
>> review, distribution, printing or copying of the
>> information contained in this e-mail message
>> and/or attachments to it are strictly prohibited. If
>> you have received this communication in error,
>> please notify us by reply e-mail or telephone and
>> immediately and permanently delete the message
>> and any attachments. Thank you
>>
>>
>>

Re: Empty row when using OFFSET + LIMIT

2018-08-25 Thread Thomas D'Silva

Juan,

Can you please create a JIRA that allows us to repro this in a test?

Thanks,
Thomas

On Fri, Aug 24, 2018 at 9:12 PM, Juan Pablo Gardella <
gardellajuanpa...@gmail.com> wrote:

> Hi all,
>
> Today I faced a bug -I think-. I'm using Phoenix shipped
> at
> HDP 2.6.3
> . I
> have a table defined as:
>
> create table test (
>
>id VARCHAR not null primary key,
>
>json VARCHAR,
>
>nifi_capture_ts TIMESTAMP
>
> )
>
>
> It has 2559774 rows. If I execute the following query, it returns a row
> with a null value.
>
>
> select * from
>
> (
>
>SELECT ID
>
>FROM test
>
>LIMIT 10 OFFSET 10
>
> )
>
> where ID is null
>
>
> I was reviewing the git logs and I didn't see any commit related to
> that[1]. Notice the query for OFFSET and LIMIT lowers than 1 does not
> fail. I've attached a capture of the query results.
>
> [image: image.png]
> Notice if I execute SELECT ID FROM test WHERE ID IS NULL returns an empty
> result as expected.
>
> [image: image.png]
> Let me know if I should file a ticket at JIRA. For now I'm unable to
> verify with a newer server, but AFAIK there is no commit related to that
> from today until version 4.7.0.
>
> Thanks in advance,
> Juan
>
> [1] Similar but not equal is PHOENIX-3422
> . The results is no
> data instead of null row.
> 
>

Re: Disable NO_CACHE hint on query for LIMIT OFFSET paging queries

2018-08-23 Thread Thomas D'Silva

If the offset is very large you will end up scanning and filtering many
rows most of which won't be accessed again.
You need to test and see the performance of the NO_CACHE in your particular
use case to see if its helps.

On Wed, Aug 22, 2018 at 4:39 AM, Abhishek Gupta  wrote:

> The thing is that RVC will not work for case where the SQL query does
> aggregation on truncated primary key columns
> eg. SELECT pk1,
> TO_CHAR(CONVERT_TZ(TO_DATE(time_col,'MMddHHmm'),
> 'UTC','Asia/Kolkata'),'MMdd') as truncated_time_sample,
> SUM(col1) as agg_value,
>  FROM ... GROUP BY pk1, truncated_time_sample..LIMIT 1000 OFFSET 1000
>
> This is the reason to pick LIMIT OFFSET for pagination.
>
> From your answer it seems for OFFSET queries, all the previous pages rows
> are scanned aggregated to rows and rejected till offset. So in such a
> scenario wouldn't those rows in block cache help during the pagination
> duration.
>
> Thanks,
> Abhishek
>
> On Wed, Aug 22, 2018 at 12:07 AM Thomas D'Silva 
> wrote:
>
>> When you do an OFFSET Phoenix will scan rows and filter them out until it
>> reaches the offset count which can end up being very costly for large
>> offsets.
>> If you can use a RVC where the order matches the PK of the data table or
>> index the start key of the scan will be set based on the RVC, which is much
>> more efficient
>> (see http://phoenix.apache.org/paged.html).
>>
>> On Tue, Aug 21, 2018 at 8:06 AM, Abhishek Gupta 
>> wrote:
>>
>>> Hi,
>>>
>>> Could you help me understand how LIMIT OFFSET queries work under the
>>> hood in Phoenix, is the filtering out of rows done in heap or is there some
>>> sort of optimisation where it can skip at disk level.
>>> My idea about posting this question was to understand if the rows from
>>> paste pages of the query in block cache can optimize the subsequent page
>>> call that would use the cache hits for the filter out rows and not seek at
>>> disk.
>>>
>>> Thanks,
>>> Abhishek
>>>
>>> On Sat, Aug 18, 2018 at 4:16 AM Thomas D'Silva 
>>> wrote:
>>>
>>>> Shouldn't you pass the NO_CACHE hint for the LIMIT-OFFSET queries,
>>>> since you will be reading and filtering out lots of rows on the server?
>>>> I guess using the block cache for RVC queries might help depending on
>>>> how many rows you read per query, you should be able to easily test this
>>>> out.
>>>>
>>>> On Fri, Aug 17, 2018 at 4:25 AM, Abhishek Gupta 
>>>> wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> I am working on a use case where SQL aggregated queries are made such
>>>>> that RVC cannot be used (aggregation on truncated primary key columns)
>>>>> instead LIMIT-OFFSET has to be used. RVC is used for some user user cases
>>>>>
>>>>> Currently I have disabled BLOCKCACHE for the table. I wanted to check
>>>>> if it would be more performant to instead enable BLOCKCACHE on the
>>>>> table and pass NO_CACHE hint for RVC queries because it uses
>>>>> non-LIMIT-OFFSET scans and not pass NO_CACHE for the LIMIT-OFFSET
>>>>> queries so that for the subsequent page calls can leverage prior page data
>>>>> in block cache.
>>>>>
>>>>> Thanks,
>>>>> Abhishek
>>>>>
>>>>
>>>>
>>

Re: Is read-only user of Phoeix table possible?

2018-08-23 Thread Thomas D'Silva

On a new cluster, the first time a client connects is when the SYSTEM
tables are created. You need to connect with a user that has RWX on the
SYSTEM schema the very first time.
After that user1 should be able to connect. Also from the doc: Every user
requires 'RX' permissions on all Phoenix SYSTEM tables in order to work
correctly. Users also require 'RWX' permissions on SYSTEM.SEQUENCE table
for using SEQUENCES.

On Wed, Aug 22, 2018 at 10:38 PM, Sumanta Gh  wrote:

> Hi,
>
> I have a Kerberos enabled Hbase 1.2 cluster with Phoenix 4.9.
> In hbase shell, I have granted an hbase user with permission R.
>
>  hbase shell > grant 'user1', 'R'
>
> Now while connecting through SqlLine, I am getting the below error -
>
> Insufficient permissions (user=us...@example.com,
> scope=default:SYSTEM.CATALOG, params=[table=default:SYSTEM.
> CATALOG],action=CREATE
>
> Is the Phoenix client trying to CREATE a table everytime? For this issue,
> I am not able to create an absolute read-only user of Hbase.
> NB : All Phoenix tables are already created
>
> Kindly help to resolve this issue.
>
>
> Regards
> Sumanta
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>

Re: Disable NO_CACHE hint on query for LIMIT OFFSET paging queries

2018-08-21 Thread Thomas D'Silva

When you do an OFFSET Phoenix will scan rows and filter them out until it
reaches the offset count which can end up being very costly for large
offsets.
If you can use a RVC where the order matches the PK of the data table or
index the start key of the scan will be set based on the RVC, which is much
more efficient
(see http://phoenix.apache.org/paged.html).

On Tue, Aug 21, 2018 at 8:06 AM, Abhishek Gupta  wrote:

> Hi,
>
> Could you help me understand how LIMIT OFFSET queries work under the hood
> in Phoenix, is the filtering out of rows done in heap or is there some sort
> of optimisation where it can skip at disk level.
> My idea about posting this question was to understand if the rows from
> paste pages of the query in block cache can optimize the subsequent page
> call that would use the cache hits for the filter out rows and not seek at
> disk.
>
> Thanks,
> Abhishek
>
> On Sat, Aug 18, 2018 at 4:16 AM Thomas D'Silva 
> wrote:
>
>> Shouldn't you pass the NO_CACHE hint for the LIMIT-OFFSET queries, since
>> you will be reading and filtering out lots of rows on the server?
>> I guess using the block cache for RVC queries might help depending on how
>> many rows you read per query, you should be able to easily test this out.
>>
>> On Fri, Aug 17, 2018 at 4:25 AM, Abhishek Gupta 
>> wrote:
>>
>>> Hi Team,
>>>
>>> I am working on a use case where SQL aggregated queries are made such
>>> that RVC cannot be used (aggregation on truncated primary key columns)
>>> instead LIMIT-OFFSET has to be used. RVC is used for some user user cases
>>>
>>> Currently I have disabled BLOCKCACHE for the table. I wanted to check if
>>> it would be more performant to instead enable BLOCKCACHE on the table
>>> and pass NO_CACHE hint for RVC queries because it uses non-LIMIT-OFFSET
>>> scans and not pass NO_CACHE for the LIMIT-OFFSET queries so that for
>>> the subsequent page calls can leverage prior page data in block cache.
>>>
>>> Thanks,
>>> Abhishek
>>>
>>
>>

Re: Disable NO_CACHE hint on query for LIMIT OFFSET paging queries

2018-08-17 Thread Thomas D'Silva

Shouldn't you pass the NO_CACHE hint for the LIMIT-OFFSET queries, since
you will be reading and filtering out lots of rows on the server?
I guess using the block cache for RVC queries might help depending on how
many rows you read per query, you should be able to easily test this out.

On Fri, Aug 17, 2018 at 4:25 AM, Abhishek Gupta  wrote:

> Hi Team,
>
> I am working on a use case where SQL aggregated queries are made such that
> RVC cannot be used (aggregation on truncated primary key columns) instead
> LIMIT-OFFSET has to be used. RVC is used for some user user cases
>
> Currently I have disabled BLOCKCACHE for the table. I wanted to check if
> it would be more performant to instead enable BLOCKCACHE on the table and
> pass NO_CACHE hint for RVC queries because it uses non-LIMIT-OFFSET scans
> and not pass NO_CACHE for the LIMIT-OFFSET queries so that for the
> subsequent page calls can leverage prior page data in block cache.
>
> Thanks,
> Abhishek
>

Re: SALT_BUCKETS and PRIMARY KEY DESC

2017-02-27 Thread Thomas D'Silva

I was unable to repro this behavior with phoenix 4.9, maybe you can try
using a later version of phoenix?

On Mon, Feb 20, 2017 at 9:17 AM, Afshin Moazami 
wrote:

> Hi folks,
>
> I am not sure if I it is by design, or it is a [known] bug
> in phoenix-4.7.0-HBase-1.1.
> It looks like when I create a table with salt_buckets and primary key
> desc, the where clauses are not behaving as expected.
> The behaviour is normal in case that I drop the salt buckets or "desc" in
> primary key.
>
> Would you please assist?
>
>
> The followings are the queries that I am using:
>
> I am creating a simple table like this:
>
> Create table TEST.TEST (
>  id VARCHAR NOT NULL PRIMARY KEY DESC
>)
>
> SALT_BUCKETS = 60;
>
>
> And insert two records like these:
>
> UPSERT INTO TEST.TEST VALUES ('1487108338648-SAMPLE1');
> UPSERT INTO TEST.TEST VALUES ('1487108338648-SAMPLE2');
>
> When I am trying to select using ID, very strange behaviour will happen:
> 1. Select * behave as expected:
>
> 0: jdbc:phoenix:localhost>  SELECT * FROM TEST.TEST;
> ++
> |   ID   |
> ++
> | 1487108338648-SAMPLE1  |
> | 1487108338648-SAMPLE2  |
> ++
> 2 rows selected (0.071 seconds)
>
> 2. Select with = in where clause does not return any records:
>
> 0: jdbc:phoenix:localhost> SELECT * FROM TEST.TEST where id =
> '1487108338648-SAMPLE1';
> +-+
> | ID  |
> +-+
> +-+
> No rows selected (0.013 seconds)
>
> 3. Select with "in" for a single value doesn't return any records:
>
> 0: jdbc:phoenix:localhost> SELECT * FROM TEST.TEST where id in
> ('1487108338648-SAMPLE1');
> +-+
> | ID  |
> +-+
> +-+
> No rows selected (0.015 seconds)
>
> 4. Select with "in" for a multiple values return as expected
> 0: jdbc:phoenix:localhost> SELECT * FROM TEST.TEST where id in
> ('1487108338648-SAMPLE1', '1487108338648-SAMPLE2');
> ++
> |   ID   |
> ++
> | 1487108338648-SAMPLE1  |
> | 1487108338648-SAMPLE2  |
> ++
> 2 rows selected (0.05 seconds)
>
> 5. Select with "like" returns as expected:
> 0: jdbc:phoenix:localhost> SELECT * FROM TEST.TEST where id like
> '%1487108338648-SAMPLE1';
> ++
> |   ID   |
> ++
> | 1487108338648-SAMPLE1  |
> ++
> 1 row selected (0.068 seconds)
> 0: jdbc:phoenix:localhost> SELECT * FROM TEST.TEST where id like
> '1487108338648-SAMPLE1%';
> ++
> |   ID   |
> ++
> | 1487108338648-SAMPLE1  |
> ++
> 1 row selected (0.058 seconds)
>
> Best,
> Afshin
>
>

Re: Which statements are supported when using transactions?

2016-10-06 Thread Thomas D'Silva

Francis,

Can you please file a JIRA for this?

Thanks,
Thomas

On Thu, Oct 6, 2016 at 12:58 AM, F21  wrote:

> I just ran into the following scenario with Phoenix 4.8.1 and HBase 1.2.3.
>
> 1. Create a transactional table: CREATE TABLE schemas(version varchar not
> null primary key) TRANSACTIONAL=true
>
> 2. Confirm it exists/is created: SELECT * FROM schemas
>
> 3. Begin transaction.
>
> 4. Insert into schemas: UPSERT INTO schemas (version) VALUES
> ('some-release')
>
> 5. Also create a table: CREATE TABLE test_table (id integer not null
> primary key)
>
> 6. Commit the transaction.
>
> Once the transaction has been committed, I can see that test_table is
> created. However, the schemas table is missing the 'some-release' entry.
>
>
> Which statements do not support transactions? In MySQL, things like CREATE
> TABLE, etc are non-transactional and would implicitly commit. What is the
> case with Phoenix? It seems a bit odd that the table is created, but the
> UPSERT has not effect, event after committing the transaction.
>
> Cheers,
>
> Francis
>
>

Re: Tephra errors when trying to create a transactional table in Phoenix 4.8.0

2016-08-31 Thread Thomas D'Silva

Can you check the Transaction Manager logs and see if there are any error?
Also can you do a jps and see confirm the Transaction Manager is running ?

On Wed, Aug 31, 2016 at 2:12 AM, F21  wrote:

> Just another update. Even though the logs says that the transaction
> manager is not running, it is actually running.
>
> I confirmed this by checking the output of ps and connecting to the
> transaction manager:
>
> bash-4.3# ps
> PID   USER TIME   COMMAND
> 1 root   0:01 bash /run-hbase-phoenix.sh
>   137 hadoop 0:19 /usr/lib/jvm/java-1.8-openjdk/bin/java
> -Dproc_master -XX:OnOutOfMemoryError=kill -9 %p -XX:+UseConcMarkSweepGC
> -XX:PermSize=128m -XX:Ma
>   189 hadoop 0:08 /usr/lib/jvm/java-1.8-openjdk/bin/java
> -XX:+UseConcMarkSweepGC -cp /opt/hbase/bin/../lib/*:/opt/h
> base/bin/../conf/:/opt/hbase/phoenix-c
>   542 root   0:00 /bin/bash
>  9035 root   0:00 sleep 1
>  9036 root   0:00 ps
>
> bash-4.3# wget localhost:15165
> Connecting to localhost:15165 (127.0.0.1:15165)
> wget: error getting response: Connection reset by peer
>
>
> On 31/08/2016 3:25 PM, F21 wrote:
>
>> This only seems to be a problem when I have HBase running in fully
>> distributed mode (1 master, 1 regionserver and 1 zookeeper node in
>> different docker images).
>>
>> If I have HBase running in standalone mode with HBase and Phoenix and the
>> Query server in 1 docker image, it works correctly.
>>
>> On 31/08/2016 11:21 AM, F21 wrote:
>>
>>> I have HBase 1.2.2 and Phoenix 4.8.0 running on my HBase master running
>>> on alpine linux with OpenJDK JRE 8.
>>>
>>> This is my hbase-site.xml:
>>>
>>> 
>>> 
>>> 
>>>   
>>> hbase.rootdir
>>> hdfs://mycluster/hbase
>>>   
>>>   
>>> zookeeper.znode.parent
>>> /hbase
>>>   
>>>   
>>> hbase.cluster.distributed
>>> true
>>>   
>>>   
>>> hbase.zookeeper.quorum
>>> m9edd51-zookeeper.m9edd51
>>>   
>>>   
>>> data.tx.snapshot.dir
>>> /tmp/tephra/snapshots
>>>   
>>>   
>>> data.tx.timeout
>>> 60
>>>   
>>>   
>>> phoenix.transactions.enabled
>>> true
>>>   
>>> 
>>>
>>> I am able to start the master correctly. I am also able to create
>>> non-transactional table.
>>>
>>> However, if I create a transactional table, I get this error: ERROR
>>> [TTransactionServer-rpc-0] thrift.ProcessFunction: Internal error
>>> processing startShort
>>>
>>> This is what I see in the logs:
>>>
>>> 2016-08-31 01:08:33,560 WARN 
>>> [main-SendThread(m9edd51-zookeeper.m9edd51:2181)]
>>> zookeeper.ClientCnxn: Session 0x156de22abec0004 for server null, unexpected
>>> error, closing socket connection and attempting reconnect
>>> java.net.ConnectException: Connection refused
>>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl
>>> .java:717)
>>> at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientC
>>> nxnSocketNIO.java:361)
>>> at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.
>>> java:1081)
>>> 2016-08-31 01:08:33,616 INFO  [DefaultMetricsCollector STOPPING]
>>> metrics.DefaultMetricsCollector: Stopped metrics reporter
>>> 2016-08-31 01:08:33,623 INFO  [ThriftRPCServer]
>>> tephra.TransactionManager: Took 170.7 ms to stop
>>> 2016-08-31 01:08:33,623 INFO  [ThriftRPCServer] rpc.ThriftRPCServer: RPC
>>> server for TTransactionServer stopped.
>>> 2016-08-31 01:08:34,776 INFO 
>>> [main-SendThread(m9edd51-zookeeper.m9edd51:2181)]
>>> zookeeper.ClientCnxn: Opening socket connection to server
>>> m9edd51-zookeeper.m9edd51/172.18.0.2:2181. Will not attempt to
>>> authenticate using SASL (unknown error)
>>> 2016-08-31 01:08:34,777 INFO 
>>> [main-SendThread(m9edd51-zookeeper.m9edd51:2181)]
>>> zookeeper.ClientCnxn: Socket connection established to
>>> m9edd51-zookeeper.m9edd51/172.18.0.2:2181, initiating session
>>> 2016-08-31 01:08:34,778 INFO 
>>> [main-SendThread(m9edd51-zookeeper.m9edd51:2181)]
>>> zookeeper.ClientCnxn: Session establishment complete on server
>>> m9edd51-zookeeper.m9edd51/172.18.0.2:2181, sessionid =
>>> 0x156de22abec0004, negotiated timeout = 4
>>> 2016-08-31 01:08:34,783 INFO [leader-election-tx.service-leader]
>>> zookeeper.LeaderElection: Connected to ZK, running election:
>>> m9edd51-zookeeper.m9edd51 for /tx.service/leader
>>> 2016-08-31 01:08:34,815 INFO  [ThriftRPCServer] rpc.ThriftRPCServer:
>>> Starting RPC server for TTransactionServer
>>> 2016-08-31 01:08:34,815 INFO  [ThriftRPCServer] rpc.ThriftRPCServer:
>>> Running RPC server for TTransactionServer
>>> 2016-08-31 01:08:34,816 INFO  [ThriftRPCServer]
>>> server.TThreadedSelectorServerWithFix: Starting
>>> TThreadedSelectorServerWithFix
>>> 2016-08-31 01:08:34,822 INFO [leader-election-tx.service-leader]
>>> distributed.TransactionService: Transaction Thrift Service started
>>> successfully on m9edd51-hmaster1.m9edd51/172.18.0.12:15165
>>> 2016-08-31 01:10:42,830 ERROR [TTransactionServer-rpc-0]
>>>

Re: Emulating a true INSERT or UPDATE

2016-07-29 Thread Thomas D'Silva

Yes, you don't have to start a transaction explicitly. You don't need to
commit after you have done only reads. Transactions timeout depending
on the data.tx.timeout attribute (see
http://phoenix.apache.org/transactions.html).

On Thu, Jul 28, 2016 at 11:06 PM, Heather, James (ELS) <
james.heat...@elsevier.com> wrote:

> Thanks.
>
> What do you mean when you say the transaction starts automatically at (2)?
> Do you mean that you don't need to start a transaction explicitly?
>
> If that's right, does that mean that you need to commit even after you've
> only done read operations, so that Phoenix knows to close the transaction
> it's created for you?
>
> James
>
> On 28 July 2016 11:45:17 p.m. James Taylor <jamestay...@apache.org> wrote:
>
>> James,
>> Your logic looks correct, assuming that you have the complete row keys in
>> your SELECT statement. FYI, the transaction will start automatically at
>> (2). You can optimize this slightly by just doing a COUNT(*) instead of
>> returning the rows back to the client. For the UPDATE case, you'd throw if
>> the count doesn't match the number of rows you have. You'll also have the
>> added benefit that another client attempting to INSERT or UPDATE the same
>> rows at the same time would fail (that's the conflict detection piece that
>> Thomas mentioned).
>> Thanks,
>> James
>>
>> On Thu, Jul 28, 2016 at 2:46 PM, Thomas D'Silva <tdsi...@salesforce.com>
>> wrote:
>>
>>> If the table is transactional, you are guaranteed that if there are
>>> overlapping transactions that try to commit the same row one will succeed
>>> and the others will fail with an exception. There is also an additional
>>> cost to doing conflict detection at commit time.
>>>
>>>
>>> On Thu, Jul 28, 2016 at 8:18 AM, Heather, James (ELS) <
>>> james.heat...@elsevier.com> wrote:
>>>
>>>> What would I need to do in order to emulate an INSERT or UPDATE in
>>>> Phoenix, as opposed to an UPSERT?
>>>>
>>>>
>>>> Suppose I had a TRANSACTIONAL table. To do an INSERT, I then:
>>>>
>>>>1. Start a transaction
>>>>2. SELECT the relevant rows, and throw an error if the SELECT is
>>>>non-empty
>>>>3. UPSERT
>>>>4. Commit the transaction
>>>>
>>>> To do an UPDATE, I do the same, except that in step 2 I throw an error
>>>> if the SELECT is empty.
>>>>
>>>>
>>>> If all of the possible writes to those rows are enclosed in
>>>> transactions, will this avoid the race conditions and give me a true INSERT
>>>> and UPDATE (at a cost of having to make multiple queries, of course)?
>>>>
>>>>
>>>> The case I have in mind is where we might have DELETE and PATCH queries
>>>> coming into our API. With a back end that supports UPDATE, it's not a
>>>> problem if a DELETE and a PATCH come in at the same time: either the DELETE
>>>> succeeds and then the PATCH fails, or the PATCH succeeds and then the
>>>> DELETE succeeds. Either way, you end up with the row's being deleted. But
>>>> if we use an UPSERT statement for the PATCH, we have a problem: the DELETE
>>>> can succeed, and then the PATCH will still succeed, but it'll insert the
>>>> row back in again.
>>>>
>>>>
>>>> I'm unclear as to how to use transactions to guarantee the right
>>>> behaviour here.
>>>>
>>>>
>>>> James
>>>>
>>>> --
>>>>
>>>> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
>>>> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
>>>> Registered in England and Wales.
>>>>
>>>
>>>
>>
> --
>
> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
> Registered in England and Wales.
>

Re: Emulating a true INSERT or UPDATE

2016-07-28 Thread Thomas D'Silva

If the table is transactional, you are guaranteed that if there are
overlapping transactions that try to commit the same row one will succeed
and the others will fail with an exception. There is also an additional
cost to doing conflict detection at commit time.


On Thu, Jul 28, 2016 at 8:18 AM, Heather, James (ELS) <
james.heat...@elsevier.com> wrote:

> What would I need to do in order to emulate an INSERT or UPDATE in
> Phoenix, as opposed to an UPSERT?
>
>
> Suppose I had a TRANSACTIONAL table. To do an INSERT, I then:
>
>1. Start a transaction
>2. SELECT the relevant rows, and throw an error if the SELECT is
>non-empty
>3. UPSERT
>4. Commit the transaction
>
> To do an UPDATE, I do the same, except that in step 2 I throw an error if
> the SELECT is empty.
>
>
> If all of the possible writes to those rows are enclosed in transactions,
> will this avoid the race conditions and give me a true INSERT and UPDATE
> (at a cost of having to make multiple queries, of course)?
>
>
> The case I have in mind is where we might have DELETE and PATCH queries
> coming into our API. With a back end that supports UPDATE, it's not a
> problem if a DELETE and a PATCH come in at the same time: either the DELETE
> succeeds and then the PATCH fails, or the PATCH succeeds and then the
> DELETE succeeds. Either way, you end up with the row's being deleted. But
> if we use an UPSERT statement for the PATCH, we have a problem: the DELETE
> can succeed, and then the PATCH will still succeed, but it'll insert the
> row back in again.
>
>
> I'm unclear as to how to use transactions to guarantee the right behaviour
> here.
>
>
> James
>
> --
>
> Elsevier Limited. Registered Office: The Boulevard, Langford Lane,
> Kidlington, Oxford, OX5 1GB, United Kingdom, Registration No. 1982084,
> Registered in England and Wales.
>

Re: Using transaction in custom coprocessor

2016-04-20 Thread Thomas D'Silva

It is possible to use a transaction started by a client in a coprocessor.
The transaction is serialized as the TxConstants.TX_OPERATION_ATTRIBUTE_KEY
attribute on the operation.

On Wed, Apr 13, 2016 at 7:42 AM, Mohammad Adnan Raza 
wrote:

> Hello everyone,
>
> I have requirement to use transaction in my project. My requirement is - a
> client starts transaction and do some upsert operation. I'll have a
> coprocessor that will do few more puts. Now what I want is all these
> operation should go in single transaction. Meaning if client rollbacks her
> transaction my puts in coprocessor should also get rolledback.
>
> My question is . is it possible to use transaction started in client in
> coprocessors?
>
>
> --
>
> *With Best Regards,*
>
> *   Mohammad Adnan*
>

Re: Failed to dynamically load library 'libMJCloudConnector.so'.

2016-02-26 Thread Thomas D'Silva

What phoenix server jar version are you using? The 1.2.0 client jar is
probably too old to use with your server jar.
Its best if your client jar version is the same as the server jar.

On Thu, Feb 25, 2016 at 3:55 PM, Murugesan, Rani  wrote:
> Hi,
>
>
>
> I used the open source client - phoenix-4.5.2-HBase-1.0-client.jar , and
> able to connect, but get time out issue which we faced with
> phoenix-4.3.1-client.jar as well (but it is related to MicroStrategy
> application, which we may have a temporary work around.)
>
>
>
> But I would like to follow up on the issue with using
> phoenix-1.2.0-client.jar.
>
>
>
> Thanks
>
> Rani
>
>
>
> From: Murugesan, Rani [mailto:ranmu...@visa.com]
> Sent: Thursday, February 25, 2016 2:38 PM
> To: user@phoenix.apache.org
> Subject: Failed to dynamically load library 'libMJCloudConnector.so'.
>
>
>
> Hi,
>
>
>
> I am able to connect from MicroStrategy (on Linux -
> 3.10.0-327.3.1.el7.x86_64) to HBase(CDH-5.5.1) using Phoenix
> (phoenix-4.3.1-client.jar)
>
>
>
> I am unable to connect MicroStrategy (on Linux - 3.10.0-327.3.1.el7.x86_64)
> to HBase(CDH-5.5.1) using Phoenix (phoenix-1.2.0-client.jar)
>
>
>
> From java log:
>
> WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
> platform... using builtin-java classes where applicable.
>
>
>
> From MicroStrategy logs:
>
> Failed to dynamically load library 'libMJCloudConnector.so'. (Server message
> 305419896)  libMJCloudConnector.so: cannot open shared object file: No such
> file or directory (UNIX Dynamic Linking message -1).)
>
>
>
> Is there any way to get 'libMJCloudConnector.so'?
>
>
>
> Thanks
>
> Rani
>
>

Re: Multiple versions for single row key

2016-02-22 Thread Thomas D'Silva

PHOENIX-590 is currently unassigned, so I'm not sure when it will be
implemented. We are always looking for contributions.

On Mon, Feb 22, 2016 at 1:23 PM,  <kannan.ramanat...@barclays.com> wrote:
> Thanks Thomas. Do we have rough estimate of when PHOENIX-590 will be done?
>
> -Original Message-
> From: Thomas D'Silva [mailto:tdsi...@salesforce.com]
> Sent: Monday, February 22, 2016 14:59
> To: user@phoenix.apache.org
> Subject: Re: Multiple versions for single row key
>
> You need to set the versions attribute of the scan :
>  scan ‘t1’, {RAW => true, VERSIONS => 10}
>
> As James said in a previous post, getting all versions of a row using a 
> phoenix query is not implemented yet
> (https://issues.apache.org/jira/browse/PHOENIX-590)
>
> Thanks,
> Thomas
>
> On Mon, Feb 22, 2016 at 9:06 AM,  <kannan.ramanat...@barclays.com> wrote:
>> Hi James,
>>
>>
>>
>> I don’t quite get it to work or I didn’t understand how it’s supposed
>> to work.
>>
>>
>>
>> I have created a simple table to test this.
>>
>>
>>
>> Create table statement with support for 5 versions:
>>
>> create table MULTI_ROW_KEYS (ID VARCHAR PRIMARY KEY, PRICE DOUBLE)
>> VERSIONS=5
>>
>>
>>
>> Sample java code snippet to get connection using calendar:
>>
>> private static void executeQuery(Calendar cal, String sql) throws
>> SQLException {
>>
>> cal.add(Calendar.HOUR, 1);
>>
>> Properties props = new Properties();
>>
>> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB,
>> Long.toString(cal.getTimeInMillis()));
>>
>> connection = DriverManager.getConnection("jdbc:phoenix:" +
>> zooKeeperHostPort, props); // zooKeeperHostPort – host/port
>>
>> connection.createStatement().execute(sql);
>>
>> connection.commit();
>>
>> }
>>
>> Calendar cal = Calendar.getInstance();
>>
>> cal.set(2106, 1, 22, 10, 0, 0);
>>
>> executeQuery(cal, "UPSERT INTO MULTI_ROW_KEYS VALUES ('a', 100)");
>>
>> executeQuery(cal, "UPSERT INTO MULTI_ROW_KEYS VALUES ('a', 200)");
>>
>>
>>
>> After running first executeQuery, I ran scan command in HBase shell:
>>
>> hbase(main):017:0> scan 'MULTI_ROW_KEYS'
>>
>> ROW  COLUMN+CELL
>>
>> a   column=0:PRICE, timestamp=4296294000943,
>> value=\xC0Y\x00\x00\x00\x00\x00\x01
>>
>> a   column=0:_0, timestamp=4296294000943,
>> value=
>>
>> 1 row(s) in 0.0260 seconds
>>
>>
>>
>> This is an expected output (timestamp=4296294000943 => 2/22/2106,
>> 10:00:00 AM GMT-5:00)
>>
>>
>>
>> hbase(main):018:0> scan 'MULTI_ROW_KEYS'
>>
>> ROW  COLUMN+CELL
>>
>> a   column=0:PRICE, timestamp=4296297600943,
>> value=\xC0i\x00\x00\x00\x00\x00\x01
>>
>> a   column=0:_0, timestamp=4296297600943,
>> value=
>>
>> 1 row(s) in 0.0160 seconds
>>
>>
>>
>> I am expecting two rows but getting only one with updated timestamp
>> (timestamp=4296297600943 => 2/22/2106, 11:00:00 AM GMT-5:00).
>>
>>
>>
>> Why? Do I need to send additional parameters in scan command? Also
>> running select query on this table from SQuirreL doesn’t return any results.
>>
>>
>>
>> What should be the connection setting to select all the two rows using JDBC?
>>
>>
>>
>> Running query (select  * from MULTI_ROW_KEYS) with this connection
>> property (cal.set(2106, 1, 22, 11, 0, 0)), returns only first row (ID: a, 
>> PRICE:
>> 100.0) and
>>
>> Running query (select  * from MULTI_ROW_KEYS) with this connection
>> property (cal.set(2106, 1, 22, 12, 0, 0)), returns only first row (ID: a, 
>> PRICE:
>> 200.0)
>>
>>
>>
>> I tried setting the timestamp in the connection property to much later
>> date but returns only the last row.
>>
>>
>>
>>
>>
>> Thanks in advance
>>
>>
>>
>> Kannan.
>>
>>
>>
>>
>>
>>
>>
>> From: Ramanathan, Kannan: IT (NYK)
>> Sent: Thursday, February 11, 2016 13:20
>> To: user@phoenix.apache.org
>> Subject: RE: Multiple versions for single row key
>>
>>
>>
>> Thanks a lot James, I’ll try this.
>>
>>
>>
>> From: James Taylor [m

Re: Multiple versions for single row key

2016-02-22 Thread Thomas D'Silva

You need to set the versions attribute of the scan :
 scan ‘t1’, {RAW => true, VERSIONS => 10}

As James said in a previous post, getting all versions of a row using
a phoenix query is not implemented yet
(https://issues.apache.org/jira/browse/PHOENIX-590)

Thanks,
Thomas

On Mon, Feb 22, 2016 at 9:06 AM,   wrote:
> Hi James,
>
>
>
> I don’t quite get it to work or I didn’t understand how it’s supposed to
> work.
>
>
>
> I have created a simple table to test this.
>
>
>
> Create table statement with support for 5 versions:
>
> create table MULTI_ROW_KEYS (ID VARCHAR PRIMARY KEY, PRICE DOUBLE)
> VERSIONS=5
>
>
>
> Sample java code snippet to get connection using calendar:
>
> private static void executeQuery(Calendar cal, String sql) throws
> SQLException {
>
> cal.add(Calendar.HOUR, 1);
>
> Properties props = new Properties();
>
> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB,
> Long.toString(cal.getTimeInMillis()));
>
> connection = DriverManager.getConnection("jdbc:phoenix:" +
> zooKeeperHostPort, props); // zooKeeperHostPort – host/port
>
> connection.createStatement().execute(sql);
>
> connection.commit();
>
> }
>
> Calendar cal = Calendar.getInstance();
>
> cal.set(2106, 1, 22, 10, 0, 0);
>
> executeQuery(cal, "UPSERT INTO MULTI_ROW_KEYS VALUES ('a', 100)");
>
> executeQuery(cal, "UPSERT INTO MULTI_ROW_KEYS VALUES ('a', 200)");
>
>
>
> After running first executeQuery, I ran scan command in HBase shell:
>
> hbase(main):017:0> scan 'MULTI_ROW_KEYS'
>
> ROW  COLUMN+CELL
>
> a   column=0:PRICE, timestamp=4296294000943,
> value=\xC0Y\x00\x00\x00\x00\x00\x01
>
> a   column=0:_0, timestamp=4296294000943,
> value=
>
> 1 row(s) in 0.0260 seconds
>
>
>
> This is an expected output (timestamp=4296294000943 => 2/22/2106, 10:00:00
> AM GMT-5:00)
>
>
>
> hbase(main):018:0> scan 'MULTI_ROW_KEYS'
>
> ROW  COLUMN+CELL
>
> a   column=0:PRICE, timestamp=4296297600943,
> value=\xC0i\x00\x00\x00\x00\x00\x01
>
> a   column=0:_0, timestamp=4296297600943,
> value=
>
> 1 row(s) in 0.0160 seconds
>
>
>
> I am expecting two rows but getting only one with updated timestamp
> (timestamp=4296297600943 => 2/22/2106, 11:00:00 AM GMT-5:00).
>
>
>
> Why? Do I need to send additional parameters in scan command? Also running
> select query on this table from SQuirreL doesn’t return any results.
>
>
>
> What should be the connection setting to select all the two rows using JDBC?
>
>
>
> Running query (select  * from MULTI_ROW_KEYS) with this connection property
> (cal.set(2106, 1, 22, 11, 0, 0)), returns only first row (ID: a, PRICE:
> 100.0) and
>
> Running query (select  * from MULTI_ROW_KEYS) with this connection property
> (cal.set(2106, 1, 22, 12, 0, 0)), returns only first row (ID: a, PRICE:
> 200.0)
>
>
>
> I tried setting the timestamp in the connection property to much later date
> but returns only the last row.
>
>
>
>
>
> Thanks in advance
>
>
>
> Kannan.
>
>
>
>
>
>
>
> From: Ramanathan, Kannan: IT (NYK)
> Sent: Thursday, February 11, 2016 13:20
> To: user@phoenix.apache.org
> Subject: RE: Multiple versions for single row key
>
>
>
> Thanks a lot James, I’ll try this.
>
>
>
> From: James Taylor [mailto:jamestay...@apache.org]
> Sent: Thursday, February 11, 2016 12:43
> To: user@phoenix.apache.org
> Subject: Re: Multiple versions for single row key
>
>
>
> Hi Kannan,
>
>
>
> Yes, you can keep 3 versions of a cell in Phoenix (just add VERSIONS=3 to
> your DDL statement), however you'll only see one version when you query (by
> default, the latest - see [1] for how to see an earlier version).
> PHOENIX-590 (not implemented) is about seeing all versions..
>
>
>
> HTH,
>
> James
>
>
>
> [1]
> https://phoenix.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API
>
> On Thursday, February 11, 2016,  wrote:
>
> Any suggestions?
>
>
>
> From: Ramanathan, Kannan: IT (NYK)
> Sent: Wednesday, February 10, 2016 12:03
> To: user@phoenix.apache.org
> Subject: Multiple versions for single row key
>
>
>
> Hello,
>
>
>
> HBase tables support multiple versions (default is 3) for single row key. I
> am trying to see how efficiently this can be achieved in Phoenix (don’t want
> to create view on existing HBase table, just want to go with new Phoenix
> table).
>
>
>
> Is it better to create a separate secondary key column in Phoenix table
> which is of course not unique and setup secondary index on this column for
> faster querying?
>
>
>
> Also, is this JIRA related what I am asking?
>
>
>
> https://issues.apache.org/jira/browse/PHOENIX-590
>
>
>
> Sorry if this has been answered before and thanks in advance.
>
>
>
> Regards
>
> Kannan.
>
> ___
>
> This message is for information purposes only, it is

Re: Problem with String Concatenation with Fields

2016-02-17 Thread Thomas D'Silva

Steve,

That is a bug, can you please file a JIRA.

Thanks,
Thomas

On Wed, Feb 17, 2016 at 3:34 PM, Steve Terrell  wrote:
> Can someone please tell me if this is a bug in Phoenix 4.6.0 ?
>
> This works as expected:
> 0: jdbc:phoenix:localhost> select * from BUGGY where
> ('tortilla'||F2)='tortillachip';
> PK1  0
> F1   tortilla
> F2   chip
>
> But this does not:
> 0: jdbc:phoenix:localhost> select * from BUGGY where
> (F1||F2)='tortillachip';
> No rows selected (0.01 seconds)
>
> If it is a bug, is there a work-around?
>
> Here's how I built my table in case it's part of the problem:
> create table BUGGY (PK1 integer, F1 varchar, F2 varchar, constraint PK
> primary key (PK1));
> upsert into BUGGY values(0, 'tortilla', 'chip');
>
> Thank you,
> Steve

Re: to_date not working as expected

2016-02-02 Thread Thomas D'Silva

I have a patch out for PHOENIX-1769
<https://issues.apache.org/jira/browse/PHOENIX-1769> . The issue is that
the jruby-complete-1.6.8.jar that is present in then hbase lib directory
contains jodatime 1.6 classes which get picked up instead of the 2.7
classes that are included in the phoenix server jar. You could try removing
this jar from the lib directory (from all the servers on your cluster), I
think its only required for the hbase shell.

On Tue, Feb 2, 2016 at 11:41 AM, Binu Mathew <bmat...@groupon.com> wrote:

> Hi Thomas,
>
> Any update on this?
>
> If and when a patch becomes available, can you please update this thread.
>
> This is a blocker:
> 1. We have existing HBase tables that users do not necessarily want to
> port over to Phoenix tables
> 2. Building Phoenix views to the existing HBase tables is a better
> solution
> 3. The inability to cast the date data types in the WHERE clause and
> compare against another date prevents users from performance range scans.
>
> Unfortunately, we are unable to upgrade our Hortonworks Hadoop
> distribution, HDP 2.3, to use Phoenix 4.6.
>
> HDP 2.3 came default with Phoenix 4.4 and we can't seem to upgrade a
> single package, 4.4, to 4.6 on the HDP 2.3 distribution.
>
> Can you provide us with a patch to resolve this issue?
>
> Thanks,
>
> On Sat, Jan 30, 2016 at 11:45 AM, Thomas D'Silva <tdsi...@salesforce.com>
> wrote:
>
>> Binu,
>>
>> I am able to repro the issue by manually running the test from the patch
>> from https://issues.apache.org/jira/browse/PHOENIX-1769 .
>> I will investigate further.
>>
>> Thanks,
>> Thomas
>>
>>
>> On Fri, Jan 29, 2016 at 4:26 PM, Binu Mathew <bmat...@groupon.com> wrote:
>>
>>> That doesn't seem to work.
>>>
>>> Phoenix is not recognizing that created_at in the WERE clause is a
>>> derived column from to_date("created_at_date").
>>>
>>> Some relational databases support this type of functionality.
>>>
>>> On Fri, Jan 29, 2016 at 4:16 PM, Alok Singh <a...@cloudability.com>
>>> wrote:
>>>
>>>> Does this work:
>>>>
>>>> select 1, to_date("created_at_date") as created_at from
>>>> "gp_subscriptions" where created_at > to_date('2010-10-10') limit 3;
>>>>
>>>> Alok
>>>>
>>>> Alok
>>>>
>>>> a...@cloudability.com
>>>>
>>>> On Fri, Jan 29, 2016 at 3:54 PM, Binu Mathew <bmat...@groupon.com>
>>>> wrote:
>>>>
>>>>> Thank you for the reply.
>>>>>
>>>>> I mistakenly wrote that we are using Phoenix with HBase .96. This was
>>>>> a typo. We are using HBase .98 with Phoenix 4.4
>>>>>
>>>>> I tried the UNSIGNED types and still encountering the same issue.
>>>>>
>>>>> My field has the following data:
>>>>>
>>>>> select "created_at_date" from "gp_subscriptions" limit 3;
>>>>>
>>>>> +--+
>>>>> | created_at_date  |
>>>>> +--+
>>>>> | 2012-11-22   |
>>>>> | 2012-11-22   |
>>>>> | 2012-11-26   |
>>>>> +--+
>>>>>
>>>>> to_date function works when I convert the VARCHAR field to a DATE:
>>>>>
>>>>> select to_date("created_at_date") from "gp_subscriptions" limit 3;
>>>>>
>>>>> +---+
>>>>> | TO_DATE(subscriber."created_at_date", null, null) |
>>>>> +---+
>>>>> | 2012-11-22 00:00:00.000   |
>>>>> | 2012-11-22 00:00:00.000   |
>>>>> | 2012-11-26 00:00:00.000   |
>>>>> +---+
>>>>>
>>>>>
>>>>> However, I can't use the to_date function in the WHERE clause:
>>>>>
>>>>> select 1 from "gp_subscriptions" where to_date("created_at_date") >
>>>>> to_date('2010-10-10') limit 3;
>>>>>
>>>>> java.lang.RuntimeException:
>>&

Re: to_date not working as expected

2016-01-30 Thread Thomas D'Silva

Binu,

I am able to repro the issue by manually running the test from the patch
from https://issues.apache.org/jira/browse/PHOENIX-1769 .
I will investigate further.

Thanks,
Thomas

On Fri, Jan 29, 2016 at 4:26 PM, Binu Mathew  wrote:

> That doesn't seem to work.
>
> Phoenix is not recognizing that created_at in the WERE clause is a derived
> column from to_date("created_at_date").
>
> Some relational databases support this type of functionality.
>
> On Fri, Jan 29, 2016 at 4:16 PM, Alok Singh  wrote:
>
>> Does this work:
>>
>> select 1, to_date("created_at_date") as created_at from
>> "gp_subscriptions" where created_at > to_date('2010-10-10') limit 3;
>>
>> Alok
>>
>> Alok
>>
>> a...@cloudability.com
>>
>> On Fri, Jan 29, 2016 at 3:54 PM, Binu Mathew  wrote:
>>
>>> Thank you for the reply.
>>>
>>> I mistakenly wrote that we are using Phoenix with HBase .96. This was a
>>> typo. We are using HBase .98 with Phoenix 4.4
>>>
>>> I tried the UNSIGNED types and still encountering the same issue.
>>>
>>> My field has the following data:
>>>
>>> select "created_at_date" from "gp_subscriptions" limit 3;
>>>
>>> +--+
>>> | created_at_date  |
>>> +--+
>>> | 2012-11-22   |
>>> | 2012-11-22   |
>>> | 2012-11-26   |
>>> +--+
>>>
>>> to_date function works when I convert the VARCHAR field to a DATE:
>>>
>>> select to_date("created_at_date") from "gp_subscriptions" limit 3;
>>>
>>> +---+
>>> | TO_DATE(subscriber."created_at_date", null, null) |
>>> +---+
>>> | 2012-11-22 00:00:00.000   |
>>> | 2012-11-22 00:00:00.000   |
>>> | 2012-11-26 00:00:00.000   |
>>> +---+
>>>
>>>
>>> However, I can't use the to_date function in the WHERE clause:
>>>
>>> select 1 from "gp_subscriptions" where to_date("created_at_date") >
>>> to_date('2010-10-10') limit 3;
>>>
>>> java.lang.RuntimeException:
>>> org.apache.phoenix.exception.PhoenixIOException:
>>> org.apache.phoenix.exception.PhoenixIOException:
>>> org.apache.hadoop.hbase.DoNotRetryIOException:
>>> java.lang.reflect.InvocationTargetException
>>>
>>> Another issue is how Phoenix evaluates dates. In the queries below, I'm
>>> evaluating string literals:
>>>
>>> 2009-05-05 is greater than (in the future) than 1970-05-05
>>>
>>> The following query should return 2 rows, however, it does not return
>>> any rows:
>>>
>>> select '1' from "gp_subscriptions" where to_date('2009-05-05') >
>>> to_date('1970-05-05') limit 2;
>>> No rows selected (0.024 seconds)
>>>
>>>
>>> The following query should return no rows, however, it returns 2 rows:
>>>
>>> select '1' from "gp_subscriptions" where to_date('2009-05-05') <
>>> to_date('1970-05-05') limit 2;
>>> 2 rows selected (0.033 seconds)
>>>
>>> Thanks,
>>>
>>> On Thu, Jan 28, 2016 at 8:07 PM, James Taylor 
>>> wrote:
>>>
 Hi Binu,
 Phoenix has never supported HBase 0.96, so I'm not sure where you got
 the release from.

 I recommend upgrading to a later, supported version of HBase and a
 later version of Phoenix. Give the 4.7.0 RC a try.

 One other tip in particular for views you create over existing HBase
 tables. Use the UNSIGNED types documented here[1] as these use the same
 serialization as the Bytes methods provided by HBase. If you tell Phoenix
 the wrong type, it won't know so would produce erroneous data and queries.

 Thanks,
 James

 [1] https://phoenix.apache.org/language/datatypes.html
 

 On Thu, Jan 28, 2016 at 5:57 PM, Binu Mathew 
 wrote:

> Phoenix version 4.4.0
>
> Issues with Phoenix when used with HBase 0.96.0.2.0
>
> 2 Issues:
>
> *ISSUE:* to_date Function is not converting string data types in
> valid date formats to a DATE data type when used in the WHERE clause for
> date comparison.
>
> Below is a query I ran against a Phoenix view in which I use the
> ‘to_date’ function to convert 2 VARCHAR columns to date.
> 1. column ‘created_at_ts’ stored as VARCHAR in format such as
> 2009-05-05 15:40:10.000
> 2. column ‘created_at_date’ stored as VARCHAR in format such as
> 2009-05-05
>
> Observe that the ‘to_date’ function coverts the 2 VARCHAR columns to
> dates:

Re: Questions: history of deleted records, controlling timestamps

2015-12-18 Thread Thomas D'Silva

John,

You can use a connection with a scn to ensure all changes are written
with the specified time stamp
https://phoenix.apache.org/faq.html#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API

We are also working on transaction support using Tephra for the
upcoming 4.7 release. All changes within a transaction will either
complete successfully or fail. However we do not support adding
additional metadata to a transaction. Transactions also cannot be used
with the SCN feature.

Thanks,
Thomas

On Fri, Dec 18, 2015 at 6:56 AM, John Lilley <john.lil...@redpoint.net> wrote:
> Thanks Thomas!
>
> What I'm trying to accomplish with "a set of changes at the same timestamp" 
> is several things.  Basically I'm trying to implement a "versioned database" 
> in which a set of changes to tables are grouped into a "changeset" that we 
> can tag with additional information:
> -- Associate changes across multiple tables (I know this could be done by 
> adding an additional index to the tables, but using timestamp for this 
> purpose would kill two birds with one stone)
> -- Have a clear "transactionlike" event on which we can hang additional 
> meta-data
> -- Make the set of changes all appear to have happened "at the same time"
> -- Hopefully, be able to undo all of the changes of a changeset.
>
> Thanks
> John
>
> -Original Message-
> From: Thomas D'Silva [mailto:tdsi...@salesforce.com]
> Sent: Thursday, December 17, 2015 7:56 PM
> To: user@phoenix.apache.org
> Subject: Re: Questions: history of deleted records, controlling timestamps
>
> John,
>
> If you enable KEEP_DELETED_CELLS on the underlying HBase table you will be 
> able to see deleted data (See 
> http://hbase.apache.org/0.94/book/cf.keep.deleted.html ) Could you describe 
> what you mean by making a set of changes at the same timestamp?
>
> Thanks,
> Thomas
>
> On Thu, Dec 17, 2015 at 4:50 PM, John Lilley <john.lil...@redpoint.net> wrote:
>> Greetings,
>>
>>
>>
>> I’ve been reading about Phoenix with an eye toward implementing a
>> “versioned database” on Hadoop.  It looks pretty slick, especially the
>> ability to query at past timestamp.  But I can’t figure out what
>> happens with deleted records.  Are all versions deleted, or can I
>> still go back in time and see the versions before the delete?
>>
>>
>>
>> Also I would like to be able to make a set of changes “at the same
>> timestamp” to a get a changeset-like ability similar to a VCS.  It
>> looks like the APIs allow for setting of the effective timestamp for
>> all change operations; is that true?
>>
>>
>>
>> Thanks
>>
>> John Lilley
>>
>>

Re: How do How do the operation Tephra

2015-12-03 Thread Thomas D'Silva

Phoenix is working on supporting transactions using tephra (see
PHOENIX-1674). If you want to use HBase directly with tephra, the
tephra website has a getting started guide with an example (see
https://github.com/caskdata/tephra).

-Thomas

On Thu, Dec 3, 2015 at 1:02 AM, Hardika Catur Sapta
 wrote:
> Hay guys,
> We plan to integrate with Apache HBase apache tephra, which we know it uses
> apache phoenix.
>
> How do the operation Tephra? whether using apache phoenix ?? and how to run
> it.
>
> Thank you,
> -Hardika C. S.

Re: Drop in throughput

2015-10-13 Thread Thomas D'Silva

Sumanta,

Phoenix resolves the table for every SELECT. For UPSERT it resolves
the table once at commit time.
We have a JIRA in the txn branch where if you specify an SCN it will
cache the table and look it up from the cache
https://issues.apache.org/jira/browse/PHOENIX-1812
This will be available once we merge the txn branch back.

Thanks,
Thomas

On Tue, Oct 13, 2015 at 12:00 AM, Sumanta Gh  wrote:
> Hi,
> I was experimenting the reasons why there is significant drop in through-put
> when I give mixed workload of 70% Upsert + 30% Selects over 100% Upserts or
> 100% Selects.
>
> While doing that I find that the following line is getting printed every
> time in my log :-
>
> "Re-resolved stale table MY_TABLE with seqNum 0 at timestamp 1443618964162
> with 22 columns: [...]"
>
> Can it be avoided for each SQL query.
>
> regards,
> Sumanta
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you

Re: [ANNOUNCE] New Apache Phoenix committer - Jan Fernando

2015-09-29 Thread Thomas D'Silva

Congrats Jan!

On Tue, Sep 29, 2015 at 11:23 AM, Eli Levine  wrote:
> On behalf of the Apache Phoenix project I am happy to welcome Jan Fernando
> as a committer. Jan has been an active user and contributor to Phoenix in
> the last couple of years. Some of his major contributions are:
> 1) Worked deeply in the sequence code including implementing Bulk Sequence
> Allocation: PHOENIX-1954 and debugging and fixing several tricky Sequence
> Bugs:
> PHOENIX-2149, PHOENIX-1096.
> 2) Implemented DROP TABLE...CASCADE to support tenant-specific views being
> dropped: PHOENIX-1098.
> 3) Worked closely with Cody and Mujtaba in the design of the interfaces for
> Pherf and contributed patches to increase support for tenant-specific use
> cases: PHOENIX-1791, PHOENIX-2227 Pioneered creating Pherf scenarios at
> Salesforce.
> 4) Worked closely with Samarth on requirements and API design and
> validation for Phoenix global- and query-level metrics:
> PHOENIX-1452, PHOENIX-1819 to get better visibility into Phoenix internals.
>
> Look forward to continuing working with Jan on Apache Phoenix!
>
> Thanks,
>
> Eli Levine
> elilev...@apache.org

Re: AbstractMethodError

2015-09-21 Thread Thomas D'Silva

Sumit,

I tested out HBase 0.98.6 and the Phoenix 4.5.1 server jar and it
worked for me. The PhoenixRpcSchedulerFactory.create(Configuration
conf, PriorityFunction priorityFunction, Abortable abortable)
signature is used for HBase 1.0 and 1.1 versions. The
create(Configuration conf, RegionServerServices services) signature is
for HBase 0.98.

Since the 3 argument method is being called, can you double check the
HBase version you are using?

Thanks,
Thomas

On Mon, Sep 21, 2015 at 7:39 AM, Sumit Nigam  wrote:
> Hi,
>
> I am using Hbase 0.98.6 and Phoenix 4.5.1 and this combination. As per
> phoenix Jira and some
> blogs(https://phoenix.apache.org/secondary_indexing.html), these should also
> be affected by deadlock occurring during index maintenance for global
> indexes (https://issues.apache.org/jira/browse/PHOENIX-938).
>
> We are having phoenix-4.5.1-HBase-0.98-server.jar and
> phoenix-core-4.5.1-HBase-0.98.jar in our classpath. To resolve it, we have
> following properties configured in hbase-site.xml of masters/ region
> servers:
>
>
>   hbase.region.server.rpc.scheduler.factory.class
>   org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory
>
>   hbase.rpc.controllerfactory.class
>
> org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory
>
> At startup, we get the following error:
>
> 2015-09-21 16:47:48,686 ERROR [main] master.HMasterCommandLine: Master
> exiting
> java.lang.RuntimeException: Failed construction of Master: class
> org.apache.hadoop.hbase.master.HMaster
> at
> org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2290)
>
> Caused by: java.lang.AbstractMethodError:
> org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory.create(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/hbase/ipc/PriorityFunction;Lorg/apache/hadoop/hbase/Abortable;)Lorg/apache/hadoop/hbase/ipc/RpcScheduler;
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.(RSRpcServices.java:863)
> at
> org.apache.hadoop.hbase.master.MasterRpcServices.(MasterRpcServices.java:210)
> at
> org.apache.hadoop.hbase.master.HMaster.createRpcServices(HMaster.java:532)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:531)
> at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:364)
>
> It seems the create method with 3 arguments is not present in those versions
> of the phoenix jars?
>
> Thanks,
> Sumit
>

Re: Issue while joining data using pheonix

2015-08-13 Thread Thomas D'Silva

There isn't a way to apply this config using sqlline. It is a server
side config and should be added to the hbases-site.xml on each region
server.

On Tue, Aug 11, 2015 at 10:21 PM, Nipur Patodi npat...@brocade.com wrote:
 Hey Thomas,

 I made those changes in hbase-site.xml on each region server. I have 
 crosschecked and looks like this file is in class path of sqlline.py. But 
 still looks like updated config are not picked. Is there any way to apply 
 these  config ( by cmdline if possible) in phoenix sqlline?

 Thanks,

 _Nipur

 -Original Message-
 From: Thomas D'Silva [mailto:tdsi...@salesforce.com]
 Sent: Wednesday, August 12, 2015 1:20 AM
 To: user@phoenix.apache.org
 Subject: Re: Issue while joining data using pheonix

 Nipur,

 Are you sure the config change is getting picked up? The exception says the 
 maximum allowed size is (104857664 bytes ~ 0.1GB) not 1GB.

 Thanks,
 Thomas

 On Tue, Aug 11, 2015 at 12:43 AM, Nipur Patodi er.nipur.pat...@gmail.com 
 wrote:
 Hi All,

 I am trying to join data in hbase phoenix tables. How ever I am
 getting this exception.

 Error: Encountered exception in sub plan [0] execution.
 (state=,code=0)
 java.sql.SQLException: Encountered exception in sub plan [0] execution.
 at
 org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:157
 )
 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:
 251)
 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:
 241) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
 at
 org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement
 .java:240)
 at
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java
 :1250) at sqlline.Commands.execute(Commands.java:822)
 at sqlline.Commands.sql(Commands.java:732)
 at sqlline.SqlLine.dispatch(SqlLine.java:808)
 at sqlline.SqlLine.begin(SqlLine.java:681)
 at sqlline.SqlLine.start(SqlLine.java:398)
 at sqlline.SqlLine.main(SqlLine.java:292)
 Caused by:
 org.apache.phoenix.join.MaxServerCacheSizeExceededException: Size of
 hash cache (104857664 bytes) exceeds the maximum allowed size
 (104857600
 bytes)
 at
 org.apache.phoenix.join.HashCacheClient.serialize(HashCacheClient.java
 :109)
 at
 org.apache.phoenix.join.HashCacheClient.addHashCache(HashCacheClient.j
 ava:82)
 at
 org.apache.phoenix.execute.HashJoinPlan$HashSubPlan.execute(HashJoinPl
 an.java:339) at
 org.apache.phoenix.execute.HashJoinPlan$1.call(HashJoinPlan.java:136)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at
 org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobMan
 ager.java:172)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
 ava:1142)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
 java:617) at java.lang.Thread.run(Thread.java:745)

 I also tried changing phoenix.query.maxServerCacheBytes to 1GB and
 phoenix.query.maxGlobalMemoryPercentage to 40 in hbase-site.xml ( as I
 am using ambari). but it desn't worked.

 Any suggestion would be extremely helpful.

 Thanks,
 _Nipur

Re: Issue while joining data using pheonix

2015-08-11 Thread Thomas D'Silva

Nipur,

Are you sure the config change is getting picked up? The exception
says the maximum allowed size is (104857664 bytes ~ 0.1GB) not 1GB.

Thanks,
Thomas

On Tue, Aug 11, 2015 at 12:43 AM, Nipur Patodi
er.nipur.pat...@gmail.com wrote:
 Hi All,

 I am trying to join data in hbase phoenix tables. How ever I am getting this
 exception.

 Error: Encountered exception in sub plan [0] execution. (state=,code=0)
 java.sql.SQLException: Encountered exception in sub plan [0] execution.
 at org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:157)
 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:251)
 at
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:241)
 at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
 at
 org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:240)
 at
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1250)
 at sqlline.Commands.execute(Commands.java:822)
 at sqlline.Commands.sql(Commands.java:732)
 at sqlline.SqlLine.dispatch(SqlLine.java:808)
 at sqlline.SqlLine.begin(SqlLine.java:681)
 at sqlline.SqlLine.start(SqlLine.java:398)
 at sqlline.SqlLine.main(SqlLine.java:292)
 Caused by: org.apache.phoenix.join.MaxServerCacheSizeExceededException: Size
 of hash cache (104857664 bytes) exceeds the maximum allowed size (104857600
 bytes)
 at
 org.apache.phoenix.join.HashCacheClient.serialize(HashCacheClient.java:109)
 at
 org.apache.phoenix.join.HashCacheClient.addHashCache(HashCacheClient.java:82)
 at
 org.apache.phoenix.execute.HashJoinPlan$HashSubPlan.execute(HashJoinPlan.java:339)
 at org.apache.phoenix.execute.HashJoinPlan$1.call(HashJoinPlan.java:136)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at
 org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:172)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)

 I also tried changing phoenix.query.maxServerCacheBytes to 1GB and
 phoenix.query.maxGlobalMemoryPercentage to 40 in hbase-site.xml ( as I am
 using ambari). but it desn't worked.

 Any suggestion would be extremely helpful.

 Thanks,
 _Nipur

Re: More help with secondary indexes

2015-07-22 Thread Thomas D'Silva

Zack,

Can you try increasing the value of hbase.regionserver.lease.period ?
Also set the following to a high value

phoenix.query.timeoutMs
phoenix.query.keepAliveMs

On Wed, Jul 22, 2015 at 5:38 AM, Riesland, Zack
zack.riesl...@sensus.com wrote:
 I have a table like this:



 CREATE TABLE fma. er_keyed_gz_meterkey_split_custid (

 meter_key varchar not null,

 …

 sample_point  integer not null,

 …

endpoint_id integer,

 …

 CONSTRAINT pk_rma_er_keyed_filtered PRIMARY KEY (meter_key,
 sample_point)

 )

 COMPRESSION='GZ'

 SPLIT ON (…..)



 I need a secondary index that allows me to query based on endpoint_id and
 sample_point. A row with a given meter_key will ALMOST always have the same
 endpoint_id, but there are exceptions.



 So I did this:



 ALTER TABLE fma.er_keyed_gz_meterkey_split_custid SET IMMUTABLE_ROWS=true;



 And this:



 create index fma_er_keyed_gz_endpoint_id_include_sample_point on
 fma.er_keyed_gz_meterkey_split_custid (endpoint_id) include (sample_point)
 SALT_BUCKETS = 256;



 After a very long time, the ‘create index’ command comes back with: “Error:
 (state=08000,code=101)”



 However, if I query ‘!indexes fma.ER_KEYED_GZ_METERKEY_SPLIT_CUSTID;’ it
 shows me my new index.



 But when I try to query against it: select endpoint_id, sample_point from
 fma.er_keyed_gz_meterkey_split_custid where endpoint_id = 49799898;



 I end up getting exceptions like the one below.



 I’m guessing that this has something to do with the many billions of rows
 that are already in my table.



 Can anyone help? Am I doing something wrong? Is there a better way to make
 this table easy to query by meter_key and/or endpoint_id + sample_point?





 15/07/22 08:28:42 WARN client.ScannerCallable: Ignore, probably already
 closed

 org.apache.hadoop.hbase.regionserver.LeaseException:
 org.apache.hadoop.hbase.regionserver.LeaseException: lease '16982' does not
 exist

 at
 org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:221)

 at
 org.apache.hadoop.hbase.regionserver.Leases.cancelLease(Leases.java:206)

 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3305)

 at
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29994)

 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)

 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)

 at
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)

 at
 org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)

 at java.lang.Thread.run(Thread.java:744)



 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)

 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

 at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

 at
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)

 at
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)

 at
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:306)

 at
 org.apache.hadoop.hbase.client.ScannerCallable.close(ScannerCallable.java:323)

 at
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:189)

 at
 org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:119)

 at
 org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:55)

 at
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:201)

 at
 org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:288)

 at
 org.apache.hadoop.hbase.client.ClientScanner.close(ClientScanner.java:476)

 at
 org.apache.phoenix.iterate.ScanningResultIterator.close(ScanningResultIterator.java:41)

 at
 org.apache.phoenix.iterate.TableResultIterator.close(TableResultIterator.java:64)

 at
 org.apache.phoenix.iterate.ChunkedResultIterator$SingleChunkResultIterator.close(ChunkedResultIterator.java:173)

 at
 org.apache.phoenix.iterate.SpoolingResultIterator.init(SpoolingResultIterator.java:131)

 at
 org.apache.phoenix.iterate.SpoolingResultIterator.init(SpoolingResultIterator.java:74)

 at
 org.apache.phoenix.iterate.SpoolingResultIterator$SpoolingResultIteratorFactory.newIterator(SpoolingResultIterator.java:68)

 at
 org.apache.phoenix.iterate.ChunkedResultIterator.init(ChunkedResultIterator.java:90)

 at

Re: How fast is upsert select?

2015-07-22 Thread Thomas D'Silva

Zack,

It depends on how wide the rows are in your table.  On a 8 node
cluster,   creating an index with 3 columns (char(15),varchar and
date) on a 1 billion row table takes about 1 hour 15 minutes.
How many rows does your table have and how wide are they?

On Wed, Jul 22, 2015 at 8:29 AM, Riesland, Zack
zack.riesl...@sensus.com wrote:
 Thanks Ravi,



 I think I may not have IndexTool in my version of Phoenix.



 I’m calling:
 HADOOP_CLASSPATH=/usr/hdp/current/hbase-master/conf/:/usr/hdp/current/hbase-master/lib/hbase-protocol.jar
 hadoop jar /usr/hdp/current/phoenix-client/phoenix-client.jar
 org.apache.phoenix.mapreduce.index.IndexTool



 And getting a java.lang.ClassNotFoundException:
 org.apache.phoenix.mapreduce.index.IndexTool







 From: Ravi Kiran [mailto:maghamraviki...@gmail.com]
 Sent: Wednesday, July 22, 2015 10:36 AM
 To: user@phoenix.apache.org
 Subject: Re: How fast is upsert select?



 Hi ,



Since you are saying billions of rows, why don't you try out the
 MapReduce route to speed up the process.  You can take a look at how
 IndexTool.java(https://github.com/apache/phoenix/blob/359c255ba6c67d01a810d203825264907f580735/phoenix-core/src/main/java/org/apache/phoenix/mapreduce/index/IndexTool.java)
 was written as it does a similar task of reading from a Phoenix table and
 writes the data into the target table using bulk load.





 Regards

 Ravi



 On Wed, Jul 22, 2015 at 6:23 AM, Riesland, Zack zack.riesl...@sensus.com
 wrote:

 I want to play with some options for splitting a table to  test performance.



 If I were to create a new table and perform an upsert select * to the table,
 with billions of rows in the source table, is that like an overnight
 operation or should it be pretty quick?



 For reference, we have 6 (beefy) region servers in our cluster.



 Thanks!

Re: REG: REGEXP in Phoenix Queries

2015-07-21 Thread Thomas D'Silva

The default regex functions just use Java Pattern


On Tue, Jul 21, 2015 at 2:28 AM, Ns G nsgns...@gmail.com wrote:
 Hi All,

 I have a requirement for using Regular expression. I am trying below query
 but it doesnt seem to work.

 SELECT el_id from element  where el_name like
 REGEXP_SUBSTR(el_name,'^[Z][A-Za-z0-9 \.]*')

 I dont know if this is correct usage or not. If you can point me any source
 where i can get it will be helpful.

 Thanks,

Re: Avoid deleting Hbase table when droping table with Phoenix

2015-06-29 Thread Thomas D'Silva

Jose,

hbase-site.xml needs to be on the classpath in order for the config to
get picked up.
Regarding the empty key value see :
https://groups.google.com/forum/#!msg/phoenix-hbase-user/UWdBghSfePo/BmCxOUOPHn8J

-Thomas

On Mon, Jun 29, 2015 at 4:38 PM, Jose M soloning...@hotmail.com wrote:
 Hi,
 I'm new to Phoenix and trying it right now. I had installed it as Parcel in
 Cloudera 5.4.

 I'm planning to use it with an already existant Hbase table (other systems
 use it and I can't recreate it). I see that I can create it without problem
 with the CREATE TABLE command.

 But in case I need to DROP the table afterwards (only from Phoenix), is
 there any way to avoid removing the table also from Hbase? Cause I tried the
 DROP TABLE command and the table is removed despite setting the
 phoenix.schema.dropMetaData property in Phoenix hbase-site.xml with true or
 false.
 I'm using Phoenix from command line
 (/opt/cloudera/parcels/CLABS_PHOENIX-4.3.0-1.clabs_phoenix1.0.0.p0.78/lib/phoenix/bin/sqlline.py)
 and I set the property in
 /opt/cloudera/parcels/CLABS_PHOENIX-4.3.0-1.clabs_phoenix1.0.0.p0.78/lib/phoenix/bin/hbase-site.xml.

 I also see that Phoenix add an extra column (named _0) with an empty value
 in my column family. Is there any way to avoid that?

 Thanks in advance,

Re: How to upsert data into dynamic columns in phoniex.

2015-06-23 Thread Thomas D'Silva

Since lastGCTime is a dynamic column you need to specify the dynamic
column explicitly along with the table name.

Select lastGCTime FROM EventLog(lastGCTime TIME)

Select * will return only the regular columns (and not dynamic columns).


On Tue, Jun 23, 2015 at 12:09 AM, guxiaobo1982 guxiaobo1...@qq.com wrote:
 Hi Thomas,
 I tried your suggestion, the result is

 the upsert statement with new column defined works,
 but subsequent queries can't find the new column, which is

 select * from eventlog will not show the new defined column lastGCTime

 and

 select lastGCTime from eventlog fails with undefined column error.

 is there a bug?



 -- Original --
 From:  Thomas D'Silva;tdsi...@salesforce.com;
 Send time: Tuesday, Jun 23, 2015 5:44 AM
 To: useruser@phoenix.apache.org;
 Subject:  Re: How to upsert data into dynamic columns in phoniex.

 You can upsert rows by sepecifying the column name and data type along
 with the table in the select. For the example in
 http://phoenix.apache.org/dynamic_columns.html

 UPSERT INTO TABLE (eventId, eventTime, lastGCTime INTEGER) VALUES(1,
 CURRENT_TIME(), 1234);

 On Sun, Jun 21, 2015 at 6:51 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
 Hi,

 The official doc only gives an example for querying dynamic columns which
 are not defined at table creating time, but how can create the dynamic
 columns in the schema and get data upsert through the phoniex API?

 Thanks

Re: How to upsert data into dynamic columns in phoniex.

2015-06-22 Thread Thomas D'Silva

You can upsert rows by sepecifying the column name and data type along
with the table in the select. For the example in
http://phoenix.apache.org/dynamic_columns.html

UPSERT INTO TABLE (eventId, eventTime, lastGCTime INTEGER) VALUES(1,
CURRENT_TIME(), 1234);

On Sun, Jun 21, 2015 at 6:51 PM, guxiaobo1982 guxiaobo1...@qq.com wrote:
 Hi,

 The official doc only gives an example for querying dynamic columns which
 are not defined at table creating time, but how can create the dynamic
 columns in the schema and get data upsert through the phoniex API?

 Thanks

Re: PhoenixIOException :Setting the query timeout

2015-06-16 Thread Thomas D'Silva

Bahubali,

hbase-site.xml needs to be on the client's CLASSPATH in order to get
picked up or else it will use the default timeout.
When using sqlline it sets the CLASSPATH to the HBASE_CONF_PATH
environment variable which default to the current directory.
Try running sqlline directly from the bin directory.

-Thomas

On Tue, Jun 16, 2015 at 5:00 AM, Bahubali Jain bahub...@gmail.com wrote:

 Hi,
 I am running into below exception when I run a select query on a table with
 10 million rows (using sqlline.py)
 I added the parameter phoenix.query.timeoutMs to the hbase-site.xml present
 in the bin directory where sqlline.py is located, but for some reason its
 doesn't seem to be taking effect.
   property
 namephoenix.query.timeoutMs/name
   value600/value
   /property


 java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException:
 71793ms passed since the last invocation, timeout is currently set to 6
 at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
 at sqlline.TableOutputFormat.print(TableOutputFormat.java:33)
 at sqlline.SqlLine.print(SqlLine.java:1653)
 at sqlline.Commands.execute(Commands.java:833)
 at sqlline.Commands.sql(Commands.java:732)
 at sqlline.SqlLine.dispatch(SqlLine.java:808)
 at sqlline.SqlLine.runCommands(SqlLine.java:1711)
 at sqlline.Commands.run(Commands.java:1285)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:36)
 at sqlline.SqlLine.dispatch(SqlLine.java:804)
 at sqlline.SqlLine.initArgs(SqlLine.java:613)
 at sqlline.SqlLine.begin(SqlLine.java:656)
 at sqlline.SqlLine.start(SqlLine.java:398)
 at sqlline.SqlLine.main(SqlLine.java:292)
 Aborting command set because force is false and command failed: select
 articleid from poc1;

 --
 Twitter:http://twitter.com/Baahu

Re: Socket timeout while counting number of rows of a table

2015-04-09 Thread Thomas D'Silva

The phoenix.query.timeoutMs property should be set on the
hbase-site.xml of the client (in the phoenix/bin) directory, not the
server hbase-site.xml. See
https://github.com/forcedotcom/phoenix/wiki/Tuning .  Did you try just
setting it on the client side config before starting sqlline and
running the query?

Thanks,
Thomas

On Thu, Apr 9, 2015 at 9:29 AM, PERNOLLET Martin
martin.pernollet-...@sgcib.com wrote:
 I have to mention I also tried changing these properties on HBase side :



 hbase.regionserver.lease.period : 12

 hbase.rpc.timeout : 120



 I am running on Hortonworks 2.2.0

 Phoenix 4.2.0

 HBase 0.98.4



 From: PERNOLLET Martin (EXT) ItecCttDir
 Sent: Thursday 9 April 2015 17:52
 To: 'user@phoenix.apache.org'
 Subject: Socket timeout while counting number of rows of a table



 Hi,



 When asking to Phoenix to count the lines of a HBase table (select
 count(UUID) from bulk_1month) it fails after one minute :



 java.lang.RuntimeException: org.apache.phoenix.exception.PhoenixIOException:
 org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36,
 exceptions:

 Thu Apr 09 16:49:33 CEST 2015, null, java.net.SocketTimeoutException:
 callTimeout=6, callDuration=62366: row '' on table 'bulk_1month' at
 region=bulk_1month,,1428582098717.2b2c2f1b5eab43e15b5789c2aa0dfc80.,
 hostname=reid,60020,1428590222546, seqNum=37



 at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2440)

 at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2074)

 at sqlline.SqlLine.print(SqlLine.java:1735)

 at sqlline.SqlLine$Commands.execute(SqlLine.java:3683)

 at sqlline.SqlLine$Commands.sql(SqlLine.java:3584)

 at sqlline.SqlLine.dispatch(SqlLine.java:821)

 at sqlline.SqlLine.begin(SqlLine.java:699)

 at sqlline.SqlLine.mainWithInputRedirection(SqlLine.java:441)

 at sqlline.SqlLine.main(SqlLine.java:424)



 A post (https://github.com/forcedotcom/phoenix/issues/730) suggested to edit
 timeout values so I added the following properties to HBase configuration
 via Ambari.



 property

   namephoenix.query.keepAliveMs/name

   !—changed to timeout from 1 min to 10 min --

   value60/value

 /property



 property

   namephoenix.query.timeoutMs/name

   !—changed to timeout from 60 sec to 2h --

   value720/value

 /property



 And once HBase restarted I copied the updated HBase conf to Phoenix bin/
 directory :



 cp /etc/hbase/conf/hbase-site.xml  /usr/hdp/2.2.0.0-2041/phoenix/bin



 It did not change anything to the actual timeout.



 Did I miss a property or am I wrong while copying the hbase settings?



 Thanks for your help!





 *
 This message and any attachments (the message) are confidential, intended
 solely for the addressee(s), and may contain legally privileged information.
 Any unauthorised use or dissemination is prohibited. E-mails are susceptible
 to alteration.
 Neither SOCIETE GENERALE nor any of its subsidiaries or affiliates shall be
 liable for the message if altered, changed or
 falsified.
 Please visit http://swapdisclosure.sgcib.com for important information with
 respect to derivative products.
   
 Ce message et toutes les pieces jointes (ci-apres le message) sont
 confidentiels et susceptibles de contenir des informations couvertes
 par le secret professionnel.
 Ce message est etabli a l'intention exclusive de ses destinataires. Toute
 utilisation ou diffusion non autorisee est interdite.
 Tout message electronique est susceptible d'alteration.
 La SOCIETE GENERALE et ses filiales declinent toute responsabilite au titre
 de ce message s'il a ete altere, deforme ou falsifie.
 Veuillez consulter le site http://swapdisclosure.sgcib.com afin de
 recueillir d'importantes informations sur les produits derives.
 *

Re: understanding phoenix code flow

2015-04-07 Thread Thomas D'Silva

Ashish,

If you want to step through server side code you can enable remote
debugging in hbase-env.sh. I have used this with standalone mode.

# Enable remote JDWP debugging of major HBase processes. Meant for
Core Developers
# export HBASE_MASTER_OPTS=$HBASE_MASTER_OPTS -Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070
# export HBASE_REGIONSERVER_OPTS=$HBASE_REGIONSERVER_OPTS -Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071
# export HBASE_THRIFT_OPTS=$HBASE_THRIFT_OPTS -Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072
# export HBASE_ZOOKEEPER_OPTS=$HBASE_ZOOKEEPER_OPTS -Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073

There are a few existing JIRAs related to transactions PHOENIX-400,
PHOENIX-1674, maybe you could describe your ideas there?

Thanks,
Thomas


On Fri, Apr 3, 2015 at 3:08 PM, ashish tapdiya ashishtapd...@gmail.com wrote:
 Hi Jesse,

 I plan on implementing multi-row transaction support and currently trying to
 understand code flow.

 I currently have a 6 node cluster setup. I initially linked source in the
 client and stepped through using some simple sql workload, however, realized
 most of the processing is done on server side.

 To be able to step through server side code, I intend to setup hbase in
 standalone or pseudo-distributed mode.

 Thanks,
 ~Ashish

 On Fri, Apr 3, 2015 at 4:49 PM, Jesse Yates jesse.k.ya...@gmail.com wrote:

 You could understand it just by reading the code, or running the tests, or
 running a HBase minicluster in the JVM or in standalone mode or in
 pseudo-distributed mode or in a fully distributed setup.

 What are you trying to achieve?

 In the area you are interested in, have you:
  - read the docs
  - read the code
  - read though unit tests
  - debugged the unit tests and stepped through areas of uncertainty
  - written a unit test to figure out if your conjectures are correct

 If you are still unclear, then we can gladly help.

 On Fri, Apr 3, 2015 at 2:46 PM ashish tapdiya ashishtapd...@gmail.com
 wrote:

 To understand phoenix code flow should HBase be steup in standalone or
 pseudo-distributed mode

 Thanks,
 ~Ashish

Re: Timestamp for mutations

2015-04-07 Thread Thomas D'Silva

I think it stands for System Change Number

On Tue, Apr 7, 2015 at 11:06 AM, Abhilash L L
abhil...@capillarytech.com wrote:
 Thanks a lot Thomas, will try it out.

 Just saw this line in PhoenixConnection class

 this.scn = JDBCUtil.getCurrentSCN(url, this.info);

 What does scn stand for ?


 Regards,
 Abhilash L L
 Capillary Technologies
 M:919886208262
 abhil...@capillarytech.com | www.capillarytech.com

 On Tue, Apr 7, 2015 at 11:27 PM, Thomas D'Silva tdsi...@salesforce.com
 wrote:

 Abhilash,

 You can set the timestamp by setting the
 PhoenixRuntime.CURRENT_SCN_ATTRIB property on the phoenix connection.
 See Can phoenix work on tables with arbitrary timestamp on
 http://phoenix.apache.org/faq.html

 -Thomas



 On Mon, Apr 6, 2015 at 11:20 PM, Abhilash L L
 abhil...@capillarytech.com wrote:
  Hello,
 
  When we fire a delete, how do we set the timestamp for the mutation?
 
 
  Email from people at capillarytech.com may not represent official policy
  of
  Capillary Technologies unless explicitly stated. Please see our
  Corporate-Email-Policy for details.Contents of this email are
  confidential.
  Please contact the Sender if you have received this email in error.



 Email from people at capillarytech.com may not represent official policy of
 Capillary Technologies unless explicitly stated. Please see our
 Corporate-Email-Policy for details.Contents of this email are confidential.
 Please contact the Sender if you have received this email in error.

Re: Persisting Objects thru Phoenix

2015-03-18 Thread Thomas D'Silva

Anirudha

At Salesforce,  one of the use cases Phoenix and HBase is used for is
storing immutable event data such as login information. We periodically run
aggregate queries to generate metrics eg. number of logins per user. We
select the columns of the primary key based on the filters used while
querying data. Our objects don't have multi-level parent child
relationships.
Do you have any specific information you are looking for?

-Thomas

On Wed, Mar 18, 2015 at 11:28 AM, Anirudha Khanna akha...@marinsoftware.com
 wrote:

 Hi,

 We are evaluating using Phoenix over HBase for persisting relational data.
 Has anyone tried doing something similar? Any experience reports would be
 really helpful.
 Quick note, some of our objects have upto 3 - 4 levels of parent - child
 relations.

 Cheers,
 Anirudha

Re: Exception when starting sqlline.py

2015-02-11 Thread Thomas D'Silva

Anirudha,

Did you add the location of phoenix-4.2.2-server.jar to the the
HBASE_CLASSPATH (in confhbase-env.sh)?

Thanks,
Thomas

On Wed, Feb 11, 2015 at 6:35 AM, Anirudha Khanna
akha...@marinsoftware.com wrote:
 Hi All,

 On our dev HBase cluster after installing the Phoenix server jars on
 ALL(master and region servers) the nodes, when I try connecting using
 sqlline.py I get the following exception,

 Error: ERROR 2006 (INT08): Incompatible jars detected between client and
 server. Ensure that phoenix.jar is put on the classpath of HBase in every
 region server: org.apache.hadoop.hbase.exceptions.UnknownProtocolException:
 No registered coprocessor service found for name MetaDataService in region
 SYSTEM.CATALOG,,1423663375874.6b8de72059f100fce659bd46cc188fe7

 Environment,

 Phoenix - 4.2.2
 HBase - 0.98.6-cdh5.3.0, rUnknown

 The Co-Processors loaded for that table are,

 hbase(main):003:0 describe 'SYSTEM.CATALOG'
 DESCRIPTION
 ENABLED
  'SYSTEM.CATALOG', {TABLE_ATTRIBUTES = {coprocessor$1 =
 '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocesso
 true
  r$2 =
 '|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|',
 coprocessor$3 = '|org.apache.phoenix.coproces
  sor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 =
 '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|80530636
  6|', coprocessor$5 =
 '|org.apache.phoenix.coprocessor.MetaDataEndpointImpl|805306366|',
 coprocessor$6 = '|org.apache.phoenix.copro
  cessor.MetaDataRegionObserver|805306367|'}, {NAME = '0',
 DATA_BLOCK_ENCODING = 'FAST_DIFF', BLOOMFILTER = 'ROW', REPLICATION_SCOP
  E = '0', VERSIONS = '1000', COMPRESSION = 'NONE', MIN_VERSIONS = '0',
 TTL = 'FOREVER', KEEP_DELETED_CELLS = 'true', BLOCKSIZE
  = '65536', IN_MEMORY = 'false', BLOCKCACHE = 'true'}
 1 row(s) in 1.7260 seconds

 Help is much appreciated.

 Thanks,
 Anirudha

70 matches

Mail list logo