Re: Problem Updating Stats

2016-03-18 Thread Ankit Singhal
It seems from the attached logs that you have upgraded phoenix to 4.7
version and now you are using old client to connect with it.
"Update statistics" command and guideposts will not work with old client
after upgradation to 4.7, you need to use the new client for such
operations.

On Wed, Mar 16, 2016 at 10:55 PM, Benjamin Kim  wrote:

> | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME
>|   |
>
> +-+--+-++-+
> |  | SYSTEM| STATS  |
> PHYSICAL_NAME| 12  |
> |  | SYSTEM| STATS  |
> COLUMN_FAMILY| 12  |
> |  | SYSTEM| STATS  |
> GUIDE_POST_KEY  | -3   |
> |  | SYSTEM| STATS  |
> GUIDE_POSTS_WIDTH   | -5   |
> |  | SYSTEM| STATS  |
> LAST_STATS_UPDATE_TIME | 91  |
> |  | SYSTEM| STATS  |
> GUIDE_POSTS_ROW_COUNT   | -5   |
>
> I have attached the SYSTEM.CATALOG contents.
>
> Thanks,
> Ben
>
>
>
> On Mar 16, 2016, at 9:34 AM, Ankit Singhal 
> wrote:
>
> Sorry Ben, I may not be clear in first comment but I need you to describe
> SYSTEM.STATS in some sql client so that I can see the columns present.
> And also please scan 'SYSTEM.CATALOG' ,{RAW=>true} in hbase shell and
> attach a output here.
>
> On Wed, Mar 16, 2016 at 8:55 PM, Benjamin Kim  wrote:
>
>> Ankit,
>>
>> I did not see any problems when connecting with the phoenix sqlline
>> client. So, below is the what you asked for. I hope that you can give us
>> insight into fixing this.
>>
>> hbase(main):005:0> describe 'SYSTEM.STATS'
>> Table SYSTEM.STATS is ENABLED
>>
>>
>> SYSTEM.STATS, {TABLE_ATTRIBUTES => {coprocessor$1 =>
>> '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|',
>> coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggr
>> egateRegionObserver|805306366|', coprocessor$3 =>
>> '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|',
>> coprocessor$4 => '|org.apache.phoenix.coprocessor.Serv
>> erCachingEndpointImpl|805306366|', coprocessor$5 =>
>> '|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|805306366|',
>> coprocessor$6 => '|org.apache.hadoop.hbase.regionserv
>> er.LocalIndexSplitter|805306366|', METADATA => {'SPLIT_POLICY' =>
>> 'org.apache.phoenix.schema.MetaDataSplitPolicy'}}
>>
>> COLUMN FAMILIES DESCRIPTION
>>
>>
>> {NAME => '0', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW',
>> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
>> MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP
>> _DELETED_CELLS => 'true', BLOCKSIZE => '65536', IN_MEMORY => 'false',
>> BLOCKCACHE => 'true'}
>>
>> 1 row(s) in 0.0280 seconds
>>
>> Thanks,
>> Ben
>>
>>
>> On Mar 15, 2016, at 11:59 PM, Ankit Singhal 
>> wrote:
>>
>> Yes it seems to.
>> Did you get any error related to SYSTEM.STATS when the client is
>> connected first time ?
>>
>> can you please describe your system.stats table and paste the output here.
>>
>> On Wed, Mar 16, 2016 at 3:24 AM, Benjamin Kim  wrote:
>>
>>> When trying to run update status on an existing table in hbase, I get
>>> error:
>>>
>>> Update stats:
>>>
>>> UPDATE STATISTICS "ops_csv" ALL
>>>
>>> error:
>>>
>>> ERROR 504 (42703): Undefined column. columnName=REGION_NAME
>>>
>>> Looks like the meta data information is messed up, ie. there is no
>>> column with name REGION_NAME in this table.
>>>
>>> I see similar errors for other tables that we currently have in hbase.
>>>
>>> We are using CDH 5.5.2, HBase 1.0.0, and Phoenix 4.5.2.
>>>
>>> Thanks,
>>> Ben
>>>
>>
>>
>>
>
>
>


Re: Problem Updating Stats

2016-03-18 Thread Benjamin Kim
Ankit,

We tried a 4.7 client upgrade to use the phoenix spark client as an experiment; 
then, rolled back to the sanctioned CDH 5.5 version of 4.5. I had no idea that 
someone did an "update stats” during that period, and I didn’t know that there 
would be a fundamental change as this. Do you know of a way to rollback this 
change too?

Thanks,
Ben 


> On Mar 16, 2016, at 10:47 AM, Ankit Singhal  wrote:
> 
> It seems from the attached logs that you have upgraded phoenix to 4.7 version 
> and now you are using old client to connect with it.
> "Update statistics" command and guideposts will not work with old client 
> after upgradation to 4.7, you need to use the new client for such operations.
> 
> On Wed, Mar 16, 2016 at 10:55 PM, Benjamin Kim  > wrote:
> | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME  
> |   |
> +-+--+-++-+
> |  | SYSTEM| STATS  | 
> PHYSICAL_NAME| 12  |
> |  | SYSTEM| STATS  | 
> COLUMN_FAMILY| 12  |
> |  | SYSTEM| STATS  | 
> GUIDE_POST_KEY  | -3   |
> |  | SYSTEM| STATS  | 
> GUIDE_POSTS_WIDTH   | -5   |
> |  | SYSTEM| STATS  | 
> LAST_STATS_UPDATE_TIME | 91  |
> |  | SYSTEM| STATS  | 
> GUIDE_POSTS_ROW_COUNT   | -5   |
> 
> I have attached the SYSTEM.CATALOG contents.
> 
> Thanks,
> Ben
> 
> 
> 
>> On Mar 16, 2016, at 9:34 AM, Ankit Singhal > > wrote:
>> 
>> Sorry Ben, I may not be clear in first comment but I need you to describe 
>> SYSTEM.STATS in some sql client so that I can see the columns present.
>> And also please scan 'SYSTEM.CATALOG' ,{RAW=>true} in hbase shell and attach 
>> a output here.
>> 
>> On Wed, Mar 16, 2016 at 8:55 PM, Benjamin Kim > > wrote:
>> Ankit,
>> 
>> I did not see any problems when connecting with the phoenix sqlline client. 
>> So, below is the what you asked for. I hope that you can give us insight 
>> into fixing this.
>> 
>> hbase(main):005:0> describe 'SYSTEM.STATS'
>> Table SYSTEM.STATS is ENABLED
>>  
>>
>> SYSTEM.STATS, {TABLE_ATTRIBUTES => {coprocessor$1 => 
>> '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', 
>> coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggr
>> egateRegionObserver|805306366|', coprocessor$3 => 
>> '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', 
>> coprocessor$4 => '|org.apache.phoenix.coprocessor.Serv
>> erCachingEndpointImpl|805306366|', coprocessor$5 => 
>> '|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|805306366|', 
>> coprocessor$6 => '|org.apache.hadoop.hbase.regionserv
>> er.LocalIndexSplitter|805306366|', METADATA => {'SPLIT_POLICY' => 
>> 'org.apache.phoenix.schema.MetaDataSplitPolicy'}}
>>   
>> COLUMN FAMILIES DESCRIPTION  
>>  
>>
>> {NAME => '0', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW', 
>> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', 
>> MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP
>> _DELETED_CELLS => 'true', BLOCKSIZE => '65536', IN_MEMORY => 'false', 
>> BLOCKCACHE => 'true'}
>>   
>> 1 row(s) in 0.0280 seconds
>> 
>> Thanks,
>> Ben
>> 
>> 
>>> On Mar 15, 2016, at 11:59 PM, Ankit Singhal >> > wrote:
>>> 
>>> Yes it seems to. 
>>> Did you get any error related to SYSTEM.STATS when the client is connected 
>>> first time ?
>>> 
>>> can you please describe your system.stats table and paste the output here.
>>> 
>>> On Wed, Mar 16, 2016 at 3:24 AM, Benjamin Kim >> > wrote:
>>> When trying to run update status on an existing table in hbase, I get error:
>>> Update stats:
>>> UPDATE STATISTICS "ops_csv" ALL
>>> error:
>>> ERROR 504 (42703): Undefined column. columnName=REGION_NAME
>>> Looks like the meta data information is messed up, ie. there is no column 
>>> with name REGION_NAME in this table.
>>> I see similar errors for other tables that we currently 

Re: Implement Custom Aggregate Functions in Phoenix

2016-03-18 Thread James Taylor
Hi Swapna,
We don't support custom aggregate functions, only scalar functions
(see PHOENIX-2069). For a custom aggregate function, you'd need to add them
to phoenix-core and rebuild the jar. We're open to adding them to the code
base if they're general enough. That's how FIRST_VALUE, LAST_VALUE, and
NTH_VALUE made it in.
Thanks,
James

On Thu, Mar 17, 2016 at 12:11 PM, Swapna Swapna 
wrote:

> Hi,
>
> I found this in Phoenix UDF documentation:
>
>- After compiling your code to a jar, you need to deploy the jar into
>the HDFS. It would be better to add the jar to HDFS folder configured for
>hbase.dynamic.jars.dir.
>
>
> My question is, can that be any 'udf-user-specific' jar which need to be
> copied to HDFS or would it need to register the function and update the
> custom UDF classes inside phoenix-core.jar and rebuild the
> 'phoenix-core.jar'
>
> Regards
> Swapna
>
>
>
>
> On Fri, Jan 29, 2016 at 6:31 PM, James Taylor 
> wrote:
>
>> Hi Swapna,
>> We currently don't support custom aggregate UDF, and it looks like you
>> found the JIRA here: PHOENIX-2069. It would be a natural extension of UDFs.
>> Would be great to capture your use case and requirements on the JIRA to
>> make sure the functionality will meet your needs.
>> Thanks,
>> James
>>
>> On Fri, Jan 29, 2016 at 1:47 PM, Swapna Swapna 
>> wrote:
>>
>>> Hi,
>>>
>>> I would like to know the approach to implement and register custom
>>> aggregate functions in Phoenix like the way we have built-in aggregate
>>> functions like SUM, COUNT,etc
>>>
>>> Please help.
>>>
>>> Thanks
>>> Swapna
>>>
>>
>>
>


Re: Problem Updating Stats

2016-03-18 Thread Ankit Singhal
Sorry Ben, I may not be clear in first comment but I need you to describe
SYSTEM.STATS in some sql client so that I can see the columns present.
And also please scan 'SYSTEM.CATALOG' ,{RAW=>true} in hbase shell and
attach a output here.

On Wed, Mar 16, 2016 at 8:55 PM, Benjamin Kim  wrote:

> Ankit,
>
> I did not see any problems when connecting with the phoenix sqlline
> client. So, below is the what you asked for. I hope that you can give us
> insight into fixing this.
>
> hbase(main):005:0> describe 'SYSTEM.STATS'
> Table SYSTEM.STATS is ENABLED
>
>
> SYSTEM.STATS, {TABLE_ATTRIBUTES => {coprocessor$1 =>
> '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|',
> coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggr
> egateRegionObserver|805306366|', coprocessor$3 =>
> '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|',
> coprocessor$4 => '|org.apache.phoenix.coprocessor.Serv
> erCachingEndpointImpl|805306366|', coprocessor$5 =>
> '|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|805306366|',
> coprocessor$6 => '|org.apache.hadoop.hbase.regionserv
> er.LocalIndexSplitter|805306366|', METADATA => {'SPLIT_POLICY' =>
> 'org.apache.phoenix.schema.MetaDataSplitPolicy'}}
>
> COLUMN FAMILIES DESCRIPTION
>
>
> {NAME => '0', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW',
> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
> MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP
> _DELETED_CELLS => 'true', BLOCKSIZE => '65536', IN_MEMORY => 'false',
> BLOCKCACHE => 'true'}
>
> 1 row(s) in 0.0280 seconds
>
> Thanks,
> Ben
>
>
> On Mar 15, 2016, at 11:59 PM, Ankit Singhal 
> wrote:
>
> Yes it seems to.
> Did you get any error related to SYSTEM.STATS when the client is connected
> first time ?
>
> can you please describe your system.stats table and paste the output here.
>
> On Wed, Mar 16, 2016 at 3:24 AM, Benjamin Kim  wrote:
>
>> When trying to run update status on an existing table in hbase, I get
>> error:
>>
>> Update stats:
>>
>> UPDATE STATISTICS "ops_csv" ALL
>>
>> error:
>>
>> ERROR 504 (42703): Undefined column. columnName=REGION_NAME
>>
>> Looks like the meta data information is messed up, ie. there is no column
>> with name REGION_NAME in this table.
>>
>> I see similar errors for other tables that we currently have in hbase.
>>
>> We are using CDH 5.5.2, HBase 1.0.0, and Phoenix 4.5.2.
>>
>> Thanks,
>> Ben
>>
>
>
>


Re: Does phoenix CsvBulkLoadTool write to WAL/Memstore

2016-03-18 Thread Vamsi Krishna
Thanks Pari.

The frequency of the job is weekly.
No. of rows is around 10 billion.
Cluster is 13 node.
>From what you have mentioned I see that CsvBulkLoadTool is best option for
my scenario.

I see you have mentioned about increasing the batch size to accommodate
more rows.
Are you talking about the 'phoenix.mutate.batchSize' configuration
parameter?

Vamsi Attluri

On Wed, Mar 16, 2016 at 9:01 AM Pariksheet Barapatre 
wrote:

> Hi Vamsi,
>
> How many number of rows your expecting out of your transformation and what
> is the frequency of job?
>
> If there are less number of row (< ~100K and this depends on cluster size
> as well), you can go ahead with phoenix-spark plug-in , increase  batch
> size to accommodate more rows, else use CVSbulkLoader.
>
> Thanks
> Pari
>
> On 16 March 2016 at 20:03, Vamsi Krishna  wrote:
>
>> Thanks Gabriel & Ravi.
>>
>> I have a data processing job wirtten in Spark-Scala.
>> I do a join on data from 2 data files (CSV files) and do data
>> transformation on the resulting data. Finally load the transformed data
>> into phoenix table using Phoenix-Spark plugin.
>> On seeing that Phoenix-Spark plugin goes through regular HBase write path
>> (writes to WAL), i'm thinking of option 2 to reduce the job execution time.
>>
>> *Option 2:* Do data transformation in Spark and write the transformed
>> data to a CSV file and use Phoenix CsvBulkLoadTool to load data into
>> Phoenix table.
>>
>> Has anyone tried this kind of exercise? Any thoughts.
>>
>> Thanks,
>> Vamsi Attluri
>>
>> On Tue, Mar 15, 2016 at 9:40 PM Ravi Kiran 
>> wrote:
>>
>>> Hi Vamsi,
>>>The upserts through Phoenix-spark plugin definitely go through WAL .
>>>
>>>
>>> On Tue, Mar 15, 2016 at 5:56 AM, Gabriel Reid 
>>> wrote:
>>>
 Hi Vamsi,

 I can't answer your question abotu the Phoenix-Spark plugin (although
 I'm sure that someone else here can).

 However, I can tell you that the CsvBulkLoadTool does not write to the
 WAL or to the Memstore. It simply writes HFiles and then hands those
 HFiles over to HBase, so the memstore and WAL are never
 touched/affected by this.

 - Gabriel


 On Tue, Mar 15, 2016 at 1:41 PM, Vamsi Krishna 
 wrote:
 > Team,
 >
 > Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore?
 >
 > Phoenix-Spark plugin:
 > Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore?
 >
 > Thanks,
 > Vamsi Attluri
 > --
 > Vamsi Attluri

>>>
>>> --
>> Vamsi Attluri
>>
>
>
>
> --
> Cheers,
> Pari
>
-- 
Vamsi Attluri