Re: UDF for lateral views

2018-01-17 Thread Krishna
According to this blog (
http://phoenix-hbase.blogspot.in/2013/04/how-to-add-your-own-built-in-function.html),
evaluate(...) is responsible for processing the input state of the row and
filling up ImmutableBytesWritable pointer with transformed row.
Did not find any references that'll support returning multiple rows for
each input row. Does anyone know if UDF framework can support that?

On Tue, Jan 16, 2018 at 6:07 PM, Krishna <research...@gmail.com> wrote:

> I would like to convert a column of ARRAY data-type such that each element
> of the array is returned as a row. Hive supports it via Lateral Views (
> https://cwiki.apache.org/confluence/display/Hive/
> LanguageManual+LateralView)?
>
> Does UDF framework in Phoenix allow for building such functions?
>


UDF for lateral views

2018-01-16 Thread Krishna
I would like to convert a column of ARRAY data-type such that each element
of the array is returned as a row. Hive supports it via Lateral Views (
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
)?

Does UDF framework in Phoenix allow for building such functions?


Re: Question regarding designing row keys

2016-10-03 Thread Krishna
You have two options:
- Modify your primary key to include metric_type & timestamp as leading
columns.
- Create an index on metric_type & timestamp

On Monday, October 3, 2016, Kanagha  wrote:

> Sorry for the confusion.
>
> metric_type,
> timestamp,
> metricId  is defined as the primary key via Phoenix for metric_table.
>
> Thanks
>
> Kanagha
>
> On Mon, Oct 3, 2016 at 3:41 PM, Michael McAllister <
> mmcallis...@homeaway.com
> > wrote:
>
>> >
>>
>> there is no indexing available on this table yet.
>>
>> >
>>
>>
>>
>> So you haven’t defined a primary key constraint? Can you share your table
>> creation DDL?
>>
>>
>>
>> Michael McAllister
>>
>> Staff Data Warehouse Engineer | Decision Systems
>>
>> mmcallis...@homeaway.com
>>  | C:
>> 512.423.7447 | skype: michael.mcallister.ha
>>  | webex:
>> https://h.a/mikewebex
>>
>> This electronic communication (including any attachment) is
>> confidential.  If you are not an intended recipient of this communication,
>> please be advised that any disclosure, dissemination, distribution, copying
>> or other use of this communication or any attachment is strictly
>> prohibited.  If you have received this communication in error, please
>> notify the sender immediately by reply e-mail and promptly destroy all
>> electronic and printed copies of this communication and any attachment.
>>
>>
>>
>> *From: *Kanagha > >
>> *Reply-To: *"user@phoenix.apache.org
>> " <
>> user@phoenix.apache.org
>> >
>> *Date: *Monday, October 3, 2016 at 5:32 PM
>> *To: *"u...@hbase.apache.org
>> " <
>> u...@hbase.apache.org
>> >, "
>> user@phoenix.apache.org
>> " <
>> user@phoenix.apache.org
>> >
>> *Subject: *Re: Question regarding designing row keys
>>
>>
>>
>> there is no indexing available on this table yet.
>>
>
>


Decode rowkey

2016-09-16 Thread Krishna
Hi,

Does Phoenix have API for converting a rowkey (made up of multiple columns)
and in ImmutableBytesRow format to split into primary key columns? I am
performing a scan directly from HBase and would like to convert the rowkey
into column values. We used Phoenix standard JDBC API while writing to the
table.

Thanks


Re: CsvBulkLoadTool not populating Actual Table & Local Index Table when '-it' option specified

2016-07-05 Thread Vamsi Krishna
Thanks Rajeshbabu.

On Tue, Jul 5, 2016 at 5:59 AM rajeshb...@apache.org <
chrajeshbab...@gmail.com> wrote:

> Hi Vamsi,
>
> There is a bug with local indexes in 4.4.0 which is fixed in 4.7.0
> https://issues.apache.org/jira/browse/PHOENIX-2334
>
> Thanks,
> Rajeshbabu.
>
> On Tue, Jul 5, 2016 at 6:21 PM, Vamsi Krishna <vamsi.attl...@gmail.com>
> wrote:
>
>> Team,
>>
>> I'm working on HDP 2.3.2 (Phoenix 4.4.0, HBase 1.1.2).
>> When I use '-it' option of CsvBulkLoadTool neither Acutal Table nor Local
>> Index Table is loaded.
>> *Command:*
>> *HADOOP_CLASSPATH=/usr/hdp/current/hbase-master/lib/hbase-protocol.jar:/etc/hbase/conf
>> yarn jar /usr/hdp/current/phoenix-client/phoenix-client.jar
>> org.apache.phoenix.mapreduce.CsvBulkLoadTool
>> -Dmapreduce.job.queuename=$QUEUE_NAME -s VAMSI -t TABLE_A -c COL1,COL2,COL3
>> -it IDX_TABLE_A_COL2 -i test/test_data.csv -d ',' -z $ZOOKEEPER_QUORUM*
>>
>> When I use the same command without specifying the '-it' option it
>> populates the Actual Table but not Local Index Table (Which is as expected).
>> *Command:*
>> *HADOOP_CLASSPATH=/usr/hdp/current/hbase-master/lib/hbase-protocol.jar:/etc/hbase/conf
>> yarn jar /usr/hdp/current/phoenix-client/phoenix-client.jar
>> org.apache.phoenix.mapreduce.CsvBulkLoadTool
>> -Dmapreduce.job.queuename=$QUEUE_NAME -s VAMSI -t TABLE_A -c COL1,COL2,COL3
>> -i test/test_data.csv -d ',' -z $ZOOKEEPER_QUORUM*
>>
>> Could someone please help me if you see anything wrong with what I'm
>> doing?
>>
>> Here is how I'm setting up my table:
>> CREATE TABLE IF NOT EXISTS VAMSI.TABLE_A (COL1 VARCHAR(36) , COL2
>> VARCHAR(36) , COL3 VARCHAR(36) CONSTRAINT TABLE_A_PK PRIMARY KEY (COL1))
>> COMPRESSION='SNAPPY', SALT_BUCKETS=5;
>> CREATE LOCAL INDEX IF NOT EXISTS IDX_TABLE_A_COL2 ON VAMSI.TABLE_A (COL2);
>> upsert into vamsi.table_a values ('abc123','abc','123');
>> upsert into vamsi.table_a values ('def456','def','456');
>>
>> test_data.csv contains 2 records:
>> ghi789,ghi,789
>> jkl012,jkl,012
>>
>> Thanks,
>> Vamsi Attluri
>> --
>> Vamsi Attluri
>>
>
> --
Vamsi Attluri


Phoenix-Spark: is DataFrame saving a single threaded operation?

2016-07-05 Thread Vamsi Krishna
Team,

In Phoenix-Spark plugin is DataFrame save operation single threaded?

df.write \
  .format("org.apache.phoenix.spark") \
  .mode("overwrite") \
  .option("table", "TABLE1") \
  .option("zkUrl", "localhost:2181") \
  .save()


Thanks,
Vamsi Attluri
-- 
Vamsi Attluri


CsvBulkLoadTool not populating Actual Table & Local Index Table when '-it' option specified

2016-07-05 Thread Vamsi Krishna
Team,

I'm working on HDP 2.3.2 (Phoenix 4.4.0, HBase 1.1.2).
When I use '-it' option of CsvBulkLoadTool neither Acutal Table nor Local
Index Table is loaded.
*Command:*
*HADOOP_CLASSPATH=/usr/hdp/current/hbase-master/lib/hbase-protocol.jar:/etc/hbase/conf
yarn jar /usr/hdp/current/phoenix-client/phoenix-client.jar
org.apache.phoenix.mapreduce.CsvBulkLoadTool
-Dmapreduce.job.queuename=$QUEUE_NAME -s VAMSI -t TABLE_A -c COL1,COL2,COL3
-it IDX_TABLE_A_COL2 -i test/test_data.csv -d ',' -z $ZOOKEEPER_QUORUM*

When I use the same command without specifying the '-it' option it
populates the Actual Table but not Local Index Table (Which is as expected).
*Command:*
*HADOOP_CLASSPATH=/usr/hdp/current/hbase-master/lib/hbase-protocol.jar:/etc/hbase/conf
yarn jar /usr/hdp/current/phoenix-client/phoenix-client.jar
org.apache.phoenix.mapreduce.CsvBulkLoadTool
-Dmapreduce.job.queuename=$QUEUE_NAME -s VAMSI -t TABLE_A -c COL1,COL2,COL3
-i test/test_data.csv -d ',' -z $ZOOKEEPER_QUORUM*

Could someone please help me if you see anything wrong with what I'm doing?

Here is how I'm setting up my table:
CREATE TABLE IF NOT EXISTS VAMSI.TABLE_A (COL1 VARCHAR(36) , COL2
VARCHAR(36) , COL3 VARCHAR(36) CONSTRAINT TABLE_A_PK PRIMARY KEY (COL1))
COMPRESSION='SNAPPY', SALT_BUCKETS=5;
CREATE LOCAL INDEX IF NOT EXISTS IDX_TABLE_A_COL2 ON VAMSI.TABLE_A (COL2);
upsert into vamsi.table_a values ('abc123','abc','123');
upsert into vamsi.table_a values ('def456','def','456');

test_data.csv contains 2 records:
ghi789,ghi,789
jkl012,jkl,012

Thanks,
Vamsi Attluri
-- 
Vamsi Attluri


Exception when trying to build Local Index asynchronously

2016-06-30 Thread Vamsi Krishna
Team,

I'm using HDP 2.3.2 (HBase 1.1.2, Phoenix 4.4.0)
I'm seeing an exception when I run the IndexTool MapReduce job to build
Local Index asynchronously.
Could someone please help me understand what I'm doing wrong?

*Create Table:*
CREATE TABLE IF NOT EXISTS VAMSI.TABLE_A (COL1 VARCHAR(36) , COL2
VARCHAR(36) , COL3 VARCHAR(36) CONSTRAINT TABLE_A_PK PRIMARY KEY (COL1))
COMPRESSION='SNAPPY', SALT_BUCKETS=5;
*Create Index:*
CREATE LOCAL INDEX IF NOT EXISTS IDX_TABLE_A_COL2 ON VAMSI.TABLE_A (COL2)
ASYNC;
After I create the Local Index with 'ASYNC' option and list the tables in
Phoenix I see the index created and is in 'BUILDING' state.
*Build Index data:*
hbase org.apache.phoenix.mapreduce.index.IndexTool --schema VAMSI
--data-table TABLE_A --index-table IDX_TABLE_A_COL2 --output-path
hdfs://xx/user/user123/test/indexdata

*ERROR:*
2016-06-30 03:11:13,875 ERROR [main] index.IndexTool:  An exception occured
while performing the indexing job , error message  VAMSI.IDX_TABLE_A_COL2
is not an index table for VAMSI.TABLE_A

Thanks,
Vamsi Attluri
-- 
Vamsi Attluri


Re: dropping Phoenix local index is not dropping the local index table in HBase

2016-06-30 Thread Vamsi Krishna
Thanks Rajeshbabu.

On Wed, Jun 29, 2016 at 10:15 PM rajeshb...@apache.org <
chrajeshbab...@gmail.com> wrote:

> Since we are storing all local indexes data in a single shared table
> that's why we are not dropping when we drop a local index.
> We can check for any local indexes or not and then we can drop it.
>
> Now as part of PHOENIX-1734 we have reimplemented local indexes and
> storing local indexes also in same data table.
>
> Thanks,
> Rajeshbabu.
>
> On Tue, Jun 28, 2016 at 4:45 PM, Vamsi Krishna <vamsi.attl...@gmail.com>
> wrote:
>
>> Team,
>>
>> I'm using HDP 2.3.2 (HBase : 1.1.2, Phoenix : 4.4.0).
>> *Question: *Dropping Phoenix local index is not dropping the local index
>> table in HBase. Can someone explain why?
>>
>> Phoenix:
>> CREATE TABLE IF NOT EXISTS VAMSI.TABLE_B (COL1 VARCHAR(36) , COL2
>> VARCHAR(36) , COL3 VARCHAR(36) CONSTRAINT TABLE_B_PK PRIMARY KEY (COL1))
>> COMPRESSION='SNAPPY', SALT_BUCKETS=5;
>> CREATE LOCAL INDEX IF NOT EXISTS IDX_TABLE_A_COL2 ON VAMSI.TABLE_A (COL2);
>> DROP INDEX IF EXISTS IDX_TABLE_A_COL2 ON VAMSI.TABLE_A;
>>
>> hbase(main):012:0> list '_LOCAL.*'
>> TABLE
>> _LOCAL_IDX_VAMSI.TABLE_A
>>
>> Thanks,
>> Vamsi Attluri
>> --
>> Vamsi Attluri
>>
>
> --
Vamsi Attluri


Re: For multiple local indexes on Phoenix table only one local index table is being created in HBase

2016-06-30 Thread Vamsi Krishna
Thanks Ankit.

On Wed, Jun 29, 2016 at 12:11 PM Ankit Singhal <ankitsingha...@gmail.com>
wrote:

> Hi Vamsi,
>
> Phoenix uses single local Index table for all the local indexes created on
> a particular data table.
> Rows are differentiated by local index sequence id and filtered when
> requested during the query for particular index.
>
> Regards,
> Ankit Singhal
>
> Re
>
> On Tue, Jun 28, 2016 at 4:18 AM, Vamsi Krishna <vamsi.attl...@gmail.com>
> wrote:
>
>> Team,
>>
>> I'm using HDP 2.3.2 (HBase : 1.1.2, Phoenix : 4.4.0).
>> *Question:* For multiple local indexes on Phoenix table only one local
>> index table is being created in HBase. Is this regular behavior? Can
>> someone explain why?
>>
>> Phoenix:
>> CREATE TABLE IF NOT EXISTS VAMSI.TABLE_B (COL1 VARCHAR(36) , COL2
>> VARCHAR(36) , COL3 VARCHAR(36) CONSTRAINT TABLE_B_PK PRIMARY KEY (COL1))
>> COMPRESSION='SNAPPY', SALT_BUCKETS=5;
>> CREATE LOCAL INDEX IF NOT EXISTS IDX_TABLE_A_COL2 ON VAMSI.TABLE_A (COL2);
>> CREATE LOCAL INDEX IF NOT EXISTS IDX_TABLE_A_COL3 ON VAMSI.TABLE_A (COL3);
>>
>> hbase(main):012:0> list '_LOCAL.*'
>> TABLE
>> _LOCAL_IDX_VAMSI.TABLE_A
>>
>> Thanks,
>> Vamsi Attluri
>> --
>> Vamsi Attluri
>>
>
> --
Vamsi Attluri


phoenix explain plan not showing any difference after adding a local index on the table column that is used in query filter

2016-06-28 Thread Vamsi Krishna
Team,

I'm using HDP 2.3.2 (HBase : 1.1.2, Phoenix : 4.4.0).
*Question: *phoenix explain plan not showing any difference after adding a
local index on the table column that is used in query filter. Can someone
please explain why?

*Create table:*
CREATE TABLE IF NOT EXISTS VAMSI.TABLE_A (COL1 VARCHAR(36) , COL2
VARCHAR(36) , COL3 VARCHAR(36) CONSTRAINT TABLE_A_PK PRIMARY KEY (COL1))
COMPRESSION='SNAPPY', SALT_BUCKETS=5;
*Insert data:*
upsert into vamsi.table_a values ('abc123','abc','123');
upsert into vamsi.table_a values ('def456','def','456');

*Explain plan:*
explain select * from vamsi.table_a where col2 = 'abc';
+-+
|PLAN |
+-+
| CLIENT 5-CHUNK PARALLEL 5-WAY FULL SCAN OVER VAMSI.TABLE_A  |
| SERVER FILTER BY COL2 = 'abc'   |
+-+

*Create local index:*
CREATE LOCAL INDEX IF NOT EXISTS IDX_TABLE_A_COL2 ON VAMSI.TABLE_A (COL2);

*Explain plan:*
explain select * from vamsi.table_a where col2 = 'abc';
+-+
|PLAN |
+-+
| CLIENT 5-CHUNK PARALLEL 5-WAY FULL SCAN OVER VAMSI.TABLE_A  |
| SERVER FILTER BY COL2 = 'abc'   |
+-+

Thanks,
Vamsi Attluri

-- 
Vamsi Attluri


For multiple local indexes on Phoenix table only one local index table is being created in HBase

2016-06-28 Thread Vamsi Krishna
Team,

I'm using HDP 2.3.2 (HBase : 1.1.2, Phoenix : 4.4.0).
*Question:* For multiple local indexes on Phoenix table only one local
index table is being created in HBase. Is this regular behavior? Can
someone explain why?

Phoenix:
CREATE TABLE IF NOT EXISTS VAMSI.TABLE_B (COL1 VARCHAR(36) , COL2
VARCHAR(36) , COL3 VARCHAR(36) CONSTRAINT TABLE_B_PK PRIMARY KEY (COL1))
COMPRESSION='SNAPPY', SALT_BUCKETS=5;
CREATE LOCAL INDEX IF NOT EXISTS IDX_TABLE_A_COL2 ON VAMSI.TABLE_A (COL2);
CREATE LOCAL INDEX IF NOT EXISTS IDX_TABLE_A_COL3 ON VAMSI.TABLE_A (COL3);

hbase(main):012:0> list '_LOCAL.*'
TABLE
_LOCAL_IDX_VAMSI.TABLE_A

Thanks,
Vamsi Attluri
-- 
Vamsi Attluri


dropping Phoenix local index is not dropping the local index table in HBase

2016-06-28 Thread Vamsi Krishna
Team,

I'm using HDP 2.3.2 (HBase : 1.1.2, Phoenix : 4.4.0).
*Question: *Dropping Phoenix local index is not dropping the local index
table in HBase. Can someone explain why?

Phoenix:
CREATE TABLE IF NOT EXISTS VAMSI.TABLE_B (COL1 VARCHAR(36) , COL2
VARCHAR(36) , COL3 VARCHAR(36) CONSTRAINT TABLE_B_PK PRIMARY KEY (COL1))
COMPRESSION='SNAPPY', SALT_BUCKETS=5;
CREATE LOCAL INDEX IF NOT EXISTS IDX_TABLE_A_COL2 ON VAMSI.TABLE_A (COL2);
DROP INDEX IF EXISTS IDX_TABLE_A_COL2 ON VAMSI.TABLE_A;

hbase(main):012:0> list '_LOCAL.*'
TABLE
_LOCAL_IDX_VAMSI.TABLE_A

Thanks,
Vamsi Attluri
-- 
Vamsi Attluri


Re: Phoenix Upsert with SELECT behaving strange

2016-05-18 Thread Radha krishna
Hi,

Thanks for your reply, I will check with the latest phoenix version and
update the result.
And one more observation with the phoenix version 4.4.0 if i mentioned
limit (big table record count) in the Upsert with select query
its working fine.

like

Sort Merge Join

UPSERT INTO Target_Table SELECT /*+ USE_SORT_MERGE_JOIN*/
big.col1,big.col2...(102 columns) FROM BIG_TABLE as big JOIN SMALL_TABLE as
small ON big.col1=small.col1 where big.col2=small.col2 limit
;

Hash Join

UPSERT INTO Target_Table SELECT big.col1,big.col2...(102 columns) FROM
BIG_TABLE as big JOIN SMALL_TABLE as small ON big.col1=small.col1 where
big.col2=small.col2 limit ;

I tested with the above statement and it is working fine up to for the
record counts big table 100 million rows and small table 15 million rows
if the small table record count's increased like 20 millions i am getting
the below error


16/05/17 18:02:34 WARN client.HTable: Error calling coprocessor service
org.apache.phoenix.coprocessor.generated.ServerCachingProtos$ServerCachingService
for row \x0D\x00\x00
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java
heap space
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1763)
at
org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1719)
at
org.apache.phoenix.cache.ServerCacheClient$1.call(ServerCacheClient.java:188)
at
org.apache.phoenix.cache.ServerCacheClient$1.call(ServerCacheClient.java:182)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:172)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space


Thanks & Regards
Radha Krishna





On Wed, May 18, 2016 at 12:04 AM, Maryann Xue <maryann@gmail.com> wrote:

> Hi Radha,
>
> Thanks for reporting this issue! Would you mind trying it with latest
> Phoenix version?
>
> Thanks,
> Maryann
>
> On Tue, May 17, 2016 at 8:19 AM, Radha krishna <grkmc...@gmail.com> wrote:
>
>> Hi I am performing some join operation in phoenix console and storing the
>> result into another table but the same query some time showing below error
>> messages and some times it is inserting the result into the table.
>>
>> Error Messages:
>>
>> 1)
>>
>> Error: ERROR 201 (22000): Illegal data. ERROR 201 (22000): Illegal data.
>> Expected length of at least 96 bytes, but had 15 (state=22000,code=201)
>> java.sql.SQLException: ERROR 201 (22000): Illegal data. ERROR 201
>> (22000): Illegal data. Expected length of at least 96 bytes, but had 15
>> at
>> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:395)
>> at
>> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
>> at
>> org.apache.phoenix.util.ServerUtil.parseRemoteException(ServerUtil.java:131)
>> at
>> org.apache.phoenix.util.ServerUtil.parseServerExceptionOrNull(ServerUtil.java:115)
>> at
>> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:104)
>> at
>> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:538)
>> at
>> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:510)
>> at
>> org.apache.phoenix.iterate.RoundRobinResultIterator.getIterators(RoundRobinResultIterator.java:176)
>> at
>> org.apache.phoenix.iterate.RoundRobinResultIterator.next(RoundRobinResultIterator.java:91)
>> at
>> org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
>> at
>> org.apache.phoenix.compile.UpsertCompiler$2.execute(UpsertCompiler.java:737)
>> at
>> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:305)
>> at
>> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:297)
>> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>> at
>> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:295)
>> at
>> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1255)
>> at sqlline.Commands.execute(Commands.java:822)
>> at sqlline.Comma

Phoenix Upsert with SELECT behaving strange

2016-05-17 Thread Radha krishna
145)
at
org.apache.phoenix.util.ServerUtil.parseRemoteException(ServerUtil.java:131)
at
org.apache.phoenix.util.ServerUtil.parseServerExceptionOrNull(ServerUtil.java:115)
at
org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:104)
at
org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:538)
at
org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:510)
at
org.apache.phoenix.iterate.RoundRobinResultIterator.getIterators(RoundRobinResultIterator.java:176)
at
org.apache.phoenix.iterate.RoundRobinResultIterator.next(RoundRobinResultIterator.java:91)
at
org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
at
org.apache.phoenix.compile.UpsertCompiler$2.execute(UpsertCompiler.java:737)
at
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:305)
at
org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:297)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:295)
at
org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1255)
at sqlline.Commands.execute(Commands.java:822)
at sqlline.Commands.sql(Commands.java:732)
at sqlline.SqlLine.dispatch(SqlLine.java:808)
at sqlline.SqlLine.begin(SqlLine.java:681)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:292)


Note : i am performing inner join, the join tables and the storing the
result table has the same structure( same column families , same
compression , same salt buckets )
why it is behaving inconsistently. can any one help in the issue

Environment:

Ø  Hadoop Distribution : Hortonworks

Ø  Spark Version : 1.6

Ø  HBASE Version: 1.1.2

Ø  Phoenix Version: 4.4.0

Join command


Sort Merge Join

UPSERT INTO Target_Table SELECT /*+ USE_SORT_MERGE_JOIN*/
big.col1,big.col2...(102 columns) FROM BIG_TABLE as big JOIN SMALL_TABLE as
small ON big.col1=small.col1 where big.col2=small.col2;

Hash Join

UPSERT INTO Target_Table SELECT big.col1,big.col2...(102 columns) FROM
BIG_TABLE as big JOIN SMALL_TABLE as small ON big.col1=small.col1 where
big.col2=small.col2;



Thanks & Regards
   Radha krishna


Re: PHOENIX SPARK - DataFrame for BulkLoad

2016-05-17 Thread Radha krishna
Hi

I have the same scenario, can you share your metrics like column count for
each row, number of SALT_BUCKETS, compression technique which you used and
how much time it is taking to load the complete data.

my scenario is I have to load 1.9 billions of records ( approx 20 files
data each file contains 100 million rows and 102 columns per each row)
currently it is taking 35 to 45 minutes to load one file data



On Tue, May 17, 2016 at 3:51 PM, Mohanraj Ragupathiraj <
mohanaug...@gmail.com> wrote:

> I have 100 million records to be inserted to a HBase table (PHOENIX) as a
> result of a Spark Job. I would like to know if i convert it to a Dataframe
> and save it, will it do Bulk load (or) it is not the efficient way to write
> data to Phoenix HBase table
>
> --
> Thanks and Regards
> Mohan
>



-- 








Thanks & Regards
   Radha krishna


GenericMutableRow cannot be cast to org.apache.spark.sql.Row

2016-05-12 Thread Radha krishna
), p(10), p(11), p(12), p(13), p(14), p(15),
p(16), p(17), p(18), p(19), p(20), p(21), p(22), p(23), p(24), p(25),
p(26), p(27), p(28), p(29), p(30), p(31), p(32), p(33), p(34), p(35),
p(36), p(37), p(38), p(39), p(40), p(41), p(42), p(43), p(44), p(45),
p(46), p(47), p(48), p(49), p(50), p(51), p(52), p(53), p(54), p(55),
p(56), p(57), p(58), p(59), p(60), p(61), p(62), p(63), p(64), p(65),
p(66), p(67), p(68), p(69), p(70), p(71), p(72), p(73), p(74), p(75),
p(76), p(77), p(78), p(79), p(80), p(81), p(82), p(83), p(84), p(85),
p(86), p(87), p(88), p(89), p(90), p(91), p(92), p(93), p(94), p(95),
p(96), p(97), p(98), p(99), p(100), p(101), p(102)))

  // Apply the schema to the RDD.
  val input_incr_rdd_df = sqlContext.createDataFrame(input_incr_rdd,
schema)
  input_incr_rdd_df.registerTempTable("INCR_TABLE")

  val hist_hist_df =
sqlContext.read.format("org.apache.phoenix.spark").options(Map("table" ->
"Phoenix_Table_Name", "zkUrl" -> "g4t7565.houston.hp.com:2181
:/hbase-unsecure")).load()
  hist_hist_df.registerTempTable("HIST_TABLE")


  val matched_rc = input_incr_rdd_df.join(hist_hist_df,
input_incr_rdd_df("Col1") <=> hist_hist_df("col1")
   && input_incr_rdd_df("col2") <=> hist_hist_df("col2"))

  matched_rc.show()




Thanks & Regards
   Radha krishna


Fwd: How to perform Read and Write operations( for TB's of data) on Phoenix tables using spark

2016-05-10 Thread Radha krishna
Hi All,

In one of my project we thought of using HBASE as back end.

My use case is I have 1TB of data which will come as multiple files (one
file around 40GB with 100 Million rows & contains 102 columns for each row)
I am trying to load this files using spark + Phoenix it is taking around 2
hours.

can you please suggest how to fine tune the load process and how to load
back the data using spark.

Environment details
==
Hadoop Distribution : Hortonworks
Spark Version : 1.6
Hbase Version: 1.1.2
Phoenix Version: 4.4.0
Number of nodes: 19

Please find the attachment for the create and load scripts.


Thanks & Regards
   Radha krishna
Phoenix create table with one column family and 19 salt buckets 
===

CREATE TABLE IF NOT EXISTS MY_Table_Name(
"BASE_PROD_ID" VARCHAR,
"SRL_NR_ID" VARCHAR,
"CLF_1"."PROD_ID" VARCHAR,
. ( 102 columns )
CONSTRAINT my_pk PRIMARY KEY (BASE_PROD_ID, SRL_NR_ID))SALT_BUCKETS=19, 
COMPRESSION='GZ';


Spark Code
==

object InsertRecords {

  def main(args: Array[String]): Unit = {

try {

  val sparkConf = new SparkConf().setAppName("Phoenix_HbaseTest")
  val sparkContext = new SparkContext(sparkConf)
  val sqlContext = new SQLContext(sparkContext)
   
  val schemaString = "BASE_PROD_ID SRL_NR_ID PROD_ID ..."

  // Generate the schema based on the string of schemaString
  val schema = StructType(schemaString.split(" ").map(fieldName => 
StructField(fieldName, StringType, true)))
  
  // Convert records of the RDD (people) to Rows.
  val input_rdd = 
sparkContext.textFile(args(0)).map(_.split("\u001c")).map(p => 
Row(p(0),p(1).trim().toUpperCase(),p(2).trim().toUpperCase(),p(3),p(4),p(5),p(6),p(7),p(8),p(9),p(10),p(11),p(12),p(13),p(14),p(15),p(16),p(17),p(18),p(19),p(20),p(21),p(22),p(23),p(24),p(25),p(26),p(27),p(28),p(29),p(30),p(31),p(32),p(33),p(34),p(35),p(36),p(37),p(38),p(39),p(40),p(41),p(42),p(43),p(44),p(45),p(46),p(47),p(48),p(49),p(50),p(51),p(52),p(53),p(54),p(55),p(56),p(57),p(58),p(59),p(60),p(61),p(62),p(63),p(64),p(65),p(66),p(67),p(68),p(69),p(70),p(71),p(72),p(73),p(74),p(75),p(76),p(77),p(78),p(79),p(80),p(81),p(82),p(83),p(84),p(85),p(86),p(87),p(88),p(89),p(90),p(91),p(92),p(93),p(94),p(95),p(96),p(97),p(98),p(99),p(100),p(101),p(102)))
  
  // Apply the schema to the RDD.
  val inputDF = sqlContext.createDataFrame(input_rdd, schema)   
  
  
inputDF.write.format("org.apache.phoenix.spark").mode(args(1)).options(Map("table"
 -> args(2), "zkUrl" -> args(3),"-batchSize" -> args(4))).save()

  sparkContext.stop()
} catch {
  case t: Throwable => t.printStackTrace()
  case e: Exception => e.printStackTrace()
}

  }
}

Re: Does phoenix CsvBulkLoadTool write to WAL/Memstore

2016-03-19 Thread Vamsi Krishna
Thanks Gabriel & Ravi.

I have a data processing job wirtten in Spark-Scala.
I do a join on data from 2 data files (CSV files) and do data
transformation on the resulting data. Finally load the transformed data
into phoenix table using Phoenix-Spark plugin.
On seeing that Phoenix-Spark plugin goes through regular HBase write path
(writes to WAL), i'm thinking of option 2 to reduce the job execution time.

*Option 2:* Do data transformation in Spark and write the transformed data
to a CSV file and use Phoenix CsvBulkLoadTool to load data into Phoenix
table.

Has anyone tried this kind of exercise? Any thoughts.

Thanks,
Vamsi Attluri

On Tue, Mar 15, 2016 at 9:40 PM Ravi Kiran <maghamraviki...@gmail.com>
wrote:

> Hi Vamsi,
>The upserts through Phoenix-spark plugin definitely go through WAL .
>
>
> On Tue, Mar 15, 2016 at 5:56 AM, Gabriel Reid <gabriel.r...@gmail.com>
> wrote:
>
>> Hi Vamsi,
>>
>> I can't answer your question abotu the Phoenix-Spark plugin (although
>> I'm sure that someone else here can).
>>
>> However, I can tell you that the CsvBulkLoadTool does not write to the
>> WAL or to the Memstore. It simply writes HFiles and then hands those
>> HFiles over to HBase, so the memstore and WAL are never
>> touched/affected by this.
>>
>> - Gabriel
>>
>>
>> On Tue, Mar 15, 2016 at 1:41 PM, Vamsi Krishna <vamsi.attl...@gmail.com>
>> wrote:
>> > Team,
>> >
>> > Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore?
>> >
>> > Phoenix-Spark plugin:
>> > Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore?
>> >
>> > Thanks,
>> > Vamsi Attluri
>> > --
>> > Vamsi Attluri
>>
>
> --
Vamsi Attluri


how to tune phoenix CsvBulkLoadTool job

2016-03-19 Thread Vamsi Krishna
Hi,

I'm using CsvBulkLoadTool to load a csv data file into Phoenix/HBase table.

HDP Version : 2.3.2 (Phoenix Version : 4.4.0, HBase Version: 1.1.2)
CSV file size: 97.6 GB
No. of records: 1,439,000,238
Cluster: 13 node
Phoenix table salt-buckets: 13
Phoenix table compression: snappy
HBase table size after loading: 26.6 GB

The job completed in *1hrs, 39mins, 43sec*.
Average Map Time 5mins, 25sec
Average Shuffle Time *47mins, 46sec*
Average Merge Time 12mins, 22sec
Average Reduce Time *32mins, 9sec*

I'm looking for an opportunity to tune this job.
Could someone please help me with some pointers on how to tune this job?
Please let me know if you need to know any cluster configuration parameters
that I'm using.

*This is only a performance test. My PRODUCTION data file is 7x bigger.*

Thanks,
Vamsi Attluri

-- 
Vamsi Attluri


Re: Does phoenix CsvBulkLoadTool write to WAL/Memstore

2016-03-18 Thread Vamsi Krishna
Thanks Pari.

The frequency of the job is weekly.
No. of rows is around 10 billion.
Cluster is 13 node.
>From what you have mentioned I see that CsvBulkLoadTool is best option for
my scenario.

I see you have mentioned about increasing the batch size to accommodate
more rows.
Are you talking about the 'phoenix.mutate.batchSize' configuration
parameter?

Vamsi Attluri

On Wed, Mar 16, 2016 at 9:01 AM Pariksheet Barapatre <pbarapa...@gmail.com>
wrote:

> Hi Vamsi,
>
> How many number of rows your expecting out of your transformation and what
> is the frequency of job?
>
> If there are less number of row (< ~100K and this depends on cluster size
> as well), you can go ahead with phoenix-spark plug-in , increase  batch
> size to accommodate more rows, else use CVSbulkLoader.
>
> Thanks
> Pari
>
> On 16 March 2016 at 20:03, Vamsi Krishna <vamsi.attl...@gmail.com> wrote:
>
>> Thanks Gabriel & Ravi.
>>
>> I have a data processing job wirtten in Spark-Scala.
>> I do a join on data from 2 data files (CSV files) and do data
>> transformation on the resulting data. Finally load the transformed data
>> into phoenix table using Phoenix-Spark plugin.
>> On seeing that Phoenix-Spark plugin goes through regular HBase write path
>> (writes to WAL), i'm thinking of option 2 to reduce the job execution time.
>>
>> *Option 2:* Do data transformation in Spark and write the transformed
>> data to a CSV file and use Phoenix CsvBulkLoadTool to load data into
>> Phoenix table.
>>
>> Has anyone tried this kind of exercise? Any thoughts.
>>
>> Thanks,
>> Vamsi Attluri
>>
>> On Tue, Mar 15, 2016 at 9:40 PM Ravi Kiran <maghamraviki...@gmail.com>
>> wrote:
>>
>>> Hi Vamsi,
>>>The upserts through Phoenix-spark plugin definitely go through WAL .
>>>
>>>
>>> On Tue, Mar 15, 2016 at 5:56 AM, Gabriel Reid <gabriel.r...@gmail.com>
>>> wrote:
>>>
>>>> Hi Vamsi,
>>>>
>>>> I can't answer your question abotu the Phoenix-Spark plugin (although
>>>> I'm sure that someone else here can).
>>>>
>>>> However, I can tell you that the CsvBulkLoadTool does not write to the
>>>> WAL or to the Memstore. It simply writes HFiles and then hands those
>>>> HFiles over to HBase, so the memstore and WAL are never
>>>> touched/affected by this.
>>>>
>>>> - Gabriel
>>>>
>>>>
>>>> On Tue, Mar 15, 2016 at 1:41 PM, Vamsi Krishna <vamsi.attl...@gmail.com>
>>>> wrote:
>>>> > Team,
>>>> >
>>>> > Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore?
>>>> >
>>>> > Phoenix-Spark plugin:
>>>> > Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore?
>>>> >
>>>> > Thanks,
>>>> > Vamsi Attluri
>>>> > --
>>>> > Vamsi Attluri
>>>>
>>>
>>> --
>> Vamsi Attluri
>>
>
>
>
> --
> Cheers,
> Pari
>
-- 
Vamsi Attluri


Does phoenix CsvBulkLoadTool write to WAL/Memstore

2016-03-15 Thread Vamsi Krishna
Team,

Does phoenix CsvBulkLoadTool write to HBase WAL/Memstore?

Phoenix-Spark plugin:
Does saveToPhoenix method on RDD[Tuple] write to HBase WAL/Memstore?

Thanks,
Vamsi Attluri
-- 
Vamsi Attluri


Save dataframe to Phoenix

2016-02-16 Thread Krishna
According Phoenix-Spark plugin docs, only SaveMode.Overwrite is supported
for saving dataframes to Phoenix table.

Are there any plans to support other save modes (append, ignore) anytime
soon? Only having overwrite option makes it useful for a small number of
use-cases.


Re: Announcing phoenix-for-cloudera 4.6.0

2016-01-17 Thread Krishna
What Phoenix version is in the parcels for CDH5.5.1? Is there a way to
extract jars from those parcels?

On Sun, Jan 17, 2016 at 5:52 AM, Alex Ott <alex...@gmail.com> wrote:

> The parcels provided by Cloudera were updated to run on CDH 5.5.
> I've installed it, but didn't run very complex tasks, but basic tasks
> works fine.
>
> Krishna  at "Fri, 15 Jan 2016 18:20:47 -0800" wrote:
>  K> Thanks Andrew. Are binaries available for CDH5.5.x?
>
>  K> On Tue, Nov 3, 2015 at 9:10 AM, Andrew Purtell <apurt...@apache.org>
> wrote:
>
>  K> Today I pushed a new branch '4.6-HBase-1.0-cdh5' and the tag
> 'v4.6.0-cdh5.4.5' (58fcfa6) to https://github.com/chiastic-security/
>  K> phoenix-for-cloudera. This is the Phoenix 4.6.0 release, modified
> to build against CDH 5.4.5 and possibly (but not tested)
>  K> subsequent CDH releases.
>  K>
>  K> If you want release tarballs I built from this, get them here:
>
>  K> Binaries
>  K>
>  K>
> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-bin.tar.gz
>  K>
>  K>
> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-bin.tar.gz.asc
>  (signature)
>  K>
>  K>
> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-bin.tar.gz.md5
> (MD5 sum)
>  K>
>  K>
> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-bin.tar.gz.sha
>  (SHA-1 sum)
>
>  K> Source
>  K>
>  K>
> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-src.tar.gz
>
>  K>
>  K>
> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-src.tar.gz.asc
>  (signature)
>  K>
>  K>
> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-src.tar.gz.md5
> (MD5 sum)
>  K>
>  K>
> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-src.tar.gz.sha
>  (SHA1-sum)
>
>  K> Signed with my code signing key D5365CCD.
>  K>
>  K> ​The source and these binaries incorporate changes from the
> Cloudera Labs fork of Phoenix (https://github.com/cloudera-labs/
>  K> phoenix), licensed under the ASL v2, Neither the source or binary
> artifacts are in any way "official" or supported by the Apache
>  K> Phoenix project. The source and artifacts are provided by me in a
> personal capacity for the convenience of would-be Phoenix users
>  K> that also use CDH. Please don't contact the Apache Phoenix project
> for any issues regarding this source and these binaries.
>  K>
>  K> --
>  K> Best regards,
>  K>
>  K>- Andy
>  K>
>  K> Problems worthy of attack prove their worth by hitting back. -
> Piet Hein (via Tom White)
>
>
>
> --
> With best wishes, Alex Ott
> http://alexott.blogspot.com/http://alexott.net/
> http://alexott-ru.blogspot.com/
> Skype: alex.ott
>


Re: Announcing phoenix-for-cloudera 4.6.0

2016-01-15 Thread Krishna
On the branch:  4.5-HBase-1.0-cdh5, I set cdh version to 5.5.1 in pom and
building the package produces following errors.
Repo: https://github.com/chiastic-security/phoenix-for-cloudera

[ERROR]
~/phoenix_related/phoenix-for-cloudera/phoenix-core/src/main/java/org/apache/phoenix/trace/util/Tracing.java:[176,82]
cannot find symbol
[ERROR] symbol:   method getParentId()
[ERROR] location: variable span of type org.apache.htrace.Span
[ERROR]
~/phoenix_related/phoenix-for-cloudera/phoenix-core/src/main/java/org/apache/phoenix/trace/TraceReader.java:[129,31]
cannot find symbol
[ERROR] symbol:   variable ROOT_SPAN_ID
[ERROR] location: interface org.apache.htrace.Span
[ERROR]
~/phoenix_related/phoenix-for-cloudera/phoenix-core/src/main/java/org/apache/phoenix/trace/TraceReader.java:[159,38]
cannot find symbol
[ERROR] symbol:   variable ROOT_SPAN_ID
[ERROR] location: interface org.apache.htrace.Span
[ERROR]
~/phoenix_related/phoenix-for-cloudera/phoenix-core/src/main/java/org/apache/phoenix/trace/TraceReader.java:[162,31]
cannot find symbol
[ERROR] symbol:   variable ROOT_SPAN_ID
[ERROR] location: interface org.apache.htrace.Span
[ERROR]
~/phoenix_related/phoenix-for-cloudera/phoenix-core/src/main/java/org/apache/phoenix/trace/TraceReader.java:[337,38]
cannot find symbol
[ERROR] symbol:   variable ROOT_SPAN_ID
[ERROR] location: interface org.apache.htrace.Span
[ERROR]
~/phoenix_related/phoenix-for-cloudera/phoenix-core/src/main/java/org/apache/phoenix/trace/TraceReader.java:[339,42]
cannot find symbol
[ERROR] symbol:   variable ROOT_SPAN_ID
[ERROR] location: interface org.apache.htrace.Span
[ERROR]
~/phoenix_related/phoenix-for-cloudera/phoenix-core/src/main/java/org/apache/phoenix/trace/TraceReader.java:[359,58]
cannot find symbol
[ERROR] symbol:   variable ROOT_SPAN_ID
[ERROR] location: interface org.apache.htrace.Span
[ERROR]
~/phoenix_related/phoenix-for-cloudera/phoenix-core/src/main/java/org/apache/phoenix/trace/TraceMetricSource.java:[99,74]
cannot find symbol
[ERROR] symbol:   method getParentId()
[ERROR] location: variable span of type org.apache.htrace.Span
[ERROR]
~/phoenix_related/phoenix-for-cloudera/phoenix-core/src/main/java/org/apache/phoenix/trace/TraceMetricSource.java:[110,60]
incompatible types
[ERROR] required: java.util.Map<byte[],byte[]>
[ERROR] found:java.util.Map<java.lang.String,java.lang.String>
[ERROR]
~/phoenix_related/phoenix-for-cloudera/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java:[550,57]
 is not
abstract and does not override abstract method
nextRaw(java.util.List,org.apache.hadoop.hbase.regionserver.ScannerContext)
in org.apache.hadoop.hbase.regionserver.RegionScanner


On Fri, Jan 15, 2016 at 6:20 PM, Krishna <research...@gmail.com> wrote:

> Thanks Andrew. Are binaries available for CDH5.5.x?
>
> On Tue, Nov 3, 2015 at 9:10 AM, Andrew Purtell <apurt...@apache.org>
> wrote:
>
>> Today I pushed a new branch '4.6-HBase-1.0-cdh5' and the tag
>> 'v4.6.0-cdh5.4.5' (58fcfa6) to
>> https://github.com/chiastic-security/phoenix-for-cloudera. This is the
>> Phoenix 4.6.0 release, modified to build against CDH 5.4.5 and possibly
>> (but not tested) subsequent CDH releases.
>>
>> If you want release tarballs I built from this, get them here:
>>
>> Binaries
>>
>> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-bin.tar.gz
>>
>> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-bin.tar.gz.asc
>>  (signature)
>>
>> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-bin.tar.gz.md5
>> (MD5 sum)
>>
>> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-bin.tar.gz.sha
>>  (SHA-1 sum)
>>
>>
>> Source
>>
>> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-src.tar.gz
>>
>>
>>
>> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-src.tar.gz.asc
>>  (signature)
>>
>>
>> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-src.tar.gz.md5
>> (MD5 sum)
>>
>>
>> http://apurtell.s3.amazonaws.com/phoenix/phoenix-4.6.0-cdh5.4.5-src.tar.gz.sha
>>  (SHA1-sum)
>>
>>
>> Signed with my code signing key D5365CCD.
>>
>> ​The source and these binaries incorporate changes from the Cloudera Labs
>> fork of Phoenix (https://github.com/cloudera-labs/phoenix), licensed
>> under the ASL v2, Neither the source or binary artifacts are in any way
>> "official" or supported by the Apache Phoenix project. The source and
>> artifacts are provided by me in a personal capacity for the convenience of
>> would-be Phoenix users that also use CDH. Please don't contact the Apache
>> Phoenix project for any issues regarding this source and these binaries.
>>
>> --
>> Best regards,
>>
>>- Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
>
>


Re: Backup and Recovery for disaster recovery

2015-12-23 Thread Krishna
Did you check my response to your previous mail?

On Wednesday, December 23, 2015, Krish Rajan <yume.kris...@gmail.com> wrote:

> Hi,
>
> We’re using HBase under phoenix. Need to setup DR site and ongoing
> replication.
> Phoenix tables are salted tables. In this scenario what is the best method
> to copy data to remote cluster?
> People give different opinions.  Replication will not work for us as we’re
> using bulk loading.
>
> Can you advise what are our options to copy data to remote cluster and
> keeping it up to date.
> Thanks for your inputs.
>
> -Regards
> Krishna
>
>


spark plugin with java

2015-12-01 Thread Krishna
Hi,

Is there a working example for using spark plugin in Java? Specifically,
what's the java equivalent for creating a dataframe as shown here in scala:

val df = sqlContext.phoenixTableAsDataFrame("TABLE1", Array("ID",
"COL1"), conf = configuration)


Any known issues with Phoenix Schema feature

2015-09-21 Thread Vamsi Krishna
Hi,

We tried to use HBase namespace feature with Phoenix and we see there is an
issue with creating LOCAL Indexes when we use HBase namespace.

We are planning on using Phoenix Schema feature in our application.
If someone has already tried it and seen any issues with 'schema' feature,
could you please let us know.

Cluster info: HDP 2.3

Thanks,
Vamsi Attluri

-- 
Vamsi Attluri


Re: setting up community repo of Phoenix for CDH5?

2015-09-12 Thread Krishna
As explained here, there are some code changes too in addition to pom
related changes.

http://stackoverflow.com/a/31934434/165130



On Friday, September 11, 2015, Andrew Purtell <andrew.purt...@gmail.com>
wrote:

> Or once parameterized, add a default off profile that redefines them all
> in one shot after the builder activates the profile on the maven command
> line with -P ...
>
>
>
> On Sep 11, 2015, at 7:05 AM, Andrew Purtell <andrew.purt...@gmail.com
> <javascript:_e(%7B%7D,'cvml','andrew.purt...@gmail.com');>> wrote:
>
> The group IDs and versions can be parameterized in the POM so they can be
> overridden on the maven command line with -D. That would be easy and
> something I think we could get committed without any controversy.
>
>
> On Sep 11, 2015, at 6:53 AM, James Heather <james.heat...@mendeley.com
> <javascript:_e(%7B%7D,'cvml','james.heat...@mendeley.com');>> wrote:
>
> Yes, my plan is to create a fork of the main repo, so that we can still
> merge new Phoenix code into the CDH-compatible version.
>
> Before that, I do wonder whether it's possible to suggest a few changes to
> the main repo that would allow for compiling a CDH-compatible version,
> without needing to maintain a separate repo. The bulk of the changes are to
> dependencies in the pom, which suggests that it could be done to accept a
> switch to mvn build.
>
> James
>
> On 11/09/15 14:50, Andrew Purtell wrote:
>
> The first step I think is a repo with code that compiles. Please
> initialize it by forking github.com/apache/phoenix so we have common
> ancestors. Once we have a clear idea (by diff) what is required we can
> figure out if we can support compatibility in some way.
>
>
> On Sep 9, 2015, at 11:00 PM, Krishna <
> <javascript:_e(%7B%7D,'cvml','research...@gmail.com');>
> research...@gmail.com
> <javascript:_e(%7B%7D,'cvml','research...@gmail.com');>> wrote:
>
> I can volunteer to spend some time on this.
>
> CDH artifacts are available in Maven repo but from reading other threads
> on CDH-Phoenix compatibilty, it looks like there are some code changes to
> be made in Phoenix to successfully compile against CDH.
>
> Here are questions to address:
> 1) How to maintain CDH compatible Phoenix code base?
> 2) Is having a CDH compatible branch even an option?
>
> Krishna
>
>
>
> On Friday, August 28, 2015, Andrew Purtell <andrew.purt...@gmail.com
> <javascript:_e(%7B%7D,'cvml','andrew.purt...@gmail.com');>> wrote:
>
>> Yes I am interested. Assuming CDH artifacts are publicly available in a
>> Maven repo somewhere, which I believe is the case, perhaps we (the Phoenix
>> project/community) could set up a Jenkins job that builds against them and
>> makes the resulting build artifacts available. They would never be an
>> official release, just a best effort convenience. Would that work? I think
>> little must be done besides compile against the CDH artifacts for binary
>> compatibility.
>>
>>
>> > On Aug 28, 2015, at 11:19 AM, James Heather <james.heat...@mendeley.com>
>> wrote:
>> >
>> > Is anyone interested in helping with getting an up-to-date
>> CDH5-compatible build of Phoenix up and running?
>> >
>> > Cloudera has a build of Phoenix 4.3 (
>> <https://github.com/cloudera-labs/phoenix>
>> https://github.com/cloudera-labs/phoenix), but this is now two versions
>> behind, and there seems little desire at Cloudera to keep it updated.
>> >
>> > I imagine that by looking at the differences between vanilla 4.3 and
>> cloudera labs 4.3, and with some guidance from this list, we could get a
>> good idea of what would need to be modified in 4.5+ and keep a
>> CDH5-compatible build up to date.
>> >
>> > Yes?
>> >
>> > James
>>
>
>


Re: Using Phoenix Bulk Upload CSV to upload 200GB data

2015-09-11 Thread Krishna
1400 mappers on 9 nodes is about 155 mappers per datanode which sounds high
to me. There are very few specifics in your mail. Are you using YARN? Can
you provide details like table structure, # of rows & columns, etc. Do you
have an error stack?


On Friday, September 11, 2015, Gaurav Kanade 
wrote:

> Hi All
>
> I am new to Apache Phoenix (and relatively new to MR in general) but I am
> trying a bulk insert of a 200GB tar separated file in an HBase table. This
> seems to start off fine and kicks off about ~1400 mappers and 9 reducers (I
> have 9 data nodes in my setup).
>
> At some point I seem to be running into problems with this process as it
> seems the data nodes run out of capacity (from what I can see my data nodes
> have 400GB local space). It does seem that certain reducers eat up most of
> the capacity on these - thus slowing down the process to a crawl and
> ultimately leading to Node Managers complaining that Node Health is bad
> (log-dirs and local-dirs are bad)
>
> Is there some inherent setting I am missing that I need to set up for the
> particular job ?
>
> Any pointers would be appreciated
>
> Thanks
>
> --
> Gaurav Kanade,
> Software Engineer
> Big Data
> Cloud and Enterprise Division
> Microsoft
>


Re: setting up community repo of Phoenix for CDH5?

2015-09-10 Thread Krishna
Let me know when you have setup the repo; I am aware of the code changes to
make for CDH compatibility. stack overflow also has details.

On Wed, Sep 9, 2015 at 11:08 PM, James Heather <james.heat...@mendeley.com>
wrote:

> Thanks! I'll set up a repo today, and we can see how far we get with it.
>
> Another recent thread points to a stack overflow answer with some clues.
> On 10 Sep 2015 7:00 am, "Krishna" <research...@gmail.com> wrote:
>
>> I can volunteer to spend some time on this.
>>
>> CDH artifacts are available in Maven repo but from reading other threads
>> on CDH-Phoenix compatibilty, it looks like there are some code changes to
>> be made in Phoenix to successfully compile against CDH.
>>
>> Here are questions to address:
>> 1) How to maintain CDH compatible Phoenix code base?
>> 2) Is having a CDH compatible branch even an option?
>>
>> Krishna
>>
>>
>>
>> On Friday, August 28, 2015, Andrew Purtell <andrew.purt...@gmail.com>
>> wrote:
>>
>>> Yes I am interested. Assuming CDH artifacts are publicly available in a
>>> Maven repo somewhere, which I believe is the case, perhaps we (the Phoenix
>>> project/community) could set up a Jenkins job that builds against them and
>>> makes the resulting build artifacts available. They would never be an
>>> official release, just a best effort convenience. Would that work? I think
>>> little must be done besides compile against the CDH artifacts for binary
>>> compatibility.
>>>
>>>
>>> > On Aug 28, 2015, at 11:19 AM, James Heather <
>>> james.heat...@mendeley.com> wrote:
>>> >
>>> > Is anyone interested in helping with getting an up-to-date
>>> CDH5-compatible build of Phoenix up and running?
>>> >
>>> > Cloudera has a build of Phoenix 4.3 (
>>> https://github.com/cloudera-labs/phoenix), but this is now two versions
>>> behind, and there seems little desire at Cloudera to keep it updated.
>>> >
>>> > I imagine that by looking at the differences between vanilla 4.3 and
>>> cloudera labs 4.3, and with some guidance from this list, we could get a
>>> good idea of what would need to be modified in 4.5+ and keep a
>>> CDH5-compatible build up to date.
>>> >
>>> > Yes?
>>> >
>>> > James
>>>
>>


Re: Phoenix map reduce

2015-09-10 Thread Krishna
Another option is to create HFiles using csv bulk loader on one cluster,
transfer them to the backup cluster and run LoadIncrementalHFiles(...).

On Tue, Sep 1, 2015 at 11:53 AM, Jean-Marc Spaggiari <
jean-m...@spaggiari.org> wrote:

> Hi Gaurav,
>
> bulk load bypass the WAL, that's correct. It's true for Phoenix, it's true
> for HBase (outside of Phoenix).
>
> If you have replication activated, you will have to bulkload the data into
> the 2 clusters. Transfert your csv files on the other side too and bulkload
> from there.
>
> JM
>
> 2015-09-01 14:51 GMT-04:00 Gaurav Agarwal :
>
>> Hello
>>
>> We are using phoenix Map reduce CSV uploader to load data into HBASe . I
>> read documentation on Phoenix site, it will only create HFLE no WAL logs
>> will be created.Please confirm understanding is correct or wrong
>>
>> We have to use HBASe replication across cluster for Master Master
>> scenario. Will the replication work in that scenario or do we need to use
>> Copy Table to replicate ?
>>
>> thanks
>>
>
>


setting up community repo of Phoenix for CDH5?

2015-09-10 Thread Krishna
I can volunteer to spend some time on this.

CDH artifacts are available in Maven repo but from reading other threads on
CDH-Phoenix compatibilty, it looks like there are some code changes to be
made in Phoenix to successfully compile against CDH.

Here are questions to address:
1) How to maintain CDH compatible Phoenix code base?
2) Is having a CDH compatible branch even an option?

Krishna



On Friday, August 28, 2015, Andrew Purtell <andrew.purt...@gmail.com
<javascript:_e(%7B%7D,'cvml','andrew.purt...@gmail.com');>> wrote:

> Yes I am interested. Assuming CDH artifacts are publicly available in a
> Maven repo somewhere, which I believe is the case, perhaps we (the Phoenix
> project/community) could set up a Jenkins job that builds against them and
> makes the resulting build artifacts available. They would never be an
> official release, just a best effort convenience. Would that work? I think
> little must be done besides compile against the CDH artifacts for binary
> compatibility.
>
>
> > On Aug 28, 2015, at 11:19 AM, James Heather <james.heat...@mendeley.com>
> wrote:
> >
> > Is anyone interested in helping with getting an up-to-date
> CDH5-compatible build of Phoenix up and running?
> >
> > Cloudera has a build of Phoenix 4.3 (
> https://github.com/cloudera-labs/phoenix), but this is now two versions
> behind, and there seems little desire at Cloudera to keep it updated.
> >
> > I imagine that by looking at the differences between vanilla 4.3 and
> cloudera labs 4.3, and with some guidance from this list, we could get a
> good idea of what would need to be modified in 4.5+ and keep a
> CDH5-compatible build up to date.
> >
> > Yes?
> >
> > James
>


Re: Importing existing HBase table's rowkey

2015-07-22 Thread Krishna
You can map HBase composite row key to Phoenix primary key only if
serialization used for HBase matches with Phoenix. Ex: leading 1 byte for
bucket, 0-byte char for separating columns, etc.

If you used a different mechanism to serialize rowkey in HBase, you can
still map it Phoenix table but declare PK as VARBINARY and see if you can
create a UDF to separate columns.

On Tuesday, July 21, 2015, Anchal Agrawal anc...@yahoo-inc.com wrote:

 Hi,

 I'm trying to map an existing HBase table to Phoenix. Can the existing
 HBase table's rowkey be imported as the rowkey of the Phoenix table? On
 this page (
 https://phoenix.apache.org/faq.html#How_I_map_Phoenix_table_to_an_existing_HBase_table),
 there's an example:

 CREATE VIEW t1 ( pk VARCHAR PRIMARY KEY, f1.val VARCHAR )

 Here, is pk the HBase table's column that's being used as the primary
 key, or is it a Phoenix keyword/placeholder to refer to the HBase table's
 rowkey? My table's rowkey is made up of several fields that are not stored
 as columns in that table. If I could just import the rowkey into Phoenix,
 that'd be great.

 Thank you!

 Sincerely,
 Anchal



Re: Permissions Question

2015-07-06 Thread Krishna
The owner of the directory containing HFiles should be 'hbase' user and
ownership can set using 'chown' command.

On Mon, Jul 6, 2015 at 7:12 AM, Riesland, Zack zack.riesl...@sensus.com
wrote:

  I’ve been running CsvBulkLoader as ‘hbase’ and that has worked well.



 But I now need to integrate with some scripts that will be run as another
 user.



 When I run under a different account, the CsvBulkLoader runs and creates
 the HFiles, but then encounters permission issues attempting to write the
 data to HBase.



 Can someone point me in the right direction for solving this?



 How can I give ‘hbase’ write permissions to a different user?



 Thanks!







Re: Phoenix Multitenancy - sqlline tenant-specific connection

2015-03-13 Thread Vamsi Krishna
Thanks Nick  Gabriel.

Vamsi Attluri.

On Fri, Mar 13, 2015 at 12:11 AM, Gabriel Reid gabriel.r...@gmail.com
wrote:

 The correct syntax for a Phoenix JDBC url with a tenant id is as follows:
 localhost:2181;TenantId=foo

 Note that the TenantId parameter is capitalized (it's case-sensitive).

 However (on Linux or Mac at least), it's not currently possible to connect
 with a tenant-specific connection like this, as the parameter handling done
 in sqlline.py doesn't properly quote the full JDBC url. I've created
 PHOENIX-1733 [1] to track this.

 Once PHOENIX-1733 is resolved, you'll be able to connect as follows (note
 the quotes around the connection string):

 $ ./bin/sqlline.py 'localhost:2181:/hbase;TenantId=foo'

 - Gabriel

 1. https://issues.apache.org/jira/browse/PHOENIX-1733


 On Fri, Mar 13, 2015 at 1:06 AM Nick Dimiduk ndimi...@gmail.com wrote:

 This works fine for me:

 $ ./bin/sqlline.py localhost:2181:/hbase;tenantId=foo

 At least, it launches without complaint. I don't have any tables with
 tenants enabled.


 On Thu, Mar 12, 2015 at 4:48 PM, Vamsi Krishna vamsi.attl...@gmail.com
 wrote:

 I got following error when I tried that:

 java -cp
 /etc/hbase/conf:/usr/hdp/2.2.0.0-2041/phoenix/bin/../phoenix-4.2.0.2.2.0.0-2041-client.jar
 -Dlog4j.configuration=file:/usr/hdp/2.2.0.0-2041/phoenix/bin/log4j.properties
 sqlline.SqlLine -d org.apache.phoenix.jdbc.PhoenixDriver -u
 jdbc:phoenix:localhost:2181:/hbase-unsecure;tenantid=abc -n none -p none
 --color=true --fastConnect=false --verbose=true
 --isolation=TRANSACTION_READ_COMMITTED

 15/03/12 23:48:02 WARN impl.MetricsConfig: Cannot locate configuration:
 tried hadoop-metrics2-phoenix.properties,hadoop-metrics2.properties

 Error:  (state=,code=0)

 sqlline version 1.1.2

 Vamsi Attluri.

 On Thu, Mar 12, 2015 at 4:00 PM, Nick Dimiduk ndimi...@gmail.com
 wrote:

 It looks like tenantId is passed on as a jdbc property. So I think
 localhost:2181:/hbase becomes localhost:2181:/hbase;tenantId=abc. At least
 that's what's happening in JDBCUtilTest.

 On Thu, Mar 12, 2015 at 3:24 PM, Vamsi Krishna vamsi.attl...@gmail.com
  wrote:

 Hi,

 Can someone help me understand how to establish a tenant-specific
 connection using Sqlline?

 I see the following documented on Phoenix website, but i'm not sure
 how to do that for Sqlline connection:

 http://phoenix.apache.org/multi-tenancy.html

 For example, a tenant-specific connection is established like this:

 Properties props = new Properties();
 props.setProperty(TenantId, Acme);
 Connection conn = DriverManager.getConnection(localhost, props);

 Thanks,
 Vamsi Attluri.







Re: Phoenix Multitenancy - sqlline tenant-specific connection

2015-03-12 Thread Vamsi Krishna
I got following error when I tried that:

java -cp
/etc/hbase/conf:/usr/hdp/2.2.0.0-2041/phoenix/bin/../phoenix-4.2.0.2.2.0.0-2041-client.jar
-Dlog4j.configuration=file:/usr/hdp/2.2.0.0-2041/phoenix/bin/log4j.properties
sqlline.SqlLine -d org.apache.phoenix.jdbc.PhoenixDriver -u
jdbc:phoenix:localhost:2181:/hbase-unsecure;tenantid=abc -n none -p none
--color=true --fastConnect=false --verbose=true
--isolation=TRANSACTION_READ_COMMITTED

15/03/12 23:48:02 WARN impl.MetricsConfig: Cannot locate configuration:
tried hadoop-metrics2-phoenix.properties,hadoop-metrics2.properties

Error:  (state=,code=0)

sqlline version 1.1.2

Vamsi Attluri.

On Thu, Mar 12, 2015 at 4:00 PM, Nick Dimiduk ndimi...@gmail.com wrote:

 It looks like tenantId is passed on as a jdbc property. So I think
 localhost:2181:/hbase becomes localhost:2181:/hbase;tenantId=abc. At least
 that's what's happening in JDBCUtilTest.

 On Thu, Mar 12, 2015 at 3:24 PM, Vamsi Krishna vamsi.attl...@gmail.com
 wrote:

 Hi,

 Can someone help me understand how to establish a tenant-specific
 connection using Sqlline?

 I see the following documented on Phoenix website, but i'm not sure how
 to do that for Sqlline connection:

 http://phoenix.apache.org/multi-tenancy.html

 For example, a tenant-specific connection is established like this:

 Properties props = new Properties();
 props.setProperty(TenantId, Acme);
 Connection conn = DriverManager.getConnection(localhost, props);

 Thanks,
 Vamsi Attluri.





Re: how to drop SYSTEM.SEQUENCE table to reduce the no. of salt buckets for this table

2015-03-03 Thread Vamsi Krishna
James,

We tried the following steps:
1) Dropped SYSTEM.SEQUENCE table from base
1.1) disable 'SYSTEM.SEQUENCE'
1.2) drop 'SYSTEM.SEQUENCE'
2) Deleted SYSTEM.SEQUENCE meta-data from phoenix system tables
(SYSTEM.CATALOG, SYSTEM.STATS)
2.1) delete from SYSTEM.CATALOG where table_name = 'SEQUENCE' and
table_schem = 'SYSTEM';
2.2) delete from SYSTEM.STATS where physical_name like '%SYSTEM.SEQUENCE%'
or region_name like '%SYSTEM.SEQUENCE%';
3) Set phoenix.sequence.saltBuckets property to 1
4) Restarted all region servers and master server

Observations:
1. After Master  Region servers are restarted and the client is
reconnected I see SYSTEM.SEQUENCE table is re-created in HBase, but not in
Phoenix. Do I need to manually insert the SYSTEM.SEQUENCE table meta-data
in SYSTEM.CATALOG table?
2. When I list HDFS folder '/apps/hbase/data/data/default/SYSTEM.SEQUENCE',
I see it again created 256 regions.

Am I missing any steps above?

Thanks,
Vamsi Attluri.

On Fri, Feb 27, 2015 at 9:51 AM, James Taylor jamestay...@apache.org
wrote:

 I'd recommend dropping the SYSTEM.SEQUENCE table from the HBase shell
 (instead of deleting the folder in HDFS). Everything else sounded
 fine, but make sure to bounce your cluster and restart your clients
 after doing this.

 Thanks,
 James

 On Thu, Feb 26, 2015 at 12:28 PM, Vamsi Krishna vamsi.attl...@gmail.com
 wrote:
  Hi,
 
  From phoenix archives I see that we can drop SYSTEM.SEQUENCE table and
 set
  'phoenix.sequence.saltBuckets' property to '1' to see the SYSTEM.SEQUENCE
  table recreated with 1 salt bucket on cluster restart.
  Reference:
 
 http://mail-archives.apache.org/mod_mbox/incubator-phoenix-user/201412.mbox/%3ccaaf1jdi4svigfnqy0h45pm2yhiqinpbphgj3ov69qb6dfvw...@mail.gmail.com%3E
 
  But, we are not able to drop the SYSTEM.SEQUENCE table.
  We are seeing the following error when we try to drop the table:
  DROP TABLE SYSTEM.SEQUENCE;
  Error: ERROR 1010 (42M01): Not allowed to mutate table.
  tableName=SYSTEM.SEQUENCE (state=42M01,code=1010)
 
  How to drop SYSTEM.SEQUENCE table?
 
  Can we delete default/SYSTEM.SEQUENCE folder under HBase data
 directory on
  HDFS and delete the SYSTEM.SEQUENCE meta data from SYSTEM.CATALOG,
  SYSTEM.STATS tables using below belete queries?
 
  delete from SYSTEM.CATALOG where table_name = 'SEQUENCE' and table_schem
 =
  'SYSTEM';
  delete from SYSTEM.STATS where physical_name = 'SYSTEM.SEQUENCE';
 
  Will this create any other issues?
 
  Thanks,
  Vamsi Attluri.



Composite primary keys

2015-03-03 Thread Krishna
Hi,

How does phoenix store composite primary keys in HBase?
For example, if the primary key is a composite of two columns:
col1 short
col2 integer

Does phoenix concatenate 1 byte short with 4 byte integer to create a 5
byte array to make HBase rowkey?

Please point me to the code that I can refer for details.

Thanks


Re: Composite primary keys

2015-03-03 Thread Krishna
Thanks Jeffrey. Is zero byte char separator used between fixed width
variables? From the text on the website, it looks like separator byte is
used only between variable length data types - if I'm understanding it
correctly.

Our composite row keys are formed by simply concatenating the values
 together, with a zero byte character used as a separator after a variable
 length type.



On Tue, Mar 3, 2015 at 10:32 PM, Jeffrey Zhong jzh...@hortonworks.com
wrote:


 Composite row keys are formed by simply concatenating the values together,
 with a zero byte character used as a separator after a variable length
 type.

 You can check code on PTableImpl#newKey

 On 3/3/15, 10:02 PM, Krishna research...@gmail.com wrote:

 Hi,
 
 How does phoenix store composite primary keys in HBase?
 For example, if the primary key is a composite of two columns:
 col1 short
 col2 integer
 
 Does phoenix concatenate 1 byte short with 4 byte integer to create a 5
 byte array to make HBase rowkey?
 
 Please point me to the code that I can refer for details.
 
 Thanks




PhoenixOutputFormat in MR job

2015-03-01 Thread Krishna
Could someone comment of following questions regarding the usage of
PhoenixOutputFormat in a standalone MR job:

   - Is there a need to compute hash byte in the MR job?
   - Are keys and values stored in BytesWritable before doing a
   context.write(...) in the mapper?


Re: PhoenixOutputFormat in MR job

2015-03-01 Thread Krishna
Ravi, thanks.
If the target table is salted, do I need to compute the leading byte (as i
understand, its a hash value) in the mapper?

On Sunday, March 1, 2015, Ravi Kiran maghamraviki...@gmail.com wrote:

 Hi Krishna,

  I assume you have already taken a look at the example here
 http://phoenix.apache.org/phoenix_mr.html

  Is there a need to compute hash byte in the MR job?
Can you please elaborate a bit more on what hash byte is ?

  Are keys and values stored in BytesWritable before doing a
 context.write(...) in the mapper?
  The Key-values from a mapper to reducer are the usual
 Writable/WritableComparable instances and you can definitely write
 BytesWritable .

 Regards
 Ravi

 On Sun, Mar 1, 2015 at 10:04 PM, Krishna research...@gmail.com
 javascript:_e(%7B%7D,'cvml','research...@gmail.com'); wrote:

 Could someone comment of following questions regarding the usage of
 PhoenixOutputFormat in a standalone MR job:

- Is there a need to compute hash byte in the MR job?
- Are keys and values stored in BytesWritable before doing a
context.write(...) in the mapper?






how to drop SYSTEM.SEQUENCE table to reduce the no. of salt buckets for this table

2015-02-26 Thread Vamsi Krishna
Hi,

From phoenix archives I see that we can drop SYSTEM.SEQUENCE table and
set 'phoenix.sequence.saltBuckets'
property to '1' to see the SYSTEM.SEQUENCE table recreated with 1 salt
bucket on cluster restart.
Reference:
http://mail-archives.apache.org/mod_mbox/incubator-phoenix-user/201412.mbox/%3ccaaf1jdi4svigfnqy0h45pm2yhiqinpbphgj3ov69qb6dfvw...@mail.gmail.com%3E

But, we are not able to drop the SYSTEM.SEQUENCE table.
We are seeing the following error when we try to drop the table:
DROP TABLE SYSTEM.SEQUENCE;
Error: ERROR 1010 (42M01): Not allowed to mutate table.
tableName=SYSTEM.SEQUENCE (state=42M01,code=1010)

How to drop SYSTEM.SEQUENCE table?

Can we delete default/SYSTEM.SEQUENCE folder under HBase data directory
on HDFS and delete the SYSTEM.SEQUENCE meta data from SYSTEM.CATALOG,
SYSTEM.STATS tables using below belete queries?

delete from SYSTEM.CATALOG where table_name = 'SEQUENCE' and table_schem =
'SYSTEM';
delete from SYSTEM.STATS where physical_name = 'SYSTEM.SEQUENCE';

Will this create any other issues?

Thanks,
Vamsi Attluri.


Salt buckets optimization

2015-02-25 Thread Krishna
Are there any recommendations for estimating and optimizing salt buckets
during table creation time? What, if any, are the cons of having high
number (200+) of salt buckets? Is it possible to update salt buckets after
table is created?

Thanks


Phoenix batch insert support

2014-12-19 Thread Vamsi Krishna
Hi,

I'm trying to do a batch insert using MyBatis  Phoenix and I'm ending up
in an exception (org.springframework.jdbc.BadSqlGrammarException:).

-

Here is an example of what I'm doing:

*I have two entities: *
Authors { authorId, firstName, lastName }
Books { bookId, bookTitle, authorId }

*Data:*
Authors: Record-1: 001, john, henry
Books: Record-1: 001, A database primer, 001
Books: Record-2: 002, Building a datawarehouse, 001

*Model object:*
AuthorsBooksModel { authorId, firstName, lastName, booksList {bookId,
bookTitle} }

*Phoenix table (denormalized):*

authorsbooks { authorid, firstname, lastname, bookid, booktitle }
Create script: create table authorsbooks (authorid varchar, firstname
varchar, lastname varchar, bookid varchar, booktitle varchar, constraint
ab_pk primary key(authorid, bookid));

*Query:*
Using MyBatis batching, I create upsert statement from my nested model
object (AuthorsBooksModel) which is passed to phoenix:
upsert into authorsbooks (authorid, firstname, lastname, bookid, booktitle)
values ('001', 'john', 'henry', '001', 'A database primer'), ('001',
'john', 'henry', '002', 'Building a datawarehouse');

The above statement fails in the application with *BadSqlGrammarException*.

*Phoenix command line:*

When I execute the same upsert statement directly at Phoenix console, I see
the following error:

0: jdbc:phoenix:test.abc.def.com,sfdv upsert into authorsbooks (authorid,
firstname, lastname, bookid, booktitle) values ('001', 'john', 'henry',
'001', 'A database primer'), ('001', 'john', 'henry', '002', 'Building a
datawarehouse');

*Error: ERROR 602 (42P00): Syntax error. Missing EOF at line 1, column
136. (state=42P00,code=602)*
-

The same scenario works well with MyBatis - PostgreSQL.
*Create script:* create table authorsbooks (authorid varchar, firstname
varchar, lastname varchar, bookid varchar, booktitle varchar, constraint
ab_pk primary key(authorid, bookid));
*Insert:* insert into authorsbooks (authorid, firstname, lastname, bookid,
booktitle) values ('001', 'john', 'henry', '001', 'A database primer'),
('001', 'john', 'henry', '002', 'Building a datawarehouse');

-

In my realtime application when I denormalize my nested model object I will
have hundreds of thousands of records and executing one statement at a time
from MyBatis to Phoenix is going to cause a network concern.

Can someone please take a look at the above scenario and help me fix it or
suggest me an alternative?

Thanks,
Vamsi Attluri.


Re: Reverse scan

2014-12-02 Thread Krishna
That's great; thanks James, Ted.

On Mon, Dec 1, 2014 at 9:13 PM, James Taylor jamestay...@apache.org wrote:

 Yes, as Ted points out, Phoenix will use a reverse scan to optimize an
 ORDER BY.

 On Mon, Dec 1, 2014 at 7:52 PM, Ted Yu yuzhih...@gmail.com wrote:
  Please take a look at BaseQueryPlan#iterator():
 
  if (OrderBy.REV_ROW_KEY_ORDER_BY.equals(orderBy)) {
 
  ScanUtil.setReversed(scan);
 
  Cheers
 
  On Mon, Dec 1, 2014 at 7:45 PM, Krishna research...@gmail.com wrote:
 
  Hi,
 
  Does Phoenix support reverse scan as explained in HBASE-4811 (
  https://issues.apache.org/jira/browse/HBASE-4811).
 



Issue creating phoenix local index on a HBase table created in a specific namespace

2014-11-11 Thread Vamsi Krishna
Hi,

I'm working with HDP 2.2.

Hadoop: 2.6.0.2.2.0.0-1084

HBase: 0.98.4.2.2.0.0-1084-hadoop2

Phoenix: 4.2

I created namespace 'TEST' in HBase.
I created a table 'TABLE1' in Phoenix under namespace 'TEST' in HBase.
When I try to create a local index on table 'TABLE1', i'm seeing an error.
Please refer to the sequence of events, commands  error in the below
table. You can also find the full stacktrace at the bottom of the email.

Please help me resolve this issue.

  *Hbase shell*

*Phoenix command line interface*

*Command*

*Error*

create_namespace 'TEST'







create table TEST:TABLE1 (col1 varchar primary key, colfam1.col2 varchar,
colfam1.col3 varchar);





create local index TEST:TABLE1INDX2 on TEST:TABLE1(colfam1.col3 desc);

Error: org.apache.hadoop.hbase.NamespaceNotFoundException: _LOCAL_IDX_TEST

Detailed Stacktrace:

*Error: org.apache.hadoop.hbase.NamespaceNotFoundException: _LOCAL_IDX_TEST*

* at
org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3332)*

* at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1781)*

* at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1911)*

* at
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40470)*

* at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)*

* at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)*

* at
org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74)*

* at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)*

* at java.util.concurrent.FutureTask.run(FutureTask.java:262)*

* at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)*

* at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)*

* at java.lang.Thread.run(Thread.java:745) (state=08000,code=101)*

Thanks,
Vamsi.


Aggregation queries on a big dataset are failing

2014-10-02 Thread Krishna
Hi,

Aggregate queries seem to be working fine on smaller datasets but when the
data needs to be aggregated over millions of rows, query fails with
following error stack. I'm running Phoenix 3.1 on HBase 0.94.18. Any help?

Query is something like this:

 select a.customer_id, a.product_id, count(*) from customer as a join
 product as b on a.product_id = b.product_id where b.category = 'retail'
 group by a.customer_id, a.product_id


Caused by: org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hbase.DoNotRetryIOException:
CUSTOMER,\x11\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1412194938071.5602bf7e28a72ad6e3db6257b22e38f8.:
com.google.common.hash.BloomFilter.put(Ljava/lang/Object;)Z
at
org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:73)
at
org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:91)
at
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1333)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.internalOpenScanner(HRegionServer.java:2588)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2556)
at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:354)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1434)
Caused by: java.lang.NoSuchMethodError:
com.google.common.hash.BloomFilter.put(Ljava/lang/Object;)Z
at
org.apache.phoenix.cache.aggcache.SpillMap$MappedByteBufferMap.addElement(SpillMap.java:437)
at org.apache.phoenix.cache.aggcache.SpillMap.put(SpillMap.java:294)
at
org.apache.phoenix.cache.aggcache.SpillManager.spill(SpillManager.java:261)
at
org.apache.phoenix.cache.aggcache.SpillableGroupByCache$1.removeEldestEntry(SpillableGroupByCache.java:190)
at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:431)
at java.util.HashMap.put(HashMap.java:505)
at
org.apache.phoenix.cache.aggcache.SpillableGroupByCache.cache(SpillableGroupByCache.java:249)
at
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(GroupedAggregateRegionObserver.java:384)
at
org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:130)
at
org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:89)
... 8 more

at
org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1012)
at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
at com.sun.proxy.$Proxy6.openScanner(Unknown Source)
at
org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:224)
at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:126)
at
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:42)
at
org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:164)


Re: Aggregation queries on a big dataset are failing

2014-10-02 Thread Krishna
Thanks Sean. I'll try that.
Do you think, this is happening only with large datasets because of the
spills to disk during group by and guava package is used in such
scenarios?

On Thu, Oct 2, 2014 at 3:31 PM, Sean Huo s...@crunchyroll.com wrote:

 You have to upgrade the guava jar on the regionservers. I am using
 guava-12.0.1.jar.

 On Thu, Oct 2, 2014 at 2:51 PM, Krishna research...@gmail.com wrote:

 Hi,

 Aggregate queries seem to be working fine on smaller datasets but when
 the data needs to be aggregated over millions of rows, query fails with
 following error stack. I'm running Phoenix 3.1 on HBase 0.94.18. Any help?

 Query is something like this:

 select a.customer_id, a.product_id, count(*) from customer as a join
 product as b on a.product_id = b.product_id where b.category = 'retail'
 group by a.customer_id, a.product_id


 Caused by: org.apache.hadoop.ipc.RemoteException:
 org.apache.hadoop.hbase.DoNotRetryIOException:
 CUSTOMER,\x11\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1412194938071.5602bf7e28a72ad6e3db6257b22e38f8.:
 com.google.common.hash.BloomFilter.put(Ljava/lang/Object;)Z
 at
 org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:73)
 at
 org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:91)
 at
 org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1333)
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.internalOpenScanner(HRegionServer.java:2588)
 at
 org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2556)
 at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:354)
 at
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1434)
 Caused by: java.lang.NoSuchMethodError:
 com.google.common.hash.BloomFilter.put(Ljava/lang/Object;)Z
 at
 org.apache.phoenix.cache.aggcache.SpillMap$MappedByteBufferMap.addElement(SpillMap.java:437)
 at
 org.apache.phoenix.cache.aggcache.SpillMap.put(SpillMap.java:294)
 at
 org.apache.phoenix.cache.aggcache.SpillManager.spill(SpillManager.java:261)
 at
 org.apache.phoenix.cache.aggcache.SpillableGroupByCache$1.removeEldestEntry(SpillableGroupByCache.java:190)
 at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:431)
 at java.util.HashMap.put(HashMap.java:505)
 at
 org.apache.phoenix.cache.aggcache.SpillableGroupByCache.cache(SpillableGroupByCache.java:249)
 at
 org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.scanUnordered(GroupedAggregateRegionObserver.java:384)
 at
 org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:130)
 at
 org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:89)
 ... 8 more

 at
 org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1012)
 at
 org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
 at com.sun.proxy.$Proxy6.openScanner(Unknown Source)
 at
 org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:224)
 at
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:126)
 at
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:42)
 at
 org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:164)





Re: Recreating SYSTEM.CATALOG metadata

2014-09-30 Thread Krishna
Hi James/Lars,

I did another run of bulk load, backup  restore on a new test cluster but
did not encounter the issue this time. It appears to be some specific issue
with that version of backup. I will pass on the steps, as soon as I can, to
replicate it with a small dataset.

Having said that, it is still beneficial to have some kind of an option in
CREATE TABLE  CREATE INDEX clauses to indicate that underlying table is
already a phoenix table.

Thanks,
Krishna



On Monday, September 29, 2014, lars hofhansl la...@apache.org wrote:

 Not offhand.

 A few guesses/questions for Krihna:
 - does the restore tool restore the exact timestamps? In fact how exactly
 do you backup and restore the tables?
 - is it possible that you updated the CATALOG (via some DDL) while the
 backup was running? (although I think that should be OK)

 - does it restore into a clean(ed) cluster? If cleaned, was the ZK state
 removed?
 - after you restore the two tables, can you scan SYSTEM.CATALOG from an
 HBase shell? Or does that part hang?
 - in the HBase UI, are there any regions in transition? (might be some
 outdated coprocessors, etc, although unlikely)
 - exact same version of Phoenix? (between the Phoenix that wrote the data
 and the one that restored it)
 - same for HBase. Exact same version used for the restore?

 As James said, we need some logs from both the client and the region
 server(s).
 Also if possible, do you have some precise steps to reproduce with a small
 table?

 Lots of questions :)

 Thanks.

 -- Lars


 - Original Message -
 From: James Taylor jamestay...@apache.org javascript:;
 To: user user@phoenix.apache.org javascript:;; lars hofhansl 
 la...@apache.org javascript:;
 Cc:
 Sent: Monday, September 29, 2014 4:26 PM
 Subject: Re: Recreating SYSTEM.CATALOG metadata

 @Lars - any idea why Krishna may run into issues using Phoenix after a
 restore from an HBase backup?




 On Sun, Sep 28, 2014 at 9:00 PM, James Taylor jamestay...@apache.org
 javascript:; wrote:
  Hi Krishna,
  I think that's what we need to figure out - why is Phoenix having
  trouble when you restore the SYSTEM.CATALOG table. Any client or
  server-side exceptions in the logs when it hangs?
  Thanks,
  James
 
  On Sun, Sep 28, 2014 at 6:18 PM, Krishna research...@gmail.com
 javascript:; wrote:
  Hi James,
 
  I did include SYSTEM.CATALOG table in the backup  restore process. I
 dont
  know why sqlline is having trouble using the backup version - it just
 hangs,
  unless catalog table is dropped from hbase shell. I will see if there
 are
  any errors in logs that's causing the behavior.
 
  Regards
  Krishna
 
 
  On Sunday, September 28, 2014, James Taylor jamestay...@apache.org
 javascript:; wrote:
 
  Hi Krishna,
  Any reason why the SYSTEM.CATALOG hbase table isn't restored as well
  from backup? Yes, if you try to re-create the SYSTEM.CATALOG by
  re-issuing your DDL statement, Phoenix won't know that the tables were
  Phoenix tables before, so will try to add the empty key value to each
  row. It's possible that an option could be made to avoid this, but
  it'd be good to understand a bit more why this is needed.
  Thanks,
  James
 
  On Sun, Sep 28, 2014 at 1:56 PM, Krishna research...@gmail.com
 javascript:; wrote:
   Hi,
  
   When I restore hbase from a backup, sqlline gets stuck unless
   SYSTEM.CATALOG
   table is dropped. It is automatically re-created via sqlline.
 However,
   metadata of previously created phoenix tables is lost.
  
   So, to restore the metadata, when a 'CREATE TABLE' statement is
   re-issued,
   Phoenix takes very, very long time to execute which I think, is
 because
   Phoenix is upserting null value for _0 column qualifier again for
 every
   row
   (billions of rows in the table).
  
   Is there a work-around for this by somehow telling Phoenix that the
   underlying table is already a Phoenix table? Using views is not an
   option
   here. If my understanding is wrong, any pointers as to why it takes
 so
   long
   for executing 'CREATE TABLE' statement is appreciated.
  
   Thanks
  
  




Re: Recreating SYSTEM.CATALOG metadata

2014-09-28 Thread Krishna
Hi James,

I did include SYSTEM.CATALOG table in the backup  restore process. I dont
know why sqlline is having trouble using the backup version - it just
hangs, unless catalog table is dropped from hbase shell. I will see if
there are any errors in logs that's causing the behavior.

Regards
Krishna

On Sunday, September 28, 2014, James Taylor jamestay...@apache.org wrote:

 Hi Krishna,
 Any reason why the SYSTEM.CATALOG hbase table isn't restored as well
 from backup? Yes, if you try to re-create the SYSTEM.CATALOG by
 re-issuing your DDL statement, Phoenix won't know that the tables were
 Phoenix tables before, so will try to add the empty key value to each
 row. It's possible that an option could be made to avoid this, but
 it'd be good to understand a bit more why this is needed.
 Thanks,
 James

 On Sun, Sep 28, 2014 at 1:56 PM, Krishna research...@gmail.com
 javascript:; wrote:
  Hi,
 
  When I restore hbase from a backup, sqlline gets stuck unless
 SYSTEM.CATALOG
  table is dropped. It is automatically re-created via sqlline. However,
  metadata of previously created phoenix tables is lost.
 
  So, to restore the metadata, when a 'CREATE TABLE' statement is
 re-issued,
  Phoenix takes very, very long time to execute which I think, is because
  Phoenix is upserting null value for _0 column qualifier again for every
 row
  (billions of rows in the table).
 
  Is there a work-around for this by somehow telling Phoenix that the
  underlying table is already a Phoenix table? Using views is not an option
  here. If my understanding is wrong, any pointers as to why it takes so
 long
  for executing 'CREATE TABLE' statement is appreciated.
 
  Thanks
 
 



Re: Upper limit on SALT_BUCKETS?

2014-09-24 Thread Krishna
Thanks... any plans of raising number of bytes for salt value?


On Wed, Sep 24, 2014 at 10:22 AM, James Taylor jamestay...@apache.org
wrote:

 The salt byte is the first byte in your row key and that's the max
 value for a byte (i.e. it'll be 0-255).

 On Wed, Sep 24, 2014 at 10:12 AM, Krishna research...@gmail.com wrote:
  Hi,
 
  According to Phoenix documentation
 
  Phoenix provides a way to transparently salt the row key with a salting
  byte for a particular table. You need to specify this in table creation
 time
  by specifying a table property “SALT_BUCKETS” with a value from 1 to
 256
 
 
  Is 256 the max value that SALT_BUCKETS can take? If yes, could someone
  explain the reason for this upper bound?
 
  Krishna
 



Re: Upper limit on SALT_BUCKETS?

2014-09-24 Thread Krishna
50 Region Servers for 100 TB such that each RS serves 10 regions (500
regions).

At this stage, we haven't evaluated the impact on query latency when
running with fewer regions, for ex., 50 RS and 250 regions.


On Wed, Sep 24, 2014 at 11:50 AM, James Taylor jamestay...@apache.org
wrote:

 Would you be able to talk about your use case a bit and explain why you'd
 need this to be higher?
 Thanks,
 James


 On Wednesday, September 24, 2014, Krishna research...@gmail.com wrote:

 Thanks... any plans of raising number of bytes for salt value?


 On Wed, Sep 24, 2014 at 10:22 AM, James Taylor jamestay...@apache.org
 wrote:

 The salt byte is the first byte in your row key and that's the max
 value for a byte (i.e. it'll be 0-255).

 On Wed, Sep 24, 2014 at 10:12 AM, Krishna research...@gmail.com wrote:
  Hi,
 
  According to Phoenix documentation
 
  Phoenix provides a way to transparently salt the row key with a
 salting
  byte for a particular table. You need to specify this in table
 creation time
  by specifying a table property “SALT_BUCKETS” with a value from 1 to
 256
 
 
  Is 256 the max value that SALT_BUCKETS can take? If yes, could someone
  explain the reason for this upper bound?
 
  Krishna
 





Phoenix Meetups - Bay Area

2014-09-15 Thread Krishna
Hi, Is anyone aware of Phoenix meetups coming up in the next couple of
months in Bay Area?

Thanks


Re: Mapreduce job

2014-09-11 Thread Krishna
I assume you are referring to the bulk loader. -a option allows you to
pass array delimiter.

On Thursday, September 11, 2014, Flavio Pompermaier pomperma...@okkam.it
wrote:

 Any help about this..?
 What if I save a field as an array? how could I read it from a mapreduce
 job? Is there a separator char to use for splitting or what?

 On Tue, Sep 9, 2014 at 10:36 AM, Flavio Pompermaier pomperma...@okkam.it
 javascript:_e(%7B%7D,'cvml','pomperma...@okkam.it'); wrote:

 Hi to all,

 I'd like to know which is the correct way to run a mapreduce job on a
 table managed by phoenix to put data in another table (always managed by
 Phoenix).
 Is it sufficient to read data contained in column 0 (like 0:id, 0:value)
 and create insert statements in the reducer to put things correctly in the
 output table?
 Should I filter rows containing some special value for ccolumn 0:_0..?

 Best,
 FP




sqlline - Could not set up IO Streams

2014-09-10 Thread Krishna
Hi,

I'm running Phoenix 3.1.0 on AWS using Hadoop 2.2.0 and HBase 0.94.7. When
I run bin/sqlline.py localhost:2181:/hbase, it errors our with
java.io.IOException: Could not set up IO Streams because of
NoSuchMethError.

Following phoenix jars in hbase lib:
phoenix-3.1.0-client-minimal.jar (master node only)
phoenix-core-3.1.0.jar (master + region servers)

Am I missing any other jars?

*Console output:*
14/09/10 23:30:05 INFO client.HConnectionManager$HConnectionImplementation:
getMaster attempt 0 of 14 failed; retrying after sleep of 1000
java.io.IOException: Could not set up IO Streams
...
Caused by: java.lang.NoSuchMethodError:
org.apache.hadoop.net.NetUtils.getInputStream(Ljava/net/Socket;)Ljava/io/InputStream;
at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:439)
...

*Here is the actual command:*
java -cp
.:/home/hadoop/phoenix-3.1.0-bin/hadoop2/bin/../phoenix-3.1.0-client-hadoop2.jar
-Dlog4j.configuration=file:/home/hadoop/phoenix-3.1.0-bin/hadoop2/bin/log4j.properties
sqlline.SqlLine -d org.apache.phoenix.jdbc.PhoenixDriver -u
jdbc:phoenix:localhost:2181 -n none -p none --color=true
--fastConnect=false --verbose=true --isolation=TRANSACTION_READ_COMMITTED