How to verify query timeout is working

2014-12-17 Thread Chabot, Jerry
I am running 4.2.2 which has a fix for 
https://issues.apache.org/jira/browse/PHOENIX-1463. I want to verify the query 
timeout is working as expected before I set to a lower number than the default 
10 minutes. But, I can't get a query timeout to occur. Here is what I tried.

On client:


1.  Pass the phoenix.queryTimeoutMs property when getting a DataSource. I 
used tomcat-jdbc to manage a set of Phoenix connections.

2.  Call setQueryTimeout(n) on the JDBC statement. But, this returns a  
SQLFeatureNotSupportedException.

So I thought maybe it is only a server-side setting. I added the entry below to 
the hbase-site.xml in the hbase/conf directory. I set it very small and get 
reducing (now 10ms) to see if the timeout would occur. I restarted HBase after 
modifying.

property
 namephoenix.query.timeoutMs/name
 value10/value
  /property

I have a query that typically takes 1735m against an HBase (0.98.5) 
pseudo-distributed node. But,  I cannot trigger a timeout exception.

Can someone clarify how the query timeout works and how I can trigger a query 
timeout?

-Jerry




Re: Result set overhead?

2014-12-17 Thread Kristoffer Sjögren
Correct me if i'm wrong here but it seems that columns not part of the
primary key are stored as a column qualifiers. And each key value in HBase
store both the qualifier name and the value. So both qualifier name and
value are transferred from region server to client, for all column values
and rows.

This is especially bad for some of our tables which have 2-3 column keys
and 20-25 column values. And the problem gets worse when column value names
sometimes are 20-30 characters long.

Any suggestions on how to reduce this overhead?

Cheers,
-Kristoffer




On Wed, Dec 17, 2014 at 4:51 PM, Kristoffer Sjögren sto...@gmail.com
wrote:

 Hi

 I have done some tracing and it seems like _each_ 'select' result contain
 (redundant?) column names? This cause a lot of overhead when having
 descriptive column names. Especially when values in these columns are very
 small.

 Is this correct? Is it possible to make result sets less chatty?

 Cheers,
 -Kristoffer




Re: Result set overhead?

2014-12-17 Thread James Taylor
Hi Kristoffer,
Yes, you're correct - for a non aggregate, non join query, the
underlying result set is backed by the bloated HBase Result and
KeyValue/Cell. See PHOENIX-1489 - maybe we can continue the discussion
there? Your comments here would be valuable over there too.
Thanks,
James

On Wed, Dec 17, 2014 at 2:43 PM, Kristoffer Sjögren sto...@gmail.com wrote:
 Correct me if i'm wrong here but it seems that columns not part of the
 primary key are stored as a column qualifiers. And each key value in HBase
 store both the qualifier name and the value. So both qualifier name and
 value are transferred from region server to client, for all column values
 and rows.

 This is especially bad for some of our tables which have 2-3 column keys and
 20-25 column values. And the problem gets worse when column value names
 sometimes are 20-30 characters long.

 Any suggestions on how to reduce this overhead?

 Cheers,
 -Kristoffer




 On Wed, Dec 17, 2014 at 4:51 PM, Kristoffer Sjögren sto...@gmail.com
 wrote:

 Hi

 I have done some tracing and it seems like _each_ 'select' result contain
 (redundant?) column names? This cause a lot of overhead when having
 descriptive column names. Especially when values in these columns are very
 small.

 Is this correct? Is it possible to make result sets less chatty?

 Cheers,
 -Kristoffer




Re: Delta Index

2014-12-17 Thread Dominik Wagenknecht
Hej,

I may be completely wrong, but this is IMHO not a good approach to sync
data. A real solution for an Elasticsearch River would be a hook to the
HBase Cluster Replication [1] and not by polling the dataset with select
*... However that would be a lot more work, but would be a real push.

*Maybe* the river @ github could be made smarter once native HBase
Timestamps are made in Phoenix somehow; as each cell in HBase is fully
timestamped (and Phoenix doesn't mess with it, so it's always the last
UPSERT) this could be an approach. But in contrast to Cluster Replication
this is still polling and will be slow on big datasets.

[1] http://hbase.apache.org/book.html#cluster_replication

Regards


On Thu, Dec 18, 2014 at 6:51 AM, Subacini B subac...@gmail.com wrote:

 Hi,

 http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html

 *curl -XPUT 'localhost:9200/_river/phoenix_jdbc_river/_meta' -d '{ type
 : jdbc, jdbc : { url : jdbc:phoenix:localhost, user : ,
 password : , sql : select * from test.orders } }'*

 Followed the steps and i am able to successfully index the data to
 Elasticsearch from Hbase.But if i do updates to  records , the changes did
 not reflect  in ElasticSearch. Is it possible to automatically sync
 updated/changed data?

 Thanks