How to verify query timeout is working
I am running 4.2.2 which has a fix for https://issues.apache.org/jira/browse/PHOENIX-1463. I want to verify the query timeout is working as expected before I set to a lower number than the default 10 minutes. But, I can't get a query timeout to occur. Here is what I tried. On client: 1. Pass the phoenix.queryTimeoutMs property when getting a DataSource. I used tomcat-jdbc to manage a set of Phoenix connections. 2. Call setQueryTimeout(n) on the JDBC statement. But, this returns a SQLFeatureNotSupportedException. So I thought maybe it is only a server-side setting. I added the entry below to the hbase-site.xml in the hbase/conf directory. I set it very small and get reducing (now 10ms) to see if the timeout would occur. I restarted HBase after modifying. property namephoenix.query.timeoutMs/name value10/value /property I have a query that typically takes 1735m against an HBase (0.98.5) pseudo-distributed node. But, I cannot trigger a timeout exception. Can someone clarify how the query timeout works and how I can trigger a query timeout? -Jerry
Re: Result set overhead?
Correct me if i'm wrong here but it seems that columns not part of the primary key are stored as a column qualifiers. And each key value in HBase store both the qualifier name and the value. So both qualifier name and value are transferred from region server to client, for all column values and rows. This is especially bad for some of our tables which have 2-3 column keys and 20-25 column values. And the problem gets worse when column value names sometimes are 20-30 characters long. Any suggestions on how to reduce this overhead? Cheers, -Kristoffer On Wed, Dec 17, 2014 at 4:51 PM, Kristoffer Sjögren sto...@gmail.com wrote: Hi I have done some tracing and it seems like _each_ 'select' result contain (redundant?) column names? This cause a lot of overhead when having descriptive column names. Especially when values in these columns are very small. Is this correct? Is it possible to make result sets less chatty? Cheers, -Kristoffer
Re: Result set overhead?
Hi Kristoffer, Yes, you're correct - for a non aggregate, non join query, the underlying result set is backed by the bloated HBase Result and KeyValue/Cell. See PHOENIX-1489 - maybe we can continue the discussion there? Your comments here would be valuable over there too. Thanks, James On Wed, Dec 17, 2014 at 2:43 PM, Kristoffer Sjögren sto...@gmail.com wrote: Correct me if i'm wrong here but it seems that columns not part of the primary key are stored as a column qualifiers. And each key value in HBase store both the qualifier name and the value. So both qualifier name and value are transferred from region server to client, for all column values and rows. This is especially bad for some of our tables which have 2-3 column keys and 20-25 column values. And the problem gets worse when column value names sometimes are 20-30 characters long. Any suggestions on how to reduce this overhead? Cheers, -Kristoffer On Wed, Dec 17, 2014 at 4:51 PM, Kristoffer Sjögren sto...@gmail.com wrote: Hi I have done some tracing and it seems like _each_ 'select' result contain (redundant?) column names? This cause a lot of overhead when having descriptive column names. Especially when values in these columns are very small. Is this correct? Is it possible to make result sets less chatty? Cheers, -Kristoffer
Re: Delta Index
Hej, I may be completely wrong, but this is IMHO not a good approach to sync data. A real solution for an Elasticsearch River would be a hook to the HBase Cluster Replication [1] and not by polling the dataset with select *... However that would be a lot more work, but would be a real push. *Maybe* the river @ github could be made smarter once native HBase Timestamps are made in Phoenix somehow; as each cell in HBase is fully timestamped (and Phoenix doesn't mess with it, so it's always the last UPSERT) this could be an approach. But in contrast to Cluster Replication this is still polling and will be slow on big datasets. [1] http://hbase.apache.org/book.html#cluster_replication Regards On Thu, Dec 18, 2014 at 6:51 AM, Subacini B subac...@gmail.com wrote: Hi, http://lessc0de.github.io/connecting_hbase_to_elasticsearch.html *curl -XPUT 'localhost:9200/_river/phoenix_jdbc_river/_meta' -d '{ type : jdbc, jdbc : { url : jdbc:phoenix:localhost, user : , password : , sql : select * from test.orders } }'* Followed the steps and i am able to successfully index the data to Elasticsearch from Hbase.But if i do updates to records , the changes did not reflect in ElasticSearch. Is it possible to automatically sync updated/changed data? Thanks