Re: HBase Timeout on queries

2018-02-05 Thread Pedro Boado
Flavio I get same behaviour, a count(*) over 180M records needs a couple of
minutes to complete for a table with 10 regions and 4 rs serving it.

Why are you evaluating robustness in terms of full scans? As Anil said I
wouldn't expect a NoSQL database to run quick counts on hundreds of
millions or even billions of records.

In terms of usage we have a production  Phoenix cluster with 12 RS serving
a table with ~100 billion records (6TB)  - . Queries always scan by first
column of our primary key, meaning no more than a few thousand records are
pulled in well under a second response time.


On 1 Feb 2018 16:38, "James Taylor"  wrote:

I don’t think the HBase row_counter job is going to be faster than a
count(*) query. Both require a full table scan, so neither will be
particularly fast.

A couple of alternatives if you’re ok with an approximate count: 1) enable
stats collection (but you can leave off usage to parallelize queries) and
the do a SUM over the size column for the table using stats table directly,
or 2) do a count(*) using TABLESAMPLE clause (again enabling stats as
described above) to prevent a full scan.

On Thu, Feb 1, 2018 at 8:11 AM Flavio Pompermaier 
wrote:

> Hi Anil,
> Obviously I'm not using HBase just for the count query..Most of the time I
> do INSERT and selective queries, I was just trying to figure out if my
> HBase + Phoenix installation is robust enough to deal with a huge amount of
> data..
>
> On Thu, Feb 1, 2018 at 5:07 PM, anil gupta  wrote:
>
>> Hey Flavio,
>>
>> IMHO, If most of your app is just doing full table scans then i am not
>> really sure HBase(or any other NoSql) will be a good fit for your
>> solution.(building an OLAP system?) If you have point lookups and short
>> range scans then HBase/Phoenix will work well.
>> Also, if you wanna do select count(*). The HBase row_counter job will be
>> much faster than phoenix queries.
>>
>> Thanks,
>> Anil Gupta
>>
>> On Thu, Feb 1, 2018 at 7:35 AM, Flavio Pompermaier 
>> wrote:
>>
>>> I was able to make it work changing the following params (both on server
>>> and client side and restarting hbase) and now the query answers in about 6
>>> minutes:
>>>
>>> hbase.rpc.timeout (to 60)
>>> phoenix.query.timeoutMs (to 60)
>>> hbase.client.scanner.timeout.period (from 1 m to 10m)
>>> hbase.regionserver.lease.period (from 1 m to 10m)
>>>
>>> However I'd like to know id those performances could be easily improved
>>> or not. Any ideas?
>>>
>>> On Thu, Feb 1, 2018 at 4:30 PM, Vaghawan Ojha 
>>> wrote:
>>>
 I've the same problem, even after I increased the hbase.rpc.timeout the
 result is same. The difference is that I use 4.12.


 On Thu, Feb 1, 2018 at 8:23 PM, Flavio Pompermaier <
 pomperma...@okkam.it> wrote:

> Hi to all,
> I'm trying to use the brand-new Phoenix 4.13.2-cdh5.11.2 over HBase
> and everything was fine until the data was quite small (about few
> millions). As I inserted 170 M of rows in my table I cannot get the row
> count anymore (using ELECT COUNT) because of 
> org.apache.hbase.ipc.CallTimeoutException
> (operationTimeout 6 expired).
> How can I fix this problem? I could increase the hbase.rpc.timeout
> parameter but I suspect I could improve a little bit the HBase performance
> first..the problem is that I don't know how.
>
> Thanks in advance,
> Flavio
>


>>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>


Re: Apache Phoenix integration

2018-02-05 Thread Flavio Pompermaier
Hi all,
in the hope of helping many other enthusiastic users of Apache Phoneix and
Apache Drill, I've just finished to create a dedicated github repository
[1] with all the instruction about how to modify the current Apache Drill
1.12.0 in order to make it work with Phoenix 4.13.2 on Cloudera CDH 5.11.2.
I've used this version because it is the latest stable one available on
Maven Central containing some important fix about namespaces[2].
In the README I've also reported some known issues I've encountered but
that probably need some further work in the Apache Drill code base.
Since I use Drill just to sample tables this could be enough for me...

Looking forward for the deeper integration provided by Drillix (dreaming in
a common effort to integrate its benefint also into the official Drill
version)!

Best,
Flavio

[1] https://github.com/okkam-it/drill-phoenix-integration
[2] https://issues.apache.org/jira/browse/PHOENIX-4523


On Fri, Feb 2, 2018 at 7:21 PM, Kunal Khatua  wrote:

> That's great, Flavio!
>
> You can create a Google doc for review and share it on the user list.
>
> @Bridget handles the documentation on the Apache website, so she can do
> the final touches and help it find a home on the website.
>
> -Original Message-
> From: Flavio Pompermaier [mailto:pomperma...@okkam.it]
> Sent: Friday, February 02, 2018 9:04 AM
> To: u...@drill.apache.org
> Cc: James Taylor 
> Subject: Re: Apache Phoenix integration
>
> Eventually I made it to integrate Phoenix with Drill! I debugged remotely
> the drill-embedded via Eclipse and I discovered that the problem was that
> you need some extra jars to make it work!
> Where can I write some documentation about debugging remotely Drill from
> Eclipse and integration with Drill?
>
> On Fri, Feb 2, 2018 at 5:28 PM, Flavio Pompermaier 
> wrote:
>
> > What is the fastest way to debug the JDBC plugin from Eclipse? I don't
> > see anything in the logs that could help...
> > Is it possible to connect directly to the external embedded drill
> > running on my machine if I enable jmx?
> > it seems that the JDBC connection is established correctly but Drill
> > throws an Exception (that is not well unwrapped by Jersey):
> >
> > 2018-02-02 16:54:04,520 [qtp159619134-56] INFO
> > o.a.p.q.ConnectionQueryServicesImpl
> > - HConnection established. Stacktrace for informational purposes:
> > hconnection-0x1b9fe9f8
> > java.lang.Thread.getStackTrace(Thread.java:1552)
> > org.apache.phoenix.util.LogUtil.getCallerStackTrace(LogUtil.java:55)
> > org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(
> > ConnectionQueryServicesImpl.java:410)
> > org.apache.phoenix.query.ConnectionQueryServicesImpl.access$400(
> > ConnectionQueryServicesImpl.java:256)
> > org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(
> > ConnectionQueryServicesImpl.java:2408)
> > org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(
> > ConnectionQueryServicesImpl.java:2384)
> > org.apache.phoenix.util.PhoenixContextExecutor.call(
> > PhoenixContextExecutor.java:76)
> > org.apache.phoenix.query.ConnectionQueryServicesImpl.init(
> > ConnectionQueryServicesImpl.java:2384)
> > org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(
> > PhoenixDriver.java:255)
> > org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(
> > PhoenixEmbeddedDriver.java:150)
> > org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)
> > org.apache.commons.dbcp.DriverConnectionFactory.createConnection(
> > DriverConnectionFactory.java:38)
> > org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(
> > PoolableConnectionFactory.java:582)
> > org.apache.commons.dbcp.BasicDataSource.validateConnectionFactory(
> > BasicDataSource.java:1556)
> > org.apache.commons.dbcp.BasicDataSource.createPoolableConnectionFactor
> > y(BasicDataSource.java:1545)
> > org.apache.commons.dbcp.BasicDataSource.createDataSource(
> > BasicDataSource.java:1388)
> > org.apache.commons.dbcp.BasicDataSource.getConnection(
> > BasicDataSource.java:1044)
> > org.apache.calcite.adapter.jdbc.JdbcUtils$DialectPool.
> > get(JdbcUtils.java:73)
> > org.apache.calcite.adapter.jdbc.JdbcSchema.createDialect(
> > JdbcSchema.java:138)
> > org.apache.drill.exec.store.jdbc.JdbcStoragePlugin.(
> > JdbcStoragePlugin.java:103)
> > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> > sun.reflect.NativeConstructorAccessorImpl.newInstance(
> > NativeConstructorAccessorImpl.java:62)
> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
> > DelegatingConstructorAccessorImpl.java:45)
> > java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> > org.apache.drill.exec.store.StoragePluginRegistryImpl.create(
> > StoragePluginRegistryImpl.java:346)
> > org.apache.drill.exec.store.StoragePluginRegistryImpl.createOrUpdate(
> > StoragePluginRegistryImpl.java:239)
> > 

reading from a table gets stuck after deleting

2018-02-05 Thread Paolo Tomeo
Hi all,

I'm getting this strange problem with Phoenix 4.7 and HBase 1.0.
Let's say I write a spark dataframe with some millions of rows in a HBase
table with Phoenix. Then I remove many of them, let's say one half, from a
spark job that uses PhoenixConnection or using a DB client (DBeaver). After
that I can read the remaining rows with the DB client but my application cannot
do it anymore. It remains in pending for hours without response.
I made different tries. The problem happens only after a spark job delete a
lot or all the rows.
Do you have any idea about this problem?

Thanks a lot,
Paolo