Re: Disable NO_CACHE hint on query for LIMIT OFFSET paging queries

2018-08-21 Thread Thomas D'Silva
When you do an OFFSET Phoenix will scan rows and filter them out until it
reaches the offset count which can end up being very costly for large
offsets.
If you can use a RVC where the order matches the PK of the data table or
index the start key of the scan will be set based on the RVC, which is much
more efficient
(see http://phoenix.apache.org/paged.html).

On Tue, Aug 21, 2018 at 8:06 AM, Abhishek Gupta  wrote:

> Hi,
>
> Could you help me understand how LIMIT OFFSET queries work under the hood
> in Phoenix, is the filtering out of rows done in heap or is there some sort
> of optimisation where it can skip at disk level.
> My idea about posting this question was to understand if the rows from
> paste pages of the query in block cache can optimize the subsequent page
> call that would use the cache hits for the filter out rows and not seek at
> disk.
>
> Thanks,
> Abhishek
>
> On Sat, Aug 18, 2018 at 4:16 AM Thomas D'Silva 
> wrote:
>
>> Shouldn't you pass the NO_CACHE hint for the LIMIT-OFFSET queries, since
>> you will be reading and filtering out lots of rows on the server?
>> I guess using the block cache for RVC queries might help depending on how
>> many rows you read per query, you should be able to easily test this out.
>>
>> On Fri, Aug 17, 2018 at 4:25 AM, Abhishek Gupta 
>> wrote:
>>
>>> Hi Team,
>>>
>>> I am working on a use case where SQL aggregated queries are made such
>>> that RVC cannot be used (aggregation on truncated primary key columns)
>>> instead LIMIT-OFFSET has to be used. RVC is used for some user user cases
>>>
>>> Currently I have disabled BLOCKCACHE for the table. I wanted to check if
>>> it would be more performant to instead enable BLOCKCACHE on the table
>>> and pass NO_CACHE hint for RVC queries because it uses non-LIMIT-OFFSET
>>> scans and not pass NO_CACHE for the LIMIT-OFFSET queries so that for
>>> the subsequent page calls can leverage prior page data in block cache.
>>>
>>> Thanks,
>>> Abhishek
>>>
>>
>>


Re: CursorUtil -> mapCursorIDQuery

2018-08-21 Thread Josh Elser

Thanks sir! Will take a look up there and continue.

On 8/21/18 11:48 AM, Jack Steenkamp wrote:

Hi Josh,

I've created https://issues.apache.org/jira/browse/PHOENIX-4860 for this.

Thanks,
On Tue, 21 Aug 2018 at 16:34, Jack Steenkamp  wrote:


Hi Josh,

Glad I could help. The CursorUtil class it seems has not changed since
it was first created as part of PHOENIX-3572 - and is still the same
in master (I checked a bit earlier).

Sure - will have a go at creating a JIRA for this.

Regards,

On Tue, 21 Aug 2018 at 16:23, Josh Elser  wrote:


Hi Jack,

Given your assessment, it sounds like you've stumbled onto a race
condition! Thanks for bringing it to our attention.

A few questions:

* Have you checked if the same code exist in the latest
branches/releases (4.x-HBase-1.{2,3,4} or master)?
* Want to create a Jira issue to track this? Else, I can do this for ya.

On 8/21/18 9:48 AM, Jack Steenkamp wrote:

Hi All,

In my application I make heavy use of Apache Phoenix's Cursors
(https://phoenix.apache.org/cursors.html) - and for the majority of
cases it works great and makes my life a lot easier (thanks for
implementing them). However, in very rare cases (fiendishly rare
cases), I get NullpointerExceptions such as the following:

java.lang.NullPointerException: null
at org.apache.phoenix.util.CursorUtil.updateCursor(CursorUtil.java:179)
at 
org.apache.phoenix.iterate.CursorResultIterator.next(CursorResultIterator.java:46)
at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:779)

(This is for 4.13.1 - but seems that
org.apache.phoenix.util.CursorUtil has not changed much since it was
first created).

Upon closer inspection it would seem that on line 124 of CursorUtil, a
HashMap is used to keep state which is then exposed via a number
static methods, which, one has to assume, can be accessed by many
different threads. Using a plain old HashMap in cases like these have
(at least in my experience) been to blame for rare issues such as
these.

Would you perhaps consider changing the implementation of
mapCursorIDQuery to a ConcurrentHashMap instead? That should hopefully
tighten up the class and prevent any potential inconsistencies.

Regards,

Jack Steenkamp



Re: CursorUtil -> mapCursorIDQuery

2018-08-21 Thread Jack Steenkamp
Hi Josh,

I've created https://issues.apache.org/jira/browse/PHOENIX-4860 for this.

Thanks,
On Tue, 21 Aug 2018 at 16:34, Jack Steenkamp  wrote:
>
> Hi Josh,
>
> Glad I could help. The CursorUtil class it seems has not changed since
> it was first created as part of PHOENIX-3572 - and is still the same
> in master (I checked a bit earlier).
>
> Sure - will have a go at creating a JIRA for this.
>
> Regards,
>
> On Tue, 21 Aug 2018 at 16:23, Josh Elser  wrote:
> >
> > Hi Jack,
> >
> > Given your assessment, it sounds like you've stumbled onto a race
> > condition! Thanks for bringing it to our attention.
> >
> > A few questions:
> >
> > * Have you checked if the same code exist in the latest
> > branches/releases (4.x-HBase-1.{2,3,4} or master)?
> > * Want to create a Jira issue to track this? Else, I can do this for ya.
> >
> > On 8/21/18 9:48 AM, Jack Steenkamp wrote:
> > > Hi All,
> > >
> > > In my application I make heavy use of Apache Phoenix's Cursors
> > > (https://phoenix.apache.org/cursors.html) - and for the majority of
> > > cases it works great and makes my life a lot easier (thanks for
> > > implementing them). However, in very rare cases (fiendishly rare
> > > cases), I get NullpointerExceptions such as the following:
> > >
> > > java.lang.NullPointerException: null
> > > at org.apache.phoenix.util.CursorUtil.updateCursor(CursorUtil.java:179)
> > > at 
> > > org.apache.phoenix.iterate.CursorResultIterator.next(CursorResultIterator.java:46)
> > > at 
> > > org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:779)
> > >
> > > (This is for 4.13.1 - but seems that
> > > org.apache.phoenix.util.CursorUtil has not changed much since it was
> > > first created).
> > >
> > > Upon closer inspection it would seem that on line 124 of CursorUtil, a
> > > HashMap is used to keep state which is then exposed via a number
> > > static methods, which, one has to assume, can be accessed by many
> > > different threads. Using a plain old HashMap in cases like these have
> > > (at least in my experience) been to blame for rare issues such as
> > > these.
> > >
> > > Would you perhaps consider changing the implementation of
> > > mapCursorIDQuery to a ConcurrentHashMap instead? That should hopefully
> > > tighten up the class and prevent any potential inconsistencies.
> > >
> > > Regards,
> > >
> > > Jack Steenkamp
> > >


Re: CursorUtil -> mapCursorIDQuery

2018-08-21 Thread Jack Steenkamp
Hi Josh,

Glad I could help. The CursorUtil class it seems has not changed since
it was first created as part of PHOENIX-3572 - and is still the same
in master (I checked a bit earlier).

Sure - will have a go at creating a JIRA for this.

Regards,

On Tue, 21 Aug 2018 at 16:23, Josh Elser  wrote:
>
> Hi Jack,
>
> Given your assessment, it sounds like you've stumbled onto a race
> condition! Thanks for bringing it to our attention.
>
> A few questions:
>
> * Have you checked if the same code exist in the latest
> branches/releases (4.x-HBase-1.{2,3,4} or master)?
> * Want to create a Jira issue to track this? Else, I can do this for ya.
>
> On 8/21/18 9:48 AM, Jack Steenkamp wrote:
> > Hi All,
> >
> > In my application I make heavy use of Apache Phoenix's Cursors
> > (https://phoenix.apache.org/cursors.html) - and for the majority of
> > cases it works great and makes my life a lot easier (thanks for
> > implementing them). However, in very rare cases (fiendishly rare
> > cases), I get NullpointerExceptions such as the following:
> >
> > java.lang.NullPointerException: null
> > at org.apache.phoenix.util.CursorUtil.updateCursor(CursorUtil.java:179)
> > at 
> > org.apache.phoenix.iterate.CursorResultIterator.next(CursorResultIterator.java:46)
> > at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:779)
> >
> > (This is for 4.13.1 - but seems that
> > org.apache.phoenix.util.CursorUtil has not changed much since it was
> > first created).
> >
> > Upon closer inspection it would seem that on line 124 of CursorUtil, a
> > HashMap is used to keep state which is then exposed via a number
> > static methods, which, one has to assume, can be accessed by many
> > different threads. Using a plain old HashMap in cases like these have
> > (at least in my experience) been to blame for rare issues such as
> > these.
> >
> > Would you perhaps consider changing the implementation of
> > mapCursorIDQuery to a ConcurrentHashMap instead? That should hopefully
> > tighten up the class and prevent any potential inconsistencies.
> >
> > Regards,
> >
> > Jack Steenkamp
> >


Re: CursorUtil -> mapCursorIDQuery

2018-08-21 Thread Josh Elser

Hi Jack,

Given your assessment, it sounds like you've stumbled onto a race 
condition! Thanks for bringing it to our attention.


A few questions:

* Have you checked if the same code exist in the latest 
branches/releases (4.x-HBase-1.{2,3,4} or master)?

* Want to create a Jira issue to track this? Else, I can do this for ya.

On 8/21/18 9:48 AM, Jack Steenkamp wrote:

Hi All,

In my application I make heavy use of Apache Phoenix's Cursors
(https://phoenix.apache.org/cursors.html) - and for the majority of
cases it works great and makes my life a lot easier (thanks for
implementing them). However, in very rare cases (fiendishly rare
cases), I get NullpointerExceptions such as the following:

java.lang.NullPointerException: null
at org.apache.phoenix.util.CursorUtil.updateCursor(CursorUtil.java:179)
at 
org.apache.phoenix.iterate.CursorResultIterator.next(CursorResultIterator.java:46)
at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:779)

(This is for 4.13.1 - but seems that
org.apache.phoenix.util.CursorUtil has not changed much since it was
first created).

Upon closer inspection it would seem that on line 124 of CursorUtil, a
HashMap is used to keep state which is then exposed via a number
static methods, which, one has to assume, can be accessed by many
different threads. Using a plain old HashMap in cases like these have
(at least in my experience) been to blame for rare issues such as
these.

Would you perhaps consider changing the implementation of
mapCursorIDQuery to a ConcurrentHashMap instead? That should hopefully
tighten up the class and prevent any potential inconsistencies.

Regards,

Jack Steenkamp



Re: Disable NO_CACHE hint on query for LIMIT OFFSET paging queries

2018-08-21 Thread Abhishek Gupta
Hi,

Could you help me understand how LIMIT OFFSET queries work under the hood
in Phoenix, is the filtering out of rows done in heap or is there some sort
of optimisation where it can skip at disk level.
My idea about posting this question was to understand if the rows from
paste pages of the query in block cache can optimize the subsequent page
call that would use the cache hits for the filter out rows and not seek at
disk.

Thanks,
Abhishek

On Sat, Aug 18, 2018 at 4:16 AM Thomas D'Silva 
wrote:

> Shouldn't you pass the NO_CACHE hint for the LIMIT-OFFSET queries, since
> you will be reading and filtering out lots of rows on the server?
> I guess using the block cache for RVC queries might help depending on how
> many rows you read per query, you should be able to easily test this out.
>
> On Fri, Aug 17, 2018 at 4:25 AM, Abhishek Gupta 
> wrote:
>
>> Hi Team,
>>
>> I am working on a use case where SQL aggregated queries are made such
>> that RVC cannot be used (aggregation on truncated primary key columns)
>> instead LIMIT-OFFSET has to be used. RVC is used for some user user cases
>>
>> Currently I have disabled BLOCKCACHE for the table. I wanted to check if
>> it would be more performant to instead enable BLOCKCACHE on the table
>> and pass NO_CACHE hint for RVC queries because it uses non-LIMIT-OFFSET
>> scans and not pass NO_CACHE for the LIMIT-OFFSET queries so that for the
>> subsequent page calls can leverage prior page data in block cache.
>>
>> Thanks,
>> Abhishek
>>
>
>


Re: Phoenix CsvBulkLoadTool fails with java.sql.SQLException: ERROR 103 (08004): Unable to establish connection

2018-08-21 Thread Josh Elser

Btw, this is covered in the HBase book:

http://hbase.apache.org/book.html#hadoop

The reality is that HBase 2.x will work with Hadoop 3. The "unsupported" 
tag is more expressing that it not ready for "production".


On 8/21/18 2:23 AM, Jaanai Zhang wrote:
Caused by: java.lang.IllegalAccessError: class 
org.apache.hadoop.hdfs.web.HftpFileSystem cannot access its 
superinterface 
org.apache.hadoop.hdfs.web.TokenAspect$TokenManagementDelegator


This is the root cause,  it seems that HBase 1.2 can't access interface 
of Hadoop 3.1, so you should consider degrading Hadoop's version or 
upgrading HBase's version.




    Yun Zhang
    Best regards!


2018-08-21 11:28 GMT+08:00 Mich Talebzadeh >:


Hi,

The Hadoop version is Hadoop 3.1.0. Hbase is 1.2.6 and Phoenix is
apache-phoenix-4.8.1-HBase-1.2-bin

In the past I had issues with Hbase 2 working with Hadoop 3.1 so I
had to use Hbase 1.2.6. The individual components work fine. In
other words I can do all operations on Hbase with Hadoop 3.1 and
Phoenix.

The issue I am facing is using both
org.apache.phoenix.mapreduce.CsvBulkLoadTool and
hbase.mapreduce.ImportTsv utilities.

So I presume the issue may be to do with both these command line
tools not working with Hadoop 3.1?

Thanks

Dr Mich Talebzadeh

LinkedIn

/https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

/

http://talebzadehmich.wordpress.com



*Disclaimer:* Use it at your own risk.Any and all responsibility for
any loss, damage or destruction of data or any other property which
may arise from relying on this email's technical content is
explicitly disclaimed. The author will in no case be liable for any
monetary damages arising from such loss, damage or destruction.



On Tue, 21 Aug 2018 at 00:48, Sergey Soldatov
mailto:sergey.solda...@gmail.com>> wrote:

If I read it correctly you are trying to use Phoenix and HBase
that were built against Hadoop 2 with Hadoop 3. Is HBase was the
only component you have upgraded?

Thanks,
Sergey

On Mon, Aug 20, 2018 at 1:42 PM Mich Talebzadeh
mailto:mich.talebza...@gmail.com>>
wrote:

Here you go

2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper:
Client
environment:java.library.path=/home/hduser/hadoop-3.1.0/lib
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper:
Client environment:java.io.tmpdir=/tmp
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper:
Client environment:java.compiler=
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper:
Client environment:os.name =Linux
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper:
Client environment:os.arch=amd64
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper:
Client environment:os.version=3.10.0-862.3.2.el7.x86_64
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper:
Client environment:user.name =hduser
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper:
Client environment:user.home=/home/hduser
2018-08-20 18:29:47,248 INFO  [main] zookeeper.ZooKeeper:
Client
environment:user.dir=/data6/hduser/streaming_data/2018-08-20
2018-08-20 18:29:47,249 INFO  [main] zookeeper.ZooKeeper:
Initiating client connection, connectString=rhes75:2181
sessionTimeout=9 watcher=hconnection-0x493d44230x0,
quorum=rhes75:2181, baseZNode=/hbase
2018-08-20 18:29:47,261 INFO  [main-SendThread(rhes75:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
rhes75/50.140.197.220:2181 .
Will not attempt to authenticate using SASL (unknown error)
2018-08-20 18:29:47,264 INFO  [main-SendThread(rhes75:2181)]
zookeeper.ClientCnxn: Socket connection established to
rhes75/50.140.197.220:2181 ,
initiating session
2018-08-20 18:29:47,281 INFO  [main-SendThread(rhes75:2181)]
zookeeper.ClientCnxn: Session establishment complete on
server rhes75/50.140.197.220:2181
, sessionid = 0x1002ea99eed0077,
negotiated timeout = 4
Exception in thread "main" java.sql.SQLException: ERROR 103
(08004): Unable to establish connection.
     at

org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQ

CursorUtil -> mapCursorIDQuery

2018-08-21 Thread Jack Steenkamp
Hi All,

In my application I make heavy use of Apache Phoenix's Cursors
(https://phoenix.apache.org/cursors.html) - and for the majority of
cases it works great and makes my life a lot easier (thanks for
implementing them). However, in very rare cases (fiendishly rare
cases), I get NullpointerExceptions such as the following:

java.lang.NullPointerException: null
at org.apache.phoenix.util.CursorUtil.updateCursor(CursorUtil.java:179)
at 
org.apache.phoenix.iterate.CursorResultIterator.next(CursorResultIterator.java:46)
at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:779)

(This is for 4.13.1 - but seems that
org.apache.phoenix.util.CursorUtil has not changed much since it was
first created).

Upon closer inspection it would seem that on line 124 of CursorUtil, a
HashMap is used to keep state which is then exposed via a number
static methods, which, one has to assume, can be accessed by many
different threads. Using a plain old HashMap in cases like these have
(at least in my experience) been to blame for rare issues such as
these.

Would you perhaps consider changing the implementation of
mapCursorIDQuery to a ConcurrentHashMap instead? That should hopefully
tighten up the class and prevent any potential inconsistencies.

Regards,

Jack Steenkamp