UNSUBSCRIBE

2020-09-10 Thread Michael Young
UNSUBSCRIBE


Hyperthreading and system performance

2017-10-02 Thread Michael Young
Does anyone have any info on whether or not hyperthreading will help (or
hamper) Phoenix Hbase cluster performance?

What are the factors which determine if hyperthreading should be enabled on
a given cluster?  I just want a very high level idea about this since
hardware configurations can vary in so many ways.

Thanks,
-Michael


Re: Best split policy for wide distribution of table sizes

2017-08-03 Thread Michael Young
Just so I'm clear, did you mean "hbase.master.loadbalance.bytable' value to
true in hbase-site.xml" ?

On Thu, Aug 3, 2017 at 12:32 PM, Josh Mahonin <jmaho...@gmail.com> wrote:

> Hi Michael,
>
> This is more of an HBase question than Phoenix specific, and you may get
> better feedback from the hbase-users list, but...
>
> At $DAYJOB, we've run into a similar "lumpy" data distribution issues,
> which are particularly noticeable in smaller / under-provisioned
> environments. It's not necessarily recommended, and there are likely more
> elegant solutions [1], but we've found that setting the
> 'hbase.master.loadbalance.bytable' value to hbase-site.xml has been
> effective in some environments. [2]
>
> Good luck!
>
> Josh
>
> [1] http://apache-hbase.679495.n3.nabble.com/balance-
> the-tables-across-region-servers-td4066249.html
> [2] https://community.hortonworks.com/questions/65208/hbase-balancer-at-a-
> table-level.html
>
> On Thu, Aug 3, 2017 at 2:09 PM, Michael Young <yomaiq...@gmail.com> wrote:
>
>> We have several phoenix tables which vary quite a bit in size. Namely, we
>> have around 10-15 tables which contain perhaps 6-10x more data than the
>> other 50 tables.
>>
>> The default split policy is currently used, and the count of regions
>> across the clusters is uniform.  However, we noticed some tables have more
>> regions concentrated on some nodes, presumably to keep the total count of
>> regions constant.  This seems to negatively impact query performance for
>> our largest data tables.
>>
>> We tested using the ConstantSizeSplitPolicy, to have the region data
>> sizes be better balanced, and the queries seem to behave somewhat better.
>>
>> Is this a good approach or does anyone have a more appropriate solution?
>> We don't want to implement a custom split policy but are more than willing
>> to try other available split policies, or other config tuning.
>>
>> Thanks,
>> Michael Young
>>
>>
>


Re: Getting too many open files during table scan

2017-06-23 Thread Michael Young
Sergey, thanks for this tip!

Since our client data volume varies a lot from site to site, would
splitting only on the first letters of the client_id lead to some regions
being much larger than others?
Or does phoenix distribute fairly across the different region servers?

Would this continue to work as need client_id's are added to the cluster,
or does splitting need to be done again manually somehow?

As a follow up question, is there a way to get hbase to balance the regions
based on their size rather than the region count?

Cheers,
Michael

On Fri, Jun 23, 2017 at 12:38 PM, Sergey Soldatov <sergeysolda...@gmail.com>
wrote:

> You may check "Are there any tips for optimizing Phoenix?" section of
> Apache Phoenix FAQ at https://phoenix.apache.org/faq.html. It says how to
> pre-split table. In your case you may split on the first letters of
> client_id.
>
> When we are talking about monotonous data, we usually mean the primary key
> only. For example if we have primary key integer ID and writing something
> with auto increment ID, all the data will go to a single region, creating a
> hot spot there. In this case (and actually only in this case) salting may
> be useful, since it adds an additional random byte in front of primary key,
> giving us a chance to distribute the write load across the cluster. In all
> other cases salting causes more work on the cluster since we will be unable
> to do a single point lookup/range scan by primary key and need to make
> lookup for all salting keys + pk.
>
> Thanks,
> Sergey
>
>
> On Fri, Jun 23, 2017 at 12:00 PM, Michael Young <yomaiq...@gmail.com>
> wrote:
>
>> >>Don't you have any other column which is obligatory in queries during
>> reading but not monotonous with ingestion?
>> We have several columns used in typical query WHERE clauses (like
>> userID='abc' or a specific user attributes, data types). However, there are
>> a number of columns which are monotonous with many rows having the same
>> value.
>>
>> We have tried running after update STATISTICS on tables, but that would
>> be worth investigating again.
>>
>> Can you give me a hint how to pre-split the data?
>>
>> Let's say we have the following PK columns (all varchar except dt=date):
>> client_id,dt (date),rule_id,user_id,attribute_1,attribute_2,rule_name,
>> browser_type,device_type,os_type,page,group_name,period
>>
>> and non-PK columns in the same table
>> requests,connections,queues,queue_time
>>
>> What would be the suggested way to pre-split?  I'm not familiar with this
>> technique beyond very simple use cases.
>>
>> Thanks!
>>
>> On Thu, Jun 22, 2017 at 11:31 PM, Ankit Singhal <ankitsingha...@gmail.com
>> > wrote:
>>
>>> bq. A leading date column is in our schema model:-
>>> Don't you have any other column which is obligatory in queries during
>>> reading but not monotonous with ingestion? As pre-split can help you
>>> avoiding hot-spotting.
>>> For parallelism/performance comparison, have you tried running a query
>>> on a non-salted table after updating the stats and comparing performance
>>> with a salted table?
>>>
>>>
>>> On Fri, Jun 23, 2017 at 9:49 AM, Michael Young <yomaiq...@gmail.com>
>>> wrote:
>>>
>>>> We started with no salt buckets, but the performance was terrible in
>>>> our testing.
>>>>
>>>> A leading date column is in our schema model.  We don't seem to be
>>>> getting hotspotting after salting.  Date range scans are very common as are
>>>> slice and dice on many dimension columns.
>>>>
>>>> We have tested with a range of SALT values from 0 to 120 for bulk
>>>> loading, upserts, selects at different concurrent load levels on a test
>>>> cluster before moving to production (with some tweaking post-production).
>>>> However, we had fewer average regions per RS during the testing.  The
>>>> larger SALT numbers definitely gave overall better performance on our
>>>> predominantly read-heavy environment.
>>>>
>>>> I appreciate any insights to identify bottlenecks.
>>>>
>>>> On Thu, Jun 22, 2017 at 6:26 PM, James Taylor <jamestay...@apache.org>
>>>> wrote:
>>>>
>>>>> My recommendation: don't use salt buckets unless you have a
>>>>> monatomically increasing row key, for example one that leads with the
>>>>> current date/time. Otherwise you'll be putting more load (# of salt 
>>>>> buckets
>>>>> more load wor

Re: Getting too many open files during table scan

2017-06-23 Thread Michael Young
>>Don't you have any other column which is obligatory in queries during
reading but not monotonous with ingestion?
We have several columns used in typical query WHERE clauses (like
userID='abc' or a specific user attributes, data types). However, there are
a number of columns which are monotonous with many rows having the same
value.

We have tried running after update STATISTICS on tables, but that would be
worth investigating again.

Can you give me a hint how to pre-split the data?

Let's say we have the following PK columns (all varchar except dt=date):
client_id,dt
(date),rule_id,user_id,attribute_1,attribute_2,rule_name,browser_type,device_type,os_type,page,group_name,period

and non-PK columns in the same table
requests,connections,queues,queue_time

What would be the suggested way to pre-split?  I'm not familiar with this
technique beyond very simple use cases.

Thanks!

On Thu, Jun 22, 2017 at 11:31 PM, Ankit Singhal <ankitsingha...@gmail.com>
wrote:

> bq. A leading date column is in our schema model:-
> Don't you have any other column which is obligatory in queries during
> reading but not monotonous with ingestion? As pre-split can help you
> avoiding hot-spotting.
> For parallelism/performance comparison, have you tried running a query on
> a non-salted table after updating the stats and comparing performance with
> a salted table?
>
>
> On Fri, Jun 23, 2017 at 9:49 AM, Michael Young <yomaiq...@gmail.com>
> wrote:
>
>> We started with no salt buckets, but the performance was terrible in our
>> testing.
>>
>> A leading date column is in our schema model.  We don't seem to be
>> getting hotspotting after salting.  Date range scans are very common as are
>> slice and dice on many dimension columns.
>>
>> We have tested with a range of SALT values from 0 to 120 for bulk
>> loading, upserts, selects at different concurrent load levels on a test
>> cluster before moving to production (with some tweaking post-production).
>> However, we had fewer average regions per RS during the testing.  The
>> larger SALT numbers definitely gave overall better performance on our
>> predominantly read-heavy environment.
>>
>> I appreciate any insights to identify bottlenecks.
>>
>> On Thu, Jun 22, 2017 at 6:26 PM, James Taylor <jamestay...@apache.org>
>> wrote:
>>
>>> My recommendation: don't use salt buckets unless you have a
>>> monatomically increasing row key, for example one that leads with the
>>> current date/time. Otherwise you'll be putting more load (# of salt buckets
>>> more load worst case) for bread-and-butter small-range-scan Phoenix queries.
>>>
>>> Thanks,
>>> James
>>>
>>> On Fri, Jun 23, 2017 at 10:06 AM Michael Young <yomaiq...@gmail.com>
>>> wrote:
>>>
>>>> The ulimit open files was only 1024 for the user executing the query.
>>>> After increasing, the queries behaves better.
>>>>
>>>> How can we tell if we need to reduce/increase the number of salt
>>>> buckets?
>>>>
>>>> Our team set this based on read/write performance using data volume and
>>>> expected queries to be run by users.
>>>>
>>>> However, now it seems the performance has degraded.  We can recreate
>>>> the schemas using fewer/more buckets and reload the data, but I haven't
>>>> seen a hard and fast rule for setting the number of buckets.
>>>>
>>>> We have 12 data nodes, 4 SSDs per node, 128 GB Ram per node, 24 core w/
>>>> hyperthreading (HDP 2.5 running, hbase is primary service).
>>>> and 800+ regions per RS (seems high)
>>>>
>>>> Any orientation on this would be greatly appreciated.
>>>>
>>>>
>>>> On Tue, Jun 20, 2017 at 11:54 AM, Josh Elser <josh.el...@gmail.com>
>>>> wrote:
>>>>
>>>>> I think this is more of an issue of your 78 salt buckets than the
>>>>> width of your table. Each chunk, running in parallel, is spilling
>>>>> incremental counts to disk.
>>>>>
>>>>> I'd check your ulimit settings on the node which you run this query
>>>>> from and try to increase the number of open files allowed before going 
>>>>> into
>>>>> this one in more depth :)
>>>>>
>>>>>
>>>>> On 6/16/17 2:31 PM, Michael Young wrote:
>>>>>
>>>>>>
>>>>>> We are running a 13-node hbase cluster.  One table uses 78 SALT
>>>>>> BUCKETS which seems to work reasonable well f

Re: Getting too many open files during table scan

2017-06-22 Thread Michael Young
We started with no salt buckets, but the performance was terrible in our
testing.

A leading date column is in our schema model.  We don't seem to be getting
hotspotting after salting.  Date range scans are very common as are slice
and dice on many dimension columns.

We have tested with a range of SALT values from 0 to 120 for bulk loading,
upserts, selects at different concurrent load levels on a test cluster
before moving to production (with some tweaking post-production).  However,
we had fewer average regions per RS during the testing.  The larger SALT
numbers definitely gave overall better performance on our predominantly
read-heavy environment.

I appreciate any insights to identify bottlenecks.

On Thu, Jun 22, 2017 at 6:26 PM, James Taylor <jamestay...@apache.org>
wrote:

> My recommendation: don't use salt buckets unless you have a monatomically
> increasing row key, for example one that leads with the current date/time.
> Otherwise you'll be putting more load (# of salt buckets more load worst
> case) for bread-and-butter small-range-scan Phoenix queries.
>
> Thanks,
> James
>
> On Fri, Jun 23, 2017 at 10:06 AM Michael Young <yomaiq...@gmail.com>
> wrote:
>
>> The ulimit open files was only 1024 for the user executing the query.
>> After increasing, the queries behaves better.
>>
>> How can we tell if we need to reduce/increase the number of salt buckets?
>>
>> Our team set this based on read/write performance using data volume and
>> expected queries to be run by users.
>>
>> However, now it seems the performance has degraded.  We can recreate the
>> schemas using fewer/more buckets and reload the data, but I haven't seen a
>> hard and fast rule for setting the number of buckets.
>>
>> We have 12 data nodes, 4 SSDs per node, 128 GB Ram per node, 24 core w/
>> hyperthreading (HDP 2.5 running, hbase is primary service).
>> and 800+ regions per RS (seems high)
>>
>> Any orientation on this would be greatly appreciated.
>>
>>
>> On Tue, Jun 20, 2017 at 11:54 AM, Josh Elser <josh.el...@gmail.com>
>> wrote:
>>
>>> I think this is more of an issue of your 78 salt buckets than the width
>>> of your table. Each chunk, running in parallel, is spilling incremental
>>> counts to disk.
>>>
>>> I'd check your ulimit settings on the node which you run this query from
>>> and try to increase the number of open files allowed before going into this
>>> one in more depth :)
>>>
>>>
>>> On 6/16/17 2:31 PM, Michael Young wrote:
>>>
>>>>
>>>> We are running a 13-node hbase cluster.  One table uses 78 SALT BUCKETS
>>>> which seems to work reasonable well for both read and write.  This table
>>>> has 130 columns with a PK having 30 columns (fairly wide table).
>>>>
>>>> However, after adding several new tables we are seeing errors about too
>>>> many open files when running a full table scan.
>>>>
>>>>
>>>> Caused by: org.apache.phoenix.exception.PhoenixIOException: Too many
>>>> open files
>>>>  at org.apache.phoenix.util.ServerUtil.parseServerException(
>>>> ServerUtil.java:111)
>>>>  at org.apache.phoenix.iterate.SpoolingResultIterator.(
>>>> SpoolingResultIterator.java:152)
>>>>  at org.apache.phoenix.iterate.SpoolingResultIterator.(
>>>> SpoolingResultIterator.java:84)
>>>>  at org.apache.phoenix.iterate.SpoolingResultIterator.(
>>>> SpoolingResultIterator.java:63)
>>>>  at org.apache.phoenix.iterate.SpoolingResultIterator$
>>>> SpoolingResultIteratorFactory.newIterator(SpoolingResultIterator.java:
>>>> 79)
>>>>  at org.apache.phoenix.iterate.ParallelIterators$1.call(
>>>> ParallelIterators.java:112)
>>>>  at org.apache.phoenix.iterate.ParallelIterators$1.call(
>>>> ParallelIterators.java:103)
>>>>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>>  at org.apache.phoenix.job.JobManager$
>>>> InstrumentedJobFutureTask.run(JobManager.java:183)
>>>>  at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>>> ThreadPoolExecutor.java:1142)
>>>>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>>> ThreadPoolExecutor.java:617)
>>>>  at java.lang.Thread.run(Thread.java:745)
>>>> Caused by: java.io.IOException: Too many open files
>>>>  at java.io.UnixFileSystem.createFileExclusively(N

Re: Getting too many open files during table scan

2017-06-22 Thread Michael Young
The ulimit open files was only 1024 for the user executing the query.
After increasing, the queries behaves better.

How can we tell if we need to reduce/increase the number of salt buckets?

Our team set this based on read/write performance using data volume and
expected queries to be run by users.

However, now it seems the performance has degraded.  We can recreate the
schemas using fewer/more buckets and reload the data, but I haven't seen a
hard and fast rule for setting the number of buckets.

We have 12 data nodes, 4 SSDs per node, 128 GB Ram per node, 24 core w/
hyperthreading (HDP 2.5 running, hbase is primary service).
and 800+ regions per RS (seems high)

Any orientation on this would be greatly appreciated.


On Tue, Jun 20, 2017 at 11:54 AM, Josh Elser <josh.el...@gmail.com> wrote:

> I think this is more of an issue of your 78 salt buckets than the width of
> your table. Each chunk, running in parallel, is spilling incremental counts
> to disk.
>
> I'd check your ulimit settings on the node which you run this query from
> and try to increase the number of open files allowed before going into this
> one in more depth :)
>
>
> On 6/16/17 2:31 PM, Michael Young wrote:
>
>>
>> We are running a 13-node hbase cluster.  One table uses 78 SALT BUCKETS
>> which seems to work reasonable well for both read and write.  This table
>> has 130 columns with a PK having 30 columns (fairly wide table).
>>
>> However, after adding several new tables we are seeing errors about too
>> many open files when running a full table scan.
>>
>>
>> Caused by: org.apache.phoenix.exception.PhoenixIOException: Too many
>> open files
>>  at org.apache.phoenix.util.ServerUtil.parseServerException(Serv
>> erUtil.java:111)
>>  at org.apache.phoenix.iterate.SpoolingResultIterator.(Spo
>> olingResultIterator.java:152)
>>  at org.apache.phoenix.iterate.SpoolingResultIterator.(Spo
>> olingResultIterator.java:84)
>>  at org.apache.phoenix.iterate.SpoolingResultIterator.(Spo
>> olingResultIterator.java:63)
>>  at org.apache.phoenix.iterate.SpoolingResultIterator$SpoolingRe
>> sultIteratorFactory.newIterator(SpoolingResultIterator.java:79)
>>  at org.apache.phoenix.iterate.ParallelIterators$1.call(Parallel
>> Iterators.java:112)
>>  at org.apache.phoenix.iterate.ParallelIterators$1.call(Parallel
>> Iterators.java:103)
>>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>  at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.
>> run(JobManager.java:183)
>>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>>  at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.io.IOException: Too many open files
>>  at java.io.UnixFileSystem.createFileExclusively(Native Method)
>>  at java.io.File.createTempFile(File.java:2024)
>>  at org.apache.phoenix.shaded.org.apache.commons.io.output.Defer
>> redFileOutputStream.thresholdReached(DeferredFileOutputStream.java:176)
>>  at org.apache.phoenix.iterate.SpoolingResultIterator$1.threshol
>> dReached(SpoolingResultIterator.java:116)
>>  at org.apache.phoenix.shaded.org.apache.commons.io.output.Thres
>> holdingOutputStream.checkThreshold(ThresholdingOutputStream.java:224)
>>  at org.apache.phoenix.shaded.org.apache.commons.io.output.Thres
>> holdingOutputStream.write(ThresholdingOutputStream.java:92)
>>  at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
>>  at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.
>> java:273)
>>  at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.
>> java:253)
>>  at org.apache.phoenix.util.TupleUtil.write(TupleUtil.java:149)
>>  at org.apache.phoenix.iterate.SpoolingResultIterator.(Spo
>> olingResultIterator.java:127)
>>  ... 10 more
>>
>>
>> When running an explain plan:
>> explain select count(1) from MYBIGTABLE
>>
>> +---
>> ---+
>> |   PLAN
>>  |
>> +---
>> ---+
>> | CLIENT 8728-CHUNK 674830174 ROWS 2721056772632 BYTES PARALLEL 78-WAY
>> FULL SCAN OVER ATT.PRE_E

Re: Best strategy for UPSERT SELECT in large table

2017-06-19 Thread Michael Young
Regarding your idea to use the snapshot/restore method (with a new name).
Is it possible to add a PK column with that approach?  For example, if I
wanted to change a PK column type from VARCHAR to FLOAT, is this possible?



On Sun, Jun 18, 2017 at 10:50 AM, Jonathan Leech  wrote:

> Also, if you're updating that many values and not doing it in bulk /
> mapreduce / straight to hfiles, you'll want to give the region servers as
> much heap as possible, set store files and blocking store files
> astronomically high, and set the memory size for the table before Hbase
> flushes to disk as large as possible. This is to avoid compactions slowing
> you down and causing timeouts. You can also break up the upsert selects
> into smaller chunks and manually compact in between to mitigate. The above
> strategy also applies for other large updates in the regular Hbase write
> path, such as building or rebuilding indexes.
>
> > On Jun 18, 2017, at 11:41 AM, Jonathan Leech  wrote:
> >
> > Another thing to consider, but only if your 1:1 mapping keeps the
> primary keys the same, is to snapshot the table and restore it with the new
> name, and a schema that is the union of the old and new schemas. I would
> put the new columns in a new column family. Then use upsert select,
> mapreduce, or Spark to transform the data, then drop the columns from the
> old schema. This strategy could cut the amount of work to be done by half
> and not send data over the network.
> >
> >> On Jun 17, 2017, at 5:06 PM, Randy Hu  wrote:
> >>
> >> If I count the number of tailing zeros correctly, it's 15 billion
> records,
> >> any solution based on HBase PUT interaction (UPSERT SELECT) would
> probably
> >> take way more time than your expectation. It would be better to use the
> >> map/reduce based bulk importer provided by Phoenix:
> >>
> >> https://phoenix.apache.org/bulk_dataload.html
> >>
> >> The importer leverages HBase bulk mode to convert all data into HBase
> >> storage file, then hand it over to HBase in the final stage, thus avoids
> >> all network and disk random access cost when going through HBase region
> >> servers.
> >>
> >> Randy
> >>
> >> On Fri, Jun 16, 2017 at 9:51 AM, Pedro Boado [via Apache Phoenix User
> List]
> >>  wrote:
> >>
> >>> Hi guys,
> >>>
> >>> We are trying to populate a Phoenix table based on a 1:1 projection of
> >>> another table with around 15.000.000.000 records via an UPSERT SELECT
> in
> >>> phoenix client. We've noticed a very poor performance ( I suspect the
> >>> client is using a single-threaded approach ) and lots of issues with
> client
> >>> timeouts.
> >>>
> >>> Is there a better way of approaching this problem?
> >>>
> >>> Cheers!
> >>> Pedro
> >>>
> >>>
> >>> --
> >>> If you reply to this email, your message will be added to the
> discussion
> >>> below:
> >>> http://apache-phoenix-user-list.1124778.n5.nabble.com/
> >>> Best-strategy-for-UPSERT-SELECT-in-large-table-tp3675.html
> >>> To start a new topic under Apache Phoenix User List, email
> >>> ml+s1124778n1...@n5.nabble.com
> >>> To unsubscribe from Apache Phoenix User List, click here
> >>>  template/NamlServlet.jtp?macro=unsubscribe_by_code=1=
> cnV3ZWloQGdtYWlsLmNvbXwxfC04OTI3ODY3NTc=>
> >>> .
> >>> NAML
> >>>  template/NamlServlet.jtp?macro=macro_viewer=instant_
> html%21nabble%3Aemail.naml=nabble.naml.namespaces.
> BasicNamespace-nabble.view.web.template.NabbleNamespace-
> nabble.view.web.template.NodeNamespace=
> notify_subscribers%21nabble%3Aemail.naml-instant_emails%
> 21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
> >>>
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context: http://apache-phoenix-user-
> list.1124778.n5.nabble.com/Best-strategy-for-UPSERT-SELECT-in-large-table-
> tp3675p3683.html
> >> Sent from the Apache Phoenix User List mailing list archive at
> Nabble.com.
>


Re: Phoenix Upgrade compatibility

2017-06-16 Thread Michael Young
Updating my question here:

I found some info from a post on May 19th under (Upgrade Phoenix version on
HDP 2.4).

It seems for those using HDP 2.6, we are stuck with Phoenix 4.7 until the
next Hortonworks release. :(

To any Hortonworks people out there: Is there any way we could do a full
install of Phoenix 4.10 (essentially from scratch, not a rolling upgrade or
just replacing the jars), and get it to work in HDP 2.6?  I would think so,
but we don't want to live too much on the edge, just enough to not wait too
long. :)

Michael


On Fri, Jun 16, 2017 at 5:01 PM, Michael Young <yomaiq...@gmail.com> wrote:

> According to https://phoenix.apache.org/upgrading.html
>
> "Phoenix maintains backward compatibility across at least two minor
> releases to allow for *no downtime* through server-side rolling restarts
> upon upgrading. See below for details"
>
> We would like to upgrade from version 4.7 to a newer version.
>
> Any recommendations on which version we should get (latest?) and if we
> should do an incremental upgrade or move directly to the latest 4.10?
>
> Thanks,
> Michael
>
>


Phoenix Upgrade compatibility

2017-06-16 Thread Michael Young
According to https://phoenix.apache.org/upgrading.html

"Phoenix maintains backward compatibility across at least two minor
releases to allow for *no downtime* through server-side rolling restarts
upon upgrading. See below for details"

We would like to upgrade from version 4.7 to a newer version.

Any recommendations on which version we should get (latest?) and if we
should do an incremental upgrade or move directly to the latest 4.10?

Thanks,
Michael


Getting too many open files during table scan

2017-06-16 Thread Michael Young
We are running a 13-node hbase cluster.  One table uses 78 SALT BUCKETS
which seems to work reasonable well for both read and write.  This table
has 130 columns with a PK having 30 columns (fairly wide table).

However, after adding several new tables we are seeing errors about too
many open files when running a full table scan.


Caused by: org.apache.phoenix.exception.PhoenixIOException: Too many open
files
at
org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:111)
at
org.apache.phoenix.iterate.SpoolingResultIterator.(SpoolingResultIterator.java:152)
at
org.apache.phoenix.iterate.SpoolingResultIterator.(SpoolingResultIterator.java:84)
at
org.apache.phoenix.iterate.SpoolingResultIterator.(SpoolingResultIterator.java:63)
at
org.apache.phoenix.iterate.SpoolingResultIterator$SpoolingResultIteratorFactory.newIterator(SpoolingResultIterator.java:79)
at
org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:112)
at
org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:103)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:183)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(File.java:2024)
at
org.apache.phoenix.shaded.org.apache.commons.io.output.DeferredFileOutputStream.thresholdReached(DeferredFileOutputStream.java:176)
at
org.apache.phoenix.iterate.SpoolingResultIterator$1.thresholdReached(SpoolingResultIterator.java:116)
at
org.apache.phoenix.shaded.org.apache.commons.io.output.ThresholdingOutputStream.checkThreshold(ThresholdingOutputStream.java:224)
at
org.apache.phoenix.shaded.org.apache.commons.io.output.ThresholdingOutputStream.write(ThresholdingOutputStream.java:92)
at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
at
org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:273)
at
org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:253)
at org.apache.phoenix.util.TupleUtil.write(TupleUtil.java:149)
at
org.apache.phoenix.iterate.SpoolingResultIterator.(SpoolingResultIterator.java:127)
... 10 more


When running an explain plan:
explain select count(1) from MYBIGTABLE

+--+
|
PLAN   |
+--+
| CLIENT 8728-CHUNK 674830174 ROWS 2721056772632 BYTES PARALLEL 78-WAY FULL
SCAN OVER ATT.PRE_ENG_CONVERSION_OLAP  |
| ROW TIMESTAMP FILTER [0,
9223372036854775807)
|
| SERVER FILTER BY FIRST KEY
ONLY
|
| SERVER AGGREGATE INTO SINGLE
ROW
|
+--+

I has a lot of chunks.  Normally this query would return at least some
result after running for a few minutes.  With appropriate filters in the
WHERE clause, the queries run fine.

Any suggestions on how to avoid this error and get better performance from
the table scans?  Realizing that we don't need to run full table scans
regularly, just trying to understand better best practices for Phoenix
Hbase.

Thank you,
Michael


Getting negative timestamp on table joins

2017-06-13 Thread Michael Young
I'm getting the following error when joining a table with its cloned copy.


select oldtable.dt from OLD_QTABLE as oldtable INNER JOIN NEW_QTABLE as
newtable ON oldtable.dt=newtable.dt  group by oldtable.dt;

java.lang.IllegalArgumentException: Timestamp cannot be negative.
minStamp:-4239669727594509121, maxStamp:3190689910792567145
at org.apache.hadoop.hbase.io.TimeRange.(TimeRange.java:76)
at
org.apache.phoenix.compile.ScanRanges.getAscTimeRange(ScanRanges.java:658)
at
org.apache.phoenix.compile.ScanRanges.getRowTimestampColumnRange(ScanRanges.java:629)
at org.apache.phoenix.compile.ScanRanges.create(ScanRanges.java:79)
at
org.apache.phoenix.compile.WhereOptimizer.pushKeyExpressionsToScan(WhereOptimizer.java:305)
at
org.apache.phoenix.compile.WhereOptimizer.pushKeyExpressionsToScan(WhereOptimizer.java:104)
at
org.apache.phoenix.compile.WhereOptimizer.getKeyExpressionCombination(WhereOptimizer.java:379)
at
org.apache.phoenix.compile.QueryCompiler.getKeyExpressionCombinations(QueryCompiler.java:463)
at
org.apache.phoenix.compile.QueryCompiler.compileJoinQuery(QueryCompiler.java:317)
at
org.apache.phoenix.compile.JoinCompiler.optimize(JoinCompiler.java:1163)
at
org.apache.phoenix.compile.QueryCompiler.compileSelect(QueryCompiler.java:197)
at
org.apache.phoenix.compile.QueryCompiler.compile(QueryCompiler.java:160)



NEW_QTABLE is a copy of QTABLE.  I created a snapshot of QTABLE and cloned
snapshot using:

disable 'QTABLE'
snapshot 'QTABLE', 'NEW_QTABLE'
clone_snapshot 'QTABLE_SNAPSHOT', 'NEW_QTABLE'
delete_snapshot 'QTABLE_SNAPSHOT'
enable 'QTABLE'

I then recreate the NEW_QTABLE in Phoenix using the same ddl with the new
table name.

/usr/hdp/current/phoenix-client/bin/sqlline.py zkhost:2181:/hbase-unsecure

0: jdbc:phoenix:zkhost:2181:/hbase-unsecur> CREATE TABLE IF NOT EXISTS
NEW_QTABLE (... etc...);

Note: I did not use the CurrentSCN parameter on connection.

Any ideas how to fix this?

Cheers,
Michael


Re: Renaming table schema in hbase

2017-06-09 Thread Michael Young
BTW, I'm using version phoenix-4.7.0.2.5.0.0-1245

On Fri, Jun 9, 2017 at 12:58 PM, Michael Young <yomaiq...@gmail.com> wrote:

> Thanks, Ankit!
>
> I have an issue when connecting to sqlline.py using the 
> CurrentSCN=149091840
> (3/31/2017)
>
> I get >> java.sql.SQLException: ERROR 1010 (42M01): Not allowed to mutate
> table.
>
> I use TS=1 day prior to last timestamp in row data (
> timestamp=149100480 (4/1/2017)) and my data has only a single day's
> worth of data.
>
> However, interestingly I got no error at all when just connecting without
> this parameter.
>
> Running 'describe' on my table in Hbase I get the following:
>
> Table MyTable is ENABLED
> MyTable, {TABLE_ATTRIBUTES => {coprocessor$1 => '|org.apache.phoenix.
> coprocessor.ScanRegionObserver|805306366|', copr
> ocessor$2 => '|org.apache.phoenix.coprocessor.
> UngroupedAggregateRegionObserver|805306366|', coprocessor$3 =>
> '|org.apache.phoenix.copro
> cessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 =>
> '|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|80530636
> 6|', coprocessor$5 => '|org.apache.phoenix.hbase.
> index.Indexer|805306366|org.apache.hadoop.hbase.index.
> codec.class=org.apache.phoenix.i
> ndex.PhoenixIndexCodec,index.builder=org.apache.phoenix.
> index.PhoenixIndexBuilder'}
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'M', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false',
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_
> DIFF', TTL => 'FOREVER', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0',
> BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>
>  '0'}
>
> Is it OK if I CREATE the table without this CurrentSCN parameter?
>
> The data appears to be there and I can query it, so it seems like it's
> working.
>
> Thanks,
> Michael
>
> On Thu, Jun 8, 2017 at 5:04 PM, Ankit Singhal <ankitsingha...@gmail.com>
> wrote:
>
>> Hi Michael,
>>
>> bq. "exact DDL" Does this mean including all qualifiers like
>> COMPRESSION='SNAPPY', SALT_BUCKETS=64, ... etc...? If we don't use the
>> exact DDL, will Phoenix not behave correctly?
>>  yes, Phoenix will not behave well if properties affecting the write path
>> are not consistent during the read time, SALT_BUCKETS=64 is an important
>> one.
>>
>> bq.if we want to add or remove IMMUTABLE_ROWS=true from the CREATE
>> statement?
>> Yes, you can ignore IMMUTABLE_ROWS=true or alter table to mutable later.
>>
>> {code}
>> ALTER TABLE table SET IMMUTABLE_ROWS=false
>> {code}
>>
>> bq. Also, is it correct to assume that the "1 day less" timestamp is
>> just so we use a timestamp prior to the CurrentSCN?
>> Using CurrentSCN less than the oldest timestamp of data during DDL will
>> just avoid adding empty KV for each row. As you are renaming the
>> existing Phoenix table, these empty KV will already be there and you will
>> just save time by doing so.
>>
>> Regards,
>> Ankit Singhal
>>
>> On Thu, Jun 8, 2017 at 1:34 PM, Michael Young <yomaiq...@gmail.com>
>> wrote:
>>
>>> I have a doubt about step 2 from Ankit Singhal's response in
>>> http://apache-phoenix-user-list.1124778.n5.nabble.com/Phoeni
>>> x-4-4-Rename-table-Supported-td1781.html
>>>
>>> He says:
>>>
>>> 2. Open phoenix connection at timestamp 1 day less than the oldest data
>>> in your tables ( by specifying ts in CurrentSCN ./sqlline.py
>>> "localhost;CurrentSCN=") and create table with the exact DDL used
>>> for old table but with the table name changed to new table
>>>
>>> "exact DDL" Does this mean including all qualifiers like
>>> COMPRESSION='SNAPPY', SALT_BUCKETS=64, ... etc...?
>>>
>>> If we don't use the exact DDL, will Phoenix not behave correctly?  Is it
>>> possible to add or change from that list, for example if we want to add or
>>> remove IMMUTABLE_ROWS=true from the CREATE statement?
>>>
>>> Also, is it correct to assume that the "1 day less" timestamp is just so
>>> we use a timestamp prior to the CurrentSCN?
>>>
>>>
>>> On Wed, Jun 7, 2017 at 5:06 PM, Michael Young <yomaiq...@gmail.com>
>>> wrote:
>>>
>>>> ah haafter some googling I found some info from Ankit, copied below
>>>>
>>>> Looks like I was missing step 2.  I don't know how to get the CurrentSCN.
>>>>
>>>> Can someone help me with this?
>>>&g

Re: Renaming table schema in hbase

2017-06-09 Thread Michael Young
Thanks, Ankit!

I have an issue when connecting to sqlline.py using the
CurrentSCN=149091840
(3/31/2017)

I get >> java.sql.SQLException: ERROR 1010 (42M01): Not allowed to mutate
table.

I use TS=1 day prior to last timestamp in row data (
timestamp=149100480 (4/1/2017)) and my data has only a single day's
worth of data.

However, interestingly I got no error at all when just connecting without
this parameter.

Running 'describe' on my table in Hbase I get the following:

Table MyTable is ENABLED
MyTable, {TABLE_ATTRIBUTES => {coprocessor$1 =>
'|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', copr
ocessor$2 =>
'|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|',
coprocessor$3 => '|org.apache.phoenix.copro
cessor.GroupedAggregateRegionObserver|805306366|', coprocessor$4 =>
'|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|80530636
6|', coprocessor$5 =>
'|org.apache.phoenix.hbase.index.Indexer|805306366|org.apache.hadoop.hbase.index.codec.class=org.apache.phoenix.i
ndex.PhoenixIndexCodec,index.builder=org.apache.phoenix.index.PhoenixIndexBuilder'}
COLUMN FAMILIES DESCRIPTION
{NAME => 'M', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false',
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_
DIFF', TTL => 'FOREVER', COMPRESSION => 'SNAPPY', MIN_VERSIONS => '0',
BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE =>
 '0'}

Is it OK if I CREATE the table without this CurrentSCN parameter?

The data appears to be there and I can query it, so it seems like it's
working.

Thanks,
Michael

On Thu, Jun 8, 2017 at 5:04 PM, Ankit Singhal <ankitsingha...@gmail.com>
wrote:

> Hi Michael,
>
> bq. "exact DDL" Does this mean including all qualifiers like
> COMPRESSION='SNAPPY', SALT_BUCKETS=64, ... etc...? If we don't use the
> exact DDL, will Phoenix not behave correctly?
>  yes, Phoenix will not behave well if properties affecting the write path
> are not consistent during the read time, SALT_BUCKETS=64 is an important
> one.
>
> bq.if we want to add or remove IMMUTABLE_ROWS=true from the CREATE
> statement?
> Yes, you can ignore IMMUTABLE_ROWS=true or alter table to mutable later.
>
> {code}
> ALTER TABLE table SET IMMUTABLE_ROWS=false
> {code}
>
> bq. Also, is it correct to assume that the "1 day less" timestamp is just
> so we use a timestamp prior to the CurrentSCN?
> Using CurrentSCN less than the oldest timestamp of data during DDL will
> just avoid adding empty KV for each row. As you are renaming the existing
> Phoenix table, these empty KV will already be there and you will just save
> time by doing so.
>
> Regards,
> Ankit Singhal
>
> On Thu, Jun 8, 2017 at 1:34 PM, Michael Young <yomaiq...@gmail.com> wrote:
>
>> I have a doubt about step 2 from Ankit Singhal's response in
>> http://apache-phoenix-user-list.1124778.n5.nabble.com/Phoeni
>> x-4-4-Rename-table-Supported-td1781.html
>>
>> He says:
>>
>> 2. Open phoenix connection at timestamp 1 day less than the oldest data
>> in your tables ( by specifying ts in CurrentSCN ./sqlline.py
>> "localhost;CurrentSCN=") and create table with the exact DDL used
>> for old table but with the table name changed to new table
>>
>> "exact DDL" Does this mean including all qualifiers like
>> COMPRESSION='SNAPPY', SALT_BUCKETS=64, ... etc...?
>>
>> If we don't use the exact DDL, will Phoenix not behave correctly?  Is it
>> possible to add or change from that list, for example if we want to add or
>> remove IMMUTABLE_ROWS=true from the CREATE statement?
>>
>> Also, is it correct to assume that the "1 day less" timestamp is just so
>> we use a timestamp prior to the CurrentSCN?
>>
>>
>> On Wed, Jun 7, 2017 at 5:06 PM, Michael Young <yomaiq...@gmail.com>
>> wrote:
>>
>>> ah haafter some googling I found some info from Ankit, copied below
>>>
>>> Looks like I was missing step 2.  I don't know how to get the CurrentSCN.
>>>
>>> Can someone help me with this?
>>>
>>> == From 
>>> http://apache-phoenix-user-list.1124778.n5.nabble.com/Phoenix-4-4-Rename-table-Supported-td1781.html
>>>  
>>>
>>> Currently there is no sql construct but you can do it by following below
>>> steps.(It is highly recommended you try these steps in dev environment
>>> before proceeding to production.
>>>
>>> 1. Take snapshot of the original table from hbase shell and restore it
>>> with another table name.
>>>
>>> hbase> disable 'oldtablename'
>>>
>>> hbase> snap

Re: Renaming table schema in hbase

2017-06-08 Thread Michael Young
I have a doubt about step 2 from Ankit Singhal's response in
http://apache-phoenix-user-list.1124778.n5.nabble.com/Phoenix-4-4-Rename-table-Supported-td1781.html

He says:

2. Open phoenix connection at timestamp 1 day less than the oldest data in
your tables ( by specifying ts in CurrentSCN ./sqlline.py
"localhost;CurrentSCN=") and create table with the exact DDL used for
old table but with the table name changed to new table

"exact DDL" Does this mean including all qualifiers like
COMPRESSION='SNAPPY', SALT_BUCKETS=64, ... etc...?

If we don't use the exact DDL, will Phoenix not behave correctly?  Is it
possible to add or change from that list, for example if we want to add or
remove IMMUTABLE_ROWS=true from the CREATE statement?

Also, is it correct to assume that the "1 day less" timestamp is just so we
use a timestamp prior to the CurrentSCN?


On Wed, Jun 7, 2017 at 5:06 PM, Michael Young <yomaiq...@gmail.com> wrote:

> ah haafter some googling I found some info from Ankit, copied below
>
> Looks like I was missing step 2.  I don't know how to get the CurrentSCN.
>
> Can someone help me with this?
>
> == From 
> http://apache-phoenix-user-list.1124778.n5.nabble.com/Phoenix-4-4-Rename-table-Supported-td1781.html
>  
>
> Currently there is no sql construct but you can do it by following below
> steps.(It is highly recommended you try these steps in dev environment
> before proceeding to production.
>
> 1. Take snapshot of the original table from hbase shell and restore it
> with another table name.
>
> hbase> disable 'oldtablename'
>
> hbase> snapshot 'oldtablename', 'oldtablename_Snapshot'
>
> hbase> clone_snapshot 'oldtablename_Snapshot', 'newTableName'
>
> 2. Open phoenix connection at timestamp 1 day less than the oldest data in
> your tables ( by specifying ts in CurrentSCN ./sqlline.py
> "localhost;CurrentSCN=") and create table with the exact DDL used for
> old table but with the table name changed to new table.
>
> 3. confirm that your new table is working fine as expected .
> 4. Then drop the old table from phoenix and snapshot from hbase shell.
>
> hbase> delete_snapshot 'oldtablename_Snapshot'
>
>
>
> On Wed, Jun 7, 2017 at 4:58 PM, Michael Young <yomaiq...@gmail.com> wrote:
>
>> It is possible to rename a table in Hbase, but it doesn't appear to be
>> recognized by Phoenix.
>>
>> I use the approach documented for HBase:
>>
>> disable 'MySchema.TABLE1'
>> snapshot 'MySchema.TABLE1', 'MySchema.TABLE1_SNAPSHOT'
>> clone_snapshot 'MySchema.TABLE1_SNAPSHOT', NewSchema.TABLE1'
>> delete_snapshot 'MySchema.TABLE1_SNAPSHOT'
>> #drop 'MySchema.TABLE1' - let's test before dropping
>>
>> However, when testing this Phoenix doesn't seem to see the new cloned
>> table and there is nothing in the SYSTEM.CATALOG for it, even after
>> restarting the phoenix client.
>>
>> Should this mechanism work in Phoenix or is there another way to rename
>> or clone an existing table?
>>
>> Michael
>>
>>
>


Re: Renaming table schema in hbase

2017-06-07 Thread Michael Young
ah haafter some googling I found some info from Ankit, copied below

Looks like I was missing step 2.  I don't know how to get the CurrentSCN.

Can someone help me with this?

== From 
http://apache-phoenix-user-list.1124778.n5.nabble.com/Phoenix-4-4-Rename-table-Supported-td1781.html


Currently there is no sql construct but you can do it by following below
steps.(It is highly recommended you try these steps in dev environment
before proceeding to production.

1. Take snapshot of the original table from hbase shell and restore it with
another table name.

hbase> disable 'oldtablename'

hbase> snapshot 'oldtablename', 'oldtablename_Snapshot'

hbase> clone_snapshot 'oldtablename_Snapshot', 'newTableName'

2. Open phoenix connection at timestamp 1 day less than the oldest data in
your tables ( by specifying ts in CurrentSCN ./sqlline.py
"localhost;CurrentSCN=") and create table with the exact DDL used for
old table but with the table name changed to new table.

3. confirm that your new table is working fine as expected .
4. Then drop the old table from phoenix and snapshot from hbase shell.

hbase> delete_snapshot 'oldtablename_Snapshot'



On Wed, Jun 7, 2017 at 4:58 PM, Michael Young <yomaiq...@gmail.com> wrote:

> It is possible to rename a table in Hbase, but it doesn't appear to be
> recognized by Phoenix.
>
> I use the approach documented for HBase:
>
> disable 'MySchema.TABLE1'
> snapshot 'MySchema.TABLE1', 'MySchema.TABLE1_SNAPSHOT'
> clone_snapshot 'MySchema.TABLE1_SNAPSHOT', NewSchema.TABLE1'
> delete_snapshot 'MySchema.TABLE1_SNAPSHOT'
> #drop 'MySchema.TABLE1' - let's test before dropping
>
> However, when testing this Phoenix doesn't seem to see the new cloned
> table and there is nothing in the SYSTEM.CATALOG for it, even after
> restarting the phoenix client.
>
> Should this mechanism work in Phoenix or is there another way to rename or
> clone an existing table?
>
> Michael
>
>


Renaming table schema in hbase

2017-06-07 Thread Michael Young
It is possible to rename a table in Hbase, but it doesn't appear to be
recognized by Phoenix.

I use the approach documented for HBase:

disable 'MySchema.TABLE1'
snapshot 'MySchema.TABLE1', 'MySchema.TABLE1_SNAPSHOT'
clone_snapshot 'MySchema.TABLE1_SNAPSHOT', NewSchema.TABLE1'
delete_snapshot 'MySchema.TABLE1_SNAPSHOT'
#drop 'MySchema.TABLE1' - let's test before dropping

However, when testing this Phoenix doesn't seem to see the new cloned table
and there is nothing in the SYSTEM.CATALOG for it, even after restarting
the phoenix client.

Should this mechanism work in Phoenix or is there another way to rename or
clone an existing table?

Michael


Re: Row timestamp usage

2017-05-22 Thread Michael Young
Sergey,
at this point I haven't noticed any issues with the CsvBulkLoadTool on my
version (HDP2.5 with phoenix-4.7.0.2.5.0.0-1245).  I don't have any
secondary indexes, if that is what you are referring to.

If you know any potential side effects, I can look for them.  I will check
JIRA as well.

Thanks for this information!

On Mon, May 22, 2017 at 4:41 PM, Sergey Soldatov <sergeysolda...@gmail.com>
wrote:

> AFAIK depending on the version of Phoenix you are using, you may
> experience problems with MR bulk load or indexes. Possible some other 'side
> effects' - try to search JIRAs for 'ROW TIMESTAMP".  There is no way to
> alter the column type except drop/create this column.
>
> Thanks,
> Sergey
>
>
> On Mon, May 22, 2017 at 3:45 PM, Michael Young <yomaiq...@gmail.com>
> wrote:
>
>> I am using a DATE type column as one of the leading columns in my PK and
>> I am defining it as "ROW TIMESTAMP" to take advantage of the optimizations
>> mentioned here:https://phoenix.apache.org/rowtimestamp.html
>>
>> Are there any disadvantages to using this feature?  My PK has 20+ columns
>> (queries are done over date ranges so I am interested in any optimizations
>> which help such queries).  The value is set on UPSERT to the daily value,
>> the hour/minutes aren't really needed for my use cases so I just use
>> midnight 00:00.000 (eg. 2017-01-01 00:00:00.000).
>>
>> Once it's set, is there a way to alter the column type to be a regular
>> DATE type?  Or would I need to recreate the table?
>>
>> Just wondering out of curiosity in case there are instances where I
>> should not be using this feature.
>>
>> Cheers,
>> -Michael
>>
>>
>


Row timestamp usage

2017-05-22 Thread Michael Young
I am using a DATE type column as one of the leading columns in my PK and I
am defining it as "ROW TIMESTAMP" to take advantage of the optimizations
mentioned here:https://phoenix.apache.org/rowtimestamp.html

Are there any disadvantages to using this feature?  My PK has 20+ columns
(queries are done over date ranges so I am interested in any optimizations
which help such queries).  The value is set on UPSERT to the daily value,
the hour/minutes aren't really needed for my use cases so I just use
midnight 00:00.000 (eg. 2017-01-01 00:00:00.000).

Once it's set, is there a way to alter the column type to be a regular DATE
type?  Or would I need to recreate the table?

Just wondering out of curiosity in case there are instances where I should
not be using this feature.

Cheers,
-Michael


Unable to delete from table

2017-04-28 Thread Michael Young
This is probably a newbie question, but has anyone run across a case where
a simple 'delete from tbl' fails to work?

my_tmp_table has 29,728 rows

I run:
delete from my_tmp_table;
29,728 rows affected (0.08 seconds)

select count(*) from my_tmp_table;
'COUNT(1)'
'29728'

No rows were deleted from this table.  My table doesn't have
IMMUTABLE_ROWS=true, but even if it did I would expect delete to work.

We use phoenix-4.7.0.2.5.0.0-1245 in HDP 2.5 release.

Are there any checks I can do to identify the cause (without restarting
hbase, that is)?

-Michael


Re: Phoenix Query Server query logging

2017-04-12 Thread Michael Young
James,

Our users access phoenix via the query server.  Is there a way to use this
metric framework with PQS to log user query data (eg. number of queries,
number of failures, execution time, 90/95/99 percentiles, topN queries
etc...?  We'd like to know what users are running and what might cause
performance issues or errors during usage.

Also, any other way to analyze these logs besides Splunk?

On Tue, Apr 11, 2017 at 4:05 PM, James Taylor <jamestay...@apache.org>
wrote:

> FWIW, we use our metric framework[1] to collect that information,
> outputting one log line per statement or query and then Splunk to look at
> it.
>
> [1] http://phoenix.apache.org/metrics.html
>
> On Tue, Apr 11, 2017 at 3:24 PM Michael Young <yomaiq...@gmail.com> wrote:
>
>> Yes, the tracing webapp is working, at least in our distro HDP 2.5
>> release.
>>
>> However, it seemed to negatively impact our performance and created a
>> large volume of trace data which was somewhat overwhelming.
>>
>> We could not get simple SELECT query logging and  query time info from
>> the trace logs it produces.  So it didn't seem appropriate to address the
>> type of simple query logging we have in our use case.  I suppose it is more
>> for detailed tracing use cases.
>>
>> Cheers,
>> Michael
>>
>> On Mon, Apr 3, 2017 at 2:28 PM, Ryan Templeton <
>> rtemple...@hortonworks.com> wrote:
>>
>> I see there’s a phoenix-tracing-webapp project in the build plus this on
>> the website - https://phoenix.apache.org/tracing.html
>>
>> Is this project still working and usable? The project looks like it’s had
>> updates as of a few months ago…
>>
>>
>> Thanks,
>> Ryan
>>
>> On 3/3/17, 10:33 AM, "Josh Elser" <els...@apache.org> wrote:
>>
>> >https://github.com/apache/calcite/blob/master/avatica/
>> server/src/main/java/org/apache/calcite/avatica/jdbc/JdbcMeta.java
>> >
>> >This is ultimately where the requests from the client using the thin
>> >JDBC driver get executed inside of PQS. The API's aren't 100%, but it
>> >should be obvious what is getting invoked with you're familiar with the
>> >JDBC APIs.
>> >
>> >Michael Young wrote:
>> >> Josh,
>> >> I am interested in looking at adding this to Avatica myself, although
>> >> I'm not familiar with that code base.
>> >>
>> >> Can you point me to where in the avatica code I should look at to add
>> >> this logging?
>> >>
>> >> Cheers
>> >>
>> >>
>> >> On Tue, Feb 28, 2017 at 4:15 AM, Josh Elser <els...@apache.org
>> >> <mailto:els...@apache.org>> wrote:
>> >>
>> >> No, I don't believe there is any log4j logging done in PQS that
>> >> would show queries being executed.
>> >>
>> >> Ideally, we would have a "query log" in Phoenix which would present
>> >> an interface to this data and it wouldn't require anything special
>> >> in PQS. However, I wouldn't be opposed to some trivial additions to
>> >> PQS (Avatica, really) to add a simple logging as a stopgap.
>> >>
>> >>
>> >> On Feb 27, 2017 20:49, "Michael Young" <yomaiq...@gmail.com
>> >> <mailto:yomaiq...@gmail.com>> wrote:
>> >>
>> >> I hadn't seen a reply to my earlier question.
>> >>
>> >> We have business analysts running queries using BI tools (like
>> >> Tableau) which connect via the Phoenix Query Server.
>> >>
>> >> How can we log all SELECT queries (raw query, start time, end
>> >> time, etc...)?
>> >>
>> >> Any way to tweak log4j or other properties to get this?  The
>> >> TRACE logging I tried (mentioned in my post above) was way too
>> >> dense to be useful for reporting usage, and doesn't seem to
>> show
>> >> the full SQL query params and query start/end times.  Also, it
>> >> logs every UPSERT during data load (which overwhelms the log
>> >> files).  We really just need SELECTS logged.
>> >>
>> >>
>> >>
>> >> On Tue, Jan 31, 2017 at 5:10 PM, Michael Young
>> >> <yomaiq...@gmail.com <mailto:yomaiq...@gmail.com>> wrote:
>> >>
>> >> Does the Phoenix Query Server have an option to log the SQL
>>

Re: Phoenix Query Server query logging

2017-04-11 Thread Michael Young
Yes, the tracing webapp is working, at least in our distro HDP 2.5 release.

However, it seemed to negatively impact our performance and created a large
volume of trace data which was somewhat overwhelming.

We could not get simple SELECT query logging and  query time info from the
trace logs it produces.  So it didn't seem appropriate to address the type
of simple query logging we have in our use case.  I suppose it is more for
detailed tracing use cases.

Cheers,
Michael

On Mon, Apr 3, 2017 at 2:28 PM, Ryan Templeton <rtemple...@hortonworks.com>
wrote:

> I see there’s a phoenix-tracing-webapp project in the build plus this on
> the website - https://phoenix.apache.org/tracing.html
>
> Is this project still working and usable? The project looks like it’s had
> updates as of a few months ago…
>
>
> Thanks,
> Ryan
>
> On 3/3/17, 10:33 AM, "Josh Elser" <els...@apache.org> wrote:
>
> >https://github.com/apache/calcite/blob/master/avatica/
> server/src/main/java/org/apache/calcite/avatica/jdbc/JdbcMeta.java
> >
> >This is ultimately where the requests from the client using the thin
> >JDBC driver get executed inside of PQS. The API's aren't 100%, but it
> >should be obvious what is getting invoked with you're familiar with the
> >JDBC APIs.
> >
> >Michael Young wrote:
> >> Josh,
> >> I am interested in looking at adding this to Avatica myself, although
> >> I'm not familiar with that code base.
> >>
> >> Can you point me to where in the avatica code I should look at to add
> >> this logging?
> >>
> >> Cheers
> >>
> >>
> >> On Tue, Feb 28, 2017 at 4:15 AM, Josh Elser <els...@apache.org
> >> <mailto:els...@apache.org>> wrote:
> >>
> >> No, I don't believe there is any log4j logging done in PQS that
> >> would show queries being executed.
> >>
> >> Ideally, we would have a "query log" in Phoenix which would present
> >> an interface to this data and it wouldn't require anything special
> >> in PQS. However, I wouldn't be opposed to some trivial additions to
> >> PQS (Avatica, really) to add a simple logging as a stopgap.
> >>
> >>
> >> On Feb 27, 2017 20:49, "Michael Young" <yomaiq...@gmail.com
> >> <mailto:yomaiq...@gmail.com>> wrote:
> >>
> >> I hadn't seen a reply to my earlier question.
> >>
> >> We have business analysts running queries using BI tools (like
> >> Tableau) which connect via the Phoenix Query Server.
> >>
> >> How can we log all SELECT queries (raw query, start time, end
> >> time, etc...)?
> >>
> >> Any way to tweak log4j or other properties to get this?  The
> >> TRACE logging I tried (mentioned in my post above) was way too
> >> dense to be useful for reporting usage, and doesn't seem to show
> >> the full SQL query params and query start/end times.  Also, it
> >> logs every UPSERT during data load (which overwhelms the log
> >> files).  We really just need SELECTS logged.
> >>
> >>
> >>
> >> On Tue, Jan 31, 2017 at 5:10 PM, Michael Young
> >> <yomaiq...@gmail.com <mailto:yomaiq...@gmail.com>> wrote:
> >>
> >> Does the Phoenix Query Server have an option to log the SQL
> >> statements which are executed?
> >>
> >> We see there are ways to get various PQS trace logs
> >> modifying the log4j settings used by the queryserver.py:
> >>
> >> log4j.org.apache.phoenix.jdbc.PhoenixStatementFactory=TRACE
> >> (or DEBUG)
> >> log4j.org.apache.phoenix.jdbc.PhoenixStatement=TRACE
> >> log4j.logger.org.apache.calcite.avatica=TRACE
> >> log4j.logger.org.apache.phoenix.queryserver.server=TRACE
> >> etc...
> >>
> >> but the data in the trace logs (which show SQL statements)
> >> are not particularly user friendly.  And it does not seem
> >> straightforward to get to end-to-end query execution times.
> >>
> >> Any suggestions how to get simple SQL logs (raw query,
> >> execution time, ...)?  The idea is to monitor user activity
> >> and take action if query times are slow, or timeout.
> >>
> >> Thanks,
> >> Michael
> >>
> >>
> >>
> >>
> >
>


Re: Phoenix Query Server query logging

2017-03-02 Thread Michael Young
Josh,
I am interested in looking at adding this to Avatica myself, although I'm
not familiar with that code base.

Can you point me to where in the avatica code I should look at to add this
logging?

Cheers


On Tue, Feb 28, 2017 at 4:15 AM, Josh Elser <els...@apache.org> wrote:

> No, I don't believe there is any log4j logging done in PQS that would show
> queries being executed.
>
> Ideally, we would have a "query log" in Phoenix which would present an
> interface to this data and it wouldn't require anything special in PQS.
> However, I wouldn't be opposed to some trivial additions to PQS (Avatica,
> really) to add a simple logging as a stopgap.
>
>
> On Feb 27, 2017 20:49, "Michael Young" <yomaiq...@gmail.com> wrote:
>
> I hadn't seen a reply to my earlier question.
>
> We have business analysts running queries using BI tools (like Tableau)
> which connect via the Phoenix Query Server.
>
> How can we log all SELECT queries (raw query, start time, end time,
> etc...)?
>
> Any way to tweak log4j or other properties to get this?  The TRACE logging
> I tried (mentioned in my post above) was way too dense to be useful for
> reporting usage, and doesn't seem to show the full SQL query params and
> query start/end times.  Also, it logs every UPSERT during data load (which
> overwhelms the log files).  We really just need SELECTS logged.
>
>
>
> On Tue, Jan 31, 2017 at 5:10 PM, Michael Young <yomaiq...@gmail.com>
> wrote:
>
>> Does the Phoenix Query Server have an option to log the SQL statements
>> which are executed?
>>
>> We see there are ways to get various PQS trace logs modifying the log4j
>> settings used by the queryserver.py:
>>
>> log4j.org.apache.phoenix.jdbc.PhoenixStatementFactory=TRACE (or DEBUG)
>> log4j.org.apache.phoenix.jdbc.PhoenixStatement=TRACE
>> log4j.logger.org.apache.calcite.avatica=TRACE
>> log4j.logger.org.apache.phoenix.queryserver.server=TRACE
>> etc...
>>
>> but the data in the trace logs (which show SQL statements) are not
>> particularly user friendly.  And it does not seem straightforward to get to
>> end-to-end query execution times.
>>
>> Any suggestions how to get simple SQL logs (raw query, execution time,
>> ...)?  The idea is to monitor user activity and take action if query times
>> are slow, or timeout.
>>
>> Thanks,
>> Michael
>>
>
>
>


Re: Phoenix Query Server query logging

2017-02-27 Thread Michael Young
I hadn't seen a reply to my earlier question.

We have business analysts running queries using BI tools (like Tableau)
which connect via the Phoenix Query Server.

How can we log all SELECT queries (raw query, start time, end time, etc...)?

Any way to tweak log4j or other properties to get this?  The TRACE logging
I tried (mentioned in my post above) was way too dense to be useful for
reporting usage, and doesn't seem to show the full SQL query params and
query start/end times.  Also, it logs every UPSERT during data load (which
overwhelms the log files).  We really just need SELECTS logged.



On Tue, Jan 31, 2017 at 5:10 PM, Michael Young <yomaiq...@gmail.com> wrote:

> Does the Phoenix Query Server have an option to log the SQL statements
> which are executed?
>
> We see there are ways to get various PQS trace logs modifying the log4j
> settings used by the queryserver.py:
>
> log4j.org.apache.phoenix.jdbc.PhoenixStatementFactory=TRACE (or DEBUG)
> log4j.org.apache.phoenix.jdbc.PhoenixStatement=TRACE
> log4j.logger.org.apache.calcite.avatica=TRACE
> log4j.logger.org.apache.phoenix.queryserver.server=TRACE
> etc...
>
> but the data in the trace logs (which show SQL statements) are not
> particularly user friendly.  And it does not seem straightforward to get to
> end-to-end query execution times.
>
> Any suggestions how to get simple SQL logs (raw query, execution time,
> ...)?  The idea is to monitor user activity and take action if query times
> are slow, or timeout.
>
> Thanks,
> Michael
>


Re: Phoenix Query Server tenant_id

2017-02-23 Thread Michael Young
The missing quotes was the issue.  That fixed it.  Thanks!


On Wed, Feb 22, 2017 at 8:16 PM, Josh Elser <els...@apache.org> wrote:

> Also, remember that Bash is going to interpret that semi-colon in your URL
> if you don't quote it. It will be treated as two separate commands:
>
> `/usr/hdp/2.5.0.0-1245/phoenix/bin/sqlline-thin.py
> http://pqs1.mydomain.com:8765` and `TenantId=myTenant`
>
> Please try `/usr/hdp/2.5.0.0-1245/phoenix/bin/sqlline-thin.py '
> http://pqs1.mydomain.com:8765;TenantId=myTenant'`, and, if that still
> does not work, feel free to open a JIRA issue to investigate why this is
> happening.
>
> Michael Young wrote:
>
>> In response to your question:
>>  >>So, just that I'm on the same page as you, when you invoke the Java
>> application yourself, it works fine, but when you use sqlline-thin.py,
>> the extra argument is dropped?
>> Yes. The property gets passed when invoking the Java application, but
>> not when adding it to the sqllin-thin.py command line.
>> It is simple to fix in my local dev test environment, but not in our
>> production environment, unfortunately.
>> We are using Phoenix 4.7 (from HDP 2.5 Community release).
>>
>> On Wed, Feb 22, 2017 at 4:07 PM, Josh Elser <els...@apache.org
>> <mailto:els...@apache.org>> wrote:
>>
>> So, just that I'm on the same page as you, when you invoke the Java
>> application yourself, it works fine, but when you use
>> sqlline-thin.py, the extra argument is dropped?
>>
>> If so, that just sounds like a bug in sqlline-thin.py that we could
>> fix...
>>
>> If you're skilled in Python (or even just adventurous), I'd guess it
>> would be a relatively simple fix :)
>>
>> Michael Young wrote:
>>
>> Hi Josh,
>>
>> you are right, the TenantId property does get passed through
>> when I run
>> the command by hand (see below).
>>
>> Phoenix Version: phoenix-4.7.0.2.5.0.0-1245
>>
>> java $PHOENIX_OPTS \
>> -cp $CLASSPATH \
>> -Dlog4j.configuration=file:/usr/hdp/2.5.0.0-1245/phoenix/bin
>> /log4j.properties
>> \
>> org.apache.phoenix.queryserver.client.SqllineWrapper \
>> -d org.apache.phoenix.queryserver.client.Driver \
>> -u
>> "jdbc:phoenix:thin:url=http://pqs1.mydomain.com:8765
>> <http://pqs1.mydomain.com:8765>;serialization=PROTOBUF;Tenan
>> tId=myTenant"
>> \
>> -n none -p none --color=true --fastConnect=false --verbose=true
>> --incremental=false --isolation=TRANSACTION_READ_COMMITTED
>>
>> However, with Phoenix version 4.7.0.2.5.0.0-1245 (which comes
>> with HDP
>> 2.5), the python script doesn't pass any property along.
>>   Here's the
>> obfuscated output when I connect to PQS:
>>
>> [prompt]$ /usr/hdp/2.5.0.0-1245/phoenix/bin/sqlline-thin.py
>> http://pqs1.mydomain.com:8765;TenantId=myTenant
>>
>> Setting property: [incremental, false]
>> Setting property: [isolation, TRANSACTION_READ_COMMITTED]
>> issuing: !connect
>> jdbc:phoenix:thin:url=http://pqs1.mydomain:8765;serializatio
>> n=PROTOBUF
>> none none org.apache.phoenix.queryserver.client.Driver
>> Connecting to
>> jdbc:phoenix:thin:url=http://pqs1.mydomain:8765;serializatio
>> n=PROTOBUF
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>> [jar:file:/usr/hdp/2.5.0.0-1245/phoenix/phoenix-4.7.0.2.5.0.
>> 0-1245-thin-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>> [jar:file:/usr/hdp/2.5.0.0-1245/hadoop/lib/slf4j-log4j12-1.
>> 7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings
>> <http://www.slf4j.org/codes.html#multiple_bindings> for an
>> explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFac
>> tory]
>> Connected to: Apache Phoenix (version unknown version)
>> Driver: Phoenix Remote JDBC Driver (version unknown version)
>> Autocommit status: true
>> Transaction isolation: TRANSACTION_READ_COMMITTED
>> Building list of tables and columns for tab-completion (set
>> fastconnect
>> to true to skip)...
>> 7603/7603 (100%) Done
>> 

Phoenix Query Server tenant_id

2017-02-15 Thread Michael Young
Is it possible to pass the TenantID attribute on the URL when using the
phoenix query server?  For example,

/usr/hdp/2.5.0.0-1245/phoenix/bin/sqlline-thin.py
http://pqshost.myhost.com:8765;TenantId=tenant1

This works fine for me when connecting via jdbc.  Just didn't seem to work
with the query server.

Thanks,
-Michael


Phoenix Query Server query logging

2017-01-31 Thread Michael Young
Does the Phoenix Query Server have an option to log the SQL statements
which are executed?

We see there are ways to get various PQS trace logs modifying the log4j
settings used by the queryserver.py:

log4j.org.apache.phoenix.jdbc.PhoenixStatementFactory=TRACE (or DEBUG)
log4j.org.apache.phoenix.jdbc.PhoenixStatement=TRACE
log4j.logger.org.apache.calcite.avatica=TRACE
log4j.logger.org.apache.phoenix.queryserver.server=TRACE
etc...

but the data in the trace logs (which show SQL statements) are not
particularly user friendly.  And it does not seem straightforward to get to
end-to-end query execution times.

Any suggestions how to get simple SQL logs (raw query, execution time,
...)?  The idea is to monitor user activity and take action if query times
are slow, or timeout.

Thanks,
Michael