Re: "upsert select" with "limit" clause

2018-12-17 Thread Jonathan Leech
My guess is that in order to enforce the limit that it’s effectively single threaded in either the select or the upsert. > On Dec 17, 2018, at 6:43 PM, Shawn Li wrote: > > Hi Vincent, > > Thanks for explaining. That makes much more sense now and it explains the > high memory usage when

Re: Rolling hourly data

2018-11-27 Thread Jonathan Leech
I would try writing the hourly values as 24 columns in a daily row, or as an array type. I’m not up to speed on the latest Phoenix features, but if it could update a daily sum on the fly that might be ok. If that doesn’t exist yet or isn’t performant, it could be done in an Hbase coprocessor.

Re: ABORTING region server and following HBase cluster "crash"

2018-09-13 Thread Jonathan Leech
This seems similar to a failure scenario I’ve seen a couple times. I believe after multiple restarts you got lucky and tables were brought up by Hbase in the correct order. What happens is some kind of semi-catastrophic failure where 1 or more region servers go down with edits that weren’t

Re: phoenix table with 50 salt buckets ( regions ) - now shows as 68 regions and18 of them stale

2018-03-22 Thread Jonathan Leech
Did you set the split policy to CostantSizeRegionSplitPolicy? > On Mar 22, 2018, at 2:56 PM, Adi Kadimetla wrote: > > Group, > TABLE - with 50 salt buckets and configured as time series table. > > Having pre split into 50 SALT buckets we disabled the region splits using

Re: Secondary index question

2018-02-27 Thread Jonathan Leech
I’ve done what you’re looking for by selecting the pk from the index in a nested query and filtering the other column separately. > On Feb 27, 2018, at 6:39 AM, Alexey Karpov wrote: > > Thanks for quick answer, but my case is a slightly different. I've seen these > links

Re: Async get

2017-10-06 Thread Jonathan Leech
I agree here but will go farther. Hbase needs an asynchronous api that goes further than its current capability, for example like building lamda functions in the client tier that execute in a java streams manner. Being able to run mapping functions, aggregations, etc without needing

Re: using index with "or" query

2017-10-03 Thread Jonathan Leech
I had an idea a while back that I’ll share here because it’s relevant. It’s basically a combined index, or index group, and it would work in this case. It could be implemented in both global and local indexes. The data for two or more indexes would be interleaved. For a local index, the

Re: quick question about apache phoenix.

2017-08-21 Thread Jonathan Leech
I recognize that name. Some of his posts were... memorable. I'm not surprised to hear he was banned. > On Aug 21, 2017, at 11:06 PM, James Taylor wrote: > > Hi Pawan, > Why would you listen to someone about the future of Apache Phoenix who has no > involvement in or

Re: Metrics and Phoenix

2017-07-26 Thread Jonathan Leech
I think you want scan_next_rate for reads and mutate_rate for writes. > On Jul 26, 2017, at 3:53 AM, Batyrshin Alexander <0x62...@gmail.com> wrote: > > >> On 26 Jul 2017, at 12:49, Batyrshin Alexander <0x62...@gmail.com> wrote: >> >> Hello, >> Im collecting metrics from region servers - >>

Re: Best strategy for UPSERT SELECT in large table

2017-06-19 Thread Jonathan Leech
with that approach? For example, if I wanted > to change a PK column type from VARCHAR to FLOAT, is this possible? > > > >> On Sun, Jun 18, 2017 at 10:50 AM, Jonathan Leech <jonat...@gmail.com> wrote: >> Also, if you're updating that many values and not doing

Re: Best strategy for UPSERT SELECT in large table

2017-06-18 Thread Jonathan Leech
, such as building or rebuilding indexes. > On Jun 18, 2017, at 11:41 AM, Jonathan Leech <jonat...@gmail.com> wrote: > > Another thing to consider, but only if your 1:1 mapping keeps the primary > keys the same, is to snapshot the table and restore it with the new name, and > a sch

Re: Best strategy for UPSERT SELECT in large table

2017-06-18 Thread Jonathan Leech
Another thing to consider, but only if your 1:1 mapping keeps the primary keys the same, is to snapshot the table and restore it with the new name, and a schema that is the union of the old and new schemas. I would put the new columns in a new column family. Then use upsert select, mapreduce,

Re: build index on existing big table

2017-06-04 Thread Jonathan Leech
dex? > > > -- Original -- > From: "Jonathan Leech";<jonat...@gmail.com>; > Date: Sun, Jun 4, 2017 01:26 PM > To: "user"<user@phoenix.apache.org>; > Subject: Re: build index on existing big table > > Give Hbase region se

Re: build index on existing big table

2017-06-03 Thread Jonathan Leech
Give Hbase region servers lots of memory, set number of Hbase store files and blocking files way high. Major compact before and after. You can create an index a sync with map reduce but not rebuild AFAIK. Also if rebuilding one or more local index, I found it better to drop it first in Hbase,

Re: Phoenix 4.9.0 with Spark 2.0

2017-05-31 Thread Jonathan Leech
There are edits to make in a few files due to API changes in Spark 2.x. They are all in one git commit in Phoenix-Spark. > On May 31, 2017, at 1:11 AM, cmbendre wrote: > > I saw that JIRA. But the issue is i am using Phoenix on AWS EMR, which comes > with 4.9.0. I

Re: Phoenix hbase question

2017-05-23 Thread Jonathan Leech
ta directly from HDFS. not go through > phoenix/hbase fir access. > > Is this possible? > > > Best regards > > On May 23, 2017 3:35 PM, "Jonathan Leech" <jonat...@gmail.com> wrote: > I think you would use Spark for that, via the Phoenix spark plug

Re: Bad performance of the first resultset.next()

2017-04-20 Thread Jonathan Leech
Client merge sort is just merging already sorted data from the parallel scan. Look into the number of simultaneous queries vs the Phoenix thread pool size and numActiveHandlers in Hbase region servers. Salting might not be helping you. Also try setting the fetch size on the query in JDBC. Make

Re: Custom Indexing Plug-in for Phoenix

2017-04-03 Thread Jonathan Leech
Take a look at SOLR and Lucene. You should be able to a text search on the Hbase data written via Phoenix. It works via the hbase replication mechanism so should be near-real time. I think you would have to use the SOLR API to do the initial search, which would get you the Hbase rowkey, which

Re: Select date range performance issue

2017-02-23 Thread Jonathan Leech
AND > CREATE_DT < TIMESTAMP '2017-04-01 00:00:00.000') >SERVER AGGREGATE INTO DISTINCT ROWS BY [KEYWORD] > CLIENT MERGE SORT > CLIENT 100 ROW LIMIT > > 3. ROW_TIMESTAMP is time of current query execution time, right? > Then it's not a right choice. :-( > > > 2

Re: Select date range performance issue

2017-02-23 Thread Jonathan Leech
If there are not a large number of distinct values of obj_id, try a SKIP_SCAN hint. Otherwise, the secondary index should work, make sure it's actually used via explain. Finally, you might try the ROW_TIMESTAMP feature if it fits your use case. > On Feb 22, 2017, at 11:30 PM, NaHeon Kim

Re: Moving column family into new table

2017-01-19 Thread Jonathan Leech
Do an explain on your query to confirm that it's doing a full scan and not a skip scan. I typically use an in () clause instead of or, especially with compound keys. I have also had to hint queries to use a skip scan, e.g /*+ SKIP_SCAN */. Phoenix seems to do a very good job not reading data

Re: slow response on large # of columns

2016-12-27 Thread Jonathan Leech
I would try an array for that use case. From my experience in hbase for the execution time querying the same data, more rows > more columns > fewer columns. Also note that running the query in Phoenix it creates a plan every time, and the number of columns might matter there. Also the sqlline

Re: Memory leak

2016-12-05 Thread Jonathan Leech
ersion of > hbase and phoenix are you using? > >> On Mon, Dec 5, 2016 at 9:53 AM Jonathan Leech <jonat...@gmail.com> wrote: >> Looks like PHOENIX-2357 introduced a memory leak, at least for me... I end >> up with old gen filled up with objects - 100,000,000 instances e

Memory leak

2016-12-05 Thread Jonathan Leech
Looks like PHOENIX-2357 introduced a memory leak, at least for me... I end up with old gen filled up with objects - 100,000,000 instances each of WeakReference and LinkedBlockingQueue$Node, owned by ConnectionQueryServicesImpl.connectionsQueue. The PhoenixConnection referred to by the

Re: Decode rowkey

2016-09-16 Thread Jonathan Leech
This would be really useful. The use case I have that is similar is to map Phoenix data to Hive (but the subset of Hive that Impala understands). I imagine it could work by reading the System.catalog table, or connection metadata, and generating Hive create table statements. There would need to

Re: Cloning a table in Phoenix

2016-09-08 Thread Jonathan Leech
I think you're best off running DDL with a new table name, but you could probably upsert the values yourself into system.catalog. If you have a lot of data to copy, you can use hbase snapshots and restore into the new table name. This would also take care of creating the underlying hbase table,

Re: Phoenix has slow response times compared to HBase

2016-09-02 Thread Jonathan Leech
The direct hbase client probably made 500 direct clients whereas Phoenix maybe made fewer simultaneous calls, with a little waiting and hit a sweeter spot for load on your configuration. > On Sep 2, 2016, at 7:06 PM, Mujtaba Chohan wrote: > > Single user average:

Re: copy table to remote cluster

2016-07-15 Thread Jonathan Leech
If the table is small, you can export to a flat file, copy it over, then import, all using Phoenix cmd line utilities. If there is connectivity between the clusters, and the schema is identical, for small to mid-size tables, you can set up hbase replication, and do upsert into x select * from

Re: Cannot get more than 5 columns in result set

2016-04-04 Thread Jonathan Leech
!set maxWidth 2000 (or something like that, check the help) You can also set your terminal really wide prior to launching sqlline. > On Apr 4, 2016, at 1:30 PM, Ian Maloney wrote: > > Using Phoenix 4.4.0 to query a view created on an HBase 1.1.2 table, using >

Re: Local indexes not working with hbase replication

2016-03-01 Thread Jonathan Leech
be some internal state in the region server coprocessors that wouldn't be there unless the DDL is run in the cluster. Would like to avoid an hbase restart in the replica cluster. Thanks, Jonathan > On Feb 29, 2016, at 5:09 PM, Jonathan Leech <jonat...@gmail.com> wrote: >

Local indexes not working with hbase replication

2016-02-29 Thread Jonathan Leech
Some are and some aren't working... Version is 4.5.2-1.clabs_phoenix1.2.0.p0.774 on CDH5.5.1. Tried rebuilding on destination, then on both sides, the doing snapshots to transfer the data, all to no avail. The data replicates but Phoenix doesn't see it. I don't see any obvious differences on

Re: Need Help Dropping Phoenix Table Without Dropping HBase Table

2016-02-25 Thread Jonathan Leech
with all fields made static, and then copy the data from one to the other. > > Thanks, > Steve > >> On Wed, Feb 24, 2016 at 7:11 PM, Jonathan Leech <jonat...@gmail.com> wrote: >> You could also take a snapshot in hbase just prior to the drop table, then >> res

Re: Need Help Dropping Phoenix Table Without Dropping HBase Table

2016-02-24 Thread Jonathan Leech
You could also take a snapshot in hbase just prior to the drop table, then restore it afterward. > On Feb 24, 2016, at 12:25 PM, Steve Terrell wrote: > > Thanks for your quick and accurate responses! > >> On Wed, Feb 24, 2016 at 1:18 PM, Ankit Singhal

Re: java core dump

2016-02-16 Thread Jonathan Leech
. > On Feb 15, 2016, at 12:25 PM, Andrew Purtell <andrew.purt...@gmail.com> wrote: > > You might also consider moving back down to 7u79 > >> On Feb 15, 2016, at 10:35 AM, Jonathan Leech <jonat...@gmail.com> wrote: >> >> Has anyone else seen this? Happ

java core dump

2016-02-15 Thread Jonathan Leech
Has anyone else seen this? Happening under load in jdk 1.7.0_80 / phoenix 4.5.2 - cloudera labs. Based on the source code, It seems the JVM is calling the wrong toObject(), and then dumping. The correct toObject() method is a couple parent classes away with some generics and Sun / Oracle must have