Re: Union PreparedStatement ParameterMetaData Parameter value unbound Issue

2019-09-04 Thread James Taylor
Yes, JIRA please. On Wed, Sep 4, 2019 at 1:24 PM lewjackman wrote: > Does this look like an issue for which I should write a Jira? > > > > -- > Sent from: http://apache-phoenix-user-list.1124778.n5.nabble.com/ >

Re: On duplicate key update

2019-08-27 Thread James Taylor
The lock will happen every time an "on duplicate key" clause is executed. Conceptually it's like a checkAndPut, but it's batched. If two threads attempt to write the same non-existent row, one of them will get there first and get the lock, while the other would wait behind it (and subsequently

Re: Buckets VS regions

2019-08-19 Thread James Taylor
It’ll start with 12 regions, but those regions may split as they’re written to. On Mon, Aug 19, 2019 at 4:34 PM jesse wrote: > I have a table is SALT_BUCKETS = 12, but it has 14 regions, is this > right? > > Thanks > > >

Re: Phoenix 4 to 5 Upgrade Path

2019-06-15 Thread James Taylor
The data format is not different between 4.0 and 5.0. The metadata can change, though. We make a best effort for a seamless, automatic upgrade, but you should test your specific scenario yourself to be confident that there are no problems. The larger the version difference, the greater the risk.

Re: Large differences in query execution time for similar queries

2019-04-17 Thread James Taylor
Hi Hieu, You could try add the /*+ SERIAL */ hint to see if that has any impact. Also, have you tried not salting the table? The SALT_BUCKETS value of 128 is pretty high. For the other issue, do you have a lot of deleted cells? You might try running a major compaction. You might try adding a

Re: Access Hbase Cell Timestamp using Phoenix UDF

2019-03-05 Thread James Taylor
You should be able to declare your parameter as VARBINARY and you can then use any type. On Tue, Mar 5, 2019 at 5:56 PM OMKAR NALAWADE wrote: > Hello, > > I am trying to access cell level timestamp of a given column in a table > using a phoenix UDF. > *For example:* I have created a UDF named

Re: Phoenix JDBC Connection Warmup

2019-02-02 Thread James Taylor
Have you tried setting UPDATE_CACHE_FREQUENCY on your tables? On Fri, Feb 1, 2019 at 6:28 PM Jaanai Zhang wrote: > we experimented with issuing the same query repeatedly, and we observed a >> slow down not only on the first query > > I am not sure what the reasons are, perhaps you can enable

Re: Query All Dynamic Columns

2018-12-26 Thread James Taylor
Persisting dynamic column names+types in Phoenix is exactly what views are for. On Wed, Dec 26, 2018 at 12:05 PM Vincent Poon wrote: > A lot of work is currently going into handling large numbers of views - > splittable syscat, view management, etc... but agree that it's not ideal. > > There's

Re: High-availability for transactions

2018-11-09 Thread James Taylor
pproach to merging 'omid2' > branch work with say, 4.14-cdh5.14? > > Curtis > > > On Tue, Nov 6, 2018 at 3:28 PM James Taylor > wrote: > >> The Omid integration will be available on all active 4.x branches (1.2, >> 1.3, and 1.4) as well as on 5.x. The Omid integratio

Re: High-availability for transactions

2018-11-06 Thread James Taylor
The Omid integration will be available on all active 4.x branches (1.2, 1.3, and 1.4) as well as on 5.x. The Omid integration has the same unit test coverage as Tephra. If you want to give it a spin, let us know. You'd just need to pull the phoenix-integration branch for Omid and the omid2 branch

Re: Null array elements with joins

2018-08-13 Thread James Taylor
I commented on the JIRA you filed here: PHOENIX-4791. Best to keep discussion there. Thanks, James On Mon, Aug 13, 2018 at 11:08 AM, Gerald Sangudi wrote: > Hello all, > > Any suggestions or pointers on the issue below? > > Projecting array elements works when not using joins, and does not work

Re: Spark-Phoenix Plugin

2018-08-06 Thread James Taylor
For the UPSERTs on a PreparedStatement that are done by Phoenix for writing in the Spark adapter, not that these are *not* doing RPCs to the HBase server to write data (i.e. they are never committed). Instead the UPSERTs are used to ensure that the correct serialization is performed given the

Re: Apache Phoenix with Google Big Table

2018-08-02 Thread James Taylor
Since Google Bigtable doesn't support coprocessors, you'd need a version of Phoenix that doesn't rely on coprocessors (some server-side hooks that HBase provides). It'd be non trivial and you'd lose some functionality and performance in doing this, but theoretically it's possible. Here are the

Re: order by primary key desc return wrong results

2018-07-31 Thread James Taylor
Please file a JIRA. On Mon, Jul 30, 2018 at 10:12 PM, jie chen wrote: > phoenix-4.14-hbase-1.2 > > 0: jdbc:phoenix:localhost> create table test(id bigint not null primary >>> key, a bigint); >> >> No rows affected (1.242 seconds) >> >> 0: jdbc:phoenix:localhost> upsert into test values(1,11);

Re: Statements caching

2018-07-27 Thread James Taylor
There's no statement caching available in Phoenix. That would be a good contribution, though. Thanks, James On Thu, Jul 26, 2018 at 10:45 AM, Batyrshin Alexander <0x62...@gmail.com> wrote: > Hi all, > Im wondering how to enable statement caching in Phoenix JDBC Driver. > Is there anything like

Re: Upsert is EXTREMELY slow

2018-07-13 Thread James Taylor
Phoenix won’t be slower to update secondary indexes than a use case would be. Both have to do the writes to a second table to keep it in sync. On Fri, Jul 13, 2018 at 8:39 AM Josh Elser wrote: > Also, they're relying on Phoenix to do secondary index updates for them. > > Obviously, you can do

Re: Hash aggregation

2018-06-14 Thread James Taylor
Hi Gerald, No further suggestions than my comments on the JIRA. Maybe a good next step would be a patch? Thanks, James On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi wrote: > Hi Maryann and James, > > Any further guidance on PHOENIX-4751 > ?

Re: Atomic UPSERT on indexed tables

2018-06-11 Thread James Taylor
It's possible that local indexes could be allowed for atomic upserts, but global indexes are problematic (in that under load your cluster would probably die). The reason is that there'd be a cross RS call made for each row being atomically upserted. If the call hangs due to the RS hosting the data

[ANNOUNCE] Apache Phoenix 4.14 released

2018-06-11 Thread James Taylor
The Apache Phoenix team is pleased to announce the immediate availability of the 4.14.0 release. Apache Phoenix enables SQL-based OLTP and operational analytics for Apache Hadoop using Apache HBase as its backing store and providing integration with other projects in the Apache ecosystem such as

Re: Problem with query with: limit, offset and order by

2018-05-25 Thread James Taylor
OFFSET will not scale well with large values as there is no way to implement it in HBase other than scanning from the beginning and skipping that many rows. I'd suggest using row value constructors instead. You can read more about that here: https://phoenix.apache.org/paged.html Thanks, James On

Re: Cannot access from jdbc

2018-05-23 Thread James Taylor
ommit log looks more recent than 2 years > > > 2018-05-23 18:16 GMT+02:00 James Taylor <jamestay...@apache.org>: > >> The 4.7 release is more than two years old. That's seven releases back >> from the current release we're voting on 4.14. I'd recommend working

Re: Cannot access from jdbc

2018-05-23 Thread James Taylor
The 4.7 release is more than two years old. That's seven releases back from the current release we're voting on 4.14. I'd recommend working with your vendor and urging them to upgrade to a newer, supportable version. Thanks, James On Wed, May 23, 2018 at 9:10 AM, Nicolas Paris

Re: SORT_MERGE_JOIN on non-leading key: server-side sorting

2018-05-17 Thread James Taylor
Hi Gerald, The fix for PHOENIX-4508 will appear in the 4.14.0 release which we're working on now. We should have a second RC up shortly that you can use to verify. The fix isn't in 4.13 since it was checked in after the release. Thanks, James On Thu, May 17, 2018 at 4:44 PM, Maryann Xue

Re: Binary fields and compression

2018-05-13 Thread James Taylor
You can have a property only apply to a single column family by prefixing it with the family name: CREATE TABLE DOCUMENTS (HOST VARCHAR NOT NULL PRIMARY KEY, A.CONTENT VARBINARY, B.TEXT VARCHAR, B.LABEL VARCHAR, B.DATE_CREATE TIMESTAMP) B.COMPRESSION='GZ' On Sun, May 13, 2018 at 3:50 AM Nicolas

Re: UPSERT null vlaues

2018-04-28 Thread James Taylor
you James, it was “immutable”. I didn't know that it affects. > > > > *From:* James Taylor [mailto:jamestay...@apache.org] > *Sent:* Friday, April 27, 2018 5:37 PM > *To:* user@phoenix.apache.org > *Subject:* Re: UPSERT null vlaues > > > > Hi Stepan, > > Please post your

Re: UPSERT null vlaues

2018-04-27 Thread James Taylor
Hi Stepan, Please post your complete DDL and indicate the version of Phoenix and HBase you’re using. Your example should work as expected barring declaration of the table as immutable or COL2 being part of the primary key. Thanks, James On Fri, Apr 27, 2018 at 6:13 AM Stepan Migunov <

Re: Split and distribute regions of SYSTEM.STATS table

2018-04-23 Thread James Taylor
Alexander <0x62...@gmail.com> wrote: > If all stats for given table should be on the same region there is no > benefits on splitting. > > Another question: is it ok to set 'IN_MEMORY' => 'true' for CF of SYSTEM.* > tables? > > > On 20 Apr 2018, at 23:39, James Tayl

Re: Split and distribute regions of SYSTEM.STATS table

2018-04-20 Thread James Taylor
are on the same region. James On Fri, Apr 20, 2018 at 1:37 PM, James Taylor <jamestay...@apache.org> wrote: > Thanks for bringing this to our attention. There's a bug here in that the > SYSTEM.STATS > > On Wed, Apr 18, 2018 at 9:59 AM, Batyrshin Alexander <0x62...@gmail.com&

Re: Split and distribute regions of SYSTEM.STATS table

2018-04-20 Thread James Taylor
Thanks for bringing this to our attention. There's a bug here in that the SYSTEM.STATS On Wed, Apr 18, 2018 at 9:59 AM, Batyrshin Alexander <0x62...@gmail.com> wrote: > Hello, > I've discovered that SYSTEM.STATS has only 1 region with size 3.25 GB. Is > it ok to split it and distribute over

Re: hint to use a global index is not working - need to find out why

2018-04-20 Thread James Taylor
Ron - Salting is only recommended when your primary key is monotonically increasing. It's mainly used to prevent write hotspotting. Also, I think Ron forgot to mention, but I was working with him a bit earlier on this, and I couldn't repro the issue either (in current 4.x or in 4.7 release).

Re: using an array field as an index - PHOENIX-1544

2018-04-20 Thread James Taylor
I did a search using PHOENIX-1544 and could not find any updates to your > June 2015 post on the Phoenix list, so I wanted to ask: what is the current > status for indexing array fields over immutable (or even mutable) tables? > We could certainly use such. > > > > Ron >

Re: hbase cell storage different bewteen bulk load and direct api

2018-04-19 Thread James Taylor
I believe we still rely on that empty key value, even for compact storage formats (though theoretically it could likely be made so we don't - JIRA, please?) A quick test would confirm: - upsert a row with no last_name or first_name - select * from T where last_name IS NULL If the row isn't

Re: hotspot in System.catalog table

2018-04-13 Thread James Taylor
t;> >> [1] https://phoenix.apache.org/language/index.html#options >> >> On Thu, Apr 12, 2018 at 11:06 PM, James Taylor <jamestay...@apache.org> >> wrote: >> >>> No, that won’t make a difference. >>> >>> On Thu, Apr 12, 2018 at 10:51

Re: hotspot in System.catalog table

2018-04-13 Thread James Taylor
is not compiled)? > > On Thu, Apr 12, 2018 at 10:43 PM, James Taylor <jamestay...@apache.org> > wrote: > >> Try setting the UPDATE_CACHE_FREQUENCY table property (and configuring >> the phoenix.default.update.cache.frequency system-wide property). That'll >> prevent p

Re: hotspot in System.catalog table

2018-04-12 Thread James Taylor
Try setting the UPDATE_CACHE_FREQUENCY table property (and configuring the phoenix.default.update.cache.frequency system-wide property). That'll prevent pinging the region hosting SYSTEM.CATALOG every time a query is compiled. We've found value of even 5 seconds makes a big difference. For more on

Re: "Unable to discover transaction service" prompt out when tried to create table with transaction = true

2018-03-28 Thread James Taylor
I suspect the issue may be due to the version of Guava in the hbase/lib directory being too old. Try replacing it with a newer one - I think Tephra needs Guava 14 or above. On Wed, Mar 28, 2018 at 2:38 PM ivany...@gmail.com wrote: > Hi, I was trying to enable the transaction

Re: Slow query help

2018-03-16 Thread James Taylor
Hi Flavio, You'll need to add a secondary index to SOMEFIELD (or SOMEFIELD + VALID) to speed that up. You can write it more simply as SELECT COUNT(DISTINCT SOMEFIELD) FROM TEST.MYTABLE WHERE VALID AND SOMEFIELD IS NOT NULL. Otherwise, you'll end up doing a full table scan (and use a fair amount

Re: Direct HBase vs. Phoenix query performance

2018-03-15 Thread James Taylor
rect to expect that it would cache the results of a subquery used in >> a join? If so, what are possible reasons why it would *not* do so? Any >> guidance on metrics / optimizations to look at would be appreciated. >> >> Thanks, >> Marcell >> >> On

Re: Direct HBase vs. Phoenix query performance

2018-03-08 Thread James Taylor
Hi Marcell, It'd be helpful to see the table DDL and the query too along with an idea of how many regions might be involved in the query. If a query is a commonly run query, usually you'll design the row key around optimizing it. If you have other, simpler queries that have determined your row

Re: Runtime DDL supported?

2018-03-08 Thread James Taylor
//maps.google.com/?q=900+Jefferson+Ave+%0D%0A%0D%0ARedwood+City+,+CA+94063=gmail=g> > > On Wed, Feb 28, 2018 at 10:27 PM, James Taylor <jamestay...@apache.org> > wrote: > >> Please file a JIRA as it’d be feasible to change this limitation. The >> easiest way would be

Re: Runtime DDL supported?

2018-02-28 Thread James Taylor
on+Ave+%0D%0A%0D%0ARedwood+City+,+CA+94063=gmail=g>, > CA 94063 > <https://maps.google.com/?q=900+Jefferson+Ave+%0D%0A%0D%0ARedwood+City+,+CA+94063=gmail=g> > > On Thu, Feb 22, 2018 at 7:42 AM, James Taylor <jamestay...@apache.org> > wrote: > >> Another option

Re: Secondary index question

2018-02-27 Thread James Taylor
Please file a JIRA and include the Phoenix and HBase version. Sounds like you’ve found a bug. On Tue, Feb 27, 2018 at 9:21 AM Jonathan Leech wrote: > I’ve done what you’re looking for by selecting the pk from the index in a > nested query and filtering the other column

Re: Secondary index question

2018-02-26 Thread James Taylor
See https://phoenix.apache.org/secondary_indexing.html#Index_Usage. We get this question a fair amount. We have an FAQ, here [1], but it's not a very complete answer (as it doesn't mention hinting or local indexes), so it'd be good if it was updated. Thanks, James [1]

Re: Incorrect number of rows affected from DELETE query

2018-02-22 Thread James Taylor
Phoenix returns the number of delete markers that were placed if it’s a point delete, not the actual number of rows deleted. Otherwise you’d need to do a read before the delete (which would be costly). It’s possible that this could be made configurable - please file a JIRA. You could work around

Re: Runtime DDL supported?

2018-02-22 Thread James Taylor
uld also use the HBase API to run a point row get. We'd have to > reimplement decoding for Phoenix's column values, which is not ideal but > quite doable. > > Sent from my iPhone > > On Feb 21, 2018, at 9:09 PM, James Taylor <jamestay...@apache.org> wrote: > > Have you tried a UNIO

Re: Large tables in phoenix, issues with relational queries

2018-02-21 Thread James Taylor
Hi Aman, Will all of your 210 relational tables only have a few millions rows? If so, have you tried just using something like MySQL? What led you toward a distributed solution? When going from a single node RDBMS system to Phoenix, you typically wouldn't use the schemas directly, but there'd be

Re: Runtime DDL supported?

2018-02-21 Thread James Taylor
non-covered columns from >>> the main table, so we're not confident in using local indexes to optimize >>> queries. (I've looked through the 5.0-alpha release notes and couldn't find >>> anything related to this issue, so if desired I'll collect info for a >>> sep

Re: Runtime DDL supported?

2018-02-16 Thread James Taylor
t; <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063=gmail=g> > Redwood City, CA 94063 > <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063=gmail=g> > > On Fri, Feb 16, 2018 at 2:49 PM, James Taylor <jamestay...@apach

Re: Runtime DDL supported?

2018-02-16 Thread James Taylor
Hi Miles, You'll be fine if you use views [1] and multi-tenancy [2] to limit the number of physical HBase tables. Make sure you read about the limitations of views too [3]. Here's the way I've seen this modeled successfully: - create one schema per use case. This will let you leverage some nice

Re: Creating View Table Using the Date & Time

2018-02-13 Thread James Taylor
few > chars are the date. and these dates are stored in a seperate columns as > BDATE as well. Do you think I could implement the rowtimestamp in the BDATE > column? > > Thanks > Vaghawan > > On Wed, Feb 14, 2018 at 7:47 AM, James Taylor <jamestay...@apache.org> > wrote:

Re: Creating View Table Using the Date & Time

2018-02-13 Thread James Taylor
The standard way of doing this is to add a TTL for your table [1]. You can do this through the ALTER TABLE call [2]. Is the date/time column part of your primary key? If so, you can improve performance by declaring this column as a ROW_TIMESTAMP [3]. A view is not going to help you - it's not

Re: Drop column timeout

2018-02-13 Thread James Taylor
Hi Jacobo, Please file a JIRA for asynchronous drop column functionality. There's a few ways that could be implemented. We could execute the call that issues the delete markers on the server-side in a separate thread (similar to what we do with UPDATE STATISTICS), or we could support a map-reduce

Re: Index table in SYSTEM.CATALOG without DATA_TABLE_NAME and INDEX_TYPE

2018-02-06 Thread James Taylor
Hi William, The system catalog table changes as new features are implemented. The API that you can count on being stable is JDBC and in particular for metadata, our DatabaseMetaData implementation. To understand how the system catalog changes from release to release you'd need to keep an eye on

Re: ROW_TIMESTAMP

2018-02-02 Thread James Taylor
purpose. I'll > track > this JIRA to get updates about it. > > BTW, considering nowadays, there's no option except to update some date > type > field on client side every upsert? > > Thank you so much. > > Alberto > > > James Taylor wrote > > Hi Alberto, &

Re: Apache Phoenix integration

2018-02-02 Thread James Taylor
ns. > > It’s about time for us to do a meetup. A joint meetup perhaps? > > Saurabh > > Sent from my iPhone > > > > > On Feb 2, 2018, at 11:13 AM, James Taylor <jamestay...@apache.org> > wrote: > > > > There's also a much deeper integration

Re: Apache Phoenix integration

2018-02-02 Thread James Taylor
Sent: Friday, February 02, 2018 9:04 AM > To: u...@drill.apache.org > Cc: James Taylor <jamestay...@apache.org> > Subject: Re: Apache Phoenix integration > > Eventually I made it to integrate Phoenix with Drill! I debugged remotely > the drill-embedded via Eclipse and I dis

Re: ROW_TIMESTAMP

2018-02-02 Thread James Taylor
Hi Alberto, Sounds like you need PHOENIX-4552. If you agree, let's continue the discussion over there. Thanks, James On Fri, Feb 2, 2018 at 9:05 AM, Alberto Bengoa wrote: > Hello Folks, > > I'm working on a project where we need to identify when a row was changed >

Re: HBase Timeout on queries

2018-02-01 Thread James Taylor
I don’t think the HBase row_counter job is going to be faster than a count(*) query. Both require a full table scan, so neither will be particularly fast. A couple of alternatives if you’re ok with an approximate count: 1) enable stats collection (but you can leave off usage to parallelize

Re: Is first query to a table region way slower?

2018-01-28 Thread James Taylor
Did you do an rs.next() on the first query? Sounds related to HConnection establishment. Also, least expensive query is SELECT 1 FROM T LIIMIT 1. Thanks, James On Sun, Jan 28, 2018 at 5:39 PM Pedro Boado wrote: > Hi all, > > I'm running into issues with a java springboot

Re: Phoenix executeBatch

2018-01-23 Thread James Taylor
Writing to HDFS with a columnar format like Parquet will always be faster than writing to HBase. How about random access of a row? If you're not doing point lookups and small range scans, you probably don't want to use HBase (& Phoenix). HBase is writing more information than is written when using

Re: [ANNOUNCE] Apache Phoenix 4.13.2 for CDH 5.11.2 released

2018-01-20 Thread James Taylor
On Sat, Jan 20, 2018 at 12:29 PM Pedro Boado wrote: > The Apache Phoenix team is pleased to announce the immediate availability > of the 4.13.2 release for CDH 5.11.2. Apache Phoenix enables SQL-based OLTP > and operational analytics for Apache Hadoop using Apache HBase as

Re: phoenix query execution plan

2018-01-18 Thread James Taylor
This is a limitation of our optimizer (see PHOENIX-627). Patches are welcome. The fix would be isolated to WhereOptimizer.java. Thanks, James On Thu, Jan 18, 2018 at 9:46 AM, abhi1 kumar wrote: > Hi All, > > I am using phoenix 4.7(hbase 1.1.xx) and came across following

Re: Phoenix 4.13 on Hortonworks

2018-01-17 Thread James Taylor
Hi Sumanta, Phoenix is an Apache project and not tied to any vendor. At Salesforce we use the regular Apache Phoenix code base instead of any vendor specific variation and this has worked out well for us. Thanks, James On Wed, Jan 17, 2018 at 9:25 PM Sumanta Gh wrote: >

Re: UDF for lateral views

2018-01-17 Thread James Taylor
No, this isn't supported from a UDF. You're looking for PHOENIX-4311 to be implemented. Let's continue the discussion there. On Wed, Jan 17, 2018 at 7:16 PM, Krishna wrote: > According to this blog (http://phoenix-hbase.blogspot.in/2013/04/how-to- >

Re: How to reduce write amplification when exists a few global index tables?

2018-01-16 Thread James Taylor
You can use local indexes to reduce write amplification. In that case, all index writes are local writes so the impact of multiple secondary indexes is not as severe. Of course, there's a read penalty you'd pay, so make sure you're ok with that. On Tue, Jan 16, 2018 at 12:08 AM,

Re: Query optimization

2017-12-27 Thread James Taylor
a JIRA and thanks for all the details. On Wed, Dec 27, 2017 at 3:44 PM Flavio Pompermaier <pomperma...@okkam.it> wrote: > Ok. So why the 2nd query requires more memory than the first one > (nonetheless USE_SORT_MERGE_JOIN is used) and can't complete? > > > On 28 Dec 2017

Re: Add automatic/default SALT

2017-12-27 Thread James Taylor
farther to say I > consider it harmful for Phoenix to do that out of the box. > > However, I would flip the question upside down instead: what kind of > suggestions can Phoenix make as a database to the user to _recommend_ to > them that they enable salting on a table given its schema an

Re: Query optimization

2017-12-27 Thread James Taylor
lient side probably hold on to the iterators from the >> both sides and crawling forward to do the merge sort. in this case should >> be no much memory footprint either way where the filter is performed. >> >> On December 22, 2017 at 1:04:18 PM, James Taylor (jamestay...@apache.o

Re: Query optimization

2017-12-22 Thread James Taylor
k out > hint USE_SORT_MERGE_JOIN, what will be the plan? > > > On December 22, 2017 at 12:46:25 PM, James Taylor (jamestay...@apache.org) > wrote: > > For sort merge join, both post-filtered table results are sorted on the > server side and then a merge sort is done on the client-side.

Re: Query optimization

2017-12-22 Thread James Taylor
For sort merge join, both post-filtered table results are sorted on the server side and then a merge sort is done on the client-side. On Fri, Dec 22, 2017 at 12:44 PM, Ethan wrote: > Hello Flavio, > > From the plan looks like to me the second query is doing the filter at >

Re: Efficient way to get the row count of a table

2017-12-19 Thread James Taylor
apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/ > mapreduce/RowCounter.html M/R. > > On Tue, Dec 19, 2017 at 3:18 PM, James Taylor <jamestay...@apache.org> > wrote: > >> If it needs to

Re: Efficient way to get the row count of a table

2017-12-19 Thread James Taylor
If it needs to be 100% accurate, then count(*) is the only way. If your data is write-once data, you might be able to track the row count at the application level through some kind of atomic counter in a different table (but this will likely be brittle). If you can live with an estimate, you could

Re: Hive UDF for creating row key in HBASE

2017-12-18 Thread James Taylor
Hi Chethan, As Ethan mentioned, take a look first at the Phoenix/Hive integration. If that doesn't work for you, the best way to get the row key for a phoenix table is to execute an UPSERT VALUES against the primary key columns without committing it. We have a utility function that will return the

Re: Add automatic/default SALT

2017-12-08 Thread James Taylor
Hi Flavio, I like the idea of “adaptable configuration” where you specify a config value as a % of some cluster resource (with relatively conservative defaults). Salting is somewhat of a gray area though as it’s not config based, but driven by your DDL. One solution you could implement on top of

[ANNOUNCE] Apache Phoenix 4.13.1 released

2017-12-07 Thread James Taylor
The Apache Phoenix team is pleased to announce the immediate availability of the 4.13.1 release. Apache Phoenix enables SQL-based OLTP and operational analytics for Apache Hadoop using Apache HBase as its backing store and providing integration with other projects in the Apache ecosystem such as

Re: upsert data with specific timestamp using CurrentSCN fail with error java.sql.SQLException: ERROR 518 (25502): Mutations are not permitted for a read-only connection.

2017-12-06 Thread James Taylor
later using the index tool) is there a way to make the connection not read > only ? > > > > *From:* James Taylor [mailto:jamestay...@apache.org] > *Sent:* Sunday, December 3, 2017 5:18 PM > *To:* user@phoenix.apache.org > *Subject:* Re: upsert data with specific timestam

Re: Boolean condition doesn't support IS operator

2017-12-05 Thread James Taylor
this > topic > If no other one ask for this feature you could ignore this :) > > On Tue, Dec 5, 2017 at 4:31 PM, James Taylor <jamestay...@apache.org> > wrote: > >> How about just using VALID = true or just VALID like this: select * from >> t where VALID >>

Re: Boolean condition doesn't support IS operator

2017-12-05 Thread James Taylor
How about just using VALID = true or just VALID like this: select * from t where VALID On Tue, Dec 5, 2017 at 2:52 AM Flavio Pompermaier wrote: > Hi to all, > I'm using Phoenix 4.7 and I cannot use IS operator on boolean values (e.g. > VALID IS TRUE) > Would it be that

Re: upsert data with specific timestamp using CurrentSCN fail with error java.sql.SQLException: ERROR 518 (25502): Mutations are not permitted for a read-only connection.

2017-12-03 Thread James Taylor
The CurrentSCN property may only be used for reading, not writing as of 4.13. We found that this kind of rewriting of history causes indexes to become corrupted. The documentation needs to be updated. On Sun, Dec 3, 2017 at 7:07 AM Bulvik, Noam wrote: > Hi, > > > > I want

Re: Help: setting hbase row timestamp in phoenix upserts ?

2017-11-30 Thread James Taylor
The only way I can think of accomplishing this is by using the raw HBase APIs to write the data but using our utilities to write it in a Phoenix compatible manner. For example, you could run an UPSERT VALUES statement, use the PhoenixRuntime.getUncommittedDataIterator()method to get the Cells that

Re: phoenix 4.13.0 on hbase 1.2

2017-11-28 Thread James Taylor
, Pedro. James On Mon, Nov 13, 2017 at 10:17 AM, James Taylor <jamestay...@apache.org> wrote: > We discussed whether or not we should continue with Phoenix releases for > HBase 1.2, but no one showed any interested in being the release manager > [1], so we concluded that we would

Re: 4.13.0-HBase-1.1 not released?

2017-11-28 Thread James Taylor
Hi James, > > Sorry for the delay I wasn't on the dev mailing list, I'm interested to > help and I can take the lead for the Hbase 1.1 release. > > Xavier > On 2017-11-18 03:22 PM, James Taylor wrote: > > FYI, we'll do one final release for Phoenix on HBase 1.1 (look for a >

Re: [ANNOUNCE] Apache Phoenix 4.13 released

2017-11-24 Thread James Taylor
> kpalaniap...@marinsoftware.com> wrote: > >> @Jmaes, are you still planning to release 4.13HBase1.2? >> >> On Sun, Nov 19, 2017 at 1:21 PM, James Taylor <jamestay...@apache.org> >> wrote: >> >>> Hi Kumar, >>> I started a discussi

Re: [ANNOUNCE] Apache Phoenix 4.13 released

2017-11-19 Thread James Taylor
/70cffa798d5f21ef87b02e07aeca8c7982b0b30251411b7be17fadf9@%3Cdev.phoenix.apache.org%3E On Sun, Nov 19, 2017 at 12:23 PM, Kumar Palaniappan < kpalaniap...@marinsoftware.com> wrote: > Are there any plans to release Phoenix 4.13 compatible with HBase 1.2? > > On Sat, Nov 11, 2017 at 5:57 PM, James T

Re: sqlline.py kills all regionservers

2017-11-18 Thread James Taylor
That’s quite an old version of 0.98 and we no longer support Hadoop 1. Would it be possible for you to upgrade your cluster and use Hadoop 2 instead? On Sat, Nov 18, 2017 at 1:16 PM Eisenhut, Roman wrote: > Dear Phoenix community, > > > > I’m trying to implement

Re: 4.13.0-HBase-1.1 not released?

2017-11-18 Thread James Taylor
FYI, we'll do one final release for Phoenix on HBase 1.1 (look for a 4.13.1 release soon). It looks like HBase 1.1 itself is nearing end-of-life, so probably good to move off of it. If someone is interested in being the RM for continued Phoenix HBase 1.1 releases, please volunteer. On Mon, Nov

Please do not use 4.12.0 release

2017-11-11 Thread James Taylor
FYI, the 4.12.0 release had a critical issue [1] that has been fixed in the 4.13.0 release. Please make sure you do not use the 4.12.0 release and instead use the 4.13.0 release. Sorry for any inconvenience. More details on the release may be found here [2]. Thanks, James [1]

[ANNOUNCE] Apache Phoenix 4.13 released

2017-11-11 Thread James Taylor
The Apache Phoenix team is pleased to announce the immediate availability of the 4.13.0 release. Apache Phoenix enables SQL-based OLTP and operational analytics for Apache Hadoop using Apache HBase as its backing store and providing integration with other projects in the Apache ecosystem such as

Re: Spark & UpgradeInProgressException: Cluster is being concurrently upgraded from 4.11.x to 4.12.x

2017-11-11 Thread James Taylor
Hi Stepan, We discussed whether or not we should continue with Phoenix releases for HBase 1.1, but no one showed any interested in being the release manager [1], so we concluded that we would stop doing them. It's important to remember that the ASF is a volunteer effort and anyone can step up and

Re: Enabling Tracing makes HMaster service fail to start

2017-11-09 Thread James Taylor
Please note that we're no longer doing releases for HBase 1.2 due to lack of interest. If this is important for you, I suggest you volunteer to be RM for this branch (4.x-HBase-1.2) and make sure to catch up the branch with the latest bug fixes from our upcoming 4.13 release (in particular

Re: Cloudera parcel update

2017-11-09 Thread James Taylor
I agree with JMS and there is interest from the PMC, but no bandwidth to do the work - we’d look toward others like you to do the work of putting together an initial pull request, regular pulls to keep things in sync, RMing releases, etc. These types of contributions would earn merit toward a

Re: SELECT + ORDER BY vs self-join

2017-10-31 Thread James Taylor
Please file a JIRA and include the explain plan for each of the queries. I suspect your index is not being used in the first query due to the selection of all the columns. You can try hinting the query to force your index to be used. See

Re: Indexes not used when ordering by primary key.

2017-10-31 Thread James Taylor
sequence number is used. If this is the case, > similar optimizations could be made to choose the index that will scan > over a smaller dataset. > > On Sat, Oct 14, 2017 at 8:26 AM, James Taylor <jamestay...@apache.org> > wrote: > > Couple of follow up comments: > &g

Re: Upserting in batch into a column of all rows by concatenating multiple columns

2017-10-28 Thread James Taylor
See http://phoenix.apache.org/language/index.html#upsert_select On Sat, Oct 28, 2017 at 4:25 AM Vaghawan Ojha wrote: > Hi, > > I want to update a column's value in all rows by concatenating values from > the multiple columns of the rows. In plain sql it's possible to do

Re: Cloudera parcel update

2017-10-27 Thread James Taylor
er hand Hortonworks already keeps Phoenix within their own >>>> distribution so in this case the vendor itself sees strategic advantage on >>>> this. >>>> >>>> As long as I work for the same company I'll have to port Phoenix to >>>> CDH. So

Re: Cloudera parcel update

2017-10-26 Thread James Taylor
This is great, Pedro. Thanks so much for porting everything over. Would it make sense to try to have a CDH compatible release with each Phoenix release? Who would sign up to do this? Same question for HDP releases. Thanks, James On Thu, Oct 26, 2017 at 2:43 PM, Pedro Boado

Re: Load HFiles in Apache Phoenix

2017-10-20 Thread James Taylor
If you put together a nice example, we can post a link to it from the FAQ. Sorry, but with open source, the answer is often "go look at the source code". :-) On Fri, Oct 20, 2017 at 2:13 PM, snhir...@gmail.com <snhir...@gmail.com> wrote: > > > On 2017-10-20 17:0

Re: Load HFiles in Apache Phoenix

2017-10-20 Thread James Taylor
Load Phoenix into Eclipse and search for references to PhoenixRuntime.getUncommittedDataIterator(). There's even a unit test does this. On Fri, Oct 20, 2017 at 2:04 PM, snhir...@gmail.com <snhir...@gmail.com> wrote: > > > On 2017-10-20 16:49, James Taylor <jamestay...

Re: Load HFiles in Apache Phoenix

2017-10-20 Thread James Taylor
Here's a little more info: https://phoenix.apache.org/faq.html#Why_empty_key_value Lot's of hits here too: http://search-hadoop.com/?project=Phoenix=empty+key+value On Fri, Oct 20, 2017 at 1:45 PM, sn5 wrote: > It would be very helpful to see a complete, working example

Re: Async get

2017-10-20 Thread James Taylor
eing >> able to run mapping functions, aggregations, etc without needing >> coprocessors would be a big win. If Hbase doesn’t do it, the next thing >> will. >> >> On Oct 5, 2017, at 11:31 AM, James Taylor <jamestay...@apache.org> wrote: >> >> I do

  1   2   3   4   5   6   7   >