[GitHub] phoenix pull request: Phoenix 180
Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/phoenix/pull/8#discussion_r16943000 --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelIteratorRegionSplitter.java --- @@ -138,14 +146,10 @@ public boolean apply(HRegionLocation location) { //split each region in s splits such that: //s = max(x) where s * x t // -// The idea is to align splits with region boundaries. If rows are not evenly -// distributed across regions, using this scheme compensates for regions that -// have more rows than others, by applying tighter splits and therefore spawning -// off more scans over the overloaded regions. -int splitsPerRegion = getSplitsPerRegion(regions.size()); // Create a multi-map of ServerName to ListKeyRange which we'll use to round robin from to ensure // that we keep each region server busy for each query. -ListMultimapHRegionLocation,KeyRange keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion);; +int splitsPerRegion = getSplitsPerRegion(regions.size()); +ListMultimapHRegionLocation,KeyRange keyRangesPerRegion = ArrayListMultimap.create(regions.size(),regions.size() * splitsPerRegion); if (splitsPerRegion == 1) { for (HRegionLocation region : regions) { --- End diff -- Ok.. So in that case the stats wil be associated with the table directly. For now I will first finish the case so that the PTable has a PColumnFamily and the stats (guidePosts) are part of this PcolumnFamily. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (PHOENIX-476) Support declaration of DEFAULT in CREATE statement
[ https://issues.apache.org/jira/browse/PHOENIX-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117125#comment-14117125 ] Gabriel Reid commented on PHOENIX-476: -- [~jamestaylor] yes, definitely looks useful for bulk loading scenarios. I'll try to take a look at this one in the next little big, but I won't be able to get to it right away. It looks like the hard part (i.e. figuring out how to actually do it) is pretty much done, so hopefully the impl work shouldn't be too bad. Support declaration of DEFAULT in CREATE statement -- Key: PHOENIX-476 URL: https://issues.apache.org/jira/browse/PHOENIX-476 Project: Phoenix Issue Type: Task Affects Versions: 3.0-Release Reporter: James Taylor Labels: enhancement Support the declaration of a default value in the CREATE TABLE/VIEW statement like this: CREATE TABLE Persons ( Pid int NOT NULL PRIMARY KEY, LastName varchar(255) NOT NULL, FirstName varchar(255), Address varchar(255), City varchar(255) DEFAULT 'Sandnes' ) To implement this, we'd need to: 1. add a new DEFAULT_VALUE key value column in SYSTEM.TABLE and pass through the value when the table is created (in MetaDataClient). 2. always set NULLABLE to ResultSetMetaData.columnNoNulls if a default value is present, since the column will never be null. 3. add a getDefaultValue() accessor in PColumn 4. for a row key column, during UPSERT use the default value if no value was specified for that column. This could be done in the PTableImpl.newKey method. 5. for a key value column with a default value, we can get away without incurring any storage cost. Although a little bit of extra effort than if we persisted the default value on an UPSERT for key value columns, this approach has the benefit of not incurring any storage cost for a default value. * serialize any default value into KeyValueColumnExpression * in the evaluate method of KeyValueColumnExpression, conditionally use the default value if the column value is not present. If doing partial evaluation, you should not yet return the default value, as we may not have encountered the the KeyValue for the column yet (since a filter evaluates each time it sees each KeyValue, and there may be more than one KeyValue referenced in the expression). Partial evaluation is determined by calling Tuple.isImmutable(), where false means it is NOT doing partial evaluation, while true means it is. * modify EvaluateOnCompletionVisitor by adding a visitor method for RowKeyColumnExpression and KeyValueColumnExpression to set evaluateOnCompletion to true if they have a default value specified. This will cause filter evaluation to execute one final time after all KeyValues for a row have been seen, since it's at this time we know we should use the default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
[ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117194#comment-14117194 ] Hudson commented on PHOENIX-1203: - ABORTED: Integrated in Phoenix | 4.0 | Hadoop2 #83 (See [https://builds.apache.org/job/Phoenix-4.0-hadoop2/83/]) PHOENIX-1203 Uable to work for count (distinct col) queries via phoenix table with secondary indexes. (anoopsamjohn: rev 214a4ccf89da50c47158b71e9d17418a0c3e6fae) * phoenix-core/src/it/java/org/apache/phoenix/end2end/DistinctCountIT.java * phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeFactory.java Uable to work for count (distinct col) queries via phoenix table with secondary indexes --- Key: PHOENIX-1203 URL: https://issues.apache.org/jira/browse/PHOENIX-1203 Project: Phoenix Issue Type: Bug Affects Versions: 3.0.0 Environment: hadoop-2.2.0 hbase: Version 0.98.3-hadoop2 Reporter: Sun Fulin Assignee: Anoop Sam John Labels: distinct, secondaryIndex, test Fix For: 5.0.0, 4.2, 3.2 Attachments: PHOENIX-1203.patch I build the latest 4.1 rc0 from here: https://github.com/apache/phoenix/releases And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project: 0: jdbc:phoenix:zookeeper1 select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; +-+ | COUNT(IMSI) | +-+ | 2322| +-+ 1 row selected (70.572 seconds) As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. Here are some considerations: 1. count (distinct col) query over index table did not work as expectation. 2. only distinct query over index table works fine. 3. If the phoenix version got some wrong configuration, correct me. Thanks and Best Regards, Sun --- Hi Sun, Thanks for the detailed description. Yes, your syntax is correct, and it's definitely true that the count distinct query should return the same result with and without the index. Would you mind trying this on our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to file a JIRA? One thing that make make it easier for your testing: do you know about our NO_INDEX hint which forces the query *not* to use an index, like this: select /*+ NO_INDEX */ ... Another question too. What about this query with and with/out the index: select count(*) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 group by t.imsi; Thanks, James On Thu, Aug 21, 2014 at 10:38 PM, su...@certusnet.com.cn su...@certusnet.com.cn wrote: Hi James, Recently I got trouble while trying to conduct some query performance test in my phoenix tables with secondary indexes. I created a table called RANAPSIGNAL for my projects in phoenix via sqlline client and load data into the table. Then I create an index on the specific column PKTTIME for the table RANAPSIGNAL while including other more columns for adjusting my index query, like the following DDL: create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE ); The index creation worked successfully without any errors. So, when I am trying to conduct such query as: select count (distinct t.imsi) from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; Without secondary indexes, the final result got 536 distinct imsi, wihch is the right distinct count results. However, after I create the above secondary index PKT_IDX and reconducting the above count (distinct imsi) query, I got 2322 imsi rows which obviously are not the expected distinct counts results. I used the explain grammar to observe the scan of the above select query and found that it definitely scaned over the index table PKT_IDX. I then tried to conduct the following query with no count function: select distinct t.imsi from ranapsignal t where t.pkttime=140496480 and t.pkttime=140496569 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ; And the
[GitHub] phoenix pull request: Phoenix 180
Github user ramkrish86 commented on the pull request: https://github.com/apache/phoenix/pull/8#issuecomment-54037058 DefaultParallelIteratorsRegionSplitterIT - how about these test cases. We need the new behaviour or we need to update the test cases to get the required result? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (PHOENIX-1227) Upsert select of binary data doesn't always correctly coerce data into correct format
[ https://issues.apache.org/jira/browse/PHOENIX-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117344#comment-14117344 ] Gabriel Reid commented on PHOENIX-1227: --- [~jamestaylor] do you have an opinion on the best way to approach this? I've looked at it from a few different angles -- to me the one that makes the most sense is just to disallow {code}UPSERT INTO MYTABLE (v) SELECT MD5(v) FROM MYTABLE{code} due to datatype mismatch. Upsert select of binary data doesn't always correctly coerce data into correct format - Key: PHOENIX-1227 URL: https://issues.apache.org/jira/browse/PHOENIX-1227 Project: Phoenix Issue Type: Bug Reporter: Gabriel Reid If you run an upsert select statement that selects a binary value and writes a numerical value (or probably other types as well), you can end up with invalid binary values stored in HBase. For example, in something like this if v is an {{INTEGER}} column: {code}UPSERT INTO MYTABLE (v) SELECT MD5(v) FROM MYTABLE{code} the literal 16-byte binary values from the MD5 function will be added verbatim into the field v. This is a really big problem if v is the key field, as it can even lead to multiple keys with what appear to be the same value. This happens if there are multiple (invalid) row keys that begin with the same 4 bytes, as only the first 4 bytes of the key will be shown when selecting data from the column, but the different full-length values of the row keys will lead to multiple records. Somewhat related to this, a statement like the following (with a constant binary value) will fail immediately due to datatype mismatch: {code}UPSERT INTO MYTABLE (v) SELECT MD5(1) FROM MYTABLE{code} It seems that the first expression above should probably fail in the same way as the expression with the constant binary value (or neither of them should fail). Obviously there shouldn't be any invalid values being written in to HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] phoenix pull request: Phoenix 180
Github user JamesRTaylor commented on the pull request: https://github.com/apache/phoenix/pull/8#issuecomment-54078575 Yes, the DefaultParallelIteratorsRegionSplitterIT should be modified to test the new behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (PHOENIX-1220) NullPointerException in PArrayDataType.toObject() when baseType is CHAR or BINARY
[ https://issues.apache.org/jira/browse/PHOENIX-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117597#comment-14117597 ] Maryann Xue commented on PHOENIX-1220: -- [~jamestaylor] Yes, you are right. I should have used another toObject in my testGetSampleValue(), while getSampleValue() does not call toObject() and has no problem with that. I'll just close this issue. [~ram_krish] Was just reminded of another bug I had reported a while ago, https://github.com/forcedotcom/phoenix/issues/682;, which may or may not be related. And I just checked it against the latest code, it returns null instead of throwing NullPointerException now. Would you mind taking a look at of this one? Also, seems that it had not been transferred into our apache issue list, so you might need to create one. NullPointerException in PArrayDataType.toObject() when baseType is CHAR or BINARY - Key: PHOENIX-1220 URL: https://issues.apache.org/jira/browse/PHOENIX-1220 Project: Phoenix Issue Type: Bug Affects Versions: 5.0.0 Reporter: Maryann Xue Priority: Minor Original Estimate: 24h Remaining Estimate: 24h We now assume that for PDataType, if isFixedLength() returns true, we can use getByteSize() to get the byte array length of this type. But with BINARY and CHAR types, isFixedLength() returns true while getByteSize() returns null, and that's why we would get an NPE if we write code like: {code:title=PArrayDataType.createPhoenixArray()} if (!baseDataType.isFixedWidth()) { ... } else { int elemLength = (maxLength == null ? baseDataType.getByteSize() : maxLength); ... } {code} There are more than one occurrences of such code besides this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (PHOENIX-1220) NullPointerException in PArrayDataType.toObject() when baseType is CHAR or BINARY
[ https://issues.apache.org/jira/browse/PHOENIX-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14117597#comment-14117597 ] Maryann Xue edited comment on PHOENIX-1220 at 9/1/14 5:56 PM: -- [~jamestaylor] Yes, you are right. I should have used another toObject in my testGetSampleValue(), while getSampleValue() does not call toObject() and has no problem with that. I'll just close this issue. [~ram_krish] Was just reminded of another bug I had reported a while ago, https://github.com/forcedotcom/phoenix/issues/682;, which may or may not be related. And I just checked it against the latest code, it returns null instead of throwing NullPointerException now. Would you mind taking a look at this one? Also, seems that it had not been transferred into our apache issue list, so you might need to create one. was (Author: maryannxue): [~jamestaylor] Yes, you are right. I should have used another toObject in my testGetSampleValue(), while getSampleValue() does not call toObject() and has no problem with that. I'll just close this issue. [~ram_krish] Was just reminded of another bug I had reported a while ago, https://github.com/forcedotcom/phoenix/issues/682;, which may or may not be related. And I just checked it against the latest code, it returns null instead of throwing NullPointerException now. Would you mind taking a look at of this one? Also, seems that it had not been transferred into our apache issue list, so you might need to create one. NullPointerException in PArrayDataType.toObject() when baseType is CHAR or BINARY - Key: PHOENIX-1220 URL: https://issues.apache.org/jira/browse/PHOENIX-1220 Project: Phoenix Issue Type: Bug Affects Versions: 5.0.0 Reporter: Maryann Xue Priority: Minor Original Estimate: 24h Remaining Estimate: 24h We now assume that for PDataType, if isFixedLength() returns true, we can use getByteSize() to get the byte array length of this type. But with BINARY and CHAR types, isFixedLength() returns true while getByteSize() returns null, and that's why we would get an NPE if we write code like: {code:title=PArrayDataType.createPhoenixArray()} if (!baseDataType.isFixedWidth()) { ... } else { int elemLength = (maxLength == null ? baseDataType.getByteSize() : maxLength); ... } {code} There are more than one occurrences of such code besides this one. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-852) Optimize child/parent foreign key joins
[ https://issues.apache.org/jira/browse/PHOENIX-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue updated PHOENIX-852: Attachment: 852-4.patch Updated PDataTypeTest.testGetSampleValue() Optimize child/parent foreign key joins --- Key: PHOENIX-852 URL: https://issues.apache.org/jira/browse/PHOENIX-852 Project: Phoenix Issue Type: Improvement Reporter: James Taylor Assignee: Maryann Xue Attachments: 852-2.patch, 852-3.patch, 852-4.patch, 852.patch, PHOENIX-852.patch Often times a join will occur from a child to a parent. Our current algorithm would do a full scan of one side or the other. We can do much better than that if the HashCache contains the PK (or even part of the PK) from the table being joined to. In these cases, we should drive the second scan through a skip scan on the server side. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[ANNOUNCE] Apache Phoenix 3.1 and 4.1 released
Hello everyone, On behalf of the Apache Phoenix team, I'm pleased to announce the immediate availability of our 3.1 and 4.1 releases: http://phoenix.apache.org/download.html These include many bug fixes along with support for nested/derived tables, tracing, and local indexing. For details of the release, please see our announcement here: https://blogs.apache.org/phoenix/entry/announcing_phoenix_3_1_and Regards, James
[jira] [Updated] (PHOENIX-1228) NPE in select max(c1) when c1 is a CHAR field
[ https://issues.apache.org/jira/browse/PHOENIX-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated PHOENIX-1228: Affects Version/s: 4.1 3.1 Fix Version/s: 3.2 4.2 NPE in select max(c1) when c1 is a CHAR field --- Key: PHOENIX-1228 URL: https://issues.apache.org/jira/browse/PHOENIX-1228 Project: Phoenix Issue Type: Bug Affects Versions: 3.1, 4.1 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 4.2, 3.2 Reported by MaryAnn in https://github.com/forcedotcom/phoenix/issues/682. Need to see if this still causes NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-1228) NPE in select max(c1) when c1 is a CHAR field
ramkrishna.s.vasudevan created PHOENIX-1228: --- Summary: NPE in select max(c1) when c1 is a CHAR field Key: PHOENIX-1228 URL: https://issues.apache.org/jira/browse/PHOENIX-1228 Project: Phoenix Issue Type: Bug Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Reported by MaryAnn in https://github.com/forcedotcom/phoenix/issues/682. Need to see if this still causes NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Phoenix Mail archives search on search-hadoop.com
Hi All, For years, I have been using this nice user friendly search-hadoop.com website to search for mail threads related to Hadoop projects. It would be great if Phoenix can be also be added on that website. AFAIK, this website is owned by Sematext. At present, searching for mail thread of Phoenix seems to be crude way of using http://mail-archives.apache.org This is just a suggestion. I dont have the know-how of how to get a Project added to that website. -- Thanks Regards, Anil Gupta
[ANNOUNCE] Rajeshbabu Chintaguntla added as Apache Phoenix committer
On behalf of the Apache Phoenix PMC, I'm pleased to announce that Rajeshbabu Chintaguntla has been added as a committer to the Apache Phoenix project. He's responsible for adding local indexing[1] to our recent 4.1 release, a complementary secondary index strategy to global indexing for write-heavy, space constrained use cases in which the index data co-resides with the table data on the same region server through a custom load balancer. Excellent work, Rajeshbabu - looking forward to your continued contributions. Regards, James [1] http://phoenix.apache.org/secondary_indexing.html#Local_Indexing
Re: Phoenix Mail archives search on search-hadoop.com
+1. Anyone have any connections that'll help us get added? James On Mon, Sep 1, 2014 at 10:24 PM, anil gupta anilgupt...@gmail.com wrote: Hi All, For years, I have been using this nice user friendly search-hadoop.com website to search for mail threads related to Hadoop projects. It would be great if Phoenix can be also be added on that website. AFAIK, this website is owned by Sematext. At present, searching for mail thread of Phoenix seems to be crude way of using http://mail-archives.apache.org This is just a suggestion. I dont have the know-how of how to get a Project added to that website. -- Thanks Regards, Anil Gupta
[ANNOUNCE] Ravi Magham added as Apache Phoenix committer
On behalf of the Apache Phoenix PMC, I'm pleased to announce that Ravi Magham has been added as a committer to the Apache Phoenix project. He's been the force behind much of our integration with other Apache projects such as the Pig Loader [1], our Flume plugin [2], and the work-in-progress Sqoop integration. Great job, Ravi. Looking forward to many more contributions! Regards, James [1] http://phoenix.apache.org/pig_integration.html#Pig_Loader [2] http://phoenix.apache.org/flume.html
Re: Phoenix Mail archives search on search-hadoop.com
Looks like there's a form you can fill in to request Phoenix be added - you need to fill it in under Other here: https://docs.google.com/a/sematext.com/spreadsheet/viewform?formkey=dFlVbUNxOHR6UFlQem5XeGIzTjV6Qmc6MQ You can also mail to spm-supp...@sematext.com directly. Thanks, James On Mon, Sep 1, 2014 at 10:35 PM, James Taylor jamestay...@apache.org wrote: +1. Anyone have any connections that'll help us get added? James On Mon, Sep 1, 2014 at 10:24 PM, anil gupta anilgupt...@gmail.com wrote: Hi All, For years, I have been using this nice user friendly search-hadoop.com website to search for mail threads related to Hadoop projects. It would be great if Phoenix can be also be added on that website. AFAIK, this website is owned by Sematext. At present, searching for mail thread of Phoenix seems to be crude way of using http://mail-archives.apache.org This is just a suggestion. I dont have the know-how of how to get a Project added to that website. -- Thanks Regards, Anil Gupta
[GitHub] phoenix pull request: Phoenix 180
Github user ramkrish86 commented on a diff in the pull request: https://github.com/apache/phoenix/pull/8#discussion_r16971170 --- Diff: phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java --- @@ -664,14 +698,161 @@ private PTable getTable(RegionScanner scanner, long clientTimeStamp, long tableT } else { addColumnToTable(results, colName, famName, colKeyValues, columns, saltBucketNum != null); } +} +byte[] tenIdBytes = QueryConstants.EMPTY_BYTE_ARRAY; +if (tenantId != null) { +tenIdBytes = tenantId.getBytes(); } - +byte[] schNameInBytes = QueryConstants.EMPTY_BYTE_ARRAY; +if (schemaName != null) { +schNameInBytes = Bytes.toBytes(schemaName.getString()); +} +PTableStats stats = updateStats(SchemaUtil.getTableKey(tenIdBytes, schNameInBytes, tableNameBytes)); return PTableImpl.makePTable(tenantId, schemaName, tableName, tableType, indexState, timeStamp, tableSeqNum, pkName, saltBucketNum, columns, tableType == INDEX ? dataTableName : null, indexes, isImmutableRows, physicalTables, defaultFamilyName, viewStatement, disableWAL, -multiTenant, viewType, viewIndexId, indexType); +multiTenant, viewType, viewIndexId, indexType, stats); } +private PTableStats updateStats(final byte[] tableNameBytes) { +lock.readLock().lock(); +try { +PTableStats stats = tableStatsMap.get(Bytes.toString(tableNameBytes)); +return stats; +} finally { +lock.readLock().unlock(); +} +} + +private void updateStatsInternal(byte[] tableNameBytes, RegionCoprocessorEnvironment env) --- End diff -- That's right. It is the best way to do it. One more thing is inorder to retrive the latest time when the stats got updated so that based on that we don issue concurrent update stats query, should we do that in MetaDataClient with a select query or should be a normal get from the HTable? As a measure of good practice am asking this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---