[jira] [Commented] (PHOENIX-2565) Store data for immutable tables in single KeyValue
[ https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502249#comment-15502249 ] James Taylor commented on PHOENIX-2565: --- Thanks for the explanation, [~tdsilva]. Would you mind filing a follow up JIRA to get rid of ReplaceArrayColumnWithKeyValueColumnExpressionVisitor with the above explanation? +1 to commit to encodecolumns branch. > Store data for immutable tables in single KeyValue > -- > > Key: PHOENIX-2565 > URL: https://issues.apache.org/jira/browse/PHOENIX-2565 > Project: Phoenix > Issue Type: Improvement >Reporter: James Taylor >Assignee: Thomas D'Silva > Fix For: 4.9.0 > > Attachments: PHOENIX-2565-wip.patch, PHOENIX-2565.patch > > > Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never > update a column value, it'd be more efficient to store all column values for > a row in a single KeyValue. We could use the existing format we have for > variable length arrays. > For backward compatibility, we'd need to support the current mechanism. Also, > you'd no longer be allowed to transition an existing table to/from being > immutable. I think the best approach would be to introduce a new IMMUTABLE > keyword and use it like this: > {code} > CREATE IMMUTABLE TABLE ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-3294) Support approximate COUNT(*) by using stats.
Lars Hofhansl created PHOENIX-3294: -- Summary: Support approximate COUNT(*) by using stats. Key: PHOENIX-3294 URL: https://issues.apache.org/jira/browse/PHOENIX-3294 Project: Phoenix Issue Type: Sub-task Reporter: Lars Hofhansl -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-3225) Distinct Queries are slower than expected at scale.
[ https://issues.apache.org/jira/browse/PHOENIX-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl resolved PHOENIX-3225. Resolution: Duplicate > Distinct Queries are slower than expected at scale. > --- > > Key: PHOENIX-3225 > URL: https://issues.apache.org/jira/browse/PHOENIX-3225 > Project: Phoenix > Issue Type: Sub-task >Reporter: Lars Hofhansl > > In our large scale tests we found that we can easily sort 400G on a few 100 > machines, but that a simple DISTINCT would just time out. Perhaps that's > expected as we have to keep track of the unique values, but we should > investigate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2565) Store data for immutable tables in single KeyValue
[ https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501963#comment-15501963 ] Thomas D'Silva commented on PHOENIX-2565: - Thanks for the feedback. ReplaceArrayColumnWithKeyValueColumnExpressionVisitor is only used in one place in IndexUtil.generateIndexData because we use a ValueGetter to get the value of the data table column using the original data table column reference. This is also why ArrayColumnExpression needs to keep track of the original key value column expression. If we don't replace the array column expression with the original column expression when it looks up the column by the qualifier it won't find it. I will make the other changes you suggested. {code} ValueGetter valueGetter = new ValueGetter() { @Override public byte[] getRowKey() { return dataMutation.getRow(); } @Override public ImmutableBytesWritable getLatestValue(ColumnReference ref) { // Always return null for our empty key value, as this will cause the index // maintainer to always treat this Put as a new row. if (isEmptyKeyValue(table, ref)) { return null; } byte[] family = ref.getFamily(); byte[] qualifier = ref.getQualifier(); RowMutationState rowMutationState = valuesMap.get(ptr); PColumn column = null; try { column = table.getColumnFamily(family).getPColumnForColumnQualifier(qualifier); } catch (ColumnNotFoundException e) { } catch (ColumnFamilyNotFoundException e) { } if (rowMutationState!=null && column!=null) { byte[] value = rowMutationState.getColumnValues().get(column); ImmutableBytesPtr ptr = new ImmutableBytesPtr(); ptr.set(value==null ? ByteUtil.EMPTY_BYTE_ARRAY : value); SchemaUtil.padData(table.getName().getString(), column, ptr); return ptr; } return null; } }; {code} > Store data for immutable tables in single KeyValue > -- > > Key: PHOENIX-2565 > URL: https://issues.apache.org/jira/browse/PHOENIX-2565 > Project: Phoenix > Issue Type: Improvement >Reporter: James Taylor >Assignee: Thomas D'Silva > Fix For: 4.9.0 > > Attachments: PHOENIX-2565-wip.patch, PHOENIX-2565.patch > > > Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never > update a column value, it'd be more efficient to store all column values for > a row in a single KeyValue. We could use the existing format we have for > variable length arrays. > For backward compatibility, we'd need to support the current mechanism. Also, > you'd no longer be allowed to transition an existing table to/from being > immutable. I think the best approach would be to introduce a new IMMUTABLE > keyword and use it like this: > {code} > CREATE IMMUTABLE TABLE ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3118) Increase default value of hbase.client.scanner.max.result.size
[ https://issues.apache.org/jira/browse/PHOENIX-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501783#comment-15501783 ] Lars Hofhansl commented on PHOENIX-3118: Generally I would very much discourage to generally increase max result size. > Increase default value of hbase.client.scanner.max.result.size > -- > > Key: PHOENIX-3118 > URL: https://issues.apache.org/jira/browse/PHOENIX-3118 > Project: Phoenix > Issue Type: Sub-task >Reporter: James Taylor > Fix For: 4.9.0, 4.8.2 > > > See parent JIRA for a discussion on how to handle partial scan results. An > easy workaround would be to increase the > {{hbase.client.scanner.max.result.size}} above the default 2MB limit. In > combination with this, we could detect in BaseScannerRegionObserver.nextRaw() > if partial results are being returned and throw an exception. Silently > ignoring this is bad because it can lead to incorrect query results as > demonstrated by the parent JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-2565) Store data for immutable tables in single KeyValue
[ https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501271#comment-15501271 ] James Taylor commented on PHOENIX-2565: --- Also, more of a note to [~samarthjain], but upon further thinking, encoded columns will work fine for transactional tables, so any special cases around those should be removed: {code} public static boolean setMinMaxQualifiersOnScan(PTable table) { -return EncodedColumnsUtil.usesEncodedColumnNames(table) && !table.isTransactional() && !hasDynamicColumns(table); +return table.getStorageScheme() != null && table.getStorageScheme() == StorageScheme.ENCODED_COLUMN_NAMES + && !table.isTransactional() && !hasDynamicColumns(table); {code} > Store data for immutable tables in single KeyValue > -- > > Key: PHOENIX-2565 > URL: https://issues.apache.org/jira/browse/PHOENIX-2565 > Project: Phoenix > Issue Type: Improvement >Reporter: James Taylor >Assignee: Thomas D'Silva > Fix For: 4.9.0 > > Attachments: PHOENIX-2565-wip.patch, PHOENIX-2565.patch > > > Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never > update a column value, it'd be more efficient to store all column values for > a row in a single KeyValue. We could use the existing format we have for > variable length arrays. > For backward compatibility, we'd need to support the current mechanism. Also, > you'd no longer be allowed to transition an existing table to/from being > immutable. I think the best approach would be to introduce a new IMMUTABLE > keyword and use it like this: > {code} > CREATE IMMUTABLE TABLE ... > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-476) Support declaration of DEFAULT in CREATE statement
[ https://issues.apache.org/jira/browse/PHOENIX-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501243#comment-15501243 ] James Taylor commented on PHOENIX-476: -- Thanks for chiming in, [~julianhyde]. So we can still implement DEFAULT without storing the default value for non PK columns, [~kliew]. This is important as for a multi billion row table, it will save a lot of space. Instead of using using CoalesceFunction during the wrapping, we can create a new Expression and register it in ExpressionType, called DefaultValueExpression. The evaluate method can distinguish between the case where there's no value for the column (i.e. colRefChildExpression.evaluate returns false) versus it returning a null value (i.e. colRefChildExpression.evaluate returns true but ptr.getLength() is zero). We'd want to force STORE_NULLS=true when a table is created with default values. See MetaDataClient.createTableInternal() and how we force this option to true for transactional tables - we'd want to do something similar here if any of the columns for the new table define a DEFAULT. With this option in place, Phoenix will store an empty byte array as the column value instead of issuing an HBase delete. The evaluate method of DefaultValueExpression would look something like this (just one very subtle change from the CoalesceFunction implementation): {code} @Override public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) { boolean evaluated = children.get(0).evaluate(tuple, ptr); if (evaluated) { // Will potentially evaluate to null without evaluating the second expression return true; } if (tuple.isImmutable()) { // This is true if it's the last time an evaluation is happening on the row Expression secondChild = children.get(1); if (secondChild.evaluate(tuple, ptr)) { // Coerce the type of the second child to the type of the first child getDataType().coerceBytes(ptr, secondChild.getDataType(), secondChild.getSortOrder(), getSortOrder()); return true; } } return false; } {code} > Support declaration of DEFAULT in CREATE statement > -- > > Key: PHOENIX-476 > URL: https://issues.apache.org/jira/browse/PHOENIX-476 > Project: Phoenix > Issue Type: Task >Affects Versions: 3.0-Release >Reporter: James Taylor >Assignee: Kevin Liew > Labels: enhancement > Attachments: PHOENIX-476.2.patch, PHOENIX-476.patch > > > Support the declaration of a default value in the CREATE TABLE/VIEW statement > like this: > CREATE TABLE Persons ( > Pid int NOT NULL PRIMARY KEY, > LastName varchar(255) NOT NULL, > FirstName varchar(255), > Address varchar(255), > City varchar(255) DEFAULT 'Sandnes' > ) > To implement this, we'd need to: > 1. add a new DEFAULT_VALUE key value column in SYSTEM.TABLE and pass through > the value when the table is created (in MetaDataClient). > 2. always set NULLABLE to ResultSetMetaData.columnNoNulls if a default value > is present, since the column will never be null. > 3. add a getDefaultValue() accessor in PColumn > 4. for a row key column, during UPSERT use the default value if no value was > specified for that column. This could be done in the PTableImpl.newKey method. > 5. for a key value column with a default value, we can get away without > incurring any storage cost. Although a little bit of extra effort than if we > persisted the default value on an UPSERT for key value columns, this approach > has the benefit of not incurring any storage cost for a default value. > * serialize any default value into KeyValueColumnExpression > * in the evaluate method of KeyValueColumnExpression, conditionally use > the default value if the column value is not present. If doing partial > evaluation, you should not yet return the default value, as we may not have > encountered the the KeyValue for the column yet (since a filter evaluates > each time it sees each KeyValue, and there may be more than one KeyValue > referenced in the expression). Partial evaluation is determined by calling > Tuple.isImmutable(), where false means it is NOT doing partial evaluation, > while true means it is. > * modify EvaluateOnCompletionVisitor by adding a visitor method for > RowKeyColumnExpression and KeyValueColumnExpression to set > evaluateOnCompletion to true if they have a default value specified. This > will cause filter evaluation to execute one final time after all KeyValues > for a row have been seen, since it's at this time we know we should use the > default value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)