[jira] [Commented] (PHOENIX-2565) Store data for immutable tables in single KeyValue

2016-09-18 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15502249#comment-15502249
 ] 

James Taylor commented on PHOENIX-2565:
---

Thanks for the explanation, [~tdsilva]. Would you mind filing a follow up JIRA 
to get rid of ReplaceArrayColumnWithKeyValueColumnExpressionVisitor with the 
above explanation?

+1 to commit to encodecolumns branch. 

> Store data for immutable tables in single KeyValue
> --
>
> Key: PHOENIX-2565
> URL: https://issues.apache.org/jira/browse/PHOENIX-2565
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: James Taylor
>Assignee: Thomas D'Silva
> Fix For: 4.9.0
>
> Attachments: PHOENIX-2565-wip.patch, PHOENIX-2565.patch
>
>
> Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never 
> update a column value, it'd be more efficient to store all column values for 
> a row in a single KeyValue. We could use the existing format we have for 
> variable length arrays.
> For backward compatibility, we'd need to support the current mechanism. Also, 
> you'd no longer be allowed to transition an existing table to/from being 
> immutable. I think the best approach would be to introduce a new IMMUTABLE 
> keyword and use it like this:
> {code}
> CREATE IMMUTABLE TABLE ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3294) Support approximate COUNT(*) by using stats.

2016-09-18 Thread Lars Hofhansl (JIRA)
Lars Hofhansl created PHOENIX-3294:
--

 Summary: Support approximate COUNT(*) by using stats.
 Key: PHOENIX-3294
 URL: https://issues.apache.org/jira/browse/PHOENIX-3294
 Project: Phoenix
  Issue Type: Sub-task
Reporter: Lars Hofhansl






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PHOENIX-3225) Distinct Queries are slower than expected at scale.

2016-09-18 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved PHOENIX-3225.

Resolution: Duplicate

> Distinct Queries are slower than expected at scale.
> ---
>
> Key: PHOENIX-3225
> URL: https://issues.apache.org/jira/browse/PHOENIX-3225
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>
> In our large scale tests we found that we can easily sort 400G on a few 100 
> machines, but that a simple DISTINCT would just time out. Perhaps that's 
> expected as we have to keep track of the unique values, but we should 
> investigate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2565) Store data for immutable tables in single KeyValue

2016-09-18 Thread Thomas D'Silva (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501963#comment-15501963
 ] 

Thomas D'Silva commented on PHOENIX-2565:
-

Thanks for the feedback. 

ReplaceArrayColumnWithKeyValueColumnExpressionVisitor is only used in one place 
in IndexUtil.generateIndexData because we use a ValueGetter to get the value of 
the data table column using the original data table column reference. This is 
also why ArrayColumnExpression needs to keep track of the original key value 
column expression. 
If we don't replace the array column expression with the original column 
expression when it looks up the column by the qualifier it won't find it. 
I will make the other changes you suggested.

{code}
ValueGetter valueGetter = new ValueGetter() {

@Override
public byte[] getRowKey() {
return dataMutation.getRow();
}

@Override
public ImmutableBytesWritable 
getLatestValue(ColumnReference ref) {
// Always return null for our empty key value, as 
this will cause the index
// maintainer to always treat this Put as a new row.
if (isEmptyKeyValue(table, ref)) {
return null;
}
byte[] family = ref.getFamily();
byte[] qualifier = ref.getQualifier();
RowMutationState rowMutationState = 
valuesMap.get(ptr);
PColumn column = null;
try {
column = 
table.getColumnFamily(family).getPColumnForColumnQualifier(qualifier);
} catch (ColumnNotFoundException e) {
} catch (ColumnFamilyNotFoundException e) {
}
if (rowMutationState!=null && column!=null) {
byte[] value = 
rowMutationState.getColumnValues().get(column);
ImmutableBytesPtr ptr = new ImmutableBytesPtr();
ptr.set(value==null ? ByteUtil.EMPTY_BYTE_ARRAY 
: value);
SchemaUtil.padData(table.getName().getString(), 
column, ptr);
return ptr;
}
return null;
}

};
{code}

> Store data for immutable tables in single KeyValue
> --
>
> Key: PHOENIX-2565
> URL: https://issues.apache.org/jira/browse/PHOENIX-2565
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: James Taylor
>Assignee: Thomas D'Silva
> Fix For: 4.9.0
>
> Attachments: PHOENIX-2565-wip.patch, PHOENIX-2565.patch
>
>
> Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never 
> update a column value, it'd be more efficient to store all column values for 
> a row in a single KeyValue. We could use the existing format we have for 
> variable length arrays.
> For backward compatibility, we'd need to support the current mechanism. Also, 
> you'd no longer be allowed to transition an existing table to/from being 
> immutable. I think the best approach would be to introduce a new IMMUTABLE 
> keyword and use it like this:
> {code}
> CREATE IMMUTABLE TABLE ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3118) Increase default value of hbase.client.scanner.max.result.size

2016-09-18 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501783#comment-15501783
 ] 

Lars Hofhansl commented on PHOENIX-3118:


Generally I would very much discourage to generally increase max result size.

> Increase default value of hbase.client.scanner.max.result.size
> --
>
> Key: PHOENIX-3118
> URL: https://issues.apache.org/jira/browse/PHOENIX-3118
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: James Taylor
> Fix For: 4.9.0, 4.8.2
>
>
> See parent JIRA for a discussion on how to handle partial scan results. An 
> easy workaround would be to increase the 
> {{hbase.client.scanner.max.result.size}} above the default 2MB limit. In 
> combination with this, we could detect in BaseScannerRegionObserver.nextRaw() 
> if partial results are being returned and throw an exception. Silently 
> ignoring this is bad because it can lead to incorrect query results as 
> demonstrated by the parent JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-2565) Store data for immutable tables in single KeyValue

2016-09-18 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501271#comment-15501271
 ] 

James Taylor commented on PHOENIX-2565:
---

Also, more of a note to [~samarthjain], but upon further thinking, encoded 
columns will work fine for transactional tables, so any special cases around 
those should be removed:
{code}
 public static boolean setMinMaxQualifiersOnScan(PTable table) {
-return EncodedColumnsUtil.usesEncodedColumnNames(table) && 
!table.isTransactional() && !hasDynamicColumns(table);
+return table.getStorageScheme() != null && table.getStorageScheme() == 
StorageScheme.ENCODED_COLUMN_NAMES
+   && !table.isTransactional() && 
!hasDynamicColumns(table);
{code}

> Store data for immutable tables in single KeyValue
> --
>
> Key: PHOENIX-2565
> URL: https://issues.apache.org/jira/browse/PHOENIX-2565
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: James Taylor
>Assignee: Thomas D'Silva
> Fix For: 4.9.0
>
> Attachments: PHOENIX-2565-wip.patch, PHOENIX-2565.patch
>
>
> Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never 
> update a column value, it'd be more efficient to store all column values for 
> a row in a single KeyValue. We could use the existing format we have for 
> variable length arrays.
> For backward compatibility, we'd need to support the current mechanism. Also, 
> you'd no longer be allowed to transition an existing table to/from being 
> immutable. I think the best approach would be to introduce a new IMMUTABLE 
> keyword and use it like this:
> {code}
> CREATE IMMUTABLE TABLE ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-476) Support declaration of DEFAULT in CREATE statement

2016-09-18 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15501243#comment-15501243
 ] 

James Taylor commented on PHOENIX-476:
--

Thanks for chiming in, [~julianhyde]. So we can still implement DEFAULT without 
storing the default value for non PK columns, [~kliew]. This is important as 
for a multi billion row table, it will save a lot of space.

Instead of using using CoalesceFunction during the wrapping, we can create a 
new Expression and register it in ExpressionType, called 
DefaultValueExpression. The evaluate method can distinguish between the case 
where there's no value for the column (i.e. colRefChildExpression.evaluate 
returns false) versus it returning a null value (i.e. 
colRefChildExpression.evaluate returns true but ptr.getLength() is zero). We'd 
want to force STORE_NULLS=true when a table is created with default values. See 
MetaDataClient.createTableInternal() and how we force this option to true for 
transactional tables - we'd want to do something similar here if any of the 
columns for the new table define a DEFAULT. With this option in place, Phoenix 
will store an empty byte array as the column value instead of issuing an HBase 
delete.

The evaluate method of DefaultValueExpression would look something like this 
(just one very subtle change from the CoalesceFunction implementation):
{code}
@Override
public boolean evaluate(Tuple tuple, ImmutableBytesWritable ptr) {
boolean evaluated = children.get(0).evaluate(tuple, ptr);
if (evaluated) {
// Will potentially evaluate to null without evaluating the second 
expression
return true;
}
if (tuple.isImmutable()) { // This is true if it's the last time an 
evaluation is happening on the row
Expression secondChild = children.get(1);
if (secondChild.evaluate(tuple, ptr)) {
// Coerce the type of the second child to the type of the first 
child
getDataType().coerceBytes(ptr, secondChild.getDataType(), 
secondChild.getSortOrder(), getSortOrder());
return true;
}
}
return false;
}
{code}

> Support declaration of DEFAULT in CREATE statement
> --
>
> Key: PHOENIX-476
> URL: https://issues.apache.org/jira/browse/PHOENIX-476
> Project: Phoenix
>  Issue Type: Task
>Affects Versions: 3.0-Release
>Reporter: James Taylor
>Assignee: Kevin Liew
>  Labels: enhancement
> Attachments: PHOENIX-476.2.patch, PHOENIX-476.patch
>
>
> Support the declaration of a default value in the CREATE TABLE/VIEW statement 
> like this:
> CREATE TABLE Persons (
> Pid int NOT NULL PRIMARY KEY,
> LastName varchar(255) NOT NULL,
> FirstName varchar(255),
> Address varchar(255),
> City varchar(255) DEFAULT 'Sandnes'
> )
> To implement this, we'd need to:
> 1. add a new DEFAULT_VALUE key value column in SYSTEM.TABLE and pass through 
> the value when the table is created (in MetaDataClient).
> 2. always set NULLABLE to ResultSetMetaData.columnNoNulls if a default value 
> is present, since the column will never be null.
> 3. add a getDefaultValue() accessor in PColumn
> 4.  for a row key column, during UPSERT use the default value if no value was 
> specified for that column. This could be done in the PTableImpl.newKey method.
> 5.  for a key value column with a default value, we can get away without 
> incurring any storage cost. Although a little bit of extra effort than if we 
> persisted the default value on an UPSERT for key value columns, this approach 
> has the benefit of not incurring any storage cost for a default value.
> * serialize any default value into KeyValueColumnExpression
> * in the evaluate method of KeyValueColumnExpression, conditionally use 
> the default value if the column value is not present. If doing partial 
> evaluation, you should not yet return the default value, as we may not have 
> encountered the the KeyValue for the column yet (since a filter evaluates 
> each time it sees each KeyValue, and there may be more than one KeyValue 
> referenced in the expression). Partial evaluation is determined by calling 
> Tuple.isImmutable(), where false means it is NOT doing partial evaluation, 
> while true means it is.
> * modify EvaluateOnCompletionVisitor by adding a visitor method for 
> RowKeyColumnExpression and KeyValueColumnExpression to set 
> evaluateOnCompletion to true if they have a default value specified. This 
> will cause filter evaluation to execute one final time after all KeyValues 
> for a row have been seen, since it's at this time we know we should use the 
> default value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)