Phoenix 5.0 + Cloudera CDH 6.0 Integration

2018-09-13 Thread Curtis Howard
Hi all,

Is there anyone working towards Phoenix 5.0 / Cloudera (CDH) 6.0
integration at this point?  I could not find any related JIRA for this
after a quick search, and wanted to check here first before adding one.

If I were to attempt this myself, is there a suggested approach?  I can see
from previous 4.x-cdh5.* branches supporting these releases that the
changes for PHOENIX-4372 (https://issues.apache.org/jira/browse/PHOENIX-4372)
move the builds to CDH dependencies - for example:
https://github.com/apache/phoenix/commit/024f0f22a5929da6f095dc0025b8e899e2f0c47b

Would following the pattern of that commit (or attempting a cherry-pick)
onto the the v5.0.0-HBase-2.0 tagged release (
https://github.com/apache/phoenix/tree/v5.0.0-HBase-2.0) be a reasonable
starting point?

Thanks in advance
Curtis


[jira] [Updated] (PHOENIX-4849) UPSERT SELECT fails with stale region boundary exception after a split

2018-09-13 Thread Thomas D'Silva (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva updated PHOENIX-4849:

Attachment: SerialIterators.diff

> UPSERT SELECT fails with stale region boundary exception after a split
> --
>
> Key: PHOENIX-4849
> URL: https://issues.apache.org/jira/browse/PHOENIX-4849
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Akshita Malhotra
>Assignee: Lars Hofhansl
>Priority: Critical
> Attachments: PHOENIX-4849-complete-1.4.txt, PHOENIX-4849-fix.txt, 
> PHOENIX-4849-v2.patch, PHOENIX-4849-v3.patch, PHOENIX-4849-v4.patch, 
> PHOENIX-4849.patch, SerialIterators.diff
>
>
> UPSERT SELECT throws a StaleRegionBoundaryCacheException immediately after a 
> split. On the other hand, an upsert followed by a select for example works 
> absolutely fine
> org.apache.phoenix.schema.StaleRegionBoundaryCacheException: ERROR 1108 
> (XCL08): Cache of region boundaries are out of date.
> at 
> org.apache.phoenix.exception.SQLExceptionCode$14.newException(SQLExceptionCode.java:365)
>  at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
>  at 
> org.apache.phoenix.util.ServerUtil.parseRemoteException(ServerUtil.java:183)
>  at 
> org.apache.phoenix.util.ServerUtil.parseServerExceptionOrNull(ServerUtil.java:167)
>  at 
> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:134)
>  at 
> org.apache.phoenix.iterate.ScanningResultIterator.next(ScanningResultIterator.java:153)
>  at 
> org.apache.phoenix.iterate.TableResultIterator.next(TableResultIterator.java:228)
>  at 
> org.apache.phoenix.iterate.LookAheadResultIterator$1.advance(LookAheadResultIterator.java:47)
>  at 
> org.apache.phoenix.iterate.LookAheadResultIterator.init(LookAheadResultIterator.java:59)
>  at 
> org.apache.phoenix.iterate.LookAheadResultIterator.peek(LookAheadResultIterator.java:73)
>  at 
> org.apache.phoenix.iterate.SerialIterators$SerialIterator.nextIterator(SerialIterators.java:187)
>  at 
> org.apache.phoenix.iterate.SerialIterators$SerialIterator.currentIterator(SerialIterators.java:160)
>  at 
> org.apache.phoenix.iterate.SerialIterators$SerialIterator.peek(SerialIterators.java:218)
>  at 
> org.apache.phoenix.iterate.ConcatResultIterator.currentIterator(ConcatResultIterator.java:100)
>  at 
> org.apache.phoenix.iterate.ConcatResultIterator.next(ConcatResultIterator.java:117)
>  at 
> org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
>  at 
> org.apache.phoenix.iterate.LimitingResultIterator.next(LimitingResultIterator.java:47)
>  at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:805)
>  at 
> org.apache.phoenix.compile.UpsertCompiler.upsertSelect(UpsertCompiler.java:219)
>  at 
> org.apache.phoenix.compile.UpsertCompiler$ClientUpsertSelectMutationPlan.execute(UpsertCompiler.java:1292)
>  at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:408)
>  at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:391)
>  at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>  at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:390)
>  at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:378)
>  at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:173)
>  at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:183)
>  at 
> org.apache.phoenix.end2end.UpsertSelectAfterSplitTest.upsertSelectData1(UpsertSelectAfterSplitTest.java:109)
>  at 
> org.apache.phoenix.end2end.UpsertSelectAfterSplitTest.testUpsertSelect(UpsertSelectAfterSplitTest.java:59)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>  at 

Re: Salting based on partial rowkeys

2018-09-13 Thread Josh Elser

Ahh, I get you now.

For a composite primary key made up of columns 1 through N, you want 
similar controls to compute the value of the salt based on a sequence of 
the columns 1 through M where M <= N (instead of always on all columns).


For large numbers of salt buckets and a scan over a facet, you prune 
your search space considerably. Makes sense to me!


On 9/13/18 6:37 PM, Gerald Sangudi wrote:

In case the text formatting is lost below, I also added it as a comment in
the JIRA ticket:

https://issues.apache.org/jira/browse/PHOENIX-4757


On Thu, Sep 13, 2018 at 3:24 PM, Gerald Sangudi 
wrote:


Sorry I missed Josh's reply; I've subscribed to the dev list now.

Below is a copy-and-paste from our internal document. Thanks in advance
for your review and additional feedback on this.

Gerald



















*BackgroundWe make extensive use of multi-column rowkeys and salting
 in our different apache phoenix
deployments. We frequently perform group-by aggregations on these data
along a specific dimension that would benefit from predictably partitioning
the data along that dimension. Proposal:We propose to add table metadata to
allow schema designers to constrain salting to a subset of the rowkey,
rather than the full rowkey as it is today. This will introduce a mechanism
to partition data on a per-table basis along a single dimension without
application changes or much change to the phoenix runtime logic. We expect
this will result in substantially faster group-by’s along the salted
dimension and negligible penalties elsewhere. This feature has also been
proposed in PHOENIX-4757
 where it was pointed
out that partitioning and sorting data along different dimensions is a
common pattern in other datastores as well.Theoretically, it could cause
hotspotting when querying along the salted dimension without the leading
rowkey - that would be an anti-pattern.Usage ExampleCurrent:Schema:CREATE
TABLE relationship (id_1 BIGINT NOT NULL,id_2 BIGINT NOT NULL,other_key
BIGINT NOT NULL,val SMALLINT,CONSTRAINT pk PRIMARY KEY (id_1, id_2,
other_key))SALT_BUCKETS=60;Query:Select id_2, sum(val)From
relationshipWhere id_1 in (2,3)Group by id_2Explain:0: jdbc:phoenix:>
EXPLAIN Select id_2, sum(val) From relationship Where id_1 in (2,3) Group
by id_2
;+-++|
  PLAN| EST_BY
|+-++|
CLIENT 60-CHUNK PARALLEL 60-WAY SKIP SCAN ON 120 KEYS OVER RELATIONSHIP
[0,2] - [59,3]  | null || SERVER AGGREGATE INTO DISTINCT ROWS BY [ID_2]
   | null || CLIENT MERGE SORT
   |
null
|+-++3
rows selected (0.048 seconds)In this case, although the group by is
performed on both the client and regionserver, almost all of the actual
grouping happens on the client because the id_2’s are randomly distributed
across the regionservers. As a result, a lot of unnecessary data is
serialized to the client and grouped serially there. This can become quite
material with large resultsets.Proposed:Schema:CREATE TABLE relationship
(id_1 BIGINT NOT NULL,id_2 BIGINT NOT NULL,other_key BIGINT NOT NULL,val
SMALLINT,CONSTRAINT pk PRIMARY KEY (id_1, id_2,
other_key),SALT_BUCKETS=60,SALT_COLUMN = id_2);Query (unchanged):Select
id_2, sum(val)From relationshipWhere id_1 in (2,3)Group by id_2Explain
(unchanged)Under the proposal, the data are merely partitioned so that all
rows containing the same id_2 are on the same regionserver, the above query
will perform almost all of the grouping in parallel on the regionservers.
No special hint or changes to the query plan would be required to benefit.
Tables would need to be re-salted to take advantage of the new
functionality.Technical changes proposed to phoenix: - Create a new piece
of table-level metadata: SALT_COLUMN. SALT_COLUMN will instruct the salting
logic to generate a salt-byte based only on the specified column. If
unspecified, it will behave as it does today and default to salting the
entire rowkey. This metadata may be specified only when the table is
created and may not be modified. The specified column must be part of the
rowkey.  - Modify all callers of getSaltingByte
(byte[]
value, int offset, int length, int bucketNum) to consistently leverage the
new metadata.- Tests- DocsDesign points:One salt column vs multiple salt
columns: Based on the existing signature for getSaltingByte, it seems
simpler to only support a single SALT_COLUMN rather than multiple arbitrary

[jira] [Updated] (PHOENIX-4880) Phoenix IndexTool doesn't work on HBase2 per documentation

2018-09-13 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated PHOENIX-4880:

Fix Version/s: (was: 4.15.0)

> Phoenix IndexTool doesn't work on HBase2 per documentation
> --
>
> Key: PHOENIX-4880
> URL: https://issues.apache.org/jira/browse/PHOENIX-4880
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 5.1.0
>
> Attachments: PHOENIX-4880.001.patch
>
>
> The website documentation states that to run {{IndexTool}}, you should do:
> {code}
> $ hbase org.apache.phoenix.mapreduce.index.IndexTool
> {code}
> This ends up running the class using the phoenix-server jar which fails 
> because we have conflicting versions of commons-cli, as described by 
> HBASE-20201. Phoenix-client.jar does not have this problem as we did the 
> workaround there as well.
> {code}
> $ hadoop jar $PHOENIX_HOME/phoenix-*client.jar 
> org.apache.phoenix.mapreduce.index.IndexTool
> {code}
> Does work, however. I suppose we still want to fix phoenix-server.jar? (no 
> reason not to?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Salting based on partial rowkeys

2018-09-13 Thread Gerald Sangudi
In case the text formatting is lost below, I also added it as a comment in
the JIRA ticket:

https://issues.apache.org/jira/browse/PHOENIX-4757


On Thu, Sep 13, 2018 at 3:24 PM, Gerald Sangudi 
wrote:

> Sorry I missed Josh's reply; I've subscribed to the dev list now.
>
> Below is a copy-and-paste from our internal document. Thanks in advance
> for your review and additional feedback on this.
>
> Gerald
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *BackgroundWe make extensive use of multi-column rowkeys and salting
>  in our different apache phoenix
> deployments. We frequently perform group-by aggregations on these data
> along a specific dimension that would benefit from predictably partitioning
> the data along that dimension. Proposal:We propose to add table metadata to
> allow schema designers to constrain salting to a subset of the rowkey,
> rather than the full rowkey as it is today. This will introduce a mechanism
> to partition data on a per-table basis along a single dimension without
> application changes or much change to the phoenix runtime logic. We expect
> this will result in substantially faster group-by’s along the salted
> dimension and negligible penalties elsewhere. This feature has also been
> proposed in PHOENIX-4757
>  where it was pointed
> out that partitioning and sorting data along different dimensions is a
> common pattern in other datastores as well.Theoretically, it could cause
> hotspotting when querying along the salted dimension without the leading
> rowkey - that would be an anti-pattern.Usage ExampleCurrent:Schema:CREATE
> TABLE relationship (id_1 BIGINT NOT NULL,id_2 BIGINT NOT NULL,other_key
> BIGINT NOT NULL,val SMALLINT,CONSTRAINT pk PRIMARY KEY (id_1, id_2,
> other_key))SALT_BUCKETS=60;Query:Select id_2, sum(val)From
> relationshipWhere id_1 in (2,3)Group by id_2Explain:0: jdbc:phoenix:>
> EXPLAIN Select id_2, sum(val) From relationship Where id_1 in (2,3) Group
> by id_2
> ;+-++|
>  PLAN| EST_BY
> |+-++|
> CLIENT 60-CHUNK PARALLEL 60-WAY SKIP SCAN ON 120 KEYS OVER RELATIONSHIP
> [0,2] - [59,3]  | null || SERVER AGGREGATE INTO DISTINCT ROWS BY [ID_2]
>   | null || CLIENT MERGE SORT
>   |
> null
> |+-++3
> rows selected (0.048 seconds)In this case, although the group by is
> performed on both the client and regionserver, almost all of the actual
> grouping happens on the client because the id_2’s are randomly distributed
> across the regionservers. As a result, a lot of unnecessary data is
> serialized to the client and grouped serially there. This can become quite
> material with large resultsets.Proposed:Schema:CREATE TABLE relationship
> (id_1 BIGINT NOT NULL,id_2 BIGINT NOT NULL,other_key BIGINT NOT NULL,val
> SMALLINT,CONSTRAINT pk PRIMARY KEY (id_1, id_2,
> other_key),SALT_BUCKETS=60,SALT_COLUMN = id_2);Query (unchanged):Select
> id_2, sum(val)From relationshipWhere id_1 in (2,3)Group by id_2Explain
> (unchanged)Under the proposal, the data are merely partitioned so that all
> rows containing the same id_2 are on the same regionserver, the above query
> will perform almost all of the grouping in parallel on the regionservers.
> No special hint or changes to the query plan would be required to benefit.
> Tables would need to be re-salted to take advantage of the new
> functionality.Technical changes proposed to phoenix: - Create a new piece
> of table-level metadata: SALT_COLUMN. SALT_COLUMN will instruct the salting
> logic to generate a salt-byte based only on the specified column. If
> unspecified, it will behave as it does today and default to salting the
> entire rowkey. This metadata may be specified only when the table is
> created and may not be modified. The specified column must be part of the
> rowkey.  - Modify all callers of getSaltingByte
> (byte[]
> value, int offset, int length, int bucketNum) to consistently leverage the
> new metadata.- Tests- DocsDesign points:One salt column vs multiple salt
> columns: Based on the existing signature for getSaltingByte, it seems
> simpler to only support a single SALT_COLUMN rather than multiple arbitrary
> SALT_COLUMNS. Known use-cases are completely supported by a single
> column.Syntax:  PHOENIX-4757
>  suggests an alternate,
> less verbose syntax for defining the salt bucket. The SALT_COLUMN syntax is
> suggested 

Re: Salting based on partial rowkeys

2018-09-13 Thread Gerald Sangudi
Sorry I missed Josh's reply; I've subscribed to the dev list now.

Below is a copy-and-paste from our internal document. Thanks in advance for
your review and additional feedback on this.

Gerald



















*BackgroundWe make extensive use of multi-column rowkeys and salting
 in our different apache phoenix
deployments. We frequently perform group-by aggregations on these data
along a specific dimension that would benefit from predictably partitioning
the data along that dimension. Proposal:We propose to add table metadata to
allow schema designers to constrain salting to a subset of the rowkey,
rather than the full rowkey as it is today. This will introduce a mechanism
to partition data on a per-table basis along a single dimension without
application changes or much change to the phoenix runtime logic. We expect
this will result in substantially faster group-by’s along the salted
dimension and negligible penalties elsewhere. This feature has also been
proposed in PHOENIX-4757
 where it was pointed
out that partitioning and sorting data along different dimensions is a
common pattern in other datastores as well.Theoretically, it could cause
hotspotting when querying along the salted dimension without the leading
rowkey - that would be an anti-pattern.Usage ExampleCurrent:Schema:CREATE
TABLE relationship (id_1 BIGINT NOT NULL,id_2 BIGINT NOT NULL,other_key
BIGINT NOT NULL,val SMALLINT,CONSTRAINT pk PRIMARY KEY (id_1, id_2,
other_key))SALT_BUCKETS=60;Query:Select id_2, sum(val)From
relationshipWhere id_1 in (2,3)Group by id_2Explain:0: jdbc:phoenix:>
EXPLAIN Select id_2, sum(val) From relationship Where id_1 in (2,3) Group
by id_2
;+-++|
 PLAN| EST_BY
|+-++|
CLIENT 60-CHUNK PARALLEL 60-WAY SKIP SCAN ON 120 KEYS OVER RELATIONSHIP
[0,2] - [59,3]  | null || SERVER AGGREGATE INTO DISTINCT ROWS BY [ID_2]
  | null || CLIENT MERGE SORT
  |
null
|+-++3
rows selected (0.048 seconds)In this case, although the group by is
performed on both the client and regionserver, almost all of the actual
grouping happens on the client because the id_2’s are randomly distributed
across the regionservers. As a result, a lot of unnecessary data is
serialized to the client and grouped serially there. This can become quite
material with large resultsets.Proposed:Schema:CREATE TABLE relationship
(id_1 BIGINT NOT NULL,id_2 BIGINT NOT NULL,other_key BIGINT NOT NULL,val
SMALLINT,CONSTRAINT pk PRIMARY KEY (id_1, id_2,
other_key),SALT_BUCKETS=60,SALT_COLUMN = id_2);Query (unchanged):Select
id_2, sum(val)From relationshipWhere id_1 in (2,3)Group by id_2Explain
(unchanged)Under the proposal, the data are merely partitioned so that all
rows containing the same id_2 are on the same regionserver, the above query
will perform almost all of the grouping in parallel on the regionservers.
No special hint or changes to the query plan would be required to benefit.
Tables would need to be re-salted to take advantage of the new
functionality.Technical changes proposed to phoenix: - Create a new piece
of table-level metadata: SALT_COLUMN. SALT_COLUMN will instruct the salting
logic to generate a salt-byte based only on the specified column. If
unspecified, it will behave as it does today and default to salting the
entire rowkey. This metadata may be specified only when the table is
created and may not be modified. The specified column must be part of the
rowkey.  - Modify all callers of getSaltingByte
(byte[]
value, int offset, int length, int bucketNum) to consistently leverage the
new metadata.- Tests- DocsDesign points:One salt column vs multiple salt
columns: Based on the existing signature for getSaltingByte, it seems
simpler to only support a single SALT_COLUMN rather than multiple arbitrary
SALT_COLUMNS. Known use-cases are completely supported by a single
column.Syntax:  PHOENIX-4757
 suggests an alternate,
less verbose syntax for defining the salt bucket. The SALT_COLUMN syntax is
suggested for clarity and consistency with other Phoenix table
options.Future Enhancements (not in scope)Different aspects of the query
execution runtime could take advantage of new metadata and implied
knowledge that the data are partitioned in a predictable manner. For
example: - It could be that client side grouping is completely unnecessary
in cases where the 

[jira] [Assigned] (PHOENIX-4594) Perform binary search on guideposts during query compilation

2018-09-13 Thread Bin Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bin Shi reassigned PHOENIX-4594:


  Assignee: Bin Shi  (was: Abhishek Singh Chouhan)
Attachment: PHOENIX-4594-0913.patch

Please review. Thanks!

> Perform binary search on guideposts during query compilation
> 
>
> Key: PHOENIX-4594
> URL: https://issues.apache.org/jira/browse/PHOENIX-4594
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: James Taylor
>Assignee: Bin Shi
>Priority: Major
> Attachments: PHOENIX-4594-0913.patch
>
>
> If there are many guideposts, performance will suffer during query 
> compilation because we do a linear search of the guideposts to find the 
> intersection with the scan ranges. Instead, in 
> BaseResultIterators.getParallelScans() we should populate an array of 
> guideposts and perform a binary search. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Access Client Side Metrics for PhoenixRDD usage

2018-09-13 Thread William Shen
Hi all,

I see that LLAM-1819 had implemented client metric collection mechanism in
the PhoenixRecordReader class, which I believe is then used by
PhoenixInputFormat
and then used by PhoenixRDD,  but I am having trouble locating example of
how we can access the metric on https://phoenix.apache.org/metrics.html which
is limited only to example with the Java Client.

I understand that PHOENIX-4701 would send the metric to SYSTEM.LOG
asynchronously in 4.14, but wondering is there a way to access the metric
in 4.13?

Thanks in advance!

- Will


Re: Salting based on partial rowkeys

2018-09-13 Thread Thomas D'Silva
Gerald,

I think you missed Josh's reply here :
https://lists.apache.org/thread.html/c5145461805429622a410c23c1199d578e146a5c94511b2d5833438b@%3Cdev.phoenix.apache.org%3E

Could you explain how using a subset of the pk columns to generate the salt
byte helps with partitioning, aggregations etc?

Thanks,
Thomas

On Thu, Sep 13, 2018 at 8:32 AM, Gerald Sangudi 
wrote:

> Hi folks,
>
> Any thoughts or feedback on this?
>
> Thanks,
> Gerald
>
> On Mon, Sep 10, 2018 at 1:56 PM, Gerald Sangudi 
> wrote:
>
>> Hello folks,
>>
>> We have a requirement for salting based on partial, rather than full,
>> rowkeys. My colleague Mike Polcari has identified the requirement and
>> proposed an approach.
>>
>> I found an already-open JIRA ticket for the same issue:
>> https://issues.apache.org/jira/browse/PHOENIX-4757. I can provide more
>> details from the proposal.
>>
>> The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N, whereas Mike
>> proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .
>>
>> The benefit at issue is that users gain more control over partitioning,
>> and this can be used to push some additional aggregations and hash joins
>> down to region servers.
>>
>> I would appreciate any go-ahead / thoughts / guidance / objections /
>> feedback. I'd like to be sure that the concept at least is not
>> objectionable. We would like to work on this and submit a patch down the
>> road. I'll also add a note to the JIRA ticket.
>>
>> Thanks,
>> Gerald
>>
>>
>


Re: Salting based on partial rowkeys

2018-09-13 Thread Gerald Sangudi
Hi folks,

Any thoughts or feedback on this?

Thanks,
Gerald

On Mon, Sep 10, 2018 at 1:56 PM, Gerald Sangudi 
wrote:

> Hello folks,
>
> We have a requirement for salting based on partial, rather than full,
> rowkeys. My colleague Mike Polcari has identified the requirement and
> proposed an approach.
>
> I found an already-open JIRA ticket for the same issue:
> https://issues.apache.org/jira/browse/PHOENIX-4757. I can provide more
> details from the proposal.
>
> The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N, whereas Mike
> proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .
>
> The benefit at issue is that users gain more control over partitioning,
> and this can be used to push some additional aggregations and hash joins
> down to region servers.
>
> I would appreciate any go-ahead / thoughts / guidance / objections /
> feedback. I'd like to be sure that the concept at least is not
> objectionable. We would like to work on this and submit a patch down the
> road. I'll also add a note to the JIRA ticket.
>
> Thanks,
> Gerald
>
>


[jira] [Created] (PHOENIX-4904) NPE exception when use non-existing filed in function

2018-09-13 Thread Jaanai (JIRA)
Jaanai created PHOENIX-4904:
---

 Summary: NPE exception when use non-existing filed in function 
 Key: PHOENIX-4904
 URL: https://issues.apache.org/jira/browse/PHOENIX-4904
 Project: Phoenix
  Issue Type: New Feature
Affects Versions: 4.14.0, 4.13.0, 4.12.0
Reporter: Jaanai
Assignee: Jaanai


Using following SQL to reoccur error:

{code:sql}
create table "test_truncate"("ROW" varchar primary key,"f"."0" varchar,"f"."1" 
varchar);
select * from "test_truncate" order by TO_NUMBER("f.1");
{code}

Exception information:

{code}
java.lang.NullPointerException     at 
org.apache.phoenix.util.SchemaUtil.getSchemaNameFromFullName(SchemaUtil.java:632)
     at 
org.apache.phoenix.schema.TableNotFoundException.(TableNotFoundException.java:44)
     at 
org.apache.phoenix.compile.FromCompiler$MultiTableColumnResolver.resolveTable(FromCompiler.java:858)
     at 
org.apache.phoenix.compile.FromCompiler$ProjectedTableColumnResolver.resolveColumn(FromCompiler.java:984)
     at 
org.apache.phoenix.compile.ExpressionCompiler.resolveColumn(ExpressionCompiler.java:372)
     at 
org.apache.phoenix.compile.ExpressionCompiler.visit(ExpressionCompiler.java:408)
     at 
org.apache.phoenix.compile.ExpressionCompiler.visit(ExpressionCompiler.java:146)
     at 
org.apache.phoenix.parse.ColumnParseNode.accept(ColumnParseNode.java:56)     at 
org.apache.phoenix.parse.CompoundParseNode.acceptChildren(CompoundParseNode.java:64)
     at 
org.apache.phoenix.parse.FunctionParseNode.accept(FunctionParseNode.java:84)    
 at 
org.apache.phoenix.compile.OrderByCompiler.compile(OrderByCompiler.java:123)    
 at 
org.apache.phoenix.compile.QueryCompiler.compileSingleFlatQuery(QueryCompiler.java:562)
     at 
org.apache.phoenix.compile.QueryCompiler.compileSingleQuery(QueryCompiler.java:507)
     at 
org.apache.phoenix.compile.QueryCompiler.compileSelect(QueryCompiler.java:202)  
   at org.apache.phoenix.compile.QueryCompiler.compile(QueryCompiler.java:157)  
   at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:478)
     at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:444)
     a
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4904) NPE exception when use non-existing filed in function

2018-09-13 Thread Jaanai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaanai updated PHOENIX-4904:

Issue Type: Bug  (was: New Feature)

> NPE exception when use non-existing filed in function 
> --
>
> Key: PHOENIX-4904
> URL: https://issues.apache.org/jira/browse/PHOENIX-4904
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0, 4.13.0, 4.14.0
>Reporter: Jaanai
>Assignee: Jaanai
>Priority: Major
>
> Using following SQL to reoccur error:
> {code:sql}
> create table "test_truncate"("ROW" varchar primary key,"f"."0" 
> varchar,"f"."1" varchar);
> select * from "test_truncate" order by TO_NUMBER("f.1");
> {code}
> Exception information:
> {code}
> java.lang.NullPointerException     at 
> org.apache.phoenix.util.SchemaUtil.getSchemaNameFromFullName(SchemaUtil.java:632)
>      at 
> org.apache.phoenix.schema.TableNotFoundException.(TableNotFoundException.java:44)
>      at 
> org.apache.phoenix.compile.FromCompiler$MultiTableColumnResolver.resolveTable(FromCompiler.java:858)
>      at 
> org.apache.phoenix.compile.FromCompiler$ProjectedTableColumnResolver.resolveColumn(FromCompiler.java:984)
>      at 
> org.apache.phoenix.compile.ExpressionCompiler.resolveColumn(ExpressionCompiler.java:372)
>      at 
> org.apache.phoenix.compile.ExpressionCompiler.visit(ExpressionCompiler.java:408)
>      at 
> org.apache.phoenix.compile.ExpressionCompiler.visit(ExpressionCompiler.java:146)
>      at 
> org.apache.phoenix.parse.ColumnParseNode.accept(ColumnParseNode.java:56)     
> at 
> org.apache.phoenix.parse.CompoundParseNode.acceptChildren(CompoundParseNode.java:64)
>      at 
> org.apache.phoenix.parse.FunctionParseNode.accept(FunctionParseNode.java:84)  
>    at 
> org.apache.phoenix.compile.OrderByCompiler.compile(OrderByCompiler.java:123)  
>    at 
> org.apache.phoenix.compile.QueryCompiler.compileSingleFlatQuery(QueryCompiler.java:562)
>      at 
> org.apache.phoenix.compile.QueryCompiler.compileSingleQuery(QueryCompiler.java:507)
>      at 
> org.apache.phoenix.compile.QueryCompiler.compileSelect(QueryCompiler.java:202)
>      at 
> org.apache.phoenix.compile.QueryCompiler.compile(QueryCompiler.java:157)     
> at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:478)
>      at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:444)
>      a
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)