[jira] [Commented] (PHOENIX-1334) Issue when LIKE expression contains Chinese characters on Key column

2014-10-09 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164878#comment-14164878
 ] 

jay wong commented on PHOENIX-1334:
---

[~wuyang630]

for a reason of a chinese characters has two bytes.

maybe you can use: where VAL like '%中%';

 Issue when LIKE expression contains Chinese characters on Key column
 

 Key: PHOENIX-1334
 URL: https://issues.apache.org/jira/browse/PHOENIX-1334
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.1
 Environment: jdk 1.8 linux
Reporter: wuyang

 When I use like expression in SELECT query. It works well when I put 
 *Chinese* characters in LIKE expression on NONE PRIMARY KEY columns . BUT 
 when I put them in LIKE expression on *PRIMARY KEY* , it occurs an Exception:‍
  select * from test3 where PK like '中%';  ‍
 ||COLUMN_NAME||DATA_TYPE||TYPE_NAME
 |PK|12|VARCHAR
 |VAL|12|VARCHAR
 {quote}
 org.apache.phoenix.schema.IllegalDataException: CHAR types may only contain 
 single byte characters (中) ‍
 at org.apache.phoenix.schema.PDataType$2.toBytes(PDataType.java:216)‍
 at 
 org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:829)‍
 at 
 org.apache.phoenix.compile.WhereOptimizer$KeyExpressionVisitor.visitLeave(WhereOptimizer.java:349)‍
 at 
 org.apache.phoenix.expression.LikeExpression.accept(LikeExpression.java:269)  
  ‍
 at 
 
 {quote}
 the type of PRIMARY KEY and ‍NONE PRIMARY KEY columns are all ‍VARCHAR‍
 In the relative source code:
 {code}
 byte[] b = VARCHAR.toBytes(object);‍
  if (b.length != ((String) object).length()) {
 throw new IllegalDataException(CHAR types may only contain 
 single byte characters ( + object + ));
 }‍
 {code}
 actually, Chinese (or other non-Latin) characters will never meet the 
 condition b.length == ((String) object).length() . Default encode method is 
 UTF-8.‍
 *User following sentences to reappear:*
 create table test_c ( pk varchar primary key , val varchar);
 upsert into test_c values ('中文','中文'); 
 select * from test_c where VAL like '中%';
 _// it works well until now_
 select * from test_c where PK like '中%';
 _// oops..._



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-08 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1267:
--
Attachment: smallscan2.patch

 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch, smallscan2.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-08 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163236#comment-14163236
 ] 

jay wong commented on PHOENIX-1267:
---

[~jamestaylor]
About isScanForbidden logic is not about topN, it's about order by. But i found 
that the order by query sign for TOPN.
eg: we have a result like. assume that small scan query 100 results one query.
|key|col1|
|1|2|
|2|1|
|...|...|
|100|55|
|2|101|

when next query. it found the row which is rowkey is greater than the last 
rowkey in the last page.

so the rowkey will be fall into a  Infinite loop.



 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch, smallscan2.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-08 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163246#comment-14163246
 ] 

jay wong commented on PHOENIX-1267:
---

[~jamestaylor]
About joins. I think that the joins query is always a big query almost the 
whole table.

about all big query. the small scan has a bad performance. So I set smallScan 
false before

 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch, smallscan2.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-07 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong reassigned PHOENIX-1267:
-

Assignee: jay wong

 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate

2014-10-07 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163003#comment-14163003
 ] 

jay wong commented on PHOENIX-1267:
---

I have a holiday in the past several days. so sorry for reply later.

I know your mean. normally the hint is more structured and a better way. I 
think use hint control the small is a good point.

the small scan will be set true default when both the startkey and stopkey is 
setted. if we have a order by query. and the small is true. the result will be 
Infinite loop.

So I think the small scan is not only a query optimize for user. I will cause a 
bug. 
So I think the smallScanForbidden is needed also.






 Set scan.setSmall(true) when appropriate
 

 Key: PHOENIX-1267
 URL: https://issues.apache.org/jira/browse/PHOENIX-1267
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: jay wong
 Attachments: smallscan.patch


 There's a nice optimization that has been in HBase for a while now to set a 
 scan as small. This prevents extra RPC calls, I believe. We should add a 
 hint for queries that forces it to be set/not set, and make our best guess on 
 when it should default to true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1258) RegexpSubstrFunction is not useable with group by query

2014-09-18 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1258:
--
Attachment: regex_substr_func_group3.patch

 RegexpSubstrFunction is not useable with group by query 
 

 Key: PHOENIX-1258
 URL: https://issues.apache.org/jira/browse/PHOENIX-1258
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.1
Reporter: jay wong
Assignee: jay wong
 Fix For: 3.1

 Attachments: regex_substr_func_group.patch, 
 regex_substr_func_group2.patch, regex_substr_func_group3.patch


 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
 test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf?fas |
 | basdfasf?fas |
 2 rows selected (0.071 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt from test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf|
 | basdfasf   |
 2 rows selected (0.07 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
 |   SUBGMT   | SUM(A.INT_A) |
 | null   | 17498|
 1 row selected (0.082 seconds)
 the group by result is not expected
 below result is expected
 |SUBGMT   |SUM(A.INT_A)|
 | asdfasf|   8342 |
 | basdfasf   | 5449 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1258) RegexpSubstrFunction is not useable with group by query

2014-09-18 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138615#comment-14138615
 ] 

jay wong commented on PHOENIX-1258:
---

[~samarthjain]
of caurse.
thanks for review and suggest


 RegexpSubstrFunction is not useable with group by query 
 

 Key: PHOENIX-1258
 URL: https://issues.apache.org/jira/browse/PHOENIX-1258
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.1
Reporter: jay wong
Assignee: jay wong
 Fix For: 3.1

 Attachments: regex_substr_func_group.patch, 
 regex_substr_func_group2.patch, regex_substr_func_group3.patch


 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
 test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf?fas |
 | basdfasf?fas |
 2 rows selected (0.071 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt from test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf|
 | basdfasf   |
 2 rows selected (0.07 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
 |   SUBGMT   | SUM(A.INT_A) |
 | null   | 17498|
 1 row selected (0.082 seconds)
 the group by result is not expected
 below result is expected
 |SUBGMT   |SUM(A.INT_A)|
 | asdfasf|   8342 |
 | basdfasf   | 5449 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1258) RegexpSubstrFunction is not useable with group by query

2014-09-18 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1258:
--
Attachment: regex_substr_func.patch

 RegexpSubstrFunction is not useable with group by query 
 

 Key: PHOENIX-1258
 URL: https://issues.apache.org/jira/browse/PHOENIX-1258
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.1
Reporter: jay wong
Assignee: jay wong
 Fix For: 3.1

 Attachments: regex_substr_func.patch, regex_substr_func_group.patch, 
 regex_substr_func_group2.patch, regex_substr_func_group3.patch


 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
 test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf?fas |
 | basdfasf?fas |
 2 rows selected (0.071 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt from test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf|
 | basdfasf   |
 2 rows selected (0.07 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
 |   SUBGMT   | SUM(A.INT_A) |
 | null   | 17498|
 1 row selected (0.082 seconds)
 the group by result is not expected
 below result is expected
 |SUBGMT   |SUM(A.INT_A)|
 | asdfasf|   8342 |
 | basdfasf   | 5449 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1258) RegexpSubstrFunction is not useable with group by query

2014-09-18 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139908#comment-14139908
 ] 

jay wong commented on PHOENIX-1258:
---

[~gabriel.reid]
I know your mean. And I agree with it.

thanks for review and wonderful suggest
 

 RegexpSubstrFunction is not useable with group by query 
 

 Key: PHOENIX-1258
 URL: https://issues.apache.org/jira/browse/PHOENIX-1258
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.1
Reporter: jay wong
Assignee: jay wong
 Fix For: 3.1

 Attachments: regex_substr_func.patch, regex_substr_func_group.patch, 
 regex_substr_func_group2.patch, regex_substr_func_group3.patch


 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
 test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf?fas |
 | basdfasf?fas |
 2 rows selected (0.071 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt from test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf|
 | basdfasf   |
 2 rows selected (0.07 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
 |   SUBGMT   | SUM(A.INT_A) |
 | null   | 17498|
 1 row selected (0.082 seconds)
 the group by result is not expected
 below result is expected
 |SUBGMT   |SUM(A.INT_A)|
 | asdfasf|   8342 |
 | basdfasf   | 5449 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1252) Potential lock leak for ungrouped aggregation

2014-09-17 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137007#comment-14137007
 ] 

jay wong commented on PHOENIX-1252:
---

thx for review and commit [~jamestaylor]

 Potential lock leak for ungrouped aggregation
 -

 Key: PHOENIX-1252
 URL: https://issues.apache.org/jira/browse/PHOENIX-1252
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.1
Reporter: jay wong
Assignee: jay wong
 Fix For: 4.2, 3.2

 Attachments: lock-leak.patch


 as innerScanner.close throw a exception 
 the region.CloseRegionOperation will not execute at all



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-1258) RegexpSubstrFunction is not useable with group by query

2014-09-17 Thread jay wong (JIRA)
jay wong created PHOENIX-1258:
-

 Summary: RegexpSubstrFunction is not useable with group by query 
 Key: PHOENIX-1258
 URL: https://issues.apache.org/jira/browse/PHOENIX-1258
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.1
Reporter: jay wong
Assignee: jay wong
 Fix For: 3.1


0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
test_salt1 limit 2;
++
|   SUBGMT   |
++
| asdfasf?fas |
| basdfasf?fas |
++
2 rows selected (0.071 seconds)
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
'[^\\?]+') subgmt from test_salt1 limit 2;
++
|   SUBGMT   |
++
| asdfasf|
| basdfasf   |
++
2 rows selected (0.07 seconds)
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
'[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
|   SUBGMT   | SUM(A.INT_A) |
| null   | 17498|
1 row selected (0.082 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1258) RegexpSubstrFunction is not useable with group by query

2014-09-17 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1258:
--
Description: 
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
test_salt1 limit 2;
|   SUBGMT   |
| asdfasf?fas |
| basdfasf?fas |
2 rows selected (0.071 seconds)
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
'[^\\?]+') subgmt from test_salt1 limit 2;
|   SUBGMT   |
| asdfasf|
| basdfasf   |
2 rows selected (0.07 seconds)
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
'[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
|   SUBGMT   | SUM(A.INT_A) |
| null   | 17498|
1 row selected (0.082 seconds)

the group by result is not expected

  was:
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
test_salt1 limit 2;
++
|   SUBGMT   |
++
| asdfasf?fas |
| basdfasf?fas |
++
2 rows selected (0.071 seconds)
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
'[^\\?]+') subgmt from test_salt1 limit 2;
++
|   SUBGMT   |
++
| asdfasf|
| basdfasf   |
++
2 rows selected (0.07 seconds)
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
'[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
|   SUBGMT   | SUM(A.INT_A) |
| null   | 17498|
1 row selected (0.082 seconds)


 RegexpSubstrFunction is not useable with group by query 
 

 Key: PHOENIX-1258
 URL: https://issues.apache.org/jira/browse/PHOENIX-1258
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.1
Reporter: jay wong
Assignee: jay wong
 Fix For: 3.1


 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
 test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf?fas |
 | basdfasf?fas |
 2 rows selected (0.071 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt from test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf|
 | basdfasf   |
 2 rows selected (0.07 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
 |   SUBGMT   | SUM(A.INT_A) |
 | null   | 17498|
 1 row selected (0.082 seconds)
 the group by result is not expected



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1258) RegexpSubstrFunction is not useable with group by query

2014-09-17 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1258:
--
Description: 
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
test_salt1 limit 2;
|   SUBGMT   |
| asdfasf?fas |
| basdfasf?fas |
2 rows selected (0.071 seconds)
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
'[^\\?]+') subgmt from test_salt1 limit 2;
|   SUBGMT   |
| asdfasf|
| basdfasf   |
2 rows selected (0.07 seconds)
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
'[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
|   SUBGMT   | SUM(A.INT_A) |
| null   | 17498|
1 row selected (0.082 seconds)

the group by result is not expected


below result is expected
|SUBGMT |SUM(A.INT_A)|
| asdfasf|   8342 |
| basdfasf   | 5449 |

  was:
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
test_salt1 limit 2;
|   SUBGMT   |
| asdfasf?fas |
| basdfasf?fas |
2 rows selected (0.071 seconds)
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
'[^\\?]+') subgmt from test_salt1 limit 2;
|   SUBGMT   |
| asdfasf|
| basdfasf   |
2 rows selected (0.07 seconds)
0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
'[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
|   SUBGMT   | SUM(A.INT_A) |
| null   | 17498|
1 row selected (0.082 seconds)

the group by result is not expected


 RegexpSubstrFunction is not useable with group by query 
 

 Key: PHOENIX-1258
 URL: https://issues.apache.org/jira/browse/PHOENIX-1258
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.1
Reporter: jay wong
Assignee: jay wong
 Fix For: 3.1


 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
 test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf?fas |
 | basdfasf?fas |
 2 rows selected (0.071 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt from test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf|
 | basdfasf   |
 2 rows selected (0.07 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
 |   SUBGMT   | SUM(A.INT_A) |
 | null   | 17498|
 1 row selected (0.082 seconds)
 the group by result is not expected
 below result is expected
 |SUBGMT   |SUM(A.INT_A)|
 | asdfasf|   8342 |
 | basdfasf   | 5449 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-1258) RegexpSubstrFunction is not useable with group by query

2014-09-17 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138395#comment-14138395
 ] 

jay wong commented on PHOENIX-1258:
---

[~gabriel.reid]
yes. override readFields will be a better way.

But why the test case I named it GroupByWithFunctionIT. 

It's because of that only query use RegexpSubstrFunction ,the result is all 
right.

the bad case is query with both group by and function. 

as It set into scan attribute which key named UNORDERED_GROUP_BY_EXPRESSIONS.

So It is a group by test. I think



 RegexpSubstrFunction is not useable with group by query 
 

 Key: PHOENIX-1258
 URL: https://issues.apache.org/jira/browse/PHOENIX-1258
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.1
Reporter: jay wong
Assignee: jay wong
 Fix For: 3.1

 Attachments: regex_substr_func_group.patch


 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
 test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf?fas |
 | basdfasf?fas |
 2 rows selected (0.071 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt from test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf|
 | basdfasf   |
 2 rows selected (0.07 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
 |   SUBGMT   | SUM(A.INT_A) |
 | null   | 17498|
 1 row selected (0.082 seconds)
 the group by result is not expected
 below result is expected
 |SUBGMT   |SUM(A.INT_A)|
 | asdfasf|   8342 |
 | basdfasf   | 5449 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-1258) RegexpSubstrFunction is not useable with group by query

2014-09-17 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138395#comment-14138395
 ] 

jay wong edited comment on PHOENIX-1258 at 9/18/14 2:07 AM:


[~gabriel.reid]
yes. override readFields will be a better way.

But why the test case I named it GroupByWithFunctionIT. 

It's because of that only query use RegexpSubstrFunction ,the result is all 
right.

the bad case is query with both group by and function. 

And as It serialized and set into scan attribute which key named 
UNORDERED_GROUP_BY_EXPRESSIONS.

So It is a group by test. I think.

thanks for review and suggest




was (Author: jaywong):
[~gabriel.reid]
yes. override readFields will be a better way.

But why the test case I named it GroupByWithFunctionIT. 

It's because of that only query use RegexpSubstrFunction ,the result is all 
right.

the bad case is query with both group by and function. 

as It set into scan attribute which key named UNORDERED_GROUP_BY_EXPRESSIONS.

So It is a group by test. I think



 RegexpSubstrFunction is not useable with group by query 
 

 Key: PHOENIX-1258
 URL: https://issues.apache.org/jira/browse/PHOENIX-1258
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.1
Reporter: jay wong
Assignee: jay wong
 Fix For: 3.1

 Attachments: regex_substr_func_group.patch


 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
 test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf?fas |
 | basdfasf?fas |
 2 rows selected (0.071 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt from test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf|
 | basdfasf   |
 2 rows selected (0.07 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
 |   SUBGMT   | SUM(A.INT_A) |
 | null   | 17498|
 1 row selected (0.082 seconds)
 the group by result is not expected
 below result is expected
 |SUBGMT   |SUM(A.INT_A)|
 | asdfasf|   8342 |
 | basdfasf   | 5449 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PHOENIX-1258) RegexpSubstrFunction is not useable with group by query

2014-09-17 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138395#comment-14138395
 ] 

jay wong edited comment on PHOENIX-1258 at 9/18/14 2:11 AM:


[~gabriel.reid]
yes. override readFields will be a better way.

But why the test case I named it GroupByWithFunctionIT. 

It's because of that only query use RegexpSubstrFunction ,the result is all 
right.

the bad case is query with both group by and function. 

And as It serialized and set into scan attribute which key named 
UNORDERED_GROUP_BY_EXPRESSIONS.

And **other functions** may be have same problem with group by.
So I think It is a group by test. 

thanks for review and suggest




was (Author: jaywong):
[~gabriel.reid]
yes. override readFields will be a better way.

But why the test case I named it GroupByWithFunctionIT. 

It's because of that only query use RegexpSubstrFunction ,the result is all 
right.

the bad case is query with both group by and function. 

And as It serialized and set into scan attribute which key named 
UNORDERED_GROUP_BY_EXPRESSIONS.

So It is a group by test. I think.

thanks for review and suggest



 RegexpSubstrFunction is not useable with group by query 
 

 Key: PHOENIX-1258
 URL: https://issues.apache.org/jira/browse/PHOENIX-1258
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.1
Reporter: jay wong
Assignee: jay wong
 Fix For: 3.1

 Attachments: regex_substr_func_group.patch


 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
 test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf?fas |
 | basdfasf?fas |
 2 rows selected (0.071 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt from test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf|
 | basdfasf   |
 2 rows selected (0.07 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
 |   SUBGMT   | SUM(A.INT_A) |
 | null   | 17498|
 1 row selected (0.082 seconds)
 the group by result is not expected
 below result is expected
 |SUBGMT   |SUM(A.INT_A)|
 | asdfasf|   8342 |
 | basdfasf   | 5449 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1258) RegexpSubstrFunction is not useable with group by query

2014-09-17 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1258:
--
Attachment: regex_substr_func_group2.patch

 RegexpSubstrFunction is not useable with group by query 
 

 Key: PHOENIX-1258
 URL: https://issues.apache.org/jira/browse/PHOENIX-1258
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.1
Reporter: jay wong
Assignee: jay wong
 Fix For: 3.1

 Attachments: regex_substr_func_group.patch, 
 regex_substr_func_group2.patch


 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select gmt subgmt from 
 test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf?fas |
 | basdfasf?fas |
 2 rows selected (0.071 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt from test_salt1 limit 2;
 |   SUBGMT   |
 | asdfasf|
 | basdfasf   |
 2 rows selected (0.07 seconds)
 0: jdbc:phoenix:10.232.98.99:/hbase-phoenix-l select REGEXP_SUBSTR(gmt, 
 '[^\\?]+') subgmt,sum(int_A) from test_salt1 group by subgmt;
 |   SUBGMT   | SUM(A.INT_A) |
 | null   | 17498|
 1 row selected (0.082 seconds)
 the group by result is not expected
 below result is expected
 |SUBGMT   |SUM(A.INT_A)|
 | asdfasf|   8342 |
 | basdfasf   | 5449 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1252) A lock maybe leak.

2014-09-15 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1252:
--
Attachment: lock-leak.patch

 A lock maybe leak.
 --

 Key: PHOENIX-1252
 URL: https://issues.apache.org/jira/browse/PHOENIX-1252
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.1
Reporter: jay wong
 Fix For: 3.1

 Attachments: lock-leak.patch


 as innerScanner.close throw a exception 
 the region.CloseRegionOperation will not execute at all



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-1074) ParallelIteratorRegionSplitterFactory get Splits is not rational

2014-07-21 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1074:
--

Attachment: SkipScanFilterSaltedIntersectTest.java

 ParallelIteratorRegionSplitterFactory get Splits is not rational
 

 Key: PHOENIX-1074
 URL: https://issues.apache.org/jira/browse/PHOENIX-1074
 Project: Phoenix
  Issue Type: Bug
Reporter: jay wong
 Attachments: SkipScanFilterSaltedIntersectTest.java


 create a table 
 {code}
 create table if not exists table1(
   gmt VARCHAR NOT NULL, 
   spm_type VARCHAR NOT NULL, 
   spm VARCHAR NOT NULL, 
   A.int_a INTEGER, 
   B.int_b INTEGER, 
   B.int_c INTEGER 
   CONSTRAINT pk PRIMARY KEY (gmt, spm_type, spm)) SALT_BUCKETS = 4, 
 bloomfilter='ROW';
 {code}
 and made the table 29 partitions as this.
 |startrow|endrow|
 | |\x0020140201|
 |\x0020140201|\x0020140202|
 |\x0020140202|\x0020140203|
 |\x0020140203|\x0020140204|
 |\x0020140204|\x0020140205|   
 |\x0020140205|\x0020140206|   
 |\x0020140206|\x0020140207|
 |\x0020140207|\x0120140201|
 |\x0120140201|\x0120140202|
 |\x0120140202|\x0120140203|
 |\x0120140203|\x0120140204|
 |\x0120140204|\x0120140205|
 |\x0120140205|\x0120140206|
 |\x0120140206|\x0120140207|
 |\x0120140207|\x0220140201|
 |\x0220140201|\x0220140202|
 |\x0220140202|\x0220140203|
 |\x0220140203|\x0220140204|
 |\x0220140204|\x0220140205|
 |\x0220140205|\x0220140206|
 |\x0220140206|\x0220140207|
 |\x0220140207|\x0320140201|
 |\x0320140201|\x0320140202|
 |\x0320140202|\x0320140203|
 |\x0320140203|\x0320140204|
 |\x0320140204|\x0320140205|
 |\x0320140205|\x0320140206|
 |\x0320140206|\x0320140207|
 |\x0320140207| |  
 Then insert some data;
 |GMT |  SPM_TYPE  |SPM |   INT_A|   INT_B|   INT_C
 |
 | 20140201   | 1  | 1.2.3.4546 | 218| 218| null   
 |
 | 20140201   | 1  | 1.2.44545  | 190| 190| null   
 |
 | 20140201   | 1  | 1.353451312 | 246| 246| null  
  |
 | 20140201   | 2  | 1.2.3.6775 | 183| 183| null   
 |
 |...|...|...|...|...|...|
 | 20140207   | 3  | 1.2.3.4546 | 224| 224| null   
 |
 | 20140207   | 3  | 1.2.44545  | 196| 196| null   
 |
 | 20140207   | 3  | 1.353451312 | 168| 168| null  
  |
 | 20140207   | 4  | 1.2.3.6775 | 189| 189| null   
 |
 | 20140207   | 4  | 1.23.345345 | 217| 217| null  
  |
 | 20140207   | 4  | 1.23234234234 | 245| 245| null
|
 print a log like this
 {code}
 public class ParallelIterators extends ExplainTable implements 
 ResultIterators {
 
  @Override
 public ListPeekingResultIterator getIterators() throws SQLException {
 boolean success = false;
 final ConnectionQueryServices services = 
 context.getConnection().getQueryServices();
 ReadOnlyProps props = services.getProps();
 int numSplits = splits.size();
 ListPeekingResultIterator iterators = new 
 ArrayListPeekingResultIterator(numSplits);
 ListPairbyte[],FuturePeekingResultIterator futures = new 
 ArrayListPairbyte[],FuturePeekingResultIterator(numSplits);
 final UUID scanId = UUID.randomUUID();
 try {
 ExecutorService executor = services.getExecutor();
 System.out.println(the split size is  + numSplits);
  
  }
 }
 {code}
 then execute some sql 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2' and spm like '1.%'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207'
 the split size is 27
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2' and spm like '1.%'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204'
 the split size is 12
 {code}
 but I think 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2' and spm like '1.%'
 {code}
 and 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' 
 {code}
 the two sql will has the same split , but why not?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1074) ParallelIteratorRegionSplitterFactory get Splits is not rational

2014-07-21 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068585#comment-14068585
 ] 

jay wong commented on PHOENIX-1074:
---

[~jamestaylor]

add a test case. the second testcase can attest the problem

I know the rule is hard to correct. but it's not advisable

 ParallelIteratorRegionSplitterFactory get Splits is not rational
 

 Key: PHOENIX-1074
 URL: https://issues.apache.org/jira/browse/PHOENIX-1074
 Project: Phoenix
  Issue Type: Bug
Reporter: jay wong
 Attachments: SkipScanFilterSaltedIntersectTest.java


 create a table 
 {code}
 create table if not exists table1(
   gmt VARCHAR NOT NULL, 
   spm_type VARCHAR NOT NULL, 
   spm VARCHAR NOT NULL, 
   A.int_a INTEGER, 
   B.int_b INTEGER, 
   B.int_c INTEGER 
   CONSTRAINT pk PRIMARY KEY (gmt, spm_type, spm)) SALT_BUCKETS = 4, 
 bloomfilter='ROW';
 {code}
 and made the table 29 partitions as this.
 |startrow|endrow|
 | |\x0020140201|
 |\x0020140201|\x0020140202|
 |\x0020140202|\x0020140203|
 |\x0020140203|\x0020140204|
 |\x0020140204|\x0020140205|   
 |\x0020140205|\x0020140206|   
 |\x0020140206|\x0020140207|
 |\x0020140207|\x0120140201|
 |\x0120140201|\x0120140202|
 |\x0120140202|\x0120140203|
 |\x0120140203|\x0120140204|
 |\x0120140204|\x0120140205|
 |\x0120140205|\x0120140206|
 |\x0120140206|\x0120140207|
 |\x0120140207|\x0220140201|
 |\x0220140201|\x0220140202|
 |\x0220140202|\x0220140203|
 |\x0220140203|\x0220140204|
 |\x0220140204|\x0220140205|
 |\x0220140205|\x0220140206|
 |\x0220140206|\x0220140207|
 |\x0220140207|\x0320140201|
 |\x0320140201|\x0320140202|
 |\x0320140202|\x0320140203|
 |\x0320140203|\x0320140204|
 |\x0320140204|\x0320140205|
 |\x0320140205|\x0320140206|
 |\x0320140206|\x0320140207|
 |\x0320140207| |  
 Then insert some data;
 |GMT |  SPM_TYPE  |SPM |   INT_A|   INT_B|   INT_C
 |
 | 20140201   | 1  | 1.2.3.4546 | 218| 218| null   
 |
 | 20140201   | 1  | 1.2.44545  | 190| 190| null   
 |
 | 20140201   | 1  | 1.353451312 | 246| 246| null  
  |
 | 20140201   | 2  | 1.2.3.6775 | 183| 183| null   
 |
 |...|...|...|...|...|...|
 | 20140207   | 3  | 1.2.3.4546 | 224| 224| null   
 |
 | 20140207   | 3  | 1.2.44545  | 196| 196| null   
 |
 | 20140207   | 3  | 1.353451312 | 168| 168| null  
  |
 | 20140207   | 4  | 1.2.3.6775 | 189| 189| null   
 |
 | 20140207   | 4  | 1.23.345345 | 217| 217| null  
  |
 | 20140207   | 4  | 1.23234234234 | 245| 245| null
|
 print a log like this
 {code}
 public class ParallelIterators extends ExplainTable implements 
 ResultIterators {
 
  @Override
 public ListPeekingResultIterator getIterators() throws SQLException {
 boolean success = false;
 final ConnectionQueryServices services = 
 context.getConnection().getQueryServices();
 ReadOnlyProps props = services.getProps();
 int numSplits = splits.size();
 ListPeekingResultIterator iterators = new 
 ArrayListPeekingResultIterator(numSplits);
 ListPairbyte[],FuturePeekingResultIterator futures = new 
 ArrayListPairbyte[],FuturePeekingResultIterator(numSplits);
 final UUID scanId = UUID.randomUUID();
 try {
 ExecutorService executor = services.getExecutor();
 System.out.println(the split size is  + numSplits);
  
  }
 }
 {code}
 then execute some sql 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2' and spm like '1.%'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207'
 the split size is 27
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2' and spm like '1.%'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204'
 the split size is 12
 {code}
 but I think 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2' and spm like '1.%'
 {code}
 and 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' 
 {code}
 the two sql will has the same split , but why not?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1074) ParallelIteratorRegionSplitterFactory get Splits is not rational

2014-07-10 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057369#comment-14057369
 ] 

jay wong commented on PHOENIX-1074:
---

[~jamestaylor]
please check my problem again.
this is my primary key. and salt_buckets.
{code}
CONSTRAINT pk PRIMARY KEY (gmt, spm_type, spm)) SALT_BUCKETS = 4
{code}

{code}
select * from table1 where gmt  '20140202' and gmt  '20140204'

the split size is 12  (is logical)
{code}

{code}
select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type = 
'2'

the split size is 28(I think the split size is also 12 is logical)
{code}

this is only a epitome.

In my online table . has 1900 regions.

If it's run with logical splits policy, only has nearly 20 splits.

BUT it has 1900 splits



 ParallelIteratorRegionSplitterFactory get Splits is not rational
 

 Key: PHOENIX-1074
 URL: https://issues.apache.org/jira/browse/PHOENIX-1074
 Project: Phoenix
  Issue Type: Bug
Reporter: jay wong

 create a table 
 {code}
 create table if not exists table1(
   gmt VARCHAR NOT NULL, 
   spm_type VARCHAR NOT NULL, 
   spm VARCHAR NOT NULL, 
   A.int_a INTEGER, 
   B.int_b INTEGER, 
   B.int_c INTEGER 
   CONSTRAINT pk PRIMARY KEY (gmt, spm_type, spm)) SALT_BUCKETS = 4, 
 bloomfilter='ROW';
 {code}
 and made the table 29 partitions as this.
 |startrow|endrow|
 | |\x0020140201|
 |\x0020140201|\x0020140202|
 |\x0020140202|\x0020140203|
 |\x0020140203|\x0020140204|
 |\x0020140204|\x0020140205|   
 |\x0020140205|\x0020140206|   
 |\x0020140206|\x0020140207|
 |\x0020140207|\x0120140201|
 |\x0120140201|\x0120140202|
 |\x0120140202|\x0120140203|
 |\x0120140203|\x0120140204|
 |\x0120140204|\x0120140205|
 |\x0120140205|\x0120140206|
 |\x0120140206|\x0120140207|
 |\x0120140207|\x0220140201|
 |\x0220140201|\x0220140202|
 |\x0220140202|\x0220140203|
 |\x0220140203|\x0220140204|
 |\x0220140204|\x0220140205|
 |\x0220140205|\x0220140206|
 |\x0220140206|\x0220140207|
 |\x0220140207|\x0320140201|
 |\x0320140201|\x0320140202|
 |\x0320140202|\x0320140203|
 |\x0320140203|\x0320140204|
 |\x0320140204|\x0320140205|
 |\x0320140205|\x0320140206|
 |\x0320140206|\x0320140207|
 |\x0320140207| |  
 Then insert some data;
 |GMT |  SPM_TYPE  |SPM |   INT_A|   INT_B|   INT_C
 |
 | 20140201   | 1  | 1.2.3.4546 | 218| 218| null   
 |
 | 20140201   | 1  | 1.2.44545  | 190| 190| null   
 |
 | 20140201   | 1  | 1.353451312 | 246| 246| null  
  |
 | 20140201   | 2  | 1.2.3.6775 | 183| 183| null   
 |
 |...|...|...|...|...|...|
 | 20140207   | 3  | 1.2.3.4546 | 224| 224| null   
 |
 | 20140207   | 3  | 1.2.44545  | 196| 196| null   
 |
 | 20140207   | 3  | 1.353451312 | 168| 168| null  
  |
 | 20140207   | 4  | 1.2.3.6775 | 189| 189| null   
 |
 | 20140207   | 4  | 1.23.345345 | 217| 217| null  
  |
 | 20140207   | 4  | 1.23234234234 | 245| 245| null
|
 print a log like this
 {code}
 public class ParallelIterators extends ExplainTable implements 
 ResultIterators {
 
  @Override
 public ListPeekingResultIterator getIterators() throws SQLException {
 boolean success = false;
 final ConnectionQueryServices services = 
 context.getConnection().getQueryServices();
 ReadOnlyProps props = services.getProps();
 int numSplits = splits.size();
 ListPeekingResultIterator iterators = new 
 ArrayListPeekingResultIterator(numSplits);
 ListPairbyte[],FuturePeekingResultIterator futures = new 
 ArrayListPairbyte[],FuturePeekingResultIterator(numSplits);
 final UUID scanId = UUID.randomUUID();
 try {
 ExecutorService executor = services.getExecutor();
 System.out.println(the split size is  + numSplits);
  
  }
 }
 {code}
 then execute some sql 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2' and spm like '1.%'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207'
 the split size is 27
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2' and spm like '1.%'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204'
 the split size is 12
 {code}
 but I think 
 {code}
 select * from table1 where gmt  '20140202' and gmt  

[jira] [Created] (PHOENIX-1074) ParallelIteratorRegionSplitterFactory get Splits is not rational

2014-07-09 Thread jay wong (JIRA)
jay wong created PHOENIX-1074:
-

 Summary: ParallelIteratorRegionSplitterFactory get Splits is not 
rational
 Key: PHOENIX-1074
 URL: https://issues.apache.org/jira/browse/PHOENIX-1074
 Project: Phoenix
  Issue Type: Wish
Reporter: jay wong


create a table 
{code}
create table if not exists table1(
  gmt VARCHAR NOT NULL, 
  spm_type VARCHAR NOT NULL, 
  spm VARCHAR NOT NULL, 
  A.int_a INTEGER, 
  B.int_b INTEGER, 
  B.int_c INTEGER 
  CONSTRAINT pk PRIMARY KEY (gmt, spm_type, spm)) SALT_BUCKETS = 4, 
bloomfilter='ROW';
{code}
and made the table partition as this.
|startrow|endrow|
| |\x0020140201|
|\x0020140201|\x0020140202|
|\x0020140202|\x0020140203|
|\x0020140203|\x0020140204|
|\x0020140204|\x0020140205| 
|\x0020140205|\x0020140206| 
|\x0020140206|\x0020140207|
|\x0020140207|\x0120140201|
|\x0120140201|\x0120140202|
|\x0120140202|\x0120140203|
|\x0120140203|\x0120140204|
|\x0120140204|\x0120140205|
|\x0120140205|\x0120140206|
|\x0120140206|\x0120140207|
|\x0120140207|\x0220140201|
|\x0220140201|\x0220140202|
|\x0220140202|\x0220140203|
|\x0220140203|\x0220140204|
|\x0220140204|\x0220140205|
|\x0220140205|\x0220140206|
|\x0220140206|\x0220140207|
|\x0220140207|\x0320140201|
|\x0320140201|\x0320140202|
|\x0320140202|\x0320140203|
|\x0320140203|\x0320140204|
|\x0320140204|\x0320140205|
|\x0320140205|\x0320140206|
|\x0320140206|\x0320140207|
|\x0320140207| |

Then insert some data;
|GMT |  SPM_TYPE  |SPM |   INT_A|   INT_B|   INT_C|
| 20140201   | 1  | 1.2.3.4546 | 218| 218| null   |
| 20140201   | 1  | 1.2.44545  | 190| 190| null   |
| 20140201   | 1  | 1.353451312 | 246| 246| null   |
| 20140201   | 2  | 1.2.3.6775 | 183| 183| null   |
|...|...|...|...|...|...|
| 20140207   | 3  | 1.2.3.4546 | 224| 224| null   |
| 20140207   | 3  | 1.2.44545  | 196| 196| null   |
| 20140207   | 3  | 1.353451312 | 168| 168| null   |
| 20140207   | 4  | 1.2.3.6775 | 189| 189| null   |
| 20140207   | 4  | 1.23.345345 | 217| 217| null   |
| 20140207   | 4  | 1.23234234234 | 245| 245| null  
 |

print a log like this
{code}
public class ParallelIterators extends ExplainTable implements ResultIterators {

 @Override
public ListPeekingResultIterator getIterators() throws SQLException {
boolean success = false;
final ConnectionQueryServices services = 
context.getConnection().getQueryServices();
ReadOnlyProps props = services.getProps();
int numSplits = splits.size();
ListPeekingResultIterator iterators = new 
ArrayListPeekingResultIterator(numSplits);
ListPairbyte[],FuturePeekingResultIterator futures = new 
ArrayListPairbyte[],FuturePeekingResultIterator(numSplits);
final UUID scanId = UUID.randomUUID();
try {
ExecutorService executor = services.getExecutor();
System.out.println(the split size is  + numSplits);
 
 }
}
{code}

then execute some sql 
{code}
select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type = 
'2' and spm like '1.%'
the split size is 31
select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type = 
'2'
the split size is 31
select * from table1 where gmt  '20140202' and gmt  '20140207'
the split size is 27
select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type = 
'2' and spm like '1.%'
the split size is 28
select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type = 
'2'
the split size is 28
select * from table1 where gmt  '20140202' and gmt  '20140204'
the split size is 12
{code}

but I think 
{code}
select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type = 
'2' and spm like '1.%'
{code}
and 
{code}
select * from table1 where gmt  '20140202' and gmt  '20140207' 
{code}
the two sql will has the same split , but why not?






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PHOENIX-1074) ParallelIteratorRegionSplitterFactory get Splits is not rational

2014-07-09 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1074:
--

Issue Type: Bug  (was: Wish)

 ParallelIteratorRegionSplitterFactory get Splits is not rational
 

 Key: PHOENIX-1074
 URL: https://issues.apache.org/jira/browse/PHOENIX-1074
 Project: Phoenix
  Issue Type: Bug
Reporter: jay wong

 create a table 
 {code}
 create table if not exists table1(
   gmt VARCHAR NOT NULL, 
   spm_type VARCHAR NOT NULL, 
   spm VARCHAR NOT NULL, 
   A.int_a INTEGER, 
   B.int_b INTEGER, 
   B.int_c INTEGER 
   CONSTRAINT pk PRIMARY KEY (gmt, spm_type, spm)) SALT_BUCKETS = 4, 
 bloomfilter='ROW';
 {code}
 and made the table 29 partitions as this.
 |startrow|endrow|
 | |\x0020140201|
 |\x0020140201|\x0020140202|
 |\x0020140202|\x0020140203|
 |\x0020140203|\x0020140204|
 |\x0020140204|\x0020140205|   
 |\x0020140205|\x0020140206|   
 |\x0020140206|\x0020140207|
 |\x0020140207|\x0120140201|
 |\x0120140201|\x0120140202|
 |\x0120140202|\x0120140203|
 |\x0120140203|\x0120140204|
 |\x0120140204|\x0120140205|
 |\x0120140205|\x0120140206|
 |\x0120140206|\x0120140207|
 |\x0120140207|\x0220140201|
 |\x0220140201|\x0220140202|
 |\x0220140202|\x0220140203|
 |\x0220140203|\x0220140204|
 |\x0220140204|\x0220140205|
 |\x0220140205|\x0220140206|
 |\x0220140206|\x0220140207|
 |\x0220140207|\x0320140201|
 |\x0320140201|\x0320140202|
 |\x0320140202|\x0320140203|
 |\x0320140203|\x0320140204|
 |\x0320140204|\x0320140205|
 |\x0320140205|\x0320140206|
 |\x0320140206|\x0320140207|
 |\x0320140207| |  
 Then insert some data;
 |GMT |  SPM_TYPE  |SPM |   INT_A|   INT_B|   INT_C
 |
 | 20140201   | 1  | 1.2.3.4546 | 218| 218| null   
 |
 | 20140201   | 1  | 1.2.44545  | 190| 190| null   
 |
 | 20140201   | 1  | 1.353451312 | 246| 246| null  
  |
 | 20140201   | 2  | 1.2.3.6775 | 183| 183| null   
 |
 |...|...|...|...|...|...|
 | 20140207   | 3  | 1.2.3.4546 | 224| 224| null   
 |
 | 20140207   | 3  | 1.2.44545  | 196| 196| null   
 |
 | 20140207   | 3  | 1.353451312 | 168| 168| null  
  |
 | 20140207   | 4  | 1.2.3.6775 | 189| 189| null   
 |
 | 20140207   | 4  | 1.23.345345 | 217| 217| null  
  |
 | 20140207   | 4  | 1.23234234234 | 245| 245| null
|
 print a log like this
 {code}
 public class ParallelIterators extends ExplainTable implements 
 ResultIterators {
 
  @Override
 public ListPeekingResultIterator getIterators() throws SQLException {
 boolean success = false;
 final ConnectionQueryServices services = 
 context.getConnection().getQueryServices();
 ReadOnlyProps props = services.getProps();
 int numSplits = splits.size();
 ListPeekingResultIterator iterators = new 
 ArrayListPeekingResultIterator(numSplits);
 ListPairbyte[],FuturePeekingResultIterator futures = new 
 ArrayListPairbyte[],FuturePeekingResultIterator(numSplits);
 final UUID scanId = UUID.randomUUID();
 try {
 ExecutorService executor = services.getExecutor();
 System.out.println(the split size is  + numSplits);
  
  }
 }
 {code}
 then execute some sql 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2' and spm like '1.%'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207'
 the split size is 27
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2' and spm like '1.%'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204'
 the split size is 12
 {code}
 but I think 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2' and spm like '1.%'
 {code}
 and 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' 
 {code}
 the two sql will has the same split , but why not?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PHOENIX-1065) In order by case, I think nulls last will be default

2014-07-08 Thread jay wong (JIRA)
jay wong created PHOENIX-1065:
-

 Summary: In order by case, I think nulls last will be default
 Key: PHOENIX-1065
 URL: https://issues.apache.org/jira/browse/PHOENIX-1065
 Project: Phoenix
  Issue Type: New Feature
Reporter: jay wong


1.  jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder;

|PK1 |COL1|COL2|

| row1   | 2  | 3  |

| row2   | 3  | 4  |

| row3   | 4  | 5  |

| row4   | 5  | null   |

| row5   | 5  | null   |


2.jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder order by col2;

|PK1 |COL1|COL2|

| row4   | 5  | null   |

| row5   | 5  | null   |

| row1   | 2  | 3  |

| row2   | 3  | 4  |

| row3   | 4  | 5  |

3. jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder order by col2 desc;


|PK1 |COL1|COL2|

| row4   | 5  | null   |

| row5   | 5  | null   |

| row3   | 4  | 5  |

| row2   | 3  | 4  |

| row1   | 2  | 3  |



I think it will be at last when the column is null as default 

As mysql or oracle DB is at last.

It is More in line with normal logic of thinking.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PHOENIX-1064) In order by case, I think nulls last will be default

2014-07-08 Thread jay wong (JIRA)
jay wong created PHOENIX-1064:
-

 Summary: In order by case, I think nulls last will be default
 Key: PHOENIX-1064
 URL: https://issues.apache.org/jira/browse/PHOENIX-1064
 Project: Phoenix
  Issue Type: New Feature
Reporter: jay wong


1.  jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder;

|PK1 |COL1|COL2|

| row1   | 2  | 3  |

| row2   | 3  | 4  |

| row3   | 4  | 5  |

| row4   | 5  | null   |

| row5   | 5  | null   |


2.jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder order by col2;

|PK1 |COL1|COL2|

| row4   | 5  | null   |

| row5   | 5  | null   |

| row1   | 2  | 3  |

| row2   | 3  | 4  |

| row3   | 4  | 5  |

3. jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder order by col2 desc;


|PK1 |COL1|COL2|

| row4   | 5  | null   |

| row5   | 5  | null   |

| row3   | 4  | 5  |

| row2   | 3  | 4  |

| row1   | 2  | 3  |



I think it will be at last when the column is null as default 

As mysql or oracle DB is at last.

It is More in line with normal logic of thinking.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PHOENIX-1064) In order by case, I think nulls last will be default

2014-07-08 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1064:
--

Description: 
1.  jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder;

|PK1 |COL1|COL2|
| row1   | 2  | 3  |
| row2   | 3  | 4  |
| row3   | 4  | 5  |
| row4   | 5  | null   |
| row5   | 5  | null   |


2.jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder order by col2;

|PK1 |COL1|COL2|
| row4   | 5  | null   |
| row5   | 5  | null   |
| row1   | 2  | 3  |
| row2   | 3  | 4  |
| row3   | 4  | 5  |

3. jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder order by col2 desc;


|PK1 |COL1|COL2|
| row4   | 5  | null   |
| row5   | 5  | null   |
| row3   | 4  | 5  |
| row2   | 3  | 4  |
| row1   | 2  | 3  |



I think it will be at last when the column is null as default 

As mysql or oracle DB is at last.

It is More in line with normal logic of thinking.

  was:
1.  jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder;

|PK1 |COL1|COL2|

| row1   | 2  | 3  |

| row2   | 3  | 4  |

| row3   | 4  | 5  |

| row4   | 5  | null   |

| row5   | 5  | null   |


2.jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder order by col2;

|PK1 |COL1|COL2|

| row4   | 5  | null   |

| row5   | 5  | null   |

| row1   | 2  | 3  |

| row2   | 3  | 4  |

| row3   | 4  | 5  |

3. jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder order by col2 desc;


|PK1 |COL1|COL2|

| row4   | 5  | null   |

| row5   | 5  | null   |

| row3   | 4  | 5  |

| row2   | 3  | 4  |

| row1   | 2  | 3  |



I think it will be at last when the column is null as default 

As mysql or oracle DB is at last.

It is More in line with normal logic of thinking.


 In order by case, I think nulls last will be default
 

 Key: PHOENIX-1064
 URL: https://issues.apache.org/jira/browse/PHOENIX-1064
 Project: Phoenix
  Issue Type: New Feature
Reporter: jay wong

 1.  jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder;
 |PK1 |COL1|COL2|
 | row1   | 2  | 3  |
 | row2   | 3  | 4  |
 | row3   | 4  | 5  |
 | row4   | 5  | null   |
 | row5   | 5  | null   |
 2.jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder order by col2;
 |PK1 |COL1|COL2|
 | row4   | 5  | null   |
 | row5   | 5  | null   |
 | row1   | 2  | 3  |
 | row2   | 3  | 4  |
 | row3   | 4  | 5  |
 3. jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder order by col2 
 desc;
 |PK1 |COL1|COL2|
 | row4   | 5  | null   |
 | row5   | 5  | null   |
 | row3   | 4  | 5  |
 | row2   | 3  | 4  |
 | row1   | 2  | 3  |
 I think it will be at last when the column is null as default 
 As mysql or oracle DB is at last.
 It is More in line with normal logic of thinking.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PHOENIX-1064) In order by case, I think nulls last will be default

2014-07-08 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1064:
--

Attachment: PHOENIX-1064.patch

 In order by case, I think nulls last will be default
 

 Key: PHOENIX-1064
 URL: https://issues.apache.org/jira/browse/PHOENIX-1064
 Project: Phoenix
  Issue Type: New Feature
Reporter: jay wong
 Attachments: PHOENIX-1064.patch


 1.  jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder;
 |PK1 |COL1|COL2|
 | row1   | 2  | 3  |
 | row2   | 3  | 4  |
 | row3   | 4  | 5  |
 | row4   | 5  | null   |
 | row5   | 5  | null   |
 2.jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder order by col2;
 |PK1 |COL1|COL2|
 | row4   | 5  | null   |
 | row5   | 5  | null   |
 | row1   | 2  | 3  |
 | row2   | 3  | 4  |
 | row3   | 4  | 5  |
 3. jdbc:phoenix:ip:/hbase-phoenix-l select * from testorder order by col2 
 desc;
 |PK1 |COL1|COL2|
 | row4   | 5  | null   |
 | row5   | 5  | null   |
 | row3   | 4  | 5  |
 | row2   | 3  | 4  |
 | row1   | 2  | 3  |
 I think it will be at last when the column is null as default 
 As mysql or oracle DB is at last.
 It is More in line with normal logic of thinking.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1062) A SQL Trimmer for log sql execute times

2014-07-08 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055756#comment-14055756
 ] 

jay wong commented on PHOENIX-1062:
---

[~jamestaylor]

when any sql execute. just like

select col1 from table1 where col2=1;
select col1 from table1 where col2=2;
select col1 from table1 where col2=3;

select col1 from table1 where col2=99;

I think it's the sql : select col1 from table1 where col2=?; execute 99 times. 

so the tool trim any sql condition value or limit value to '?'

 A SQL Trimmer for log sql execute times
 ---

 Key: PHOENIX-1062
 URL: https://issues.apache.org/jira/browse/PHOENIX-1062
 Project: Phoenix
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: jay wong
Priority: Critical
 Fix For: 3.1

 Attachments: SQLTrimmer.java


 If we need a statistics that which sql execute times
 just like : select a,b,c from table1 where d=13 and e='abc' limit 20;
 but the condition value is not needed because of overlap
 so the will be trim as : select a,b,c from table1 where d=? and e=? limit ?;
 Now the tool fix it



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-08 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055809#comment-14055809
 ] 

jay wong commented on PHOENIX-1056:
---

[~jamestaylor]
yes . It create both table data and indexes data (HFile) in a single job.

The patch is a Alpha version. I build it for a preview.

Finally it's will be a part of CsvImportTsv.


 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PHOENIX-1073) A memory table in every region is needed?

2014-07-08 Thread jay wong (JIRA)
jay wong created PHOENIX-1073:
-

 Summary: A memory table in every region is needed?
 Key: PHOENIX-1073
 URL: https://issues.apache.org/jira/browse/PHOENIX-1073
 Project: Phoenix
  Issue Type: Wish
Reporter: jay wong


When a do a group by query.

We assume that a Region has 30M data.

100K row which include 30 kv per row

And the RT of GroupedAggregateRegionObserver is about 3 sec.

but most of time. in fact nearly  2.2 sec is spend on RegionScaner scan all the 
row.

I have a test.  first time scan all of the data into memory.
the second time only load the data from memory.

the RT of GroupedAggregateRegionObserver execute only 0.7s.

So If a memory table is needed for Phoenix computational intensive scene



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-07 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1056:
--

Attachment: PHOENIX-1056.patch

a preview tool patch for review

 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-07 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053505#comment-14053505
 ] 

jay wong commented on PHOENIX-1056:
---

nomarl HBase ImportTsv is :

bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv 
-Dimporttsv.columns=HBASE_ROW_KEY,cf:a,cf:b,cf:c  
-Dimporttsv.bulk.output=hdfs://storefile-outputdir tablename hdfs-inputdir

Phoenix ImportTsv is this. and it support phoenix datatype

bin/hbase org.apache.hadoop.hbase.mapreduce.PhoneixImportTsv 
-Dimporttsv.columns=HBASE_ROW_KEY,CF:A:PH_INT,CF:B:PH:BIGINT,cf:c 
-Dimporttsv.index.all=true  -Dimporttsv.bulk.output=hdfs://storefile-outputdir 
tablename hdfs-inputdir

If the primary key is mutil-col. support the rule 
replace HBASE_ROW_KEY to HBASE_ROW_KEY^CF1:Q1:PH_INT^CF2:Q2^0^CF1:Q3:PH_INT

parameter:
-Dimporttsv.index.all=true.  If build all index table data, default is false
-Dimporttsv.build.table=true if build the data table, default is true.
-Dimporttsv.index.names=INDEX1,INDEX2.  which index table we build. need set 
-Dimporttsv.index.all=false.






 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-07 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053507#comment-14053507
 ] 

jay wong commented on PHOENIX-1056:
---

Anyway. In my ImportTsv

It support a char or unicode separator, which in the apache code only support a 
single-byte separators

just like 
  -Dimporttsv.separator=\001 
  -Dimporttsv.separator=\u0019

 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PHOENIX-1062) A SQL Trimmer for log sql execute times

2014-07-07 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1062:
--

Attachment: SQLTrimmer.java

 A SQL Trimmer for log sql execute times
 ---

 Key: PHOENIX-1062
 URL: https://issues.apache.org/jira/browse/PHOENIX-1062
 Project: Phoenix
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: jay wong
Priority: Critical
 Fix For: 3.1

 Attachments: SQLTrimmer.java


 If we need a statistics that which sql execute times
 just like : select a,b,c from table1 where d=13 and e='abc' limit 20;
 but the condition value is not needed because of overlap
 so the will be trim as : select a,b,c from table1 where d=? and e=? limit ?;
 Now the tool fix it



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PHOENIX-1062) A SQL Trimmer for log sql execute times

2014-07-07 Thread jay wong (JIRA)
jay wong created PHOENIX-1062:
-

 Summary: A SQL Trimmer for log sql execute times
 Key: PHOENIX-1062
 URL: https://issues.apache.org/jira/browse/PHOENIX-1062
 Project: Phoenix
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: jay wong
Priority: Critical
 Fix For: 3.1
 Attachments: SQLTrimmer.java

If we need a statistics that which sql execute times

just like : select a,b,c from table1 where d=13 and e='abc' limit 20;

but the condition value is not needed because of overlap

so the will be trim as : select a,b,c from table1 where d=? and e=? limit ?;

Now the tool fix it



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (PHOENIX-1053) UngroupedAggregateRegionObserver make lock unlock

2014-07-01 Thread jay wong (JIRA)
jay wong created PHOENIX-1053:
-

 Summary: UngroupedAggregateRegionObserver make lock unlock
 Key: PHOENIX-1053
 URL: https://issues.apache.org/jira/browse/PHOENIX-1053
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1
 Attachments: PHOENIX-1053.patch

{code}
region.startRegionOperation();
try {
 //do something
} finally {
innerScanner.close();
region.closeRegionOperation();
}
{code}

but when innerScanner.close() throw a exception. the lock will not be unlock .




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-136) Support derived tables

2014-06-25 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043287#comment-14043287
 ] 

jay wong commented on PHOENIX-136:
--

[~maryannxue]
Could you attach a patch for 3.0.0-release version. thx

 Support derived tables
 --

 Key: PHOENIX-136
 URL: https://issues.apache.org/jira/browse/PHOENIX-136
 Project: Phoenix
  Issue Type: Task
Reporter: James Taylor
Assignee: Maryann Xue
  Labels: enhancement
 Fix For: 5.0.0, 3.1, 4.1


 Add support for derived queries of the form:
 SELECT * FROM ( SELECT company, revenue FROM Company ORDER BY revenue) LIMIT 
 10
 Adding support for this requires a compile time change as well as a runtime 
 execution change. The first version of the compile-time change could limit 
 aggregation to only be allowed in the inner or the outer query, but not both. 
 In this case, the inner and outer queries can be combined into a single query 
 with the outer select becoming just a remapping of a subset of the projection 
 from the inner select. The second version of the compile-time change could 
 handle aggregation in the inner and outer select by performing client side 
 (this is likely a less common scenario).
 For the runtime execution, change the UngroupedAggregateRegionObserver would 
 be modified to look for a new TopNLimit attribute with an int value in the 
 Scan. This would control the maximum number of values for the coprocessor to 
 hold on to as the scan is performed. Then the 
 GroupedAggregatingResultIterator would be modified to handle keeping the topN 
 values received back from all the child iterators.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1040) support qurey using metadata cache

2014-06-10 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027362#comment-14027362
 ] 

jay wong commented on PHOENIX-1040:
---

[~jamestaylor]

I think it's remarkable in using simple query, such as  primary key condition 
query.

At before, phoneix MetaData cache is not using as a rule.

But the opportunity for update meta data cache it's a option for user setting.

 support  qurey using metadata cache
 ---

 Key: PHOENIX-1040
 URL: https://issues.apache.org/jira/browse/PHOENIX-1040
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: daniel meng
  Labels: PointLookup, metadata
 Attachments: PHOENIX-1040.PATCH


 I have use case like : select * from t where ROW = ? (ROW is primary key),so 
 it just a hbase get.
 and it turn out a single query will do two RPCs for metadata, they are all 
 calling to 
 {code}
 public static ColumnResolver getResolverForQuery(SelectStatement 
 statement, PhoenixConnection connection)
   throws SQLException {
   ListTableNode fromNodes = statement.getFrom();
 if (fromNodes.size() == 1)
 return new SingleTableColumnResolver(connection, 
 (NamedTableNode)fromNodes.get(0), true);
 {code}
 as you can see SingleTableColumnResolver(connection, 
 (NamedTableNode)fromNodes.get(0), true), the third parameter is hard coded 
 to true, which means it never use cache. for performance, we let the query 
 use metadata cache, our test show 30% improvement on RT in our use case.
 in our patch, we do:
 1. add an timer based updater at the global cache (the one in 
 ConnectionQueryServices)
 2. the phoenixconnection doesn't have it's own cache, it has a reference to 
 global cache instead 
 i'm looking forward for your advice, thanks



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1027) Why position be reset in SkipScanFilter

2014-06-07 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020753#comment-14020753
 ] 

jay wong commented on PHOENIX-1027:
---

[~jamestaylor]

OK. thx .

I try to recurred the case I other table or a Unit Test. 

But I didn't found the way to recurred.

But In a produce table. the error confirm exist. 

So I create the issue for help if  I can do anything to recurred the error.

if the code

{code}
 (position[i] = (position[i] + 1) % slots.get(i).size()) == 0)
change to
 ((position[i] + 1) % slots.get(i).size()) == 0)
{code}

it will be all right. I want to know the change it is ok or not ?

 Why position be reset in SkipScanFilter
 ---

 Key: PHOENIX-1027
 URL: https://issues.apache.org/jira/browse/PHOENIX-1027
 Project: Phoenix
  Issue Type: Bug
Reporter: jay wong

 the method in SkipScanFilter.java
 {code}
  private int nextPosition(int i) {
 while (i = 0  slots.get(i).get(position[i]).isSingleKey()  
 (position[i] = (position[i] + 1) % slots.get(i).size()) == 0) {
 i--;
 }
 return i;
 }
 {code}
 In a query case. position[] has two elements position[0] = 3, position[1] = 
 0. 
 and slots.get( i ).size() = 1. 
 after the mothod execute two element in position[] are all 0.
 then
 {code}
  private boolean intersect(byte[] lowerInclusiveKey, byte[] 
 upperExclusiveKey, ListListKeyRange newSlots) {
 
  for (int i = 0; i = lastSlot; i++) {
 ListKeyRange newRanges = slots.get(i).subList(lowerPosition[i], 
 Math.min(position[i] + 1, slots.get(i).size()));
 .
 }
 return true;
 }
 {code}
 lowerPosition[i] = 3, Math.min(position[i] + 1, 
 slots.get( i ).size()) = 0 as a result of position[i] = 0
 so:
 {code}
 java.lang.IndexOutOfBoundsException: end index (1) must not be less than 
 start index (3)
 at 
 com.google.common.base.Preconditions.checkPositionIndexes(Preconditions.java:384)
 at 
 com.google.common.collect.RegularImmutableList.subList(RegularImmutableList.java:118)
 at 
 com.google.common.collect.RegularImmutableList.subList(RegularImmutableList.java:31)
 at org.apache.phoenix.filter.SkipScanFilter.intersect(SkipScanFilter.java:221)
 at 
 org.apache.phoenix.filter.SkipScanFilter.hasIntersect(SkipScanFilter.java:121)
 at org.apache.phoenix.compile.ScanRanges.intersect(ScanRanges.java:251)
 at 
 org.apache.phoenix.iterate.SkipRangeParallelIteratorRegionSplitter$1.apply(SkipRangeParallelIteratorRegionSplitter.java:84)
 at 
 org.apache.phoenix.iterate.SkipRangeParallelIteratorRegionSplitter$1.apply(SkipRangeParallelIteratorRegionSplitter.java:65)
 at com.google.common.collect.Iterators$7.computeNext(Iterators.java:649)
 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at com.google.common.collect.Lists.newArrayList(Lists.java:138)
 at com.google.common.collect.Lists.newArrayList(Lists.java:119)
 at 
 org.apache.phoenix.iterate.SkipRangeParallelIteratorRegionSplitter.filterRegions(SkipRangeParallelIteratorRegionSplitter.java:93)
 at 
 org.apache.phoenix.iterate.SkipRangeParallelIteratorRegionSplitter.getAllRegions(SkipRangeParallelIteratorRegionSplitter.java:54)
 at 
 org.apache.phoenix.iterate.DefaultParallelIteratorRegionSplitter.getSplits(DefaultParallelIteratorRegionSplitter.java:244)
 at 
 org.apache.phoenix.iterate.ParallelIterators.getSplits(ParallelIterators.java:205)
 at 
 org.apache.phoenix.iterate.ParallelIterators.init(ParallelIterators.java:78)
 at 
 org.apache.phoenix.execute.AggregatePlan.newIterator(AggregatePlan.java:162)
 at org.apache.phoenix.execute.BasicQueryPlan.iterator(BasicQueryPlan.java:144)
 at org.apache.phoenix.execute.BasicQueryPlan.iterator(BasicQueryPlan.java:125)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:202)
 at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:922)
 at sqlline.SqlLine$Commands.execute(SqlLine.java:3673)
 at sqlline.SqlLine$Commands.sql(SqlLine.java:3584)
 at sqlline.SqlLine.dispatch(SqlLine.java:821)
 at sqlline.SqlLine.begin(SqlLine.java:699)
 at sqlline.SqlLine.mainWithInputRedirection(SqlLine.java:441)
 at sqlline.SqlLine.main(SqlLine.java:424)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PHOENIX-1025) PhoenixConnection constructor Peoperties set isn't deep copy

2014-06-06 Thread jay wong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jay wong updated PHOENIX-1025:
--

Attachment: PHOENIX-1025.patch

 PhoenixConnection constructor Peoperties set isn't deep copy
 

 Key: PHOENIX-1025
 URL: https://issues.apache.org/jira/browse/PHOENIX-1025
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.0.0

 Attachments: PHOENIX-1025.patch


 {code}
  public PhoenixConnection(ConnectionQueryServices services, String url, 
 Properties info, PMetaData metaData) throws SQLException {
 this.url = url;
 // Copy so client cannot change
 this.info = info == null ? new Properties() : new Properties(info);
 final PName tenantId = JDBCUtil.getTenantId(url, info);
 if (this.info.isEmpty()  tenantId == null) {
 this.services = services;
 } else {
 .
 }
 }
 {code}
 As this.info is set with new Properties(info).
 So at the code below info.isEmpty() is always return true.
 I think this.info should be set as PropertiesUtil.deepCopy(info); 



--
This message was sent by Atlassian JIRA
(v6.2#6252)