[jira] [Commented] (PHOENIX-1056) A ImportTsv tool for phoenix to build table data and all index data.

2014-07-10 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057258#comment-14057258
 ] 

James Taylor commented on PHOENIX-1056:
---

Good point, [~jaywong].

 A ImportTsv tool for phoenix to build table data and all index data.
 

 Key: PHOENIX-1056
 URL: https://issues.apache.org/jira/browse/PHOENIX-1056
 Project: Phoenix
  Issue Type: Task
Affects Versions: 3.0.0
Reporter: jay wong
 Fix For: 3.1

 Attachments: PHOENIX-1056.patch


 I have just build a tool for build table data and index table data just like 
 ImportTsv job.
 http://hbase.apache.org/book/ops_mgt.html#importtsv
 when ImportTsv work it write HFile in a CF name path.
 for example A table has two cf, A and B.
 the output is 
 ./outputpath/A
 ./outputpath/B
 In my job. we has a table.  TableOne. and two Index IdxOne, IdxTwo.
 the output will be
 ./outputpath/TableOne/A
 ./outputpath/TableOne/B
 ./outputpath/IdxOne
 ./outputpath/IdxTwo.
 If anyone need it .I will build a clean tool.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PHOENIX-1016) Support MINVALUE, MAXVALUE, and CYCLE options in CREATE SEQUENCE

2014-07-10 Thread Thomas D'Silva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva updated PHOENIX-1016:


Attachment: PHOENIX-1016.v2.patch
PHOENIX-1016.v2.3.0.patch

 Support MINVALUE, MAXVALUE, and CYCLE options in CREATE SEQUENCE
 

 Key: PHOENIX-1016
 URL: https://issues.apache.org/jira/browse/PHOENIX-1016
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor
Assignee: Thomas D'Silva
 Attachments: PHOENIX-1016.3.0.patch, PHOENIX-1016.patch, 
 PHOENIX-1016.v2.3.0.patch, PHOENIX-1016.v2.patch


 We currently don't support MINVALUE, MAXVALUE, and CYCLE options in CREATE 
 SEQUENCE, but we should. See 
 http://msdn.microsoft.com/en-us/library/ff878091.aspx for the syntax.
 I believe MINVALUE applies if the INCREMENT is negative while MAXVALUE 
 applies otherwise. If the value of a sequence goes beyond MINVALUE/MAXVALUE, 
 then:
 - if CYCLE is true, then the sequence value should start again at the START 
 WITH value (or the MINVALUE if specified too? Not sure about this).
 - if CYCLE is false, then an exception should be thrown.
 To implement this:
 - make the grammar changes in PhoenixSQL.g
 - add member variables for MINVALUE, MAXVALUE, and CYCLE to 
 CreateSequenceStatement
 - add the appropriate error checking and handle bind variables for these new 
 options in CreateSequenceCompiler
 - modify the MetaDataClient.createSequence() call by passing along these new 
 parameters.
 - same for ConnectionQueryServices.createSequence() call
 - same for Sequence.createSequence().
 - pass along these parameters as new KeyValues in the Append that constitutes 
 the RPC call
 - act on these in the SequenceRegionObserver coprocessor as indicated above.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (PHOENIX-1079) ConnectionQueryServicesImpl : Close HTable after use

2014-07-10 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved PHOENIX-1079.
-

   Resolution: Fixed
Fix Version/s: (was: 4.0.0)
   (was: 3.0.0)
   4.1
   3.1

Committed to all branches. Thanks for the patch Samarth Jain.

 ConnectionQueryServicesImpl : Close HTable after use
 

 Key: PHOENIX-1079
 URL: https://issues.apache.org/jira/browse/PHOENIX-1079
 Project: Phoenix
  Issue Type: Bug
Reporter: Samarth Jain
Assignee: Samarth Jain
 Fix For: 5.0.0, 3.1, 4.1

 Attachments: master.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1074) ParallelIteratorRegionSplitterFactory get Splits is not rational

2014-07-10 Thread jay wong (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057369#comment-14057369
 ] 

jay wong commented on PHOENIX-1074:
---

[~jamestaylor]
please check my problem again.
this is my primary key. and salt_buckets.
{code}
CONSTRAINT pk PRIMARY KEY (gmt, spm_type, spm)) SALT_BUCKETS = 4
{code}

{code}
select * from table1 where gmt  '20140202' and gmt  '20140204'

the split size is 12  (is logical)
{code}

{code}
select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type = 
'2'

the split size is 28(I think the split size is also 12 is logical)
{code}

this is only a epitome.

In my online table . has 1900 regions.

If it's run with logical splits policy, only has nearly 20 splits.

BUT it has 1900 splits



 ParallelIteratorRegionSplitterFactory get Splits is not rational
 

 Key: PHOENIX-1074
 URL: https://issues.apache.org/jira/browse/PHOENIX-1074
 Project: Phoenix
  Issue Type: Bug
Reporter: jay wong

 create a table 
 {code}
 create table if not exists table1(
   gmt VARCHAR NOT NULL, 
   spm_type VARCHAR NOT NULL, 
   spm VARCHAR NOT NULL, 
   A.int_a INTEGER, 
   B.int_b INTEGER, 
   B.int_c INTEGER 
   CONSTRAINT pk PRIMARY KEY (gmt, spm_type, spm)) SALT_BUCKETS = 4, 
 bloomfilter='ROW';
 {code}
 and made the table 29 partitions as this.
 |startrow|endrow|
 | |\x0020140201|
 |\x0020140201|\x0020140202|
 |\x0020140202|\x0020140203|
 |\x0020140203|\x0020140204|
 |\x0020140204|\x0020140205|   
 |\x0020140205|\x0020140206|   
 |\x0020140206|\x0020140207|
 |\x0020140207|\x0120140201|
 |\x0120140201|\x0120140202|
 |\x0120140202|\x0120140203|
 |\x0120140203|\x0120140204|
 |\x0120140204|\x0120140205|
 |\x0120140205|\x0120140206|
 |\x0120140206|\x0120140207|
 |\x0120140207|\x0220140201|
 |\x0220140201|\x0220140202|
 |\x0220140202|\x0220140203|
 |\x0220140203|\x0220140204|
 |\x0220140204|\x0220140205|
 |\x0220140205|\x0220140206|
 |\x0220140206|\x0220140207|
 |\x0220140207|\x0320140201|
 |\x0320140201|\x0320140202|
 |\x0320140202|\x0320140203|
 |\x0320140203|\x0320140204|
 |\x0320140204|\x0320140205|
 |\x0320140205|\x0320140206|
 |\x0320140206|\x0320140207|
 |\x0320140207| |  
 Then insert some data;
 |GMT |  SPM_TYPE  |SPM |   INT_A|   INT_B|   INT_C
 |
 | 20140201   | 1  | 1.2.3.4546 | 218| 218| null   
 |
 | 20140201   | 1  | 1.2.44545  | 190| 190| null   
 |
 | 20140201   | 1  | 1.353451312 | 246| 246| null  
  |
 | 20140201   | 2  | 1.2.3.6775 | 183| 183| null   
 |
 |...|...|...|...|...|...|
 | 20140207   | 3  | 1.2.3.4546 | 224| 224| null   
 |
 | 20140207   | 3  | 1.2.44545  | 196| 196| null   
 |
 | 20140207   | 3  | 1.353451312 | 168| 168| null  
  |
 | 20140207   | 4  | 1.2.3.6775 | 189| 189| null   
 |
 | 20140207   | 4  | 1.23.345345 | 217| 217| null  
  |
 | 20140207   | 4  | 1.23234234234 | 245| 245| null
|
 print a log like this
 {code}
 public class ParallelIterators extends ExplainTable implements 
 ResultIterators {
 
  @Override
 public ListPeekingResultIterator getIterators() throws SQLException {
 boolean success = false;
 final ConnectionQueryServices services = 
 context.getConnection().getQueryServices();
 ReadOnlyProps props = services.getProps();
 int numSplits = splits.size();
 ListPeekingResultIterator iterators = new 
 ArrayListPeekingResultIterator(numSplits);
 ListPairbyte[],FuturePeekingResultIterator futures = new 
 ArrayListPairbyte[],FuturePeekingResultIterator(numSplits);
 final UUID scanId = UUID.randomUUID();
 try {
 ExecutorService executor = services.getExecutor();
 System.out.println(the split size is  + numSplits);
  
  }
 }
 {code}
 then execute some sql 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2' and spm like '1.%'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207'
 the split size is 27
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2' and spm like '1.%'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204'
 the split size is 12
 {code}
 but I think 
 {code}
 select * from table1 where gmt  '20140202' and gmt  

[jira] [Commented] (PHOENIX-1079) ConnectionQueryServicesImpl : Close HTable after use

2014-07-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057394#comment-14057394
 ] 

Hudson commented on PHOENIX-1079:
-

SUCCESS: Integrated in Phoenix | Master | Hadoop1 #266 (See 
[https://builds.apache.org/job/Phoenix-master-hadoop1/266/])
PHOENIX-1079 ConnectionQueryServicesImpl : Close HTable after use.(Samarth) 
(anoopsamjohn: rev 1e61578555e2c54ed801ffa166fcab2cc4499971)
* 
phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java


 ConnectionQueryServicesImpl : Close HTable after use
 

 Key: PHOENIX-1079
 URL: https://issues.apache.org/jira/browse/PHOENIX-1079
 Project: Phoenix
  Issue Type: Bug
Reporter: Samarth Jain
Assignee: Samarth Jain
 Fix For: 5.0.0, 3.1, 4.1

 Attachments: master.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1080) Fix PhoenixRuntime.decodepk for salted tables. Add integration tests.

2014-07-10 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057418#comment-14057418
 ] 

James Taylor commented on PHOENIX-1080:
---

[~anoop.hbase] - would you mind committing this one too?

 Fix PhoenixRuntime.decodepk for salted tables. Add integration tests.
 -

 Key: PHOENIX-1080
 URL: https://issues.apache.org/jira/browse/PHOENIX-1080
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0, 4.0.0, 5.0.0
Reporter: Samarth Jain
Assignee: Samarth Jain
 Attachments: encodeDecode_3.patch, encodeDecode_master_4.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[GitHub] phoenix pull request: PHOENIX-933 Local index support to Phoenix

2014-07-10 Thread JamesRTaylor
Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/1#discussion_r14762643
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/compile/TrackOrderPreservingExpressionCompiler.java
 ---
@@ -69,6 +70,7 @@
 boolean isSharedViewIndex = table.getViewIndexId() != null;
 // TODO: util for this offset, as it's computed in numerous places
 positionOffset = (isSalted ? 1 : 0) + (isMultiTenant ? 1 : 0) + 
(isSharedViewIndex ? 1 : 0);
+this.isOrderPreserving = table.getIndexType() != IndexType.LOCAL;
--- End diff --

One thing that's necessary, though, to maintain rows in row key order is to 
modify ScanPlan.java:118 to do a merge sort instead of a concat:

if ((isSalted || isLocalIndex) 

(context.getConnection().getQueryServices().getProps().getBoolean(
QueryServices.ROW_KEY_ORDER_SALTED_TABLE_ATTRIB,

QueryServicesOptions.DEFAULT_ROW_KEY_ORDER_SALTED_TABLE) ||
 orderBy == OrderBy.FWD_ROW_KEY_ORDER_BY ||
 orderBy == OrderBy.REV_ROW_KEY_ORDER_BY)) { // ORDER 
BY was optimized out b/c query is in row key order
scanner = new MergeSortRowKeyResultIterator(iterators, 
SaltingUtil.NUM_SALTING_BYTES, orderBy == OrderBy.REV_ROW_KEY_ORDER_BY);
} else {
scanner = new ConcatResultIterator(iterators);
}

Local indexes are similar to salted tables in that the parallel scans will 
all be within a region, ordered correctly. As long as we do a merge sort across 
the results of these scans, the rows will be ordered correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (PHOENIX-933) Local index support to Phoenix

2014-07-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057420#comment-14057420
 ] 

ASF GitHub Bot commented on PHOENIX-933:


Github user JamesRTaylor commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/1#discussion_r14762643
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/compile/TrackOrderPreservingExpressionCompiler.java
 ---
@@ -69,6 +70,7 @@
 boolean isSharedViewIndex = table.getViewIndexId() != null;
 // TODO: util for this offset, as it's computed in numerous places
 positionOffset = (isSalted ? 1 : 0) + (isMultiTenant ? 1 : 0) + 
(isSharedViewIndex ? 1 : 0);
+this.isOrderPreserving = table.getIndexType() != IndexType.LOCAL;
--- End diff --

One thing that's necessary, though, to maintain rows in row key order is to 
modify ScanPlan.java:118 to do a merge sort instead of a concat:

if ((isSalted || isLocalIndex) 

(context.getConnection().getQueryServices().getProps().getBoolean(
QueryServices.ROW_KEY_ORDER_SALTED_TABLE_ATTRIB,

QueryServicesOptions.DEFAULT_ROW_KEY_ORDER_SALTED_TABLE) ||
 orderBy == OrderBy.FWD_ROW_KEY_ORDER_BY ||
 orderBy == OrderBy.REV_ROW_KEY_ORDER_BY)) { // ORDER 
BY was optimized out b/c query is in row key order
scanner = new MergeSortRowKeyResultIterator(iterators, 
SaltingUtil.NUM_SALTING_BYTES, orderBy == OrderBy.REV_ROW_KEY_ORDER_BY);
} else {
scanner = new ConcatResultIterator(iterators);
}

Local indexes are similar to salted tables in that the parallel scans will 
all be within a region, ordered correctly. As long as we do a merge sort across 
the results of these scans, the rows will be ordered correctly.


 Local index support to Phoenix
 --

 Key: PHOENIX-933
 URL: https://issues.apache.org/jira/browse/PHOENIX-933
 Project: Phoenix
  Issue Type: New Feature
Reporter: rajeshbabu

 Hindex(https://github.com/Huawei-Hadoop/hindex) provides local indexing 
 support to HBase. It stores region level index in a separate table, and 
 co-locates the user and index table regions with a custom load balancer.
 See http://goo.gl/phkhwC and http://goo.gl/EswlxC for more information. 
 This JIRA addresses the local indexing solution integration to phoenix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1074) ParallelIteratorRegionSplitterFactory get Splits is not rational

2014-07-10 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057436#comment-14057436
 ] 

James Taylor commented on PHOENIX-1074:
---

The second query is using a skip scan because there's range information for all 
column in your PK:
{code}
select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type = 
'2' and spm like '1.%'
{code}
So it'll run the skip scan over all regions since the table is salted. How is 
performance for this query? 

You can force it to do a range scan with a hint like this:
{code}
select /*+ RANGE_SCAN */ from table1 where gmt  '20140202' and gmt  
'20140207' and spm_type = '2' and spm like '1.%'
{code}

Please let us know how performance compares between the two.


 ParallelIteratorRegionSplitterFactory get Splits is not rational
 

 Key: PHOENIX-1074
 URL: https://issues.apache.org/jira/browse/PHOENIX-1074
 Project: Phoenix
  Issue Type: Bug
Reporter: jay wong

 create a table 
 {code}
 create table if not exists table1(
   gmt VARCHAR NOT NULL, 
   spm_type VARCHAR NOT NULL, 
   spm VARCHAR NOT NULL, 
   A.int_a INTEGER, 
   B.int_b INTEGER, 
   B.int_c INTEGER 
   CONSTRAINT pk PRIMARY KEY (gmt, spm_type, spm)) SALT_BUCKETS = 4, 
 bloomfilter='ROW';
 {code}
 and made the table 29 partitions as this.
 |startrow|endrow|
 | |\x0020140201|
 |\x0020140201|\x0020140202|
 |\x0020140202|\x0020140203|
 |\x0020140203|\x0020140204|
 |\x0020140204|\x0020140205|   
 |\x0020140205|\x0020140206|   
 |\x0020140206|\x0020140207|
 |\x0020140207|\x0120140201|
 |\x0120140201|\x0120140202|
 |\x0120140202|\x0120140203|
 |\x0120140203|\x0120140204|
 |\x0120140204|\x0120140205|
 |\x0120140205|\x0120140206|
 |\x0120140206|\x0120140207|
 |\x0120140207|\x0220140201|
 |\x0220140201|\x0220140202|
 |\x0220140202|\x0220140203|
 |\x0220140203|\x0220140204|
 |\x0220140204|\x0220140205|
 |\x0220140205|\x0220140206|
 |\x0220140206|\x0220140207|
 |\x0220140207|\x0320140201|
 |\x0320140201|\x0320140202|
 |\x0320140202|\x0320140203|
 |\x0320140203|\x0320140204|
 |\x0320140204|\x0320140205|
 |\x0320140205|\x0320140206|
 |\x0320140206|\x0320140207|
 |\x0320140207| |  
 Then insert some data;
 |GMT |  SPM_TYPE  |SPM |   INT_A|   INT_B|   INT_C
 |
 | 20140201   | 1  | 1.2.3.4546 | 218| 218| null   
 |
 | 20140201   | 1  | 1.2.44545  | 190| 190| null   
 |
 | 20140201   | 1  | 1.353451312 | 246| 246| null  
  |
 | 20140201   | 2  | 1.2.3.6775 | 183| 183| null   
 |
 |...|...|...|...|...|...|
 | 20140207   | 3  | 1.2.3.4546 | 224| 224| null   
 |
 | 20140207   | 3  | 1.2.44545  | 196| 196| null   
 |
 | 20140207   | 3  | 1.353451312 | 168| 168| null  
  |
 | 20140207   | 4  | 1.2.3.6775 | 189| 189| null   
 |
 | 20140207   | 4  | 1.23.345345 | 217| 217| null  
  |
 | 20140207   | 4  | 1.23234234234 | 245| 245| null
|
 print a log like this
 {code}
 public class ParallelIterators extends ExplainTable implements 
 ResultIterators {
 
  @Override
 public ListPeekingResultIterator getIterators() throws SQLException {
 boolean success = false;
 final ConnectionQueryServices services = 
 context.getConnection().getQueryServices();
 ReadOnlyProps props = services.getProps();
 int numSplits = splits.size();
 ListPeekingResultIterator iterators = new 
 ArrayListPeekingResultIterator(numSplits);
 ListPairbyte[],FuturePeekingResultIterator futures = new 
 ArrayListPairbyte[],FuturePeekingResultIterator(numSplits);
 final UUID scanId = UUID.randomUUID();
 try {
 ExecutorService executor = services.getExecutor();
 System.out.println(the split size is  + numSplits);
  
  }
 }
 {code}
 then execute some sql 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2' and spm like '1.%'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207' and spm_type 
 = '2'
 the split size is 31
 select * from table1 where gmt  '20140202' and gmt  '20140207'
 the split size is 27
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2' and spm like '1.%'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204' and spm_type 
 = '2'
 the split size is 28
 select * from table1 where gmt  '20140202' and gmt  '20140204'
 the split size is 12
 {code}
 but I think 
 {code}
 select * from table1 where gmt  '20140202' and gmt  '20140207' and 

[jira] [Commented] (PHOENIX-938) Use higher priority queue for index updates to prevent deadlock

2014-07-10 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057462#comment-14057462
 ] 

James Taylor commented on PHOENIX-938:
--

How about this as a plan, [~jesse_yates] and [~apurtell]?
- check-in your patch and document that it fixes the issue for 0.98.3 only 
(assuming it doesn't break Phoenix for 0.98.2- and 0.98.3+).
- work with HBase community to make these APIs public and evolving for 
0.98.next release.
- implement solution on top of these public APIs.
I don't think transactions is really going to help with this, so it'd be good 
to get an as-permanent-as-possible solution IMO.


 Use higher priority queue for index updates to prevent deadlock
 ---

 Key: PHOENIX-938
 URL: https://issues.apache.org/jira/browse/PHOENIX-938
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.0.0, 4.1
Reporter: James Taylor
Assignee: Jesse Yates
 Fix For: 5.0.0, 4.1

 Attachments: phoenix-938-4.0-v0.patch, phoenix-938-master-v0.patch, 
 phoenix-938-master-v1.patch


 With our current global secondary indexing solution, a batched Put of table 
 data causes a RS to do a batch Put to other RSs. This has the potential to 
 lead to a deadlock if all RS are overloaded and unable to process the pending 
 batched Put. To prevent this, we should use a higher priority queue to submit 
 these Puts so that they're always processed before other Puts. This will 
 prevent the potential for a deadlock under high load. Note that this will 
 likely require some HBase 0.98 code changes and would not be feasible to 
 implement for HBase 0.94.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1071) Provide integration for exposing Phoenix tables as Spark RDDs

2014-07-10 Thread Josh Mahonin (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057549#comment-14057549
 ] 

Josh Mahonin commented on PHOENIX-1071:
---

Hi Andrew,

It's definitely a starting point. The PIG integration doesn't quite have the 
full JDBC feature set yet, so there's a fair bit of client-side processing 
necessary that could be handled server-side instead.

The DSL you describe above, including the on-demand save / schema-creation 
feature would be an amazing addition. That said, the fact that today we can 
read and process a full Phoenix data-set across a Spark cluster is pretty neat.

Josh

 Provide integration for exposing Phoenix tables as Spark RDDs
 -

 Key: PHOENIX-1071
 URL: https://issues.apache.org/jira/browse/PHOENIX-1071
 Project: Phoenix
  Issue Type: New Feature
Reporter: Andrew Purtell

 A core concept of Apache Spark is the resilient distributed dataset (RDD), a 
 fault-tolerant collection of elements that can be operated on in parallel. 
 One can create a RDDs referencing a dataset in any external storage system 
 offering a Hadoop InputFormat, like PhoenixInputFormat and 
 PhoenixOutputFormat. There could be opportunities for additional interesting 
 and deep integration. 
 Add the ability to save RDDs back to Phoenix with a {{saveAsPhoenixTable}} 
 action, implicitly creating necessary schema on demand.
 Add support for {{filter}} transformations that push predicates to the server.
 Add a new {{select}} transformation supporting a LINQ-like DSL, for example:
 {code}
 // Count the number of different coffee varieties offered by each
 // supplier from Guatemala
 phoenixTable(coffees)
 .select(c =
 where(c.origin == GT))
 .countByKey()
 .foreach(r = println(r._1 + = + r._2))
 {code} 
 Support conversions between Scala and Java types and Phoenix table data.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PHOENIX-1081) With phoenix case CPU usage 100%

2014-07-10 Thread yang ming (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yang ming updated PHOENIX-1081:
---

Description: 
The concurrent of the system is not highly,but CPU usage often up to 100%.
I had stopped the system,but regionserver's CPU usage is still high.
what can case this problem?

table row count:6000 million
table ddl:
create table if not exists summary
(
videoid integer not null,
date date not null,
platform varchar not null,
device varchar not null,
systemgroup varchar not null,
system varchar not null,
vv bigint,
ts bigint,
up bigint,
down bigint,
comment bigint,
favori bigint,
favord bigint,
quote bigint,
reply bigint
constraint pk primary key (videoid, date,platform, device, systemgroup,system)
)salt_buckets = 30,versions=1,compression='snappy';

query 1:
select sum(vv) as sumvv,sum(comment) as sumcomment,sum(up) as sumup,sum(down) 
as sumdown,sum(reply) as sumreply,count(*) as count from summary(reply bigint) 
where videoid 
in(137102991,151113895,171559204,171559439,171573932,171573932,171573932,171574082,171574082,171574164,171677219,171794335,171902734,172364368,172475141,172700554,172700554,172700554,172716705,172784258,172835778,173112067,173165316,173165316,173379601,173448315,173503961,173692664,173911358,174077089,174099017,174349633,174349877,174651474,174651474,174759297,174883566,174883566,174987670,174987670,175131298)
 and date=to_date('2013-09-01','-MM-dd') and 
date=to_date('2014-07-07','-MM-dd')

  was:
The system is not highly concurrent access,but CPU usage often 100%.
I had stopped the system,but regionserver's CPU usage is still high.
what can case this problem?

table row count:6000 million
table ddl:
create table if not exists summary
(
videoid integer not null,
date date not null,
platform varchar not null,
device varchar not null,
systemgroup varchar not null,
system varchar not null,
vv bigint,
ts bigint,
up bigint,
down bigint,
comment bigint,
favori bigint,
favord bigint,
quote bigint,
reply bigint
constraint pk primary key (videoid, date,platform, device, systemgroup,system)
)salt_buckets = 30,versions=1,compression='snappy';

query 1:
select sum(vv) as sumvv,sum(comment) as sumcomment,sum(up) as sumup,sum(down) 
as sumdown,sum(reply) as sumreply,count(*) as count from summary(reply bigint) 
where videoid 
in(137102991,151113895,171559204,171559439,171573932,171573932,171573932,171574082,171574082,171574164,171677219,171794335,171902734,172364368,172475141,172700554,172700554,172700554,172716705,172784258,172835778,173112067,173165316,173165316,173379601,173448315,173503961,173692664,173911358,174077089,174099017,174349633,174349877,174651474,174651474,174759297,174883566,174883566,174987670,174987670,175131298)
 and date=to_date('2013-09-01','-MM-dd') and 
date=to_date('2014-07-07','-MM-dd')


 With phoenix case CPU usage 100%
 

 Key: PHOENIX-1081
 URL: https://issues.apache.org/jira/browse/PHOENIX-1081
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: yang ming
Priority: Critical

 The concurrent of the system is not highly,but CPU usage often up to 100%.
 I had stopped the system,but regionserver's CPU usage is still high.
 what can case this problem?
 table row count:6000 million
 table ddl:
 create table if not exists summary
 (
 videoid integer not null,
 date date not null,
 platform varchar not null,
 device varchar not null,
 systemgroup varchar not null,
 system varchar not null,
 vv bigint,
 ts bigint,
 up bigint,
 down bigint,
 comment bigint,
 favori bigint,
 favord bigint,
 quote bigint,
 reply bigint
 constraint pk primary key (videoid, date,platform, device, systemgroup,system)
 )salt_buckets = 30,versions=1,compression='snappy';
 query 1:
 select sum(vv) as sumvv,sum(comment) as sumcomment,sum(up) as sumup,sum(down) 
 as sumdown,sum(reply) as sumreply,count(*) as count from summary(reply 
 bigint) where videoid 
 in(137102991,151113895,171559204,171559439,171573932,171573932,171573932,171574082,171574082,171574164,171677219,171794335,171902734,172364368,172475141,172700554,172700554,172700554,172716705,172784258,172835778,173112067,173165316,173165316,173379601,173448315,173503961,173692664,173911358,174077089,174099017,174349633,174349877,174651474,174651474,174759297,174883566,174883566,174987670,174987670,175131298)
  and date=to_date('2013-09-01','-MM-dd') and 
 date=to_date('2014-07-07','-MM-dd')



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PHOENIX-1081) CPU usage 100% With phoenix

2014-07-10 Thread yang ming (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yang ming updated PHOENIX-1081:
---

Summary: CPU usage 100% With phoenix   (was: With phoenix case CPU usage 
100%)

 CPU usage 100% With phoenix 
 

 Key: PHOENIX-1081
 URL: https://issues.apache.org/jira/browse/PHOENIX-1081
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: yang ming
Priority: Critical
 Attachments: JMX.jpg, jstat.jpg, the jstack of all threads, the 
 jstack of thread 12725.jpg, the jstack of thread 12748.jpg, the threads of 
 regionserver process.jpg


 The concurrent of the system is not high,but CPU usage often up to 100%.
 I had stopped the system,but regionserver's CPU usage is still high.
 what can case this problem?
 table row count:6000 million
 table ddl:
 create table if not exists summary
 (
 videoid integer not null,
 date date not null,
 platform varchar not null,
 device varchar not null,
 systemgroup varchar not null,
 system varchar not null,
 vv bigint,
 ts bigint,
 up bigint,
 down bigint,
 comment bigint,
 favori bigint,
 favord bigint,
 quote bigint,
 reply bigint
 constraint pk primary key (videoid, date,platform, device, systemgroup,system)
 )salt_buckets = 30,versions=1,compression='snappy';
 query 1:
 select sum(vv) as sumvv,sum(comment) as sumcomment,sum(up) as sumup,sum(down) 
 as sumdown,sum(reply) as sumreply,count(*) as count from summary(reply 
 bigint) where videoid 
 in(137102991,151113895,171559204,171559439,171573932,171573932,171573932,171574082,171574082,171574164,171677219,171794335,171902734,172364368,172475141,172700554,172700554,172700554,172716705,172784258,172835778,173112067,173165316,173165316,173379601,173448315,173503961,173692664,173911358,174077089,174099017,174349633,174349877,174651474,174651474,174759297,174883566,174883566,174987670,174987670,175131298)
  and date=to_date('2013-09-01','-MM-dd') and 
 date=to_date('2014-07-07','-MM-dd')



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PHOENIX-1081) CPU usage 100% With phoenix

2014-07-10 Thread yang ming (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14057581#comment-14057581
 ] 

yang ming commented on PHOENIX-1081:


[~jamestaylor]

 CPU usage 100% With phoenix 
 

 Key: PHOENIX-1081
 URL: https://issues.apache.org/jira/browse/PHOENIX-1081
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: yang ming
Priority: Critical
 Attachments: JMX.jpg, jstat.jpg, the jstack of all threads, the 
 jstack of thread 12725.jpg, the jstack of thread 12748.jpg, the threads of 
 regionserver process.jpg


 The concurrent of the system is not high,but CPU usage often up to 100%.
 I had stopped the system,but regionserver's CPU usage is still high.
 what can case this problem?
 table row count:6000 million
 table ddl:
 create table if not exists summary
 (
 videoid integer not null,
 date date not null,
 platform varchar not null,
 device varchar not null,
 systemgroup varchar not null,
 system varchar not null,
 vv bigint,
 ts bigint,
 up bigint,
 down bigint,
 comment bigint,
 favori bigint,
 favord bigint,
 quote bigint,
 reply bigint
 constraint pk primary key (videoid, date,platform, device, systemgroup,system)
 )salt_buckets = 30,versions=1,compression='snappy';
 query 1:
 select sum(vv) as sumvv,sum(comment) as sumcomment,sum(up) as sumup,sum(down) 
 as sumdown,sum(reply) as sumreply,count(*) as count from summary(reply 
 bigint) where videoid 
 in(137102991,151113895,171559204,171559439,171573932,171573932,171573932,171574082,171574082,171574164,171677219,171794335,171902734,172364368,172475141,172700554,172700554,172700554,172716705,172784258,172835778,173112067,173165316,173165316,173379601,173448315,173503961,173692664,173911358,174077089,174099017,174349633,174349877,174651474,174651474,174759297,174883566,174883566,174987670,174987670,175131298)
  and date=to_date('2013-09-01','-MM-dd') and 
 date=to_date('2014-07-07','-MM-dd')



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PHOENIX-950) Improve Secondary Index Update Failure Handling

2014-07-10 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated PHOENIX-950:
--

Attachment: TransactionSupportPhoenixSecondaryIndexUpdate.pdf

 Improve Secondary Index Update Failure Handling
 ---

 Key: PHOENIX-950
 URL: https://issues.apache.org/jira/browse/PHOENIX-950
 Project: Phoenix
  Issue Type: Improvement
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Attachments: Improve Phoenix Secondary Index Update Failure 
 Handling.pdf, TransactionSupportPhoenixSecondaryIndexUpdate.pdf


 Current secondary index update could trigger chained region server failures. 
 This isn't friendly to end-users. Even we disable index after index update 
 failures before aborting, it will involve lot of human involvement because 
 index update failure isn't a rare situation.
 In this JIRA, I propose a 2PC like protocol. The like means it's a not a 
 real 2PC because no infinitely blocking but it requires read time(query) to 
 reconcile inconsistence between index and data. Since I'm not familiar with 
 the query time logic, please let me know if the proposal could fly.
 Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)