date:20170213

[jira] [Commented] (PHOENIX-3360) Secondary index configuration is wrong

2017-02-13 Thread William Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865298#comment-15865298
 ] 

William Yang commented on PHOENIX-3360:
---

New patch attached. 

There is another reason we have to create a single connection used for index 
updates. See {{CoprocessorHConnection#getConnectionForEnvironment()}}, it will 
create a new connection at each call. Then the ctor of  
{{HConnectionImplementation}} will be called. In this ctor, it will hit ZK to 
read the cluster id by calling {{retrieveClusterId()}}. This is totally 
unacceptable. Apart from the extra network operation, it will still generate 
many CLOSE-WAIT tcp connections in ZK cluster. As ZK is always a critical 
resource that we should try our best to not access it unless we have to. If we 
haven't configured connection limit big enough in zoo.cfg ({{maxClientCnxns}}), 
then index updates will fail at getting HTableInterface phase because ZK 
connection requests are rejected for there are already too many.

Has anyone ever encountered this problem?

> Secondary index configuration is wrong
> --
>
> Key: PHOENIX-3360
> URL: https://issues.apache.org/jira/browse/PHOENIX-3360
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Rajeshbabu Chintaguntla
>Priority: Critical
> Fix For: 4.10.0
>
> Attachments: ConfCP.java, PHOENIX-3360.patch, PHOENIX-3360-v2.PATCH, 
> PHOENIX-3360-v3.PATCH, PHOENIX-3360-v4.PATCH
>
>
> IndexRpcScheduler allocates some handler threads and uses a higher priority 
> for RPCs. The corresponding IndexRpcController is not used by default as it 
> is, but used through ServerRpcControllerFactory that we configure from Ambari 
> by default which sets the priority of the outgoing RPCs to either metadata 
> priority, or the index priority.
> However, after reading code of IndexRpcController / ServerRpcController it 
> seems that the IndexRPCController DOES NOT look at whether the outgoing RPC 
> is for an Index table or not. It just sets ALL rpc priorities to be the index 
> priority. The intention seems to be the case that ONLY on servers, we 
> configure ServerRpcControllerFactory, and with clients we NEVER configure 
> ServerRpcControllerFactory, but instead use ClientRpcControllerFactory. We 
> configure ServerRpcControllerFactory from Ambari, which in affect makes it so 
> that ALL rpcs from Phoenix are only handled by the index handlers by default. 
> It means all deadlock cases are still there. 
> The documentation in https://phoenix.apache.org/secondary_indexing.html is 
> also wrong in this sense. It does not talk about server side / client side. 
> Plus this way of configuring different values is not how HBase configuration 
> is deployed. We cannot have the configuration show the 
> ServerRpcControllerFactory even only for server nodes, because the clients 
> running on those nodes will also see the wrong values. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PHOENIX-3360) Secondary index configuration is wrong

2017-02-13 Thread William Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Yang updated PHOENIX-3360:
--
Attachment: PHOENIX-3360-v4.PATCH

> Secondary index configuration is wrong
> --
>
> Key: PHOENIX-3360
> URL: https://issues.apache.org/jira/browse/PHOENIX-3360
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Rajeshbabu Chintaguntla
>Priority: Critical
> Fix For: 4.10.0
>
> Attachments: ConfCP.java, PHOENIX-3360.patch, PHOENIX-3360-v2.PATCH, 
> PHOENIX-3360-v3.PATCH, PHOENIX-3360-v4.PATCH
>
>
> IndexRpcScheduler allocates some handler threads and uses a higher priority 
> for RPCs. The corresponding IndexRpcController is not used by default as it 
> is, but used through ServerRpcControllerFactory that we configure from Ambari 
> by default which sets the priority of the outgoing RPCs to either metadata 
> priority, or the index priority.
> However, after reading code of IndexRpcController / ServerRpcController it 
> seems that the IndexRPCController DOES NOT look at whether the outgoing RPC 
> is for an Index table or not. It just sets ALL rpc priorities to be the index 
> priority. The intention seems to be the case that ONLY on servers, we 
> configure ServerRpcControllerFactory, and with clients we NEVER configure 
> ServerRpcControllerFactory, but instead use ClientRpcControllerFactory. We 
> configure ServerRpcControllerFactory from Ambari, which in affect makes it so 
> that ALL rpcs from Phoenix are only handled by the index handlers by default. 
> It means all deadlock cases are still there. 
> The documentation in https://phoenix.apache.org/secondary_indexing.html is 
> also wrong in this sense. It does not talk about server side / client side. 
> Plus this way of configuring different values is not how HBase configuration 
> is deployed. We cannot have the configuration show the 
> ServerRpcControllerFactory even only for server nodes, because the clients 
> running on those nodes will also see the wrong values. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3360) Secondary index configuration is wrong

2017-02-13 Thread William Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865236#comment-15865236
 ] 

William Yang commented on PHOENIX-3360:
---

bq. CompoundConfiguration treats the added configs as immutable, and has an 
internal mutable config (see the code). This means that with the original 
patch, the rest of region server (including replication) will not be affected.

I've done a simple test, see {{ConfCP.java}}. If we change the RegionServer 
level configuration in a coprocessor, then all the other Regions opened on the 
same RS will see the change. It has nothing to do with the implementation of 
Configuration class or any other internal classes, but is determined by where a 
region's Configuration object comes from. 

I checked the code in both hbase 1.1.2 and 0.94. See 
{{RegionCoprocessorHost#getTableCoprocessorAttrsFromSchema()}} for 1.1 and 
{{RegionCoprocessorHost#loadTableCoprocessors()}} for 0.94. 

Each region will have its own copy of Configuration, which are all copied from 
the region server's configuration object. So it is safe to change the 
configuration returned by {{CoprocessorEnvironment#getConfiguration()}} and 
this change can be seen only within this Region. But we should never change the 
Configurations return by 
{{RegionCoprocessorEnvironment#getRegionServerServices().getConfiguration()}} 
for this will change all the other Regions' conf.

How to use ConfCP.java
 * create 'test1', 'cf'
 * create 'test2', 'cf'
 * make sure that all regions of the above two tables are hosted in the same 
regionserver
 * add coprocessor ConfCP  for test1, check log, should see the print below:
{code}
YHYH1: [test1]conf hashCode = 2027310658
YHYH2: [test1]put conf (yh.special.key,XX)
YHYH3: [test1]get conf (yh.special.key,XX)
{code}
 * add coprocessor ConfCP for test2, check the log again, should see the print 
below
{code}
YHYH1: [test2]conf hashCode = 2027310658
YHYH3: [test2]get conf (yh.special.key,XX)
{code}

Note that {{conf}} can be assigned by two values. for 
{code}
conf = 
((RegionCoprocessorEnvironment)e).getRegionServerServices().getConfiguration();
{code}
is used now, and this is what we do in V1 patch.

Change it to 
{code}
conf = e.getConfiguration();
{code}
then table test2 will not see the change that test1 did.

Above all, we can use the v1 patch with a little modification that we just set 
the conf returned by {{CoprocessorEnvironment#getConfiguration()}}.  And for 
PHOENIX-3271 that UPSART SELECT's write will still have higher priority. 

WDYT? Ping [~jamestaylor], [~enis], [~rajeshbabu].


> Secondary index configuration is wrong
> --
>
> Key: PHOENIX-3360
> URL: https://issues.apache.org/jira/browse/PHOENIX-3360
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Rajeshbabu Chintaguntla
>Priority: Critical
> Fix For: 4.10.0
>
> Attachments: ConfCP.java, PHOENIX-3360.patch, PHOENIX-3360-v2.PATCH, 
> PHOENIX-3360-v3.PATCH
>
>
> IndexRpcScheduler allocates some handler threads and uses a higher priority 
> for RPCs. The corresponding IndexRpcController is not used by default as it 
> is, but used through ServerRpcControllerFactory that we configure from Ambari 
> by default which sets the priority of the outgoing RPCs to either metadata 
> priority, or the index priority.
> However, after reading code of IndexRpcController / ServerRpcController it 
> seems that the IndexRPCController DOES NOT look at whether the outgoing RPC 
> is for an Index table or not. It just sets ALL rpc priorities to be the index 
> priority. The intention seems to be the case that ONLY on servers, we 
> configure ServerRpcControllerFactory, and with clients we NEVER configure 
> ServerRpcControllerFactory, but instead use ClientRpcControllerFactory. We 
> configure ServerRpcControllerFactory from Ambari, which in affect makes it so 
> that ALL rpcs from Phoenix are only handled by the index handlers by default. 
> It means all deadlock cases are still there. 
> The documentation in https://phoenix.apache.org/secondary_indexing.html is 
> also wrong in this sense. It does not talk about server side / client side. 
> Plus this way of configuring different values is not how HBase configuration 
> is deployed. We cannot have the configuration show the 
> ServerRpcControllerFactory even only for server nodes, because the clients 
> running on those nodes will also see the wrong values. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PHOENIX-3360) Secondary index configuration is wrong

2017-02-13 Thread William Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Yang updated PHOENIX-3360:
--
Attachment: ConfCP.java

> Secondary index configuration is wrong
> --
>
> Key: PHOENIX-3360
> URL: https://issues.apache.org/jira/browse/PHOENIX-3360
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Rajeshbabu Chintaguntla
>Priority: Critical
> Fix For: 4.10.0
>
> Attachments: ConfCP.java, PHOENIX-3360.patch, PHOENIX-3360-v2.PATCH, 
> PHOENIX-3360-v3.PATCH
>
>
> IndexRpcScheduler allocates some handler threads and uses a higher priority 
> for RPCs. The corresponding IndexRpcController is not used by default as it 
> is, but used through ServerRpcControllerFactory that we configure from Ambari 
> by default which sets the priority of the outgoing RPCs to either metadata 
> priority, or the index priority.
> However, after reading code of IndexRpcController / ServerRpcController it 
> seems that the IndexRPCController DOES NOT look at whether the outgoing RPC 
> is for an Index table or not. It just sets ALL rpc priorities to be the index 
> priority. The intention seems to be the case that ONLY on servers, we 
> configure ServerRpcControllerFactory, and with clients we NEVER configure 
> ServerRpcControllerFactory, but instead use ClientRpcControllerFactory. We 
> configure ServerRpcControllerFactory from Ambari, which in affect makes it so 
> that ALL rpcs from Phoenix are only handled by the index handlers by default. 
> It means all deadlock cases are still there. 
> The documentation in https://phoenix.apache.org/secondary_indexing.html is 
> also wrong in this sense. It does not talk about server side / client side. 
> Plus this way of configuring different values is not how HBase configuration 
> is deployed. We cannot have the configuration show the 
> ServerRpcControllerFactory even only for server nodes, because the clients 
> running on those nodes will also see the wrong values. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3662) PhoenixStorageHandler throws ClassCastException.

2017-02-13 Thread Jeongdae Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864902#comment-15864902
 ] 

Jeongdae Kim commented on PHOENIX-3662:
---

could anyone give me reviews for this patch?

> PhoenixStorageHandler throws ClassCastException.
> 
>
> Key: PHOENIX-3662
> URL: https://issues.apache.org/jira/browse/PHOENIX-3662
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.9.0
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
> Attachments: PHOENIX-3662.1.patch, PHOENIX-3662.2.patch
>
>
> when executing a query that has between clauses embraced by function, phoenix 
> storage handler throws class cast exception like below.
> and in addition, i found some bugs when handling push down predicates.
> {code}
> 2017-02-06T16:35:26,019 ERROR [7d29d400-2ec5-4ab8-84c2-041b55c3e24b 
> HiveServer2-Handler-Pool: Thread-57]: ql.Driver 
> (SessionState.java:printError(1097)) - FAILED: ClassCastException 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc cannot be cast to 
> org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc
> java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc cannot be cast to 
> org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.processingBetweenOperator(IndexPredicateAnalyzer.java:229)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.analyzeExpr(IndexPredicateAnalyzer.java:369)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.access$000(IndexPredicateAnalyzer.java:72)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer$1.process(IndexPredicateAnalyzer.java:165)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.analyzePredicate(IndexPredicateAnalyzer.java:176)
>   at 
> org.apache.phoenix.hive.ppd.PhoenixPredicateDecomposer.decomposePredicate(PhoenixPredicateDecomposer.java:63)
>   at 
> org.apache.phoenix.hive.PhoenixStorageHandler.decomposePredicate(PhoenixStorageHandler.java:238)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory.pushFilterToStorageHandler(OpProcFactory.java:1004)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:910)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:880)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory$TableScanPPD.process(OpProcFactory.java:429)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>   at 
> org.apache.hadoop.hive.ql.ppd.SimplePredicatePushDown.transform(SimplePredicatePushDown.java:102)
>   at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:242)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10921)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:246)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:471)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1242)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1229)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:191)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276)
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:499)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:486)
>   at 
>

[jira] [Commented] (PHOENIX-3536) Remove creating unnecessary phoenix connections in MR Tasks of Hive

2017-02-13 Thread Jeongdae Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864901#comment-15864901
 ] 

Jeongdae Kim commented on PHOENIX-3536:
---

failed tests are not related to this patch. could anyone give me reviews for 
this patch?

> Remove creating unnecessary phoenix connections in MR Tasks of Hive
> ---
>
> Key: PHOENIX-3536
> URL: https://issues.apache.org/jira/browse/PHOENIX-3536
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>  Labels: HivePhoenix
> Attachments: PHOENIX-3536.1.patch
>
>
> PhoenixStorageHandler creates phoenix connections to make QueryPlan in 
> getSplit phase(prepare MR) and getRecordReader phase(Map) while running MR 
> Job.
> in phoenix, it spends too many times to create the first phoenix 
> connection(QueryServices) for specific URL. (checking and loading phoenix 
> schema information)
> i found it is possible to remove creating query plan again in Map 
> phase(getRecordReader()) by serializing QueryPlan created from Input format 
> ans passing this plan to record reader. 
>  this approach improves scan performance by removing trying to unnecessary 
> connection in map phase.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3515) CsvLineParser Improvement

2017-02-13 Thread Jeongdae Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864900#comment-15864900
 ] 

Jeongdae Kim commented on PHOENIX-3515:
---

failed tests are not related to this patch. could anyone give me reviews for 
this patch?

> CsvLineParser Improvement
> -
>
> Key: PHOENIX-3515
> URL: https://issues.apache.org/jira/browse/PHOENIX-3515
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>Priority: Minor
> Attachments: PHOENIX-3515.1.patch
>
>
> CsvLineParser creates a new parser(apache commons CSVParser) every single 
> line. it seems terribly inefficient.
> I improved this issue by adding a new string reader to create a parser once 
> and use this for all lines.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3512) PhoenixStorageHandler makes erroneous query string when handling between clauses with date constants.

2017-02-13 Thread Jeongdae Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864898#comment-15864898
 ] 

Jeongdae Kim commented on PHOENIX-3512:
---

failed tests are not related to this patch. could anyone give me reviews for 
this patch?

> PhoenixStorageHandler makes erroneous query string when handling between 
> clauses with date constants.
> -
>
> Key: PHOENIX-3512
> URL: https://issues.apache.org/jira/browse/PHOENIX-3512
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>  Labels: HivePhoenix
> Attachments: PHOENIX-3512.patch
>
>
> ex) l_shipdate BETWEEN '1992-01-02' AND '1992-02-02' --> l_shipdate between 
> to_date('69427800') and to_date('69695640')



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3486) RoundRobinResultIterator doesn't work correctly because of setting Scan's cache size inappropriately in PhoenixInputForamt

2017-02-13 Thread Jeongdae Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864897#comment-15864897
 ] 

Jeongdae Kim commented on PHOENIX-3486:
---

failed tests are not related to this patch. could anyone give me reviews for 
this patch?

> RoundRobinResultIterator doesn't work correctly because of setting Scan's 
> cache size inappropriately in PhoenixInputForamt
> --
>
> Key: PHOENIX-3486
> URL: https://issues.apache.org/jira/browse/PHOENIX-3486
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>  Labels: HivePhoenix
> Attachments: PHOENIX-3486.patch
>
>
> RoundRobinResultIterator uses "hbase.client.scanner.caching" to fill caches  
> in parallel for all scans, but by setting Scan.setCaching() in 
> PhoenixInputForrmat(phoenix-hive), RoundRobinResultIterator doesn't work 
> correctly, because if Scan have cache size by setCaching(), HBase uses cache 
> size from Scan.getCaching() to fill cache, not 
> "hbase.client.scanner.caching", and RoundRobinResultIterator scans the table 
> in parallel to fill caches every "hbase.client.scanner.caching", resulting in 
> unintended parallel scan operation,  this causes scan performance degradation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3503) PhoenixStorageHandler doesn't work properly when execution engine of Hive is Tez.

2017-02-13 Thread Jeongdae Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864899#comment-15864899
 ] 

Jeongdae Kim commented on PHOENIX-3503:
---

failed tests are not related to this patch. could anyone give me reviews for 
this patch?

> PhoenixStorageHandler doesn't  work properly when execution engine of Hive is 
> Tez.
> --
>
> Key: PHOENIX-3503
> URL: https://issues.apache.org/jira/browse/PHOENIX-3503
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>  Labels: HivePhoenix
> Attachments: PHOENIX-3503.patch
>
>
> Hive storage handler can't parse some column types that have 
> parameters(length, precision, scale...) from serdeConstants.LIST_COLUMN_TYPES 
> correctly, when execution engine of Hive is Tez.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PHOENIX-3668) Resolve Date/Time/Timestamp incompatibility in bind variables

2017-02-13 Thread Maryann Xue (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue updated PHOENIX-3668:
-
Labels: calcite  (was: )

> Resolve Date/Time/Timestamp incompatibility in bind variables
> -
>
> Key: PHOENIX-3668
> URL: https://issues.apache.org/jira/browse/PHOENIX-3668
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>  Labels: calcite
>
> Avatica TypedValue converted Date and Time object to integer values and 
> meanwhile takes the local time as the input for conversion. So we need to 
> adjust the Date/Time/Timestamp object value before setting the bind parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (PHOENIX-3669) YEAR/MONTH/DAY/HOUR/MINUTES/SECOND built-in functions do not work in Calcite-Phoenix

2017-02-13 Thread Maryann Xue (JIRA)

Maryann Xue created PHOENIX-3669:


 Summary: YEAR/MONTH/DAY/HOUR/MINUTES/SECOND built-in functions do 
not work in Calcite-Phoenix
 Key: PHOENIX-3669
 URL: https://issues.apache.org/jira/browse/PHOENIX-3669
 Project: Phoenix
  Issue Type: Bug
Reporter: Maryann Xue
Assignee: Maryann Xue


Calcite rewrites these functions as the EXTRACT function and Phoenix does not 
implement EXTRACT function yet.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (PHOENIX-3668) Resolve Date/Time/Timestamp incompatibility in bind variables

2017-02-13 Thread Maryann Xue (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue resolved PHOENIX-3668.
--
Resolution: Fixed

> Resolve Date/Time/Timestamp incompatibility in bind variables
> -
>
> Key: PHOENIX-3668
> URL: https://issues.apache.org/jira/browse/PHOENIX-3668
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>
> Avatica TypedValue converted Date and Time object to integer values and 
> meanwhile takes the local time as the input for conversion. So we need to 
> adjust the Date/Time/Timestamp object value before setting the bind parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (PHOENIX-3668) Resolve Date/Time/Timestamp incompatibility in bind variables

2017-02-13 Thread Maryann Xue (JIRA)

Maryann Xue created PHOENIX-3668:


 Summary: Resolve Date/Time/Timestamp incompatibility in bind 
variables
 Key: PHOENIX-3668
 URL: https://issues.apache.org/jira/browse/PHOENIX-3668
 Project: Phoenix
  Issue Type: Sub-task
Reporter: Maryann Xue
Assignee: Maryann Xue


Avatica TypedValue converted Date and Time object to integer values and 
meanwhile takes the local time as the input for conversion. So we need to 
adjust the Date/Time/Timestamp object value before setting the bind parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (PHOENIX-3640) Upgrading from 4.8 or before to encodecolumns2 branch fails

2017-02-13 Thread Samarth Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain resolved PHOENIX-3640.
---
Resolution: Fixed

> Upgrading from 4.8 or before to encodecolumns2 branch fails
> ---
>
> Key: PHOENIX-3640
> URL: https://issues.apache.org/jira/browse/PHOENIX-3640
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-3640.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes

2017-02-13 Thread Samarth Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain resolved PHOENIX-3666.
---
Resolution: Fixed

> Make use of EncodedColumnQualifierCellsList for all column name mapping 
> schemes
> ---
>
> Key: PHOENIX-3666
> URL: https://issues.apache.org/jira/browse/PHOENIX-3666
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-3666.patch, PHOENIX-3666_wip.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3654) Load Balancer for thin client

2017-02-13 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864717#comment-15864717
 ] 

James Taylor commented on PHOENIX-3654:
---

+1 to having a high level design doc to discuss. I think we could have an 
interface-based solution through which ZK would be one implementation if we 
want to have a more indirect ZK dependency.

> Load Balancer for thin client
> -
>
> Key: PHOENIX-3654
> URL: https://issues.apache.org/jira/browse/PHOENIX-3654
> Project: Phoenix
>  Issue Type: New Feature
>Affects Versions: 4.8.0
> Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98
>Reporter: Rahul Shrivastava
> Fix For: 4.9.0
>
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> We have been having internal discussion on load balancer for thin client for 
> PQS. The general consensus we have is to have an embedded load balancer with 
> the thin client instead of using external load balancer such as haproxy. The 
> idea is to not to have another layer between client and PQS. This reduces 
> operational cost for system, which currently leads to delay in executing 
> projects.
> But this also comes with challenge of having an embedded load balancer which 
> can maintain sticky sessions, do fair load balancing knowing the load 
> downstream of PQS server. In addition, load balancer needs to know location 
> of multiple PQS server. Now, the thin client needs to keep track of PQS 
> servers via zookeeper ( or other means). 
> In the new design, the client ( PQS client) , it is proposed to  have an 
> embedded load balancer.
> Where will the load Balancer sit ?
> The load load balancer will embedded within the app server client.  
> How will the load balancer work ? 
> Load balancer will contact zookeeper to get location of PQS. In this case, 
> PQS needs to register to ZK itself once it comes online. Zookeeper location 
> is in hbase-site.xml. It will maintain a small cache of connection to the 
> PQS. When a request comes in, it will check for an open connection from the 
> cache. 
> How will load balancer know load on PQS ?
> To start with, it will pick a random open connection to PQS. This means that 
> load balancer does not know PQS load. Later , we can augment the code so that 
> thin client can receive load info from PQS and make intelligent decisions.  
> How will load balancer maintain sticky sessions ?
> While we still need to investigate how to implement sticky sessions. We can 
> look for some open source implementation for the same.
> How will PQS register itself to service locator ?
> PQS will have location of zookeeper in hbase-site.xml and it would register 
> itself to the zookeeper. Thin client will find out PQS location using 
> zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [DISCUSS] Some licensing issues to resolve before the next release

2017-02-13 Thread Andrew Purtell

For the other issue, there's no reason not to move up to more recent minors
of those HBase releases without the dependency problem as long as we don't
detect a regression by doing so.


On Thu, Feb 9, 2017 at 1:10 PM, Josh Elser  wrote:

> Sweetness. Thanks for taking that on!
>
>
> Josh Mahonin wrote:
>
>> Re: the flume dependency, I suspect we can swap out the org.json:json
>> dependency with com.tdunning:json without too much pain. I've assigned
>> PHOENIX-3658 to myself to look at, will try and attend to it in the next
>> week.
>>
>> https://github.com/tdunning/open-json
>>
>>
>> On Thu, Feb 9, 2017 at 12:10 PM, Josh Elser  wrote:
>>
>> See https://issues.apache.org/jira/browse/PHOENIX-3658 and
>>> https://issues.apache.org/jira/browse/PHOENIX-3659 for the full details.
>>>
>>> The summary is that I noticed two dependencies that we're including (one
>>> direct, one transitive) that are disallowed.
>>>
>>> The direct dependency (org.json:json by phoenix-flume) is technically
>>> "ok"
>>> but only until 2017/04/30 when the grace-period expires. Essentially,
>>> we've
>>> used up half of the time allotted to fix this one already ;)
>>>
>>> The latter is one that we inherited from HBase. We can address it by
>>> bumping the 1.1 and 1.2 hbase version -- but I'd be interested in hearing
>>> if others have opinions on whether we do that or try to surgically remove
>>> the dependency from our bundling.
>>>
>>> - Josh
>>>
>>>
>>


-- 
Best regards,

   - Andy

If you are given a choice, you believe you have acted freely. - Raymond
Teller (via Peter Watts)

[jira] [Commented] (PHOENIX-3654) Load Balancer for thin client

2017-02-13 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864715#comment-15864715
 ] 

Josh Elser commented on PHOENIX-3654:
-

bq. You can set a read-only ACL that doesn't need auth.

Yup, AFAIK, that's not a big deal.

bq. You can build a service discovery mechanism backed by ZooKeeper yet 
providing its own client facing API that is not kerberized. ANd so on.

Yes! This is ultimately what I'd like to see some more thought put into. There 
are _tons_ of options that could be leveraged. Would be nice to see some simple 
pros/cons laid out so we can back up why one was chosen over others :)

> Load Balancer for thin client
> -
>
> Key: PHOENIX-3654
> URL: https://issues.apache.org/jira/browse/PHOENIX-3654
> Project: Phoenix
>  Issue Type: New Feature
>Affects Versions: 4.8.0
> Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98
>Reporter: Rahul Shrivastava
> Fix For: 4.9.0
>
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> We have been having internal discussion on load balancer for thin client for 
> PQS. The general consensus we have is to have an embedded load balancer with 
> the thin client instead of using external load balancer such as haproxy. The 
> idea is to not to have another layer between client and PQS. This reduces 
> operational cost for system, which currently leads to delay in executing 
> projects.
> But this also comes with challenge of having an embedded load balancer which 
> can maintain sticky sessions, do fair load balancing knowing the load 
> downstream of PQS server. In addition, load balancer needs to know location 
> of multiple PQS server. Now, the thin client needs to keep track of PQS 
> servers via zookeeper ( or other means). 
> In the new design, the client ( PQS client) , it is proposed to  have an 
> embedded load balancer.
> Where will the load Balancer sit ?
> The load load balancer will embedded within the app server client.  
> How will the load balancer work ? 
> Load balancer will contact zookeeper to get location of PQS. In this case, 
> PQS needs to register to ZK itself once it comes online. Zookeeper location 
> is in hbase-site.xml. It will maintain a small cache of connection to the 
> PQS. When a request comes in, it will check for an open connection from the 
> cache. 
> How will load balancer know load on PQS ?
> To start with, it will pick a random open connection to PQS. This means that 
> load balancer does not know PQS load. Later , we can augment the code so that 
> thin client can receive load info from PQS and make intelligent decisions.  
> How will load balancer maintain sticky sessions ?
> While we still need to investigate how to implement sticky sessions. We can 
> look for some open source implementation for the same.
> How will PQS register itself to service locator ?
> PQS will have location of zookeeper in hbase-site.xml and it would register 
> itself to the zookeeper. Thin client will find out PQS location using 
> zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (PHOENIX-3661) Make phoenix tool select file system dynamically

2017-02-13 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reassigned PHOENIX-3661:
---

   Resolution: Fixed
 Assignee: Yishan Yang
Fix Version/s: 4.10.0

Committed. 

> Make phoenix tool select file system dynamically
> 
>
> Key: PHOENIX-3661
> URL: https://issues.apache.org/jira/browse/PHOENIX-3661
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0, 4.8.0
>Reporter: Yishan Yang
>Assignee: Yishan Yang
> Fix For: 4.10.0
>
> Attachments: phoenix-3661-1.patch
>
>
> Phoenix indexing tool assume that the root directory is the default Hadoop 
> FileSystem. With this patch,
> phoenix index tool will get file system dynamically which will prevent “Wrong 
> FileSystem” errors. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (PHOENIX-3654) Load Balancer for thin client

2017-02-13 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864701#comment-15864701
 ] 

Andrew Purtell edited comment on PHOENIX-3654 at 2/13/17 11:50 PM:
---

bq. I meant ZK ACLs help ensure the PQS instances are able to register 
themselves in a trusted location which clients can then refer to.
Oh sure that makes sense. How the client discovers PQS endpoints registered by 
ZooKeeper without requiring SASL auth is an interesting question but there are 
a couple of options. You can set a read-only ACL that doesn't need auth. You 
can build a service discovery mechanism backed by ZooKeeper yet providing its 
own client facing API that is not kerberized. ANd so on.


was (Author: apurtell):
bq. I meant ZK ACLs help ensure the PQS instances are able to register 
themselves in a trusted location which clients can then refer to.
Oh sure that makes sense.

> Load Balancer for thin client
> -
>
> Key: PHOENIX-3654
> URL: https://issues.apache.org/jira/browse/PHOENIX-3654
> Project: Phoenix
>  Issue Type: New Feature
>Affects Versions: 4.8.0
> Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98
>Reporter: Rahul Shrivastava
> Fix For: 4.9.0
>
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> We have been having internal discussion on load balancer for thin client for 
> PQS. The general consensus we have is to have an embedded load balancer with 
> the thin client instead of using external load balancer such as haproxy. The 
> idea is to not to have another layer between client and PQS. This reduces 
> operational cost for system, which currently leads to delay in executing 
> projects.
> But this also comes with challenge of having an embedded load balancer which 
> can maintain sticky sessions, do fair load balancing knowing the load 
> downstream of PQS server. In addition, load balancer needs to know location 
> of multiple PQS server. Now, the thin client needs to keep track of PQS 
> servers via zookeeper ( or other means). 
> In the new design, the client ( PQS client) , it is proposed to  have an 
> embedded load balancer.
> Where will the load Balancer sit ?
> The load load balancer will embedded within the app server client.  
> How will the load balancer work ? 
> Load balancer will contact zookeeper to get location of PQS. In this case, 
> PQS needs to register to ZK itself once it comes online. Zookeeper location 
> is in hbase-site.xml. It will maintain a small cache of connection to the 
> PQS. When a request comes in, it will check for an open connection from the 
> cache. 
> How will load balancer know load on PQS ?
> To start with, it will pick a random open connection to PQS. This means that 
> load balancer does not know PQS load. Later , we can augment the code so that 
> thin client can receive load info from PQS and make intelligent decisions.  
> How will load balancer maintain sticky sessions ?
> While we still need to investigate how to implement sticky sessions. We can 
> look for some open source implementation for the same.
> How will PQS register itself to service locator ?
> PQS will have location of zookeeper in hbase-site.xml and it would register 
> itself to the zookeeper. Thin client will find out PQS location using 
> zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3654) Load Balancer for thin client

2017-02-13 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864701#comment-15864701
 ] 

Andrew Purtell commented on PHOENIX-3654:
-

bq. I meant ZK ACLs help ensure the PQS instances are able to register 
themselves in a trusted location which clients can then refer to.
Oh sure that makes sense.

> Load Balancer for thin client
> -
>
> Key: PHOENIX-3654
> URL: https://issues.apache.org/jira/browse/PHOENIX-3654
> Project: Phoenix
>  Issue Type: New Feature
>Affects Versions: 4.8.0
> Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98
>Reporter: Rahul Shrivastava
> Fix For: 4.9.0
>
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> We have been having internal discussion on load balancer for thin client for 
> PQS. The general consensus we have is to have an embedded load balancer with 
> the thin client instead of using external load balancer such as haproxy. The 
> idea is to not to have another layer between client and PQS. This reduces 
> operational cost for system, which currently leads to delay in executing 
> projects.
> But this also comes with challenge of having an embedded load balancer which 
> can maintain sticky sessions, do fair load balancing knowing the load 
> downstream of PQS server. In addition, load balancer needs to know location 
> of multiple PQS server. Now, the thin client needs to keep track of PQS 
> servers via zookeeper ( or other means). 
> In the new design, the client ( PQS client) , it is proposed to  have an 
> embedded load balancer.
> Where will the load Balancer sit ?
> The load load balancer will embedded within the app server client.  
> How will the load balancer work ? 
> Load balancer will contact zookeeper to get location of PQS. In this case, 
> PQS needs to register to ZK itself once it comes online. Zookeeper location 
> is in hbase-site.xml. It will maintain a small cache of connection to the 
> PQS. When a request comes in, it will check for an open connection from the 
> cache. 
> How will load balancer know load on PQS ?
> To start with, it will pick a random open connection to PQS. This means that 
> load balancer does not know PQS load. Later , we can augment the code so that 
> thin client can receive load info from PQS and make intelligent decisions.  
> How will load balancer maintain sticky sessions ?
> While we still need to investigate how to implement sticky sessions. We can 
> look for some open source implementation for the same.
> How will PQS register itself to service locator ?
> PQS will have location of zookeeper in hbase-site.xml and it would register 
> itself to the zookeeper. Thin client will find out PQS location using 
> zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes

2017-02-13 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864680#comment-15864680
 ] 

James Taylor commented on PHOENIX-3666:
---

+1. That's a reasonable solution, [~samarthjain].

> Make use of EncodedColumnQualifierCellsList for all column name mapping 
> schemes
> ---
>
> Key: PHOENIX-3666
> URL: https://issues.apache.org/jira/browse/PHOENIX-3666
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-3666.patch, PHOENIX-3666_wip.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3585) MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail for transactional tables and local indexes

2017-02-13 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864670#comment-15864670
 ] 

James Taylor commented on PHOENIX-3585:
---

[~rajeshbabu] - can't the LocalIndexStoreFileScanner delegate to the 
InternalScanner for it's next calls? The alternative is to not allow local 
indexes on transactional tables which would be a shame. The current logic would 
be pretty disastrous as I think the local index would become corrupt, no? 

> MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail 
> for transactional tables and local indexes
> 
>
> Key: PHOENIX-3585
> URL: https://issues.apache.org/jira/browse/PHOENIX-3585
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Attachments: diff.patch
>
>
> the tests fail if we use HDFSTransactionStateStorage instead of  
> InMemoryTransactionStateStorage when we create the TransactionManager in 
> BaseTest



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3655) Metrics for PQS

2017-02-13 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864668#comment-15864668
 ] 

Josh Elser commented on PHOENIX-3655:
-

bq. we want the PQS driver to export the same metrics through the same 
mechanism(s) as the fat driver. That way we can swap one for the other with 
minimal operational changes including visibility into operations via metrics.

Makes sense. I've spent a lot (too much?) time thinking about this from the 
Avatica standpoint (understanding the perf/characteristics of Avatica, 
regardless of database), so I may be conflating what [~rahulshrivastava] is 
planning with the big picture of what I'd like to see :)

If the goal is to just expose the thick-driver's metrics via PQS, this one 
should be pretty easy. If we want to go farther and really understand the rest 
of the picture, it gets trickier pretty fast :)

> Metrics for PQS
> ---
>
> Key: PHOENIX-3655
> URL: https://issues.apache.org/jira/browse/PHOENIX-3655
> Project: Phoenix
>  Issue Type: New Feature
>Affects Versions: 4.8.0
> Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98
>Reporter: Rahul Shrivastava
> Fix For: 4.9.0
>
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Phoenix Query Server runs a separate process compared to its thin client. 
> Metrics collection is currently done by PhoenixRuntime.java i.e. at Phoenix 
> driver level. We need the following
> 1. For every jdbc statement/prepared statement/ run by PQS , we need 
> capability to collect metrics at PQS level and push the data to external sink 
> i.e. file, JMX , other external custom sources. 
> 2. Besides this global metrics could be periodically collected and pushed to 
> the sink. 
> 2. PQS can be configured to turn on metrics collection and type of collect ( 
> runtime or global) via hbase-site.xml
> 3. Sink could be configured via an interface in hbase-site.xml. 
> All metrics definition https://phoenix.apache.org/metrics.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3661) Make phoenix tool select file system dynamically

2017-02-13 Thread Zach York (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864667#comment-15864667
 ] 

Zach York commented on PHOENIX-3661:


Thanks for the quick review guys!

> Make phoenix tool select file system dynamically
> 
>
> Key: PHOENIX-3661
> URL: https://issues.apache.org/jira/browse/PHOENIX-3661
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0, 4.8.0
>Reporter: Yishan Yang
> Attachments: phoenix-3661-1.patch
>
>
> Phoenix indexing tool assume that the root directory is the default Hadoop 
> FileSystem. With this patch,
> phoenix index tool will get file system dynamically which will prevent “Wrong 
> FileSystem” errors. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3661) Make phoenix tool select file system dynamically

2017-02-13 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864654#comment-15864654
 ] 

Andrew Purtell commented on PHOENIX-3661:
-

Me too, I'll commit now

> Make phoenix tool select file system dynamically
> 
>
> Key: PHOENIX-3661
> URL: https://issues.apache.org/jira/browse/PHOENIX-3661
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0, 4.8.0
>Reporter: Yishan Yang
> Attachments: phoenix-3661-1.patch
>
>
> Phoenix indexing tool assume that the root directory is the default Hadoop 
> FileSystem. With this patch,
> phoenix index tool will get file system dynamically which will prevent “Wrong 
> FileSystem” errors. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3655) Metrics for PQS

2017-02-13 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864651#comment-15864651
 ] 

Andrew Purtell commented on PHOENIX-3655:
-

I think wherever it makes sense, we want the PQS driver to export the same 
metrics through the same mechanism(s) as the fat driver. That way we can swap 
one for the other with minimal operational changes including visibility into 
operations via metrics.

> Metrics for PQS
> ---
>
> Key: PHOENIX-3655
> URL: https://issues.apache.org/jira/browse/PHOENIX-3655
> Project: Phoenix
>  Issue Type: New Feature
>Affects Versions: 4.8.0
> Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98
>Reporter: Rahul Shrivastava
> Fix For: 4.9.0
>
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> Phoenix Query Server runs a separate process compared to its thin client. 
> Metrics collection is currently done by PhoenixRuntime.java i.e. at Phoenix 
> driver level. We need the following
> 1. For every jdbc statement/prepared statement/ run by PQS , we need 
> capability to collect metrics at PQS level and push the data to external sink 
> i.e. file, JMX , other external custom sources. 
> 2. Besides this global metrics could be periodically collected and pushed to 
> the sink. 
> 2. PQS can be configured to turn on metrics collection and type of collect ( 
> runtime or global) via hbase-site.xml
> 3. Sink could be configured via an interface in hbase-site.xml. 
> All metrics definition https://phoenix.apache.org/metrics.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (PHOENIX-3654) Load Balancer for thin client

2017-02-13 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864647#comment-15864647
 ] 

Andrew Purtell edited comment on PHOENIX-3654 at 2/13/17 11:18 PM:
---

bq. On the security of malicious PQS, kerborzing the PQS and ZK, will probably 
help the situation.

bq. Kerberos and ZK ACLs should give us sufficient control to solve the problem

FWIW we'd like to use the PQS as a fulcrum to switch away from Kerberos auth to 
TLS auth to, eventually, avoid any client having to deal with Kerberos.


was (Author: apurtell):
bq. Kerberos and ZK ACLs should give us sufficient control to solve the problem

FWIW we'd like to use the PQS as a fulcrum to switch away from Kerberos auth to 
TLS auth to, eventually, avoid any client having to deal with Kerberos.

> Load Balancer for thin client
> -
>
> Key: PHOENIX-3654
> URL: https://issues.apache.org/jira/browse/PHOENIX-3654
> Project: Phoenix
>  Issue Type: New Feature
>Affects Versions: 4.8.0
> Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98
>Reporter: Rahul Shrivastava
> Fix For: 4.9.0
>
>   Original Estimate: 240h
>  Remaining Estimate: 240h
>
> We have been having internal discussion on load balancer for thin client for 
> PQS. The general consensus we have is to have an embedded load balancer with 
> the thin client instead of using external load balancer such as haproxy. The 
> idea is to not to have another layer between client and PQS. This reduces 
> operational cost for system, which currently leads to delay in executing 
> projects.
> But this also comes with challenge of having an embedded load balancer which 
> can maintain sticky sessions, do fair load balancing knowing the load 
> downstream of PQS server. In addition, load balancer needs to know location 
> of multiple PQS server. Now, the thin client needs to keep track of PQS 
> servers via zookeeper ( or other means). 
> In the new design, the client ( PQS client) , it is proposed to  have an 
> embedded load balancer.
> Where will the load Balancer sit ?
> The load load balancer will embedded within the app server client.  
> How will the load balancer work ? 
> Load balancer will contact zookeeper to get location of PQS. In this case, 
> PQS needs to register to ZK itself once it comes online. Zookeeper location 
> is in hbase-site.xml. It will maintain a small cache of connection to the 
> PQS. When a request comes in, it will check for an open connection from the 
> cache. 
> How will load balancer know load on PQS ?
> To start with, it will pick a random open connection to PQS. This means that 
> load balancer does not know PQS load. Later , we can augment the code so that 
> thin client can receive load info from PQS and make intelligent decisions.  
> How will load balancer maintain sticky sessions ?
> While we still need to investigate how to implement sticky sessions. We can 
> look for some open source implementation for the same.
> How will PQS register itself to service locator ?
> PQS will have location of zookeeper in hbase-site.xml and it would register 
> itself to the zookeeper. Thin client will find out PQS location using 
> zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PHOENIX-3667) Optimize BooleanExpressionFilter for tables with encoded columns

2017-02-13 Thread James Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-3667:
--
Description: The client side of Phoenix determines the subclass of 
BooleanExpressionFilter we use based on how many column families and column 
qualifiers are being referenced. The idea is to minimize the lookup cost during 
filter evaluation. For encoded columns, instead of using a Map or Set, we can 
create a few new subclasses of BooleanExpressionFilter that use an array 
instead. No need for any lookups or equality checks - just fill in the position 
based on the column qualifier value instead. Since filters are applied on every 
row between the start/stop key, this will improve performance quite a bit.  
(was: The client side of Phoenix determines the subclass of 
BooleanExpressionFilter we use based on how many column families and column 
qualifiers are being referenced. The idea is to minimize the lookup cost during 
filter evaluation. For encoded columns, instead of using a Map or Set, we can 
use an array. No need for any lookups or equality checks - just fill in the 
position based on the column qualifier value instead. Since filters are applied 
on every row between the start/stop key, this will help quite a bit.)

> Optimize BooleanExpressionFilter for tables with encoded columns
> 
>
> Key: PHOENIX-3667
> URL: https://issues.apache.org/jira/browse/PHOENIX-3667
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: James Taylor
>Assignee: Samarth Jain
>
> The client side of Phoenix determines the subclass of BooleanExpressionFilter 
> we use based on how many column families and column qualifiers are being 
> referenced. The idea is to minimize the lookup cost during filter evaluation. 
> For encoded columns, instead of using a Map or Set, we can create a few new 
> subclasses of BooleanExpressionFilter that use an array instead. No need for 
> any lookups or equality checks - just fill in the position based on the 
> column qualifier value instead. Since filters are applied on every row 
> between the start/stop key, this will improve performance quite a bit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (PHOENIX-3667) Optimize BooleanExpressionFilter for tables with encoded columns

2017-02-13 Thread James Taylor (JIRA)

James Taylor created PHOENIX-3667:
-

 Summary: Optimize BooleanExpressionFilter for tables with encoded 
columns
 Key: PHOENIX-3667
 URL: https://issues.apache.org/jira/browse/PHOENIX-3667
 Project: Phoenix
  Issue Type: Improvement
Reporter: James Taylor
Assignee: Samarth Jain


The client side of Phoenix determines the subclass of BooleanExpressionFilter 
we use based on how many column families and column qualifiers are being 
referenced. The idea is to minimize the lookup cost during filter evaluation. 
For encoded columns, instead of using a Map or Set, we can use an array. No 
need for any lookups or equality checks - just fill in the position based on 
the column qualifier value instead. Since filters are applied on every row 
between the start/stop key, this will help quite a bit.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3585) MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail for transactional tables and local indexes

2017-02-13 Thread Thomas D'Silva (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864635#comment-15864635
 ] 

Thomas D'Silva commented on PHOENIX-3585:
-

[~rajeshbabu]

Do you know how we can combine the InternalScanner that is passed into  
IndexHalfStoreFileReaderGenerator.preCompactScannerOpen() with the scanner that 
it creates?

> MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail 
> for transactional tables and local indexes
> 
>
> Key: PHOENIX-3585
> URL: https://issues.apache.org/jira/browse/PHOENIX-3585
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Attachments: diff.patch
>
>
> the tests fail if we use HDFSTransactionStateStorage instead of  
> InMemoryTransactionStateStorage when we create the TransactionManager in 
> BaseTest



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PHOENIX-3660) Don't pass statement properties while adding columns to a table that already exists that had APPEND_ONLY_SCHEMA=true

2017-02-13 Thread Thomas D'Silva (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva updated PHOENIX-3660:

Fix Version/s: 4.10.0

> Don't pass statement properties while adding columns to a table that already 
> exists that had APPEND_ONLY_SCHEMA=true
> 
>
> Key: PHOENIX-3660
> URL: https://issues.apache.org/jira/browse/PHOENIX-3660
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3660.patch
>
>
> If the table has APPEND_ONLY_SCHEMA set to true, we should only add new 
> columns and ignore any supplied properties.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3660) Don't pass statement properties while adding columns to a table that already exists that had APPEND_ONLY_SCHEMA=true

2017-02-13 Thread Samarth Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864533#comment-15864533
 ] 

Samarth Jain commented on PHOENIX-3660:
---

+1

> Don't pass statement properties while adding columns to a table that already 
> exists that had APPEND_ONLY_SCHEMA=true
> 
>
> Key: PHOENIX-3660
> URL: https://issues.apache.org/jira/browse/PHOENIX-3660
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Attachments: PHOENIX-3660.patch
>
>
> If the table has APPEND_ONLY_SCHEMA set to true, we should only add new 
> columns and ignore any supplied properties.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (PHOENIX-2051) Link record is in the format CHILD-PARENT for phoenix views and it has to scan the entire table to find the parent suffix.

2017-02-13 Thread Thomas D'Silva (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva reassigned PHOENIX-2051:
---

Assignee: Thomas D'Silva

> Link record is in the format CHILD-PARENT for phoenix views and it has to 
> scan the entire table to find the parent suffix.
> --
>
> Key: PHOENIX-2051
> URL: https://issues.apache.org/jira/browse/PHOENIX-2051
> Project: Phoenix
>  Issue Type: Sub-task
>Affects Versions: 4.3.1
>Reporter: Arun Kumaran Sabtharishi
>Assignee: Thomas D'Silva
>
> When a phoenix view is dropped, it runs a scan on the SYSTEM.CATALOG table 
> looking for the link record. Since the link record is in the format 
> CHILD-PARENT, it has to scan the entire table to find the parent suffix. For 
> the long term solution, we can write two link records, the existing 
> CHILD-PARENT and a new PARENT-CHILD so that the findChildViews() method can 
> use a key range scan.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PHOENIX-3660) Don't pass statement properties while adding columns to a table that already exists that had APPEND_ONLY_SCHEMA=true

2017-02-13 Thread Thomas D'Silva (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva updated PHOENIX-3660:

Attachment: PHOENIX-3660.patch

> Don't pass statement properties while adding columns to a table that already 
> exists that had APPEND_ONLY_SCHEMA=true
> 
>
> Key: PHOENIX-3660
> URL: https://issues.apache.org/jira/browse/PHOENIX-3660
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Attachments: PHOENIX-3660.patch
>
>
> If the table has APPEND_ONLY_SCHEMA set to true, we should only add new 
> columns and ignore any supplied properties.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3360) Secondary index configuration is wrong

2017-02-13 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864347#comment-15864347
 ] 

James Taylor commented on PHOENIX-3360:
---

Thanks for checking that out, [~enis]. So it sounds like [~rajeshbabu]'s patch 
is the way to go, right?

> Secondary index configuration is wrong
> --
>
> Key: PHOENIX-3360
> URL: https://issues.apache.org/jira/browse/PHOENIX-3360
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Rajeshbabu Chintaguntla
>Priority: Critical
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3360.patch, PHOENIX-3360-v2.PATCH, 
> PHOENIX-3360-v3.PATCH
>
>
> IndexRpcScheduler allocates some handler threads and uses a higher priority 
> for RPCs. The corresponding IndexRpcController is not used by default as it 
> is, but used through ServerRpcControllerFactory that we configure from Ambari 
> by default which sets the priority of the outgoing RPCs to either metadata 
> priority, or the index priority.
> However, after reading code of IndexRpcController / ServerRpcController it 
> seems that the IndexRPCController DOES NOT look at whether the outgoing RPC 
> is for an Index table or not. It just sets ALL rpc priorities to be the index 
> priority. The intention seems to be the case that ONLY on servers, we 
> configure ServerRpcControllerFactory, and with clients we NEVER configure 
> ServerRpcControllerFactory, but instead use ClientRpcControllerFactory. We 
> configure ServerRpcControllerFactory from Ambari, which in affect makes it so 
> that ALL rpcs from Phoenix are only handled by the index handlers by default. 
> It means all deadlock cases are still there. 
> The documentation in https://phoenix.apache.org/secondary_indexing.html is 
> also wrong in this sense. It does not talk about server side / client side. 
> Plus this way of configuring different values is not how HBase configuration 
> is deployed. We cannot have the configuration show the 
> ServerRpcControllerFactory even only for server nodes, because the clients 
> running on those nodes will also see the wrong values. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes

2017-02-13 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864340#comment-15864340
 ] 

James Taylor commented on PHOENIX-3666:
---

I wouldn't want to impact performance with an extra sort (as perf is one of the 
main reasons we're doing this). I think using 2 bytes is reasonable as if you 
need more than 65K columns the sparseness is going to end up being a problem.

The client-side should be able to use the number of encoded bytes of the 
concrete PTable for the length of the column qualifier in any reserved column 
qualifiers. If you're stuck on a subquery issue, I'd ping [~maryannxue] and if 
you're stuck on a local index issue, I'd ping [~rajeshbabu]. I don't think we 
should wait any longer to figure it out for 4.10. Not surfacing the setting of 
the number of bytes is fine for now.

> Make use of EncodedColumnQualifierCellsList for all column name mapping 
> schemes
> ---
>
> Key: PHOENIX-3666
> URL: https://issues.apache.org/jira/browse/PHOENIX-3666
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-3666_wip.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (PHOENIX-3446) Parameterize tests for different encoding and storage schemes

2017-02-13 Thread Thomas D'Silva (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva resolved PHOENIX-3446.
-
Resolution: Fixed

> Parameterize tests for different encoding and storage schemes
> -
>
> Key: PHOENIX-3446
> URL: https://issues.apache.org/jira/browse/PHOENIX-3446
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Samarth Jain
>Assignee: Thomas D'Silva
> Attachments: PHOENIX-3446.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3446) Parameterize tests for different encoding and storage schemes

2017-02-13 Thread Samarth Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864287#comment-15864287
 ] 

Samarth Jain commented on PHOENIX-3446:
---

+, looks great. Thanks, Thomas!

> Parameterize tests for different encoding and storage schemes
> -
>
> Key: PHOENIX-3446
> URL: https://issues.apache.org/jira/browse/PHOENIX-3446
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Samarth Jain
>Assignee: Thomas D'Silva
> Attachments: PHOENIX-3446.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3572) Support FETCH NEXT| n ROWS from Cursor

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864210#comment-15864210
 ] 

ASF GitHub Bot commented on PHOENIX-3572:
-

Github user bijugs commented on the issue:

https://github.com/apache/phoenix/pull/229
  
@ankitsinghal, Thanks for the review comments. I have made the changes for 
the comments. Will rebase the code to a single commit once the review process 
is complete.


> Support FETCH NEXT| n ROWS from Cursor
> --
>
> Key: PHOENIX-3572
> URL: https://issues.apache.org/jira/browse/PHOENIX-3572
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Biju Nair
>Assignee: Biju Nair
>
> Implement required changes to support 
> - {{DECLARE}} and {{OPEN}} a cursor
> - query {{FETCH NEXT | n ROWS}} from the cursor
> - {{CLOSE}} the cursor
> Based on the feedback in [PR 
> #192|https://github.com/apache/phoenix/pull/192], implement the changes using 
> {{ResultSet}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (PHOENIX-3601) PhoenixRDD doesn't expose the preferred node locations to Spark

2017-02-13 Thread Josh Mahonin (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Mahonin resolved PHOENIX-3601.
---
   Resolution: Fixed
Fix Version/s: 4.10.0

> PhoenixRDD doesn't expose the preferred node locations to Spark
> ---
>
> Key: PHOENIX-3601
> URL: https://issues.apache.org/jira/browse/PHOENIX-3601
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: Josh Mahonin
>Assignee: Josh Mahonin
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3601.patch
>
>
> Follow-up to PHOENIX-3600, in order to let Spark know the preferred node 
> locations to assign partitions to, we need to update PhoenixRDD to retrieve 
> the underlying node location information from the splits.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3600) Core MapReduce classes don't provide location info

2017-02-13 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863986#comment-15863986
 ] 

Hudson commented on PHOENIX-3600:
-

FAILURE: Integrated in Jenkins build Phoenix-master #1550 (See 
[https://builds.apache.org/job/Phoenix-master/1550/])
PHOENIX-3600 Core MapReduce classes don't provide location info (jmahonin: rev 
267323da8242fb6f0953c1a75cf96c5fde3d49ed)
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/PhoenixInputFormat.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/PhoenixInputSplit.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java


> Core MapReduce classes don't provide location info
> --
>
> Key: PHOENIX-3600
> URL: https://issues.apache.org/jira/browse/PHOENIX-3600
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: Josh Mahonin
>Assignee: Josh Mahonin
> Attachments: PHOENIX-3600.patch, PHOENIX-3600_v2.patch
>
>
> The core MapReduce classes {{org.apache.phoenix.mapreduce.PhoenixInputSplit}} 
> and {{org.apache.phoenix.mapreduce.PhoenixInputFormat}} don't provide region 
> size or location information, leaving the execution engine (MR, Spark, etc.) 
> to randomly assign splits to nodes.
> Interestingly, the phoenix-hive module has reimplemented these classes, 
> including the node-aware functionality. We should port a subset of those 
> changes back to the core code so that other engines can make use of them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes

2017-02-13 Thread James Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863980#comment-15863980
 ] 

James Taylor commented on PHOENIX-3666:
---

Let's just hard code column qualifiers as two bytes and not expose an option to 
the user to change it for now. Leave the new table column for the property, 
though, so that we can potentially fix in a point release.

> Make use of EncodedColumnQualifierCellsList for all column name mapping 
> schemes
> ---
>
> Key: PHOENIX-3666
> URL: https://issues.apache.org/jira/browse/PHOENIX-3666
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-3666_wip.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3600) Core MapReduce classes don't provide location info

2017-02-13 Thread Josh Mahonin (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863925#comment-15863925
 ] 

Josh Mahonin commented on PHOENIX-3600:
---

Looks like I broke 4.x-HBase-0.98. Fixing ASAP.

> Core MapReduce classes don't provide location info
> --
>
> Key: PHOENIX-3600
> URL: https://issues.apache.org/jira/browse/PHOENIX-3600
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0
>Reporter: Josh Mahonin
>Assignee: Josh Mahonin
> Attachments: PHOENIX-3600.patch, PHOENIX-3600_v2.patch
>
>
> The core MapReduce classes {{org.apache.phoenix.mapreduce.PhoenixInputSplit}} 
> and {{org.apache.phoenix.mapreduce.PhoenixInputFormat}} don't provide region 
> size or location information, leaving the execution engine (MR, Spark, etc.) 
> to randomly assign splits to nodes.
> Interestingly, the phoenix-hive module has reimplemented these classes, 
> including the node-aware functionality. We should port a subset of those 
> changes back to the core code so that other engines can make use of them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3664) Pyspark: pushing filter by date against apache phoenix

2017-02-13 Thread Josh Mahonin (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863720#comment-15863720
 ] 

Josh Mahonin commented on PHOENIX-3664:
---

Hi [~pablo.castellanos]

I've not seen this before, although I wonder if there's perhaps a few issues at 
play.
1) Some sort of date translation issue between python datetime, pySpark and 
phoenix-spark
2) An issue with how Spark treats the 'java.sql.Date' type, and how Phoenix 
stores it internally

Re: 1) Is it possible to attempt a similar code block using Scala in the 
spark-shell? I think it should be pretty much the same code, just replace 
{{datetime.datetime.now}} with {{System.currentTimeMillis}}

Re: 2) You might have some success passing the 'dateAsTimestamp' flag to Spark. 
Effectively Spark truncates the HH:MM:SS part of a date off, even though it is 
present in the Phoenix data type. I wonder if pyspark is doing anything strange 
with that.

https://github.com/apache/phoenix/blob/a0e5efcec5a1a732b2dce9794251242c3d66eea6/phoenix-spark/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala#L622-L633

> Pyspark: pushing filter by date against apache phoenix
> --
>
> Key: PHOENIX-3664
> URL: https://issues.apache.org/jira/browse/PHOENIX-3664
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Azure HDIndight - pyspark using phoenix client.
>Reporter: Pablo Castilla
>
> I am trying to filter by date in apache phoenix from pyspark. The column in 
> phoenix is created as Date and the filter is a datetime. When I use explain I 
> see spark doesn't push the filter to phoenix. I have tried a lot of 
> combinations without luck.
> Any way to do it?
> df = sqlContext.read \
>.format("org.apache.phoenix.spark") \
>   .option("table", "TABLENAME") \
>   .option("zkUrl",zookepperServer +":2181:/hbase-unsecure" ) \
>   .load()
> print(df.printSchema())
> startValidation = datetime.datetime.now()
> print(df.filter(df['FH'] >startValidation).explain(True))
> Results:
> root
>  |-- METER_ID: string (nullable = true)
>  |-- FH: date (nullable = true)
> None
>== Parsed Logical Plan ==
> 'Filter (FH#53 > 1486726683446150)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Analyzed Logical Plan ==
> METER_ID: string, FH: date, SUMMERTIME: string, MAGNITUDE: int, SOURCE: int, 
> ENTRY_DATETIME: date, BC: string, T_VAL_AE: int, T_VAL_AI: int, T_VAL_R1: 
> int, T_VAL_R2: int, T_VAL_R3: int, T_VAL_R4: int
> Filter (cast(FH#53 as string) > cast(1486726683446150 as string))
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Optimized Logical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Physical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- Scan 
> PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
> None
> if I set the FH column as timestamp it pushes the filter but throws an 
> exception:
> Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 
> (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "12" at line 
> 1, column 219.
> at 
> org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33)
> at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368)
> at 
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122)
> ... 102 more
> Caused by: MismatchedTokenException(106!=129)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:360)
> at 
> org.apache.phoenix.shaded.org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
>

[jira] [Commented] (PHOENIX-3665) Dataset api is missing phoenix spark connector for spark 2.0.2

2017-02-13 Thread Josh Mahonin (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863708#comment-15863708
 ] 

Josh Mahonin commented on PHOENIX-3665:
---

Can you verify if the fix for PHOENIX- solves this?

> Dataset api is missing phoenix spark connector for spark 2.0.2
> --
>
> Key: PHOENIX-3665
> URL: https://issues.apache.org/jira/browse/PHOENIX-3665
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Aavesh
>
> We have  used DataFrameFunctions class api for dataFrames for putting 
> DataFrame in the hbase table. But For spark 2.0.2 version dataFrame is no 
> more available for java and scala code and for this we need to phoenix spark 
> api for spark 2.0.2 which will igest data using dataset into hbase tables.
> Please help if it is already available  else give some workaround on this  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes

2017-02-13 Thread Samarth Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-3666:
--
Attachment: PHOENIX-3666_wip.patch

This turned out to be trickier than I anticipated. Essentially, I wanted to 
make sure that we are able to use all the different column mapping schemes 
which turned up an issue in the way we are hardcoding "reserved" column 
qualifiers. To help resolve this, I thought serializing the encoding scheme in 
ProjectedColumnExpressions would help but unfortunately that was a dark abyss. 
We generate various column expressions using intermediate PTable 
representations that don't (and can't) have the right encoding schemes in them. 
This took me down the path of attempting to use the right encoding scheme when 
we deserialize the expressions on the server side. But that made the code 
really fragile as we serialize expressions everywhere and having to fix the 
scheme in all those places was just ugly. I ultimately decided to hard code the 
reserved column qualifiers (range 1-10) to be serialized using 
ONE_BYTE_QUALIFIER scheme. I also relaxed the constraints in the encoding 
schemes to decode byte arrays of size 1 with the ONE_BYTE_QUALIFIER 
encoding/decoding scheme. A side effect of this change is that the 
EncodedColumnQualifierCellsList is no longer sorted wrt column qualifiers. This 
is because a one byte qualifier representation of 0 lexicographically sorts 
after lets say a 4 byte qualifier representation 11. As a result I need to sort 
the array of cells before creating a ResultTuple out of it. I am parking this 
patch as wip since I only want to do the sorting when needed. 
All tests pass with this patch though.

> Make use of EncodedColumnQualifierCellsList for all column name mapping 
> schemes
> ---
>
> Key: PHOENIX-3666
> URL: https://issues.apache.org/jira/browse/PHOENIX-3666
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-3666_wip.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes

2017-02-13 Thread Samarth Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-3666:
--
Issue Type: Sub-task  (was: Task)
Parent: PHOENIX-1598

> Make use of EncodedColumnQualifierCellsList for all column name mapping 
> schemes
> ---
>
> Key: PHOENIX-3666
> URL: https://issues.apache.org/jira/browse/PHOENIX-3666
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Samarth Jain
>Assignee: Samarth Jain
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes

2017-02-13 Thread Samarth Jain (JIRA)

Samarth Jain created PHOENIX-3666:
-

 Summary: Make use of EncodedColumnQualifierCellsList for all 
column name mapping schemes
 Key: PHOENIX-3666
 URL: https://issues.apache.org/jira/browse/PHOENIX-3666
 Project: Phoenix
  Issue Type: Task
Reporter: Samarth Jain
Assignee: Samarth Jain






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (PHOENIX-3665) Dataset api is missing phoenix spark connector for spark 2.0.2

2017-02-13 Thread Aavesh (JIRA)

Aavesh created PHOENIX-3665:
---

 Summary: Dataset api is missing phoenix spark connector for spark 
2.0.2
 Key: PHOENIX-3665
 URL: https://issues.apache.org/jira/browse/PHOENIX-3665
 Project: Phoenix
  Issue Type: Bug
Reporter: Aavesh


We have  used DataFrameFunctions class api for dataFrames for putting DataFrame 
in the hbase table. But For spark 2.0.2 version dataFrame is no more available 
for java and scala code and for this we need to phoenix spark api for spark 
2.0.2 which will igest data using dataset into hbase tables.

Please help if it is already available  else give some workaround on this  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (PHOENIX-3664) Pyspark: pushing filter by date against apache phoenix

2017-02-13 Thread Pablo Castilla (JIRA)

Pablo Castilla created PHOENIX-3664:
---

 Summary: Pyspark: pushing filter by date against apache phoenix
 Key: PHOENIX-3664
 URL: https://issues.apache.org/jira/browse/PHOENIX-3664
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.7.0
 Environment: Azure HDIndight - pyspark using phoenix client.
Reporter: Pablo Castilla


I am trying to filter by date in apache phoenix from pyspark. The column in 
phoenix is created as Date and the filter is a datetime. When I use explain I 
see spark doesn't push the filter to phoenix. I have tried a lot of 
combinations without luck.

Any way to do it?

df = sqlContext.read \
   .format("org.apache.phoenix.spark") \
  .option("table", "TABLENAME") \
  .option("zkUrl",zookepperServer +":2181:/hbase-unsecure" ) \
  .load()
print(df.printSchema())

startValidation = datetime.datetime.now()

print(df.filter(df['FH'] >startValidation).explain(True))

Results:
root
 |-- METER_ID: string (nullable = true)
 |-- FH: date (nullable = true)

None
   == Parsed Logical Plan ==
'Filter (FH#53 > 1486726683446150)
+- 
Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
 PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)

== Analyzed Logical Plan ==
METER_ID: string, FH: date, SUMMERTIME: string, MAGNITUDE: int, SOURCE: int, 
ENTRY_DATETIME: date, BC: string, T_VAL_AE: int, T_VAL_AI: int, T_VAL_R1: int, 
T_VAL_R2: int, T_VAL_R3: int, T_VAL_R4: int
Filter (cast(FH#53 as string) > cast(1486726683446150 as string))
+- 
Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
 PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)

== Optimized Logical Plan ==
Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
+- 
Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
 PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)

== Physical Plan ==
Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
+- Scan 
PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
None

if I set the FH column as timestamp it pushes the filter but throws an 
exception:

Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 
(42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "12" at line 
1, column 219.
at 
org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33)
at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111)
at 
org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280)
at 
org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363)
at 
org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373)
at 
org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368)
at 
org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122)
... 102 more
Caused by: MismatchedTokenException(106!=129)
at 
org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:360)
at 
org.apache.phoenix.shaded.org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
at 
org.apache.phoenix.parse.PhoenixSQLParser.not_expression(PhoenixSQLParser.java:6862)
at 
org.apache.phoenix.parse.PhoenixSQLParser.and_expression(PhoenixSQLParser.java:6677)
at 
org.apache.phoenix.parse.PhoenixSQLParser.or_expression(PhoenixSQLParser.java:6614)
at 
org.apache.phoenix.parse.PhoenixSQLParser.expression(PhoenixSQLParser.java:6579)
at 
org.apache.phoenix.parse.PhoenixSQLParser.single_select(PhoenixSQLParser.java:4615)
at 
org.apache.phoenix.parse.PhoenixSQLParser.unioned_selects(PhoenixSQLParser.java:4697)
at 
org.apache.phoenix.parse.PhoenixSQLParser.select_node(PhoenixSQLParser.java:4763)
at 
org.apache.phoenix.parse.PhoenixSQLParser.oneStatement(PhoenixSQLParser.java:789)
at 
org.apache.phoenix.parse.PhoenixSQLParser.statement(PhoenixSQLParser.java:508)
at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:108)
... 107 more



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3572) Support FETCH NEXT| n ROWS from Cursor

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863298#comment-15863298
 ] 

ASF GitHub Bot commented on PHOENIX-3572:
-

Github user ankitsinghal commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/229#discussion_r100738646
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/execute/CursorFetchPlan.java ---
@@ -0,0 +1,87 @@
+package org.apache.phoenix.execute;
+
+import java.sql.ParameterMetaData;
+import java.sql.SQLException;
+import java.util.List;
+import java.util.Set;
+
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.phoenix.compile.ExplainPlan;
+import org.apache.phoenix.compile.GroupByCompiler.GroupBy;
+import org.apache.phoenix.compile.OrderByCompiler.OrderBy;
+import org.apache.phoenix.compile.QueryPlan;
+import org.apache.phoenix.compile.RowProjector;
+import org.apache.phoenix.compile.StatementContext;
+import org.apache.phoenix.iterate.CursorResultIterator;
+import org.apache.phoenix.iterate.ParallelScanGrouper;
+import org.apache.phoenix.iterate.ResultIterator;
+import org.apache.phoenix.jdbc.PhoenixStatement.Operation;
+import org.apache.phoenix.parse.FilterableStatement;
+import org.apache.phoenix.query.KeyRange;
+import org.apache.phoenix.schema.TableRef;
+
+public class CursorFetchPlan extends DelegateQueryPlan {
+
+   //QueryPlan cursorQueryPlan;
+   private CursorResultIterator resultIterator;
+   private int fetchSize;
+
+   public CursorFetchPlan(QueryPlan cursorQueryPlan) {
+   super(cursorQueryPlan);
+   }
+
+
+   @Override
+   public ResultIterator iterator() throws SQLException {
+   // TODO Auto-generated method stub
+   StatementContext context = delegate.getContext();
+   if (resultIterator != null) {
+   return resultIterator;
+   } else {
+   context.getOverallQueryMetrics().startQuery();
+   resultIterator = (CursorResultIterator) 
delegate.iterator();
+   return resultIterator;
+   }
+   }
+
+   @Override
+   public ResultIterator iterator(ParallelScanGrouper scanGrouper) throws 
SQLException {
+   // TODO Auto-generated method stub
+   StatementContext context = delegate.getContext();
+   if (resultIterator != null) {
+   return resultIterator;
+   } else {
+   context.getOverallQueryMetrics().startQuery();
+   resultIterator = (CursorResultIterator) 
delegate.iterator(scanGrouper);
+   return resultIterator;
+   }
+   }
+
+   @Override
+   public ResultIterator iterator(ParallelScanGrouper scanGrouper, Scan 
scan) throws SQLException {
+   // TODO Auto-generated method stub
--- End diff --

can you merge these iterators as all are doing the same and base class will 
be calling iterator(ParallelScanGrouper scanGrouper, Scan scan) internally from 
other overloaded methods with special parameter value.


> Support FETCH NEXT| n ROWS from Cursor
> --
>
> Key: PHOENIX-3572
> URL: https://issues.apache.org/jira/browse/PHOENIX-3572
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Biju Nair
>Assignee: Biju Nair
>
> Implement required changes to support 
> - {{DECLARE}} and {{OPEN}} a cursor
> - query {{FETCH NEXT | n ROWS}} from the cursor
> - {{CLOSE}} the cursor
> Based on the feedback in [PR 
> #192|https://github.com/apache/phoenix/pull/192], implement the changes using 
> {{ResultSet}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3572) Support FETCH NEXT| n ROWS from Cursor

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863300#comment-15863300
 ] 

ASF GitHub Bot commented on PHOENIX-3572:
-

Github user ankitsinghal commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/229#discussion_r100738871
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/util/CursorUtil.java ---
@@ -0,0 +1,203 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.phoenix.util;
+
+import java.sql.Connection;
+import java.sql.SQLException;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.phoenix.compile.QueryPlan;
+import org.apache.phoenix.compile.OrderByCompiler.OrderBy;
+import org.apache.phoenix.execute.CursorFetchPlan;
+import org.apache.phoenix.iterate.CursorResultIterator;
+import org.apache.phoenix.parse.CloseStatement;
+import org.apache.phoenix.parse.DeclareCursorStatement;
+import org.apache.phoenix.parse.OpenStatement;
+import org.apache.phoenix.schema.tuple.Tuple;
+
+public final class CursorUtil {
+
+private static class CursorWrapper {
+private final String cursorName;
+private final String selectSQL;
+private boolean isOpen = false;
+QueryPlan queryPlan;
+ImmutableBytesWritable row;
+ImmutableBytesWritable previousRow;
+private Scan scan;
+private boolean moreValues=true;
+private boolean isReversed;
+private boolean islastCallNext;
+private CursorFetchPlan fetchPlan;
+private int offset = -1;
+
+private CursorWrapper(String cursorName, String selectSQL, 
QueryPlan queryPlan){
+this.cursorName = cursorName;
+this.selectSQL = selectSQL;
+this.queryPlan = queryPlan;
+this.islastCallNext = true;
+this.fetchPlan = new CursorFetchPlan(queryPlan);
+}
+
+private synchronized void openCursor(Connection conn) throws 
SQLException {
+if(isOpen){
+return;
+}
+this.scan = this.queryPlan.getContext().getScan();
+
isReversed=OrderBy.REV_ROW_KEY_ORDER_BY.equals(this.queryPlan.getOrderBy());
+isOpen = true;
+}
+
+private void closeCursor() throws SQLException {
+isOpen = false;
+((CursorResultIterator) fetchPlan.iterator()).closeCursor();
+//TODO: Determine if the cursor should be removed from the 
HashMap at this point.
+//Semantically it makes sense that something which is 'Closed' 
one should be able to 'Open' again.
+mapCursorIDQuery.remove(this.cursorName);
+}
+
+private QueryPlan getFetchPlan(boolean isNext, int fetchSize) 
throws SQLException {
+if (!isOpen)
+throw new SQLException("Fetch call on closed cursor '" + 
this.cursorName + "'!");
+
((CursorResultIterator)fetchPlan.iterator()).setFetchSize(fetchSize);
+if (!queryPlan.getStatement().isAggregate() || 
!queryPlan.getStatement().isDistinct()) { 
+   if (islastCallNext != isNext) {
+if (islastCallNext && !isReversed){
+   ScanUtil.setReversed(scan);
+} else {
+   ScanUtil.unsetReversed(scan);
+}
--- End diff --

this code seems to be for reverse/prior and belongs to another JIRA. can we 
remove this if it can affect the functionality?


> Support FETCH NEXT| n ROWS from Cursor
> --
>
> Key: PHOENIX-3572
> URL:

[jira] [Commented] (PHOENIX-3572) Support FETCH NEXT| n ROWS from Cursor

2017-02-13 Thread Ankit Singhal (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863302#comment-15863302
 ] 

Ankit Singhal commented on PHOENIX-3572:


[~gsbiju], requires some cleanup so have left some feedback.
ping [~jamestaylor] for review.

> Support FETCH NEXT| n ROWS from Cursor
> --
>
> Key: PHOENIX-3572
> URL: https://issues.apache.org/jira/browse/PHOENIX-3572
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Biju Nair
>Assignee: Biju Nair
>
> Implement required changes to support 
> - {{DECLARE}} and {{OPEN}} a cursor
> - query {{FETCH NEXT | n ROWS}} from the cursor
> - {{CLOSE}} the cursor
> Based on the feedback in [PR 
> #192|https://github.com/apache/phoenix/pull/192], implement the changes using 
> {{ResultSet}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PHOENIX-3471) Allow accessing full (legacy) Phoenix EXPLAIN information via Calcite

2017-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863321#comment-15863321
 ] 

ASF GitHub Bot commented on PHOENIX-3471:
-

GitHub user gabrielreid opened a pull request:

https://github.com/apache/phoenix/pull/231

PHOENIX-3471 Add query plan matching system

Add a generic system for parsing and matching Calcite query
plans using Hamcrest matchers. The general intention is to make
matching of query plans less brittle and somewhat easier to write
than simply matching the full text of the query plan.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gabrielreid/phoenix PHOENIX-3471_explain_plan

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/phoenix/pull/231.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #231


commit 128ff0a3288b0cfc48b068fc21704ca07278a33c
Author: Gabriel Reid 
Date:   2016-11-18T09:58:18Z

PHOENIX-3471 Add query plan matching system

Add a generic system for parsing and matching Calcite query
plans using Hamcrest matchers. The general intention is to make
matching of query plans less brittle and somewhat easier to write
than simply matching the full text of the query plan.




> Allow accessing full (legacy) Phoenix EXPLAIN information via Calcite
> -
>
> Key: PHOENIX-3471
> URL: https://issues.apache.org/jira/browse/PHOENIX-3471
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Gabriel Reid
>Assignee: Gabriel Reid
>
> The EXPLAIN syntax in Calcite-Phoenix (either "EXPLAIN " or "EXPLAIN 
> PLAN FOR ") currently returns the Calcite plan for a query. For example:
> {code}
> EXPLAIN SELECT MAX(I) FROM T1
> {code}
> results in the following Calcite explain plan:
> {code}
> PhoenixToEnumerableConverter
>   PhoenixServerAggregate(group=[{}], EXPR$0=[MAX($0)])
> PhoenixTableScan(table=[[phoenix, T1]])
> {code}
> and the following (legacy) Phoenix explain plan:
> {code}
> CLIENT PARALLEL 1-WAY FULL SCAN OVER T1
> SERVER FILTER BY FIRST KEY ONLY
> {code}
> There are currently a large number of integration tests which depend on the 
> legacy Phoenix format of explain plan, and this format is no longer available 
> when running via Calcite. PHOENIX-3105 added support for accessing the 
> explain plan via the "EXPLAIN " syntax, but this update to the syntax 
> still only provides the Calcite-specific explain plan.
> There are three main approaches which can be taken here:
> h4. Option 1: Custom EXPLAIN execution
> This approach extends the work done in PHOENIX-3105 to plug in a custom 
> SqlPhoenixExplain
> node which returns the legacy Phoenix explain plan, with the "EXPLAIN PLAN 
> FOR "
> syntax still returning the Calcite explain plan.
> h4. Option 2: Add the legacy Phoenix explain plan to the Calcite plan as a 
> top-level attribute
> This approach results in an explain plan that looks as follows:
> {code}
> PhoenixToEnumerableConverter(PhoenixExecutionPlan=[CLIENT PARALLEL 1-WAY FULL 
> SCAN OVER T1
> SERVER FILTER BY FIRST KEY ONLY])
>   PhoenixServerAggregate(group=[{}], EXPR$0=[MAX($0)])
> PhoenixTableScan(table=[[phoenix, T1]])
> {code}
> The disadvantage of this approach is that it's not really "correct" -- we're 
> just tacking 
> a different representation of the explain plan into the Calcite explain plan.
> The advantage of this approach is that it's very quick and easy to implement 
> (i.e. it
> can be done immediately), and it will require minimal changes to the many 
> test cases which have
> hard-coded explain plans that things are checked against. All we need to do 
> is have a 
> utility to extract the PhoenixExecutionPlan value from the full Calcite plan, 
> and other
> than that all test cases stay the same.
> h4. Option 3: Add all relevant information to the correct parts of the 
> Calcite explain plan
> This approach would result in an explain plan that looks as follows:
> {code}
> PhoenixToEnumerableConverter
>   PhoenixServerAggregate(group=[{}], EXPR$0=[MAX($0)])
> PhoenixTableScan(table=[[phoenix, T1]], scanType[CLIENT PARALLEL 1-WAY 
> FULL ])
> {code}
> This is undoubtedly the "right" way to do things. However, it has the major 
> disadvantage
> that it will require a large amount of work to do the following:
> * add all relevant information into various implementations of 
> {{AbstractRelNode.explainTerms}}
> * rework all test cases which verify things against an expected explain plan
> It is of course also an option is to start with option 2 here, and eventually 
> migrate to option 3.
> If we go for

[jira] [Commented] (PHOENIX-3112) Partial row scan not handled correctly

2017-02-13 Thread Manivel Poomalai (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863149#comment-15863149
 ] 

Manivel Poomalai commented on PHOENIX-3112:
---

Hi James,

Thanks for responding in email, As per your suggest I have commented in JIRA 
instead via email. Do you have any solution or workaround to fix this issue, 
and when this JIRA will be resolved do you have any tentative date.

Thanks,
-Manivel


> Partial row scan not handled correctly
> --
>
> Key: PHOENIX-3112
> URL: https://issues.apache.org/jira/browse/PHOENIX-3112
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
>Reporter: Pierre Lacave
>
> When doing a select of a relatively large table (a few touthands rows) some 
> rows return partially missing.
> When increasing the fitler to return those specific rows, the values appear 
> as expected
> {noformat}
> CREATE TABLE IF NOT EXISTS TEST (
> BUCKET VARCHAR,
> TIMESTAMP_DATE TIMESTAMP,
> TIMESTAMP UNSIGNED_LONG NOT NULL,
> SRC VARCHAR,
> DST VARCHAR,
> ID VARCHAR,
> ION VARCHAR,
> IC BOOLEAN NOT NULL,
> MI UNSIGNED_LONG,
> AV UNSIGNED_LONG,
> MA UNSIGNED_LONG,
> CNT UNSIGNED_LONG,
> DUMMY VARCHAR
> CONSTRAINT pk PRIMARY KEY (BUCKET, TIMESTAMP DESC, SRC, DST, ID, ION, IC)
> );{noformat}
> using a python script to generate a CSV with 5000 rows
> {noformat}
> for i in xrange(5000):
> print "5SEC,2016-07-21 
> 07:25:35.{i},146908593500{i},,AAA,,,false,{i}1181000,1788000{i},2497001{i},{i},a{i}".format(i=i)
> {noformat}
> bulk inserting the csv in the table
> {noformat}
> phoenix/bin/psql.py localhost -t TEST large.csv
> {noformat}
> here we can see one row that contains no TIMESTAMP_DATE and null values in MI 
> and MA
> {noformat}
> 0: jdbc:phoenix:localhost:2181> select * from TEST 
> 
> +-+--+---+---+--+---+---++--+--+--+---++
> | BUCKET  |  TIMESTAMP_DATE  | TIMESTAMP |SRC| DST  | 
>  ID   |ION|   IC   |  MI  |  AV  |  MA  |  
> CNT  |   DUMMY
> |
> +-+--+---+---+--+---+---++--+--+--+---++
> | 5SEC| 2016-07-21 07:25:35.100  | 1469085935001000  |   | AAA  | 
>   |   | false  | 10001181000  | 17880001000  | 24970011000  | 
> 1000  | 
> a1000  |
> | 5SEC| 2016-07-21 07:25:35.999  | 146908593500999   |   | AAA  | 
>   |   | false  | 9991181000   | 1788000999   | 2497001999   | 999 
>   | a999  
>  |
> | 5SEC| 2016-07-21 07:25:35.998  | 146908593500998   |   | AAA  | 
>   |   | false  | 9981181000   | 1788000998   | 2497001998   | 998 
>   | a998  
>  |
> | 5SEC|  | 146908593500997   |   | AAA  | 
>   |   | false  | null | 1788000997   | null | 997 
>   |   
>  |
> | 5SEC| 2016-07-21 07:25:35.996  | 146908593500996   |   | AAA  | 
>   |   | false  | 9961181000   | 1788000996   | 2497001996   | 996 
>   | a996  
>  |
> | 5SEC| 2016-07-21 07:25:35.995  | 146908593500995   |   | AAA  | 
>   |   | false  | 9951181000   | 1788000995   | 2497001995   | 995 
>   | a995  
>  |
> | 5SEC| 2016-07-21 07:25:35.994  | 146908593500994   |   | AAA  | 
>   |   | false  | 9941181000   | 1788000994   | 2497001994   | 994 
>   | a994  
>  |
> 
> {noformat}
> but when selecting that row specifically the values are correct
> {noformat}
> 0: jdbc:phoenix:localhost:2181> select * from TEST where timestamp = 
> 146908593500997;
>

[jira] [Commented] (PHOENIX-3471) Allow accessing full (legacy) Phoenix EXPLAIN information via Calcite

2017-02-13 Thread Gabriel Reid (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863323#comment-15863323
 ] 

Gabriel Reid commented on PHOENIX-3471:
---

(Finally) added a PR to pull the plumbing for this into the Calcite branch: 
https://github.com/apache/phoenix/pull/231. This PR adds the basics to be able 
to interpret and match Calcite query plans as outlined in comments above.

> Allow accessing full (legacy) Phoenix EXPLAIN information via Calcite
> -
>
> Key: PHOENIX-3471
> URL: https://issues.apache.org/jira/browse/PHOENIX-3471
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Gabriel Reid
>Assignee: Gabriel Reid
>
> The EXPLAIN syntax in Calcite-Phoenix (either "EXPLAIN " or "EXPLAIN 
> PLAN FOR ") currently returns the Calcite plan for a query. For example:
> {code}
> EXPLAIN SELECT MAX(I) FROM T1
> {code}
> results in the following Calcite explain plan:
> {code}
> PhoenixToEnumerableConverter
>   PhoenixServerAggregate(group=[{}], EXPR$0=[MAX($0)])
> PhoenixTableScan(table=[[phoenix, T1]])
> {code}
> and the following (legacy) Phoenix explain plan:
> {code}
> CLIENT PARALLEL 1-WAY FULL SCAN OVER T1
> SERVER FILTER BY FIRST KEY ONLY
> {code}
> There are currently a large number of integration tests which depend on the 
> legacy Phoenix format of explain plan, and this format is no longer available 
> when running via Calcite. PHOENIX-3105 added support for accessing the 
> explain plan via the "EXPLAIN " syntax, but this update to the syntax 
> still only provides the Calcite-specific explain plan.
> There are three main approaches which can be taken here:
> h4. Option 1: Custom EXPLAIN execution
> This approach extends the work done in PHOENIX-3105 to plug in a custom 
> SqlPhoenixExplain
> node which returns the legacy Phoenix explain plan, with the "EXPLAIN PLAN 
> FOR "
> syntax still returning the Calcite explain plan.
> h4. Option 2: Add the legacy Phoenix explain plan to the Calcite plan as a 
> top-level attribute
> This approach results in an explain plan that looks as follows:
> {code}
> PhoenixToEnumerableConverter(PhoenixExecutionPlan=[CLIENT PARALLEL 1-WAY FULL 
> SCAN OVER T1
> SERVER FILTER BY FIRST KEY ONLY])
>   PhoenixServerAggregate(group=[{}], EXPR$0=[MAX($0)])
> PhoenixTableScan(table=[[phoenix, T1]])
> {code}
> The disadvantage of this approach is that it's not really "correct" -- we're 
> just tacking 
> a different representation of the explain plan into the Calcite explain plan.
> The advantage of this approach is that it's very quick and easy to implement 
> (i.e. it
> can be done immediately), and it will require minimal changes to the many 
> test cases which have
> hard-coded explain plans that things are checked against. All we need to do 
> is have a 
> utility to extract the PhoenixExecutionPlan value from the full Calcite plan, 
> and other
> than that all test cases stay the same.
> h4. Option 3: Add all relevant information to the correct parts of the 
> Calcite explain plan
> This approach would result in an explain plan that looks as follows:
> {code}
> PhoenixToEnumerableConverter
>   PhoenixServerAggregate(group=[{}], EXPR$0=[MAX($0)])
> PhoenixTableScan(table=[[phoenix, T1]], scanType[CLIENT PARALLEL 1-WAY 
> FULL ])
> {code}
> This is undoubtedly the "right" way to do things. However, it has the major 
> disadvantage
> that it will require a large amount of work to do the following:
> * add all relevant information into various implementations of 
> {{AbstractRelNode.explainTerms}}
> * rework all test cases which verify things against an expected explain plan
> It is of course also an option is to start with option 2 here, and eventually 
> migrate to option 3.
> If we go for option 2 or option 3, we should probably remove the custom 
> EXPLAIN parsing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[GitHub] phoenix pull request #231: PHOENIX-3471 Add query plan matching system

2017-02-13 Thread gabrielreid

GitHub user gabrielreid opened a pull request:

https://github.com/apache/phoenix/pull/231

PHOENIX-3471 Add query plan matching system

Add a generic system for parsing and matching Calcite query
plans using Hamcrest matchers. The general intention is to make
matching of query plans less brittle and somewhat easier to write
than simply matching the full text of the query plan.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gabrielreid/phoenix PHOENIX-3471_explain_plan

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/phoenix/pull/231.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #231


commit 128ff0a3288b0cfc48b068fc21704ca07278a33c
Author: Gabriel Reid 
Date:   2016-11-18T09:58:18Z

PHOENIX-3471 Add query plan matching system

Add a generic system for parsing and matching Calcite query
plans using Hamcrest matchers. The general intention is to make
matching of query plans less brittle and somewhat easier to write
than simply matching the full text of the query plan.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

59 matches

Mail list logo