[jira] [Commented] (PHOENIX-3360) Secondary index configuration is wrong
[ https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865298#comment-15865298 ] William Yang commented on PHOENIX-3360: --- New patch attached. There is another reason we have to create a single connection used for index updates. See {{CoprocessorHConnection#getConnectionForEnvironment()}}, it will create a new connection at each call. Then the ctor of {{HConnectionImplementation}} will be called. In this ctor, it will hit ZK to read the cluster id by calling {{retrieveClusterId()}}. This is totally unacceptable. Apart from the extra network operation, it will still generate many CLOSE-WAIT tcp connections in ZK cluster. As ZK is always a critical resource that we should try our best to not access it unless we have to. If we haven't configured connection limit big enough in zoo.cfg ({{maxClientCnxns}}), then index updates will fail at getting HTableInterface phase because ZK connection requests are rejected for there are already too many. Has anyone ever encountered this problem? > Secondary index configuration is wrong > -- > > Key: PHOENIX-3360 > URL: https://issues.apache.org/jira/browse/PHOENIX-3360 > Project: Phoenix > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Rajeshbabu Chintaguntla >Priority: Critical > Fix For: 4.10.0 > > Attachments: ConfCP.java, PHOENIX-3360.patch, PHOENIX-3360-v2.PATCH, > PHOENIX-3360-v3.PATCH, PHOENIX-3360-v4.PATCH > > > IndexRpcScheduler allocates some handler threads and uses a higher priority > for RPCs. The corresponding IndexRpcController is not used by default as it > is, but used through ServerRpcControllerFactory that we configure from Ambari > by default which sets the priority of the outgoing RPCs to either metadata > priority, or the index priority. > However, after reading code of IndexRpcController / ServerRpcController it > seems that the IndexRPCController DOES NOT look at whether the outgoing RPC > is for an Index table or not. It just sets ALL rpc priorities to be the index > priority. The intention seems to be the case that ONLY on servers, we > configure ServerRpcControllerFactory, and with clients we NEVER configure > ServerRpcControllerFactory, but instead use ClientRpcControllerFactory. We > configure ServerRpcControllerFactory from Ambari, which in affect makes it so > that ALL rpcs from Phoenix are only handled by the index handlers by default. > It means all deadlock cases are still there. > The documentation in https://phoenix.apache.org/secondary_indexing.html is > also wrong in this sense. It does not talk about server side / client side. > Plus this way of configuring different values is not how HBase configuration > is deployed. We cannot have the configuration show the > ServerRpcControllerFactory even only for server nodes, because the clients > running on those nodes will also see the wrong values. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3360) Secondary index configuration is wrong
[ https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] William Yang updated PHOENIX-3360: -- Attachment: PHOENIX-3360-v4.PATCH > Secondary index configuration is wrong > -- > > Key: PHOENIX-3360 > URL: https://issues.apache.org/jira/browse/PHOENIX-3360 > Project: Phoenix > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Rajeshbabu Chintaguntla >Priority: Critical > Fix For: 4.10.0 > > Attachments: ConfCP.java, PHOENIX-3360.patch, PHOENIX-3360-v2.PATCH, > PHOENIX-3360-v3.PATCH, PHOENIX-3360-v4.PATCH > > > IndexRpcScheduler allocates some handler threads and uses a higher priority > for RPCs. The corresponding IndexRpcController is not used by default as it > is, but used through ServerRpcControllerFactory that we configure from Ambari > by default which sets the priority of the outgoing RPCs to either metadata > priority, or the index priority. > However, after reading code of IndexRpcController / ServerRpcController it > seems that the IndexRPCController DOES NOT look at whether the outgoing RPC > is for an Index table or not. It just sets ALL rpc priorities to be the index > priority. The intention seems to be the case that ONLY on servers, we > configure ServerRpcControllerFactory, and with clients we NEVER configure > ServerRpcControllerFactory, but instead use ClientRpcControllerFactory. We > configure ServerRpcControllerFactory from Ambari, which in affect makes it so > that ALL rpcs from Phoenix are only handled by the index handlers by default. > It means all deadlock cases are still there. > The documentation in https://phoenix.apache.org/secondary_indexing.html is > also wrong in this sense. It does not talk about server side / client side. > Plus this way of configuring different values is not how HBase configuration > is deployed. We cannot have the configuration show the > ServerRpcControllerFactory even only for server nodes, because the clients > running on those nodes will also see the wrong values. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3360) Secondary index configuration is wrong
[ https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865236#comment-15865236 ] William Yang commented on PHOENIX-3360: --- bq. CompoundConfiguration treats the added configs as immutable, and has an internal mutable config (see the code). This means that with the original patch, the rest of region server (including replication) will not be affected. I've done a simple test, see {{ConfCP.java}}. If we change the RegionServer level configuration in a coprocessor, then all the other Regions opened on the same RS will see the change. It has nothing to do with the implementation of Configuration class or any other internal classes, but is determined by where a region's Configuration object comes from. I checked the code in both hbase 1.1.2 and 0.94. See {{RegionCoprocessorHost#getTableCoprocessorAttrsFromSchema()}} for 1.1 and {{RegionCoprocessorHost#loadTableCoprocessors()}} for 0.94. Each region will have its own copy of Configuration, which are all copied from the region server's configuration object. So it is safe to change the configuration returned by {{CoprocessorEnvironment#getConfiguration()}} and this change can be seen only within this Region. But we should never change the Configurations return by {{RegionCoprocessorEnvironment#getRegionServerServices().getConfiguration()}} for this will change all the other Regions' conf. How to use ConfCP.java * create 'test1', 'cf' * create 'test2', 'cf' * make sure that all regions of the above two tables are hosted in the same regionserver * add coprocessor ConfCP for test1, check log, should see the print below: {code} YHYH1: [test1]conf hashCode = 2027310658 YHYH2: [test1]put conf (yh.special.key,XX) YHYH3: [test1]get conf (yh.special.key,XX) {code} * add coprocessor ConfCP for test2, check the log again, should see the print below {code} YHYH1: [test2]conf hashCode = 2027310658 YHYH3: [test2]get conf (yh.special.key,XX) {code} Note that {{conf}} can be assigned by two values. for {code} conf = ((RegionCoprocessorEnvironment)e).getRegionServerServices().getConfiguration(); {code} is used now, and this is what we do in V1 patch. Change it to {code} conf = e.getConfiguration(); {code} then table test2 will not see the change that test1 did. Above all, we can use the v1 patch with a little modification that we just set the conf returned by {{CoprocessorEnvironment#getConfiguration()}}. And for PHOENIX-3271 that UPSART SELECT's write will still have higher priority. WDYT? Ping [~jamestaylor], [~enis], [~rajeshbabu]. > Secondary index configuration is wrong > -- > > Key: PHOENIX-3360 > URL: https://issues.apache.org/jira/browse/PHOENIX-3360 > Project: Phoenix > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Rajeshbabu Chintaguntla >Priority: Critical > Fix For: 4.10.0 > > Attachments: ConfCP.java, PHOENIX-3360.patch, PHOENIX-3360-v2.PATCH, > PHOENIX-3360-v3.PATCH > > > IndexRpcScheduler allocates some handler threads and uses a higher priority > for RPCs. The corresponding IndexRpcController is not used by default as it > is, but used through ServerRpcControllerFactory that we configure from Ambari > by default which sets the priority of the outgoing RPCs to either metadata > priority, or the index priority. > However, after reading code of IndexRpcController / ServerRpcController it > seems that the IndexRPCController DOES NOT look at whether the outgoing RPC > is for an Index table or not. It just sets ALL rpc priorities to be the index > priority. The intention seems to be the case that ONLY on servers, we > configure ServerRpcControllerFactory, and with clients we NEVER configure > ServerRpcControllerFactory, but instead use ClientRpcControllerFactory. We > configure ServerRpcControllerFactory from Ambari, which in affect makes it so > that ALL rpcs from Phoenix are only handled by the index handlers by default. > It means all deadlock cases are still there. > The documentation in https://phoenix.apache.org/secondary_indexing.html is > also wrong in this sense. It does not talk about server side / client side. > Plus this way of configuring different values is not how HBase configuration > is deployed. We cannot have the configuration show the > ServerRpcControllerFactory even only for server nodes, because the clients > running on those nodes will also see the wrong values. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3360) Secondary index configuration is wrong
[ https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] William Yang updated PHOENIX-3360: -- Attachment: ConfCP.java > Secondary index configuration is wrong > -- > > Key: PHOENIX-3360 > URL: https://issues.apache.org/jira/browse/PHOENIX-3360 > Project: Phoenix > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Rajeshbabu Chintaguntla >Priority: Critical > Fix For: 4.10.0 > > Attachments: ConfCP.java, PHOENIX-3360.patch, PHOENIX-3360-v2.PATCH, > PHOENIX-3360-v3.PATCH > > > IndexRpcScheduler allocates some handler threads and uses a higher priority > for RPCs. The corresponding IndexRpcController is not used by default as it > is, but used through ServerRpcControllerFactory that we configure from Ambari > by default which sets the priority of the outgoing RPCs to either metadata > priority, or the index priority. > However, after reading code of IndexRpcController / ServerRpcController it > seems that the IndexRPCController DOES NOT look at whether the outgoing RPC > is for an Index table or not. It just sets ALL rpc priorities to be the index > priority. The intention seems to be the case that ONLY on servers, we > configure ServerRpcControllerFactory, and with clients we NEVER configure > ServerRpcControllerFactory, but instead use ClientRpcControllerFactory. We > configure ServerRpcControllerFactory from Ambari, which in affect makes it so > that ALL rpcs from Phoenix are only handled by the index handlers by default. > It means all deadlock cases are still there. > The documentation in https://phoenix.apache.org/secondary_indexing.html is > also wrong in this sense. It does not talk about server side / client side. > Plus this way of configuring different values is not how HBase configuration > is deployed. We cannot have the configuration show the > ServerRpcControllerFactory even only for server nodes, because the clients > running on those nodes will also see the wrong values. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3662) PhoenixStorageHandler throws ClassCastException.
[ https://issues.apache.org/jira/browse/PHOENIX-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864902#comment-15864902 ] Jeongdae Kim commented on PHOENIX-3662: --- could anyone give me reviews for this patch? > PhoenixStorageHandler throws ClassCastException. > > > Key: PHOENIX-3662 > URL: https://issues.apache.org/jira/browse/PHOENIX-3662 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.9.0 >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim > Attachments: PHOENIX-3662.1.patch, PHOENIX-3662.2.patch > > > when executing a query that has between clauses embraced by function, phoenix > storage handler throws class cast exception like below. > and in addition, i found some bugs when handling push down predicates. > {code} > 2017-02-06T16:35:26,019 ERROR [7d29d400-2ec5-4ab8-84c2-041b55c3e24b > HiveServer2-Handler-Pool: Thread-57]: ql.Driver > (SessionState.java:printError(1097)) - FAILED: ClassCastException > org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc cannot be cast to > org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc > java.lang.ClassCastException: > org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc cannot be cast to > org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc > at > org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.processingBetweenOperator(IndexPredicateAnalyzer.java:229) > at > org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.analyzeExpr(IndexPredicateAnalyzer.java:369) > at > org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.access$000(IndexPredicateAnalyzer.java:72) > at > org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer$1.process(IndexPredicateAnalyzer.java:165) > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) > at > org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.analyzePredicate(IndexPredicateAnalyzer.java:176) > at > org.apache.phoenix.hive.ppd.PhoenixPredicateDecomposer.decomposePredicate(PhoenixPredicateDecomposer.java:63) > at > org.apache.phoenix.hive.PhoenixStorageHandler.decomposePredicate(PhoenixStorageHandler.java:238) > at > org.apache.hadoop.hive.ql.ppd.OpProcFactory.pushFilterToStorageHandler(OpProcFactory.java:1004) > at > org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:910) > at > org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:880) > at > org.apache.hadoop.hive.ql.ppd.OpProcFactory$TableScanPPD.process(OpProcFactory.java:429) > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) > at > org.apache.hadoop.hive.ql.ppd.SimplePredicatePushDown.transform(SimplePredicatePushDown.java:102) > at > org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:242) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10921) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:246) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:471) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1242) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1229) > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:191) > at > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276) > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:324) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:499) > at > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:486) > at >
[jira] [Commented] (PHOENIX-3536) Remove creating unnecessary phoenix connections in MR Tasks of Hive
[ https://issues.apache.org/jira/browse/PHOENIX-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864901#comment-15864901 ] Jeongdae Kim commented on PHOENIX-3536: --- failed tests are not related to this patch. could anyone give me reviews for this patch? > Remove creating unnecessary phoenix connections in MR Tasks of Hive > --- > > Key: PHOENIX-3536 > URL: https://issues.apache.org/jira/browse/PHOENIX-3536 > Project: Phoenix > Issue Type: Improvement >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim > Labels: HivePhoenix > Attachments: PHOENIX-3536.1.patch > > > PhoenixStorageHandler creates phoenix connections to make QueryPlan in > getSplit phase(prepare MR) and getRecordReader phase(Map) while running MR > Job. > in phoenix, it spends too many times to create the first phoenix > connection(QueryServices) for specific URL. (checking and loading phoenix > schema information) > i found it is possible to remove creating query plan again in Map > phase(getRecordReader()) by serializing QueryPlan created from Input format > ans passing this plan to record reader. > this approach improves scan performance by removing trying to unnecessary > connection in map phase. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3515) CsvLineParser Improvement
[ https://issues.apache.org/jira/browse/PHOENIX-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864900#comment-15864900 ] Jeongdae Kim commented on PHOENIX-3515: --- failed tests are not related to this patch. could anyone give me reviews for this patch? > CsvLineParser Improvement > - > > Key: PHOENIX-3515 > URL: https://issues.apache.org/jira/browse/PHOENIX-3515 > Project: Phoenix > Issue Type: Improvement >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim >Priority: Minor > Attachments: PHOENIX-3515.1.patch > > > CsvLineParser creates a new parser(apache commons CSVParser) every single > line. it seems terribly inefficient. > I improved this issue by adding a new string reader to create a parser once > and use this for all lines. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3512) PhoenixStorageHandler makes erroneous query string when handling between clauses with date constants.
[ https://issues.apache.org/jira/browse/PHOENIX-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864898#comment-15864898 ] Jeongdae Kim commented on PHOENIX-3512: --- failed tests are not related to this patch. could anyone give me reviews for this patch? > PhoenixStorageHandler makes erroneous query string when handling between > clauses with date constants. > - > > Key: PHOENIX-3512 > URL: https://issues.apache.org/jira/browse/PHOENIX-3512 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.8.0 >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim > Labels: HivePhoenix > Attachments: PHOENIX-3512.patch > > > ex) l_shipdate BETWEEN '1992-01-02' AND '1992-02-02' --> l_shipdate between > to_date('69427800') and to_date('69695640') -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3486) RoundRobinResultIterator doesn't work correctly because of setting Scan's cache size inappropriately in PhoenixInputForamt
[ https://issues.apache.org/jira/browse/PHOENIX-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864897#comment-15864897 ] Jeongdae Kim commented on PHOENIX-3486: --- failed tests are not related to this patch. could anyone give me reviews for this patch? > RoundRobinResultIterator doesn't work correctly because of setting Scan's > cache size inappropriately in PhoenixInputForamt > -- > > Key: PHOENIX-3486 > URL: https://issues.apache.org/jira/browse/PHOENIX-3486 > Project: Phoenix > Issue Type: Bug >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim > Labels: HivePhoenix > Attachments: PHOENIX-3486.patch > > > RoundRobinResultIterator uses "hbase.client.scanner.caching" to fill caches > in parallel for all scans, but by setting Scan.setCaching() in > PhoenixInputForrmat(phoenix-hive), RoundRobinResultIterator doesn't work > correctly, because if Scan have cache size by setCaching(), HBase uses cache > size from Scan.getCaching() to fill cache, not > "hbase.client.scanner.caching", and RoundRobinResultIterator scans the table > in parallel to fill caches every "hbase.client.scanner.caching", resulting in > unintended parallel scan operation, this causes scan performance degradation. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3503) PhoenixStorageHandler doesn't work properly when execution engine of Hive is Tez.
[ https://issues.apache.org/jira/browse/PHOENIX-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864899#comment-15864899 ] Jeongdae Kim commented on PHOENIX-3503: --- failed tests are not related to this patch. could anyone give me reviews for this patch? > PhoenixStorageHandler doesn't work properly when execution engine of Hive is > Tez. > -- > > Key: PHOENIX-3503 > URL: https://issues.apache.org/jira/browse/PHOENIX-3503 > Project: Phoenix > Issue Type: Bug >Reporter: Jeongdae Kim >Assignee: Jeongdae Kim > Labels: HivePhoenix > Attachments: PHOENIX-3503.patch > > > Hive storage handler can't parse some column types that have > parameters(length, precision, scale...) from serdeConstants.LIST_COLUMN_TYPES > correctly, when execution engine of Hive is Tez. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3668) Resolve Date/Time/Timestamp incompatibility in bind variables
[ https://issues.apache.org/jira/browse/PHOENIX-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue updated PHOENIX-3668: - Labels: calcite (was: ) > Resolve Date/Time/Timestamp incompatibility in bind variables > - > > Key: PHOENIX-3668 > URL: https://issues.apache.org/jira/browse/PHOENIX-3668 > Project: Phoenix > Issue Type: Sub-task >Reporter: Maryann Xue >Assignee: Maryann Xue > Labels: calcite > > Avatica TypedValue converted Date and Time object to integer values and > meanwhile takes the local time as the input for conversion. So we need to > adjust the Date/Time/Timestamp object value before setting the bind parameter. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (PHOENIX-3669) YEAR/MONTH/DAY/HOUR/MINUTES/SECOND built-in functions do not work in Calcite-Phoenix
Maryann Xue created PHOENIX-3669: Summary: YEAR/MONTH/DAY/HOUR/MINUTES/SECOND built-in functions do not work in Calcite-Phoenix Key: PHOENIX-3669 URL: https://issues.apache.org/jira/browse/PHOENIX-3669 Project: Phoenix Issue Type: Bug Reporter: Maryann Xue Assignee: Maryann Xue Calcite rewrites these functions as the EXTRACT function and Phoenix does not implement EXTRACT function yet. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (PHOENIX-3668) Resolve Date/Time/Timestamp incompatibility in bind variables
[ https://issues.apache.org/jira/browse/PHOENIX-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue resolved PHOENIX-3668. -- Resolution: Fixed > Resolve Date/Time/Timestamp incompatibility in bind variables > - > > Key: PHOENIX-3668 > URL: https://issues.apache.org/jira/browse/PHOENIX-3668 > Project: Phoenix > Issue Type: Sub-task >Reporter: Maryann Xue >Assignee: Maryann Xue > > Avatica TypedValue converted Date and Time object to integer values and > meanwhile takes the local time as the input for conversion. So we need to > adjust the Date/Time/Timestamp object value before setting the bind parameter. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (PHOENIX-3668) Resolve Date/Time/Timestamp incompatibility in bind variables
Maryann Xue created PHOENIX-3668: Summary: Resolve Date/Time/Timestamp incompatibility in bind variables Key: PHOENIX-3668 URL: https://issues.apache.org/jira/browse/PHOENIX-3668 Project: Phoenix Issue Type: Sub-task Reporter: Maryann Xue Assignee: Maryann Xue Avatica TypedValue converted Date and Time object to integer values and meanwhile takes the local time as the input for conversion. So we need to adjust the Date/Time/Timestamp object value before setting the bind parameter. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (PHOENIX-3640) Upgrading from 4.8 or before to encodecolumns2 branch fails
[ https://issues.apache.org/jira/browse/PHOENIX-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain resolved PHOENIX-3640. --- Resolution: Fixed > Upgrading from 4.8 or before to encodecolumns2 branch fails > --- > > Key: PHOENIX-3640 > URL: https://issues.apache.org/jira/browse/PHOENIX-3640 > Project: Phoenix > Issue Type: Sub-task >Reporter: Samarth Jain >Assignee: Samarth Jain > Attachments: PHOENIX-3640.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain resolved PHOENIX-3666. --- Resolution: Fixed > Make use of EncodedColumnQualifierCellsList for all column name mapping > schemes > --- > > Key: PHOENIX-3666 > URL: https://issues.apache.org/jira/browse/PHOENIX-3666 > Project: Phoenix > Issue Type: Sub-task >Reporter: Samarth Jain >Assignee: Samarth Jain > Attachments: PHOENIX-3666.patch, PHOENIX-3666_wip.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3654) Load Balancer for thin client
[ https://issues.apache.org/jira/browse/PHOENIX-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864717#comment-15864717 ] James Taylor commented on PHOENIX-3654: --- +1 to having a high level design doc to discuss. I think we could have an interface-based solution through which ZK would be one implementation if we want to have a more indirect ZK dependency. > Load Balancer for thin client > - > > Key: PHOENIX-3654 > URL: https://issues.apache.org/jira/browse/PHOENIX-3654 > Project: Phoenix > Issue Type: New Feature >Affects Versions: 4.8.0 > Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98 >Reporter: Rahul Shrivastava > Fix For: 4.9.0 > > Original Estimate: 240h > Remaining Estimate: 240h > > We have been having internal discussion on load balancer for thin client for > PQS. The general consensus we have is to have an embedded load balancer with > the thin client instead of using external load balancer such as haproxy. The > idea is to not to have another layer between client and PQS. This reduces > operational cost for system, which currently leads to delay in executing > projects. > But this also comes with challenge of having an embedded load balancer which > can maintain sticky sessions, do fair load balancing knowing the load > downstream of PQS server. In addition, load balancer needs to know location > of multiple PQS server. Now, the thin client needs to keep track of PQS > servers via zookeeper ( or other means). > In the new design, the client ( PQS client) , it is proposed to have an > embedded load balancer. > Where will the load Balancer sit ? > The load load balancer will embedded within the app server client. > How will the load balancer work ? > Load balancer will contact zookeeper to get location of PQS. In this case, > PQS needs to register to ZK itself once it comes online. Zookeeper location > is in hbase-site.xml. It will maintain a small cache of connection to the > PQS. When a request comes in, it will check for an open connection from the > cache. > How will load balancer know load on PQS ? > To start with, it will pick a random open connection to PQS. This means that > load balancer does not know PQS load. Later , we can augment the code so that > thin client can receive load info from PQS and make intelligent decisions. > How will load balancer maintain sticky sessions ? > While we still need to investigate how to implement sticky sessions. We can > look for some open source implementation for the same. > How will PQS register itself to service locator ? > PQS will have location of zookeeper in hbase-site.xml and it would register > itself to the zookeeper. Thin client will find out PQS location using > zookeeper. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: [DISCUSS] Some licensing issues to resolve before the next release
For the other issue, there's no reason not to move up to more recent minors of those HBase releases without the dependency problem as long as we don't detect a regression by doing so. On Thu, Feb 9, 2017 at 1:10 PM, Josh Elserwrote: > Sweetness. Thanks for taking that on! > > > Josh Mahonin wrote: > >> Re: the flume dependency, I suspect we can swap out the org.json:json >> dependency with com.tdunning:json without too much pain. I've assigned >> PHOENIX-3658 to myself to look at, will try and attend to it in the next >> week. >> >> https://github.com/tdunning/open-json >> >> >> On Thu, Feb 9, 2017 at 12:10 PM, Josh Elser wrote: >> >> See https://issues.apache.org/jira/browse/PHOENIX-3658 and >>> https://issues.apache.org/jira/browse/PHOENIX-3659 for the full details. >>> >>> The summary is that I noticed two dependencies that we're including (one >>> direct, one transitive) that are disallowed. >>> >>> The direct dependency (org.json:json by phoenix-flume) is technically >>> "ok" >>> but only until 2017/04/30 when the grace-period expires. Essentially, >>> we've >>> used up half of the time allotted to fix this one already ;) >>> >>> The latter is one that we inherited from HBase. We can address it by >>> bumping the 1.1 and 1.2 hbase version -- but I'd be interested in hearing >>> if others have opinions on whether we do that or try to surgically remove >>> the dependency from our bundling. >>> >>> - Josh >>> >>> >> -- Best regards, - Andy If you are given a choice, you believe you have acted freely. - Raymond Teller (via Peter Watts)
[jira] [Commented] (PHOENIX-3654) Load Balancer for thin client
[ https://issues.apache.org/jira/browse/PHOENIX-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864715#comment-15864715 ] Josh Elser commented on PHOENIX-3654: - bq. You can set a read-only ACL that doesn't need auth. Yup, AFAIK, that's not a big deal. bq. You can build a service discovery mechanism backed by ZooKeeper yet providing its own client facing API that is not kerberized. ANd so on. Yes! This is ultimately what I'd like to see some more thought put into. There are _tons_ of options that could be leveraged. Would be nice to see some simple pros/cons laid out so we can back up why one was chosen over others :) > Load Balancer for thin client > - > > Key: PHOENIX-3654 > URL: https://issues.apache.org/jira/browse/PHOENIX-3654 > Project: Phoenix > Issue Type: New Feature >Affects Versions: 4.8.0 > Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98 >Reporter: Rahul Shrivastava > Fix For: 4.9.0 > > Original Estimate: 240h > Remaining Estimate: 240h > > We have been having internal discussion on load balancer for thin client for > PQS. The general consensus we have is to have an embedded load balancer with > the thin client instead of using external load balancer such as haproxy. The > idea is to not to have another layer between client and PQS. This reduces > operational cost for system, which currently leads to delay in executing > projects. > But this also comes with challenge of having an embedded load balancer which > can maintain sticky sessions, do fair load balancing knowing the load > downstream of PQS server. In addition, load balancer needs to know location > of multiple PQS server. Now, the thin client needs to keep track of PQS > servers via zookeeper ( or other means). > In the new design, the client ( PQS client) , it is proposed to have an > embedded load balancer. > Where will the load Balancer sit ? > The load load balancer will embedded within the app server client. > How will the load balancer work ? > Load balancer will contact zookeeper to get location of PQS. In this case, > PQS needs to register to ZK itself once it comes online. Zookeeper location > is in hbase-site.xml. It will maintain a small cache of connection to the > PQS. When a request comes in, it will check for an open connection from the > cache. > How will load balancer know load on PQS ? > To start with, it will pick a random open connection to PQS. This means that > load balancer does not know PQS load. Later , we can augment the code so that > thin client can receive load info from PQS and make intelligent decisions. > How will load balancer maintain sticky sessions ? > While we still need to investigate how to implement sticky sessions. We can > look for some open source implementation for the same. > How will PQS register itself to service locator ? > PQS will have location of zookeeper in hbase-site.xml and it would register > itself to the zookeeper. Thin client will find out PQS location using > zookeeper. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (PHOENIX-3661) Make phoenix tool select file system dynamically
[ https://issues.apache.org/jira/browse/PHOENIX-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell reassigned PHOENIX-3661: --- Resolution: Fixed Assignee: Yishan Yang Fix Version/s: 4.10.0 Committed. > Make phoenix tool select file system dynamically > > > Key: PHOENIX-3661 > URL: https://issues.apache.org/jira/browse/PHOENIX-3661 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0, 4.8.0 >Reporter: Yishan Yang >Assignee: Yishan Yang > Fix For: 4.10.0 > > Attachments: phoenix-3661-1.patch > > > Phoenix indexing tool assume that the root directory is the default Hadoop > FileSystem. With this patch, > phoenix index tool will get file system dynamically which will prevent “Wrong > FileSystem” errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (PHOENIX-3654) Load Balancer for thin client
[ https://issues.apache.org/jira/browse/PHOENIX-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864701#comment-15864701 ] Andrew Purtell edited comment on PHOENIX-3654 at 2/13/17 11:50 PM: --- bq. I meant ZK ACLs help ensure the PQS instances are able to register themselves in a trusted location which clients can then refer to. Oh sure that makes sense. How the client discovers PQS endpoints registered by ZooKeeper without requiring SASL auth is an interesting question but there are a couple of options. You can set a read-only ACL that doesn't need auth. You can build a service discovery mechanism backed by ZooKeeper yet providing its own client facing API that is not kerberized. ANd so on. was (Author: apurtell): bq. I meant ZK ACLs help ensure the PQS instances are able to register themselves in a trusted location which clients can then refer to. Oh sure that makes sense. > Load Balancer for thin client > - > > Key: PHOENIX-3654 > URL: https://issues.apache.org/jira/browse/PHOENIX-3654 > Project: Phoenix > Issue Type: New Feature >Affects Versions: 4.8.0 > Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98 >Reporter: Rahul Shrivastava > Fix For: 4.9.0 > > Original Estimate: 240h > Remaining Estimate: 240h > > We have been having internal discussion on load balancer for thin client for > PQS. The general consensus we have is to have an embedded load balancer with > the thin client instead of using external load balancer such as haproxy. The > idea is to not to have another layer between client and PQS. This reduces > operational cost for system, which currently leads to delay in executing > projects. > But this also comes with challenge of having an embedded load balancer which > can maintain sticky sessions, do fair load balancing knowing the load > downstream of PQS server. In addition, load balancer needs to know location > of multiple PQS server. Now, the thin client needs to keep track of PQS > servers via zookeeper ( or other means). > In the new design, the client ( PQS client) , it is proposed to have an > embedded load balancer. > Where will the load Balancer sit ? > The load load balancer will embedded within the app server client. > How will the load balancer work ? > Load balancer will contact zookeeper to get location of PQS. In this case, > PQS needs to register to ZK itself once it comes online. Zookeeper location > is in hbase-site.xml. It will maintain a small cache of connection to the > PQS. When a request comes in, it will check for an open connection from the > cache. > How will load balancer know load on PQS ? > To start with, it will pick a random open connection to PQS. This means that > load balancer does not know PQS load. Later , we can augment the code so that > thin client can receive load info from PQS and make intelligent decisions. > How will load balancer maintain sticky sessions ? > While we still need to investigate how to implement sticky sessions. We can > look for some open source implementation for the same. > How will PQS register itself to service locator ? > PQS will have location of zookeeper in hbase-site.xml and it would register > itself to the zookeeper. Thin client will find out PQS location using > zookeeper. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3654) Load Balancer for thin client
[ https://issues.apache.org/jira/browse/PHOENIX-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864701#comment-15864701 ] Andrew Purtell commented on PHOENIX-3654: - bq. I meant ZK ACLs help ensure the PQS instances are able to register themselves in a trusted location which clients can then refer to. Oh sure that makes sense. > Load Balancer for thin client > - > > Key: PHOENIX-3654 > URL: https://issues.apache.org/jira/browse/PHOENIX-3654 > Project: Phoenix > Issue Type: New Feature >Affects Versions: 4.8.0 > Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98 >Reporter: Rahul Shrivastava > Fix For: 4.9.0 > > Original Estimate: 240h > Remaining Estimate: 240h > > We have been having internal discussion on load balancer for thin client for > PQS. The general consensus we have is to have an embedded load balancer with > the thin client instead of using external load balancer such as haproxy. The > idea is to not to have another layer between client and PQS. This reduces > operational cost for system, which currently leads to delay in executing > projects. > But this also comes with challenge of having an embedded load balancer which > can maintain sticky sessions, do fair load balancing knowing the load > downstream of PQS server. In addition, load balancer needs to know location > of multiple PQS server. Now, the thin client needs to keep track of PQS > servers via zookeeper ( or other means). > In the new design, the client ( PQS client) , it is proposed to have an > embedded load balancer. > Where will the load Balancer sit ? > The load load balancer will embedded within the app server client. > How will the load balancer work ? > Load balancer will contact zookeeper to get location of PQS. In this case, > PQS needs to register to ZK itself once it comes online. Zookeeper location > is in hbase-site.xml. It will maintain a small cache of connection to the > PQS. When a request comes in, it will check for an open connection from the > cache. > How will load balancer know load on PQS ? > To start with, it will pick a random open connection to PQS. This means that > load balancer does not know PQS load. Later , we can augment the code so that > thin client can receive load info from PQS and make intelligent decisions. > How will load balancer maintain sticky sessions ? > While we still need to investigate how to implement sticky sessions. We can > look for some open source implementation for the same. > How will PQS register itself to service locator ? > PQS will have location of zookeeper in hbase-site.xml and it would register > itself to the zookeeper. Thin client will find out PQS location using > zookeeper. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864680#comment-15864680 ] James Taylor commented on PHOENIX-3666: --- +1. That's a reasonable solution, [~samarthjain]. > Make use of EncodedColumnQualifierCellsList for all column name mapping > schemes > --- > > Key: PHOENIX-3666 > URL: https://issues.apache.org/jira/browse/PHOENIX-3666 > Project: Phoenix > Issue Type: Sub-task >Reporter: Samarth Jain >Assignee: Samarth Jain > Attachments: PHOENIX-3666.patch, PHOENIX-3666_wip.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3585) MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail for transactional tables and local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864670#comment-15864670 ] James Taylor commented on PHOENIX-3585: --- [~rajeshbabu] - can't the LocalIndexStoreFileScanner delegate to the InternalScanner for it's next calls? The alternative is to not allow local indexes on transactional tables which would be a shame. The current logic would be pretty disastrous as I think the local index would become corrupt, no? > MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail > for transactional tables and local indexes > > > Key: PHOENIX-3585 > URL: https://issues.apache.org/jira/browse/PHOENIX-3585 > Project: Phoenix > Issue Type: Bug >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Attachments: diff.patch > > > the tests fail if we use HDFSTransactionStateStorage instead of > InMemoryTransactionStateStorage when we create the TransactionManager in > BaseTest -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3655) Metrics for PQS
[ https://issues.apache.org/jira/browse/PHOENIX-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864668#comment-15864668 ] Josh Elser commented on PHOENIX-3655: - bq. we want the PQS driver to export the same metrics through the same mechanism(s) as the fat driver. That way we can swap one for the other with minimal operational changes including visibility into operations via metrics. Makes sense. I've spent a lot (too much?) time thinking about this from the Avatica standpoint (understanding the perf/characteristics of Avatica, regardless of database), so I may be conflating what [~rahulshrivastava] is planning with the big picture of what I'd like to see :) If the goal is to just expose the thick-driver's metrics via PQS, this one should be pretty easy. If we want to go farther and really understand the rest of the picture, it gets trickier pretty fast :) > Metrics for PQS > --- > > Key: PHOENIX-3655 > URL: https://issues.apache.org/jira/browse/PHOENIX-3655 > Project: Phoenix > Issue Type: New Feature >Affects Versions: 4.8.0 > Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98 >Reporter: Rahul Shrivastava > Fix For: 4.9.0 > > Original Estimate: 240h > Remaining Estimate: 240h > > Phoenix Query Server runs a separate process compared to its thin client. > Metrics collection is currently done by PhoenixRuntime.java i.e. at Phoenix > driver level. We need the following > 1. For every jdbc statement/prepared statement/ run by PQS , we need > capability to collect metrics at PQS level and push the data to external sink > i.e. file, JMX , other external custom sources. > 2. Besides this global metrics could be periodically collected and pushed to > the sink. > 2. PQS can be configured to turn on metrics collection and type of collect ( > runtime or global) via hbase-site.xml > 3. Sink could be configured via an interface in hbase-site.xml. > All metrics definition https://phoenix.apache.org/metrics.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3661) Make phoenix tool select file system dynamically
[ https://issues.apache.org/jira/browse/PHOENIX-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864667#comment-15864667 ] Zach York commented on PHOENIX-3661: Thanks for the quick review guys! > Make phoenix tool select file system dynamically > > > Key: PHOENIX-3661 > URL: https://issues.apache.org/jira/browse/PHOENIX-3661 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0, 4.8.0 >Reporter: Yishan Yang > Attachments: phoenix-3661-1.patch > > > Phoenix indexing tool assume that the root directory is the default Hadoop > FileSystem. With this patch, > phoenix index tool will get file system dynamically which will prevent “Wrong > FileSystem” errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3661) Make phoenix tool select file system dynamically
[ https://issues.apache.org/jira/browse/PHOENIX-3661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864654#comment-15864654 ] Andrew Purtell commented on PHOENIX-3661: - Me too, I'll commit now > Make phoenix tool select file system dynamically > > > Key: PHOENIX-3661 > URL: https://issues.apache.org/jira/browse/PHOENIX-3661 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0, 4.8.0 >Reporter: Yishan Yang > Attachments: phoenix-3661-1.patch > > > Phoenix indexing tool assume that the root directory is the default Hadoop > FileSystem. With this patch, > phoenix index tool will get file system dynamically which will prevent “Wrong > FileSystem” errors. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3655) Metrics for PQS
[ https://issues.apache.org/jira/browse/PHOENIX-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864651#comment-15864651 ] Andrew Purtell commented on PHOENIX-3655: - I think wherever it makes sense, we want the PQS driver to export the same metrics through the same mechanism(s) as the fat driver. That way we can swap one for the other with minimal operational changes including visibility into operations via metrics. > Metrics for PQS > --- > > Key: PHOENIX-3655 > URL: https://issues.apache.org/jira/browse/PHOENIX-3655 > Project: Phoenix > Issue Type: New Feature >Affects Versions: 4.8.0 > Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98 >Reporter: Rahul Shrivastava > Fix For: 4.9.0 > > Original Estimate: 240h > Remaining Estimate: 240h > > Phoenix Query Server runs a separate process compared to its thin client. > Metrics collection is currently done by PhoenixRuntime.java i.e. at Phoenix > driver level. We need the following > 1. For every jdbc statement/prepared statement/ run by PQS , we need > capability to collect metrics at PQS level and push the data to external sink > i.e. file, JMX , other external custom sources. > 2. Besides this global metrics could be periodically collected and pushed to > the sink. > 2. PQS can be configured to turn on metrics collection and type of collect ( > runtime or global) via hbase-site.xml > 3. Sink could be configured via an interface in hbase-site.xml. > All metrics definition https://phoenix.apache.org/metrics.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (PHOENIX-3654) Load Balancer for thin client
[ https://issues.apache.org/jira/browse/PHOENIX-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864647#comment-15864647 ] Andrew Purtell edited comment on PHOENIX-3654 at 2/13/17 11:18 PM: --- bq. On the security of malicious PQS, kerborzing the PQS and ZK, will probably help the situation. bq. Kerberos and ZK ACLs should give us sufficient control to solve the problem FWIW we'd like to use the PQS as a fulcrum to switch away from Kerberos auth to TLS auth to, eventually, avoid any client having to deal with Kerberos. was (Author: apurtell): bq. Kerberos and ZK ACLs should give us sufficient control to solve the problem FWIW we'd like to use the PQS as a fulcrum to switch away from Kerberos auth to TLS auth to, eventually, avoid any client having to deal with Kerberos. > Load Balancer for thin client > - > > Key: PHOENIX-3654 > URL: https://issues.apache.org/jira/browse/PHOENIX-3654 > Project: Phoenix > Issue Type: New Feature >Affects Versions: 4.8.0 > Environment: Linux 3.13.0-107-generic kernel, v4.9.0-HBase-0.98 >Reporter: Rahul Shrivastava > Fix For: 4.9.0 > > Original Estimate: 240h > Remaining Estimate: 240h > > We have been having internal discussion on load balancer for thin client for > PQS. The general consensus we have is to have an embedded load balancer with > the thin client instead of using external load balancer such as haproxy. The > idea is to not to have another layer between client and PQS. This reduces > operational cost for system, which currently leads to delay in executing > projects. > But this also comes with challenge of having an embedded load balancer which > can maintain sticky sessions, do fair load balancing knowing the load > downstream of PQS server. In addition, load balancer needs to know location > of multiple PQS server. Now, the thin client needs to keep track of PQS > servers via zookeeper ( or other means). > In the new design, the client ( PQS client) , it is proposed to have an > embedded load balancer. > Where will the load Balancer sit ? > The load load balancer will embedded within the app server client. > How will the load balancer work ? > Load balancer will contact zookeeper to get location of PQS. In this case, > PQS needs to register to ZK itself once it comes online. Zookeeper location > is in hbase-site.xml. It will maintain a small cache of connection to the > PQS. When a request comes in, it will check for an open connection from the > cache. > How will load balancer know load on PQS ? > To start with, it will pick a random open connection to PQS. This means that > load balancer does not know PQS load. Later , we can augment the code so that > thin client can receive load info from PQS and make intelligent decisions. > How will load balancer maintain sticky sessions ? > While we still need to investigate how to implement sticky sessions. We can > look for some open source implementation for the same. > How will PQS register itself to service locator ? > PQS will have location of zookeeper in hbase-site.xml and it would register > itself to the zookeeper. Thin client will find out PQS location using > zookeeper. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3667) Optimize BooleanExpressionFilter for tables with encoded columns
[ https://issues.apache.org/jira/browse/PHOENIX-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-3667: -- Description: The client side of Phoenix determines the subclass of BooleanExpressionFilter we use based on how many column families and column qualifiers are being referenced. The idea is to minimize the lookup cost during filter evaluation. For encoded columns, instead of using a Map or Set, we can create a few new subclasses of BooleanExpressionFilter that use an array instead. No need for any lookups or equality checks - just fill in the position based on the column qualifier value instead. Since filters are applied on every row between the start/stop key, this will improve performance quite a bit. (was: The client side of Phoenix determines the subclass of BooleanExpressionFilter we use based on how many column families and column qualifiers are being referenced. The idea is to minimize the lookup cost during filter evaluation. For encoded columns, instead of using a Map or Set, we can use an array. No need for any lookups or equality checks - just fill in the position based on the column qualifier value instead. Since filters are applied on every row between the start/stop key, this will help quite a bit.) > Optimize BooleanExpressionFilter for tables with encoded columns > > > Key: PHOENIX-3667 > URL: https://issues.apache.org/jira/browse/PHOENIX-3667 > Project: Phoenix > Issue Type: Improvement >Reporter: James Taylor >Assignee: Samarth Jain > > The client side of Phoenix determines the subclass of BooleanExpressionFilter > we use based on how many column families and column qualifiers are being > referenced. The idea is to minimize the lookup cost during filter evaluation. > For encoded columns, instead of using a Map or Set, we can create a few new > subclasses of BooleanExpressionFilter that use an array instead. No need for > any lookups or equality checks - just fill in the position based on the > column qualifier value instead. Since filters are applied on every row > between the start/stop key, this will improve performance quite a bit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (PHOENIX-3667) Optimize BooleanExpressionFilter for tables with encoded columns
James Taylor created PHOENIX-3667: - Summary: Optimize BooleanExpressionFilter for tables with encoded columns Key: PHOENIX-3667 URL: https://issues.apache.org/jira/browse/PHOENIX-3667 Project: Phoenix Issue Type: Improvement Reporter: James Taylor Assignee: Samarth Jain The client side of Phoenix determines the subclass of BooleanExpressionFilter we use based on how many column families and column qualifiers are being referenced. The idea is to minimize the lookup cost during filter evaluation. For encoded columns, instead of using a Map or Set, we can use an array. No need for any lookups or equality checks - just fill in the position based on the column qualifier value instead. Since filters are applied on every row between the start/stop key, this will help quite a bit. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3585) MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail for transactional tables and local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864635#comment-15864635 ] Thomas D'Silva commented on PHOENIX-3585: - [~rajeshbabu] Do you know how we can combine the InternalScanner that is passed into IndexHalfStoreFileReaderGenerator.preCompactScannerOpen() with the scanner that it creates? > MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail > for transactional tables and local indexes > > > Key: PHOENIX-3585 > URL: https://issues.apache.org/jira/browse/PHOENIX-3585 > Project: Phoenix > Issue Type: Bug >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Attachments: diff.patch > > > the tests fail if we use HDFSTransactionStateStorage instead of > InMemoryTransactionStateStorage when we create the TransactionManager in > BaseTest -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3660) Don't pass statement properties while adding columns to a table that already exists that had APPEND_ONLY_SCHEMA=true
[ https://issues.apache.org/jira/browse/PHOENIX-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva updated PHOENIX-3660: Fix Version/s: 4.10.0 > Don't pass statement properties while adding columns to a table that already > exists that had APPEND_ONLY_SCHEMA=true > > > Key: PHOENIX-3660 > URL: https://issues.apache.org/jira/browse/PHOENIX-3660 > Project: Phoenix > Issue Type: Bug >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Fix For: 4.10.0 > > Attachments: PHOENIX-3660.patch > > > If the table has APPEND_ONLY_SCHEMA set to true, we should only add new > columns and ignore any supplied properties. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3660) Don't pass statement properties while adding columns to a table that already exists that had APPEND_ONLY_SCHEMA=true
[ https://issues.apache.org/jira/browse/PHOENIX-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864533#comment-15864533 ] Samarth Jain commented on PHOENIX-3660: --- +1 > Don't pass statement properties while adding columns to a table that already > exists that had APPEND_ONLY_SCHEMA=true > > > Key: PHOENIX-3660 > URL: https://issues.apache.org/jira/browse/PHOENIX-3660 > Project: Phoenix > Issue Type: Bug >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Attachments: PHOENIX-3660.patch > > > If the table has APPEND_ONLY_SCHEMA set to true, we should only add new > columns and ignore any supplied properties. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (PHOENIX-2051) Link record is in the format CHILD-PARENT for phoenix views and it has to scan the entire table to find the parent suffix.
[ https://issues.apache.org/jira/browse/PHOENIX-2051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva reassigned PHOENIX-2051: --- Assignee: Thomas D'Silva > Link record is in the format CHILD-PARENT for phoenix views and it has to > scan the entire table to find the parent suffix. > -- > > Key: PHOENIX-2051 > URL: https://issues.apache.org/jira/browse/PHOENIX-2051 > Project: Phoenix > Issue Type: Sub-task >Affects Versions: 4.3.1 >Reporter: Arun Kumaran Sabtharishi >Assignee: Thomas D'Silva > > When a phoenix view is dropped, it runs a scan on the SYSTEM.CATALOG table > looking for the link record. Since the link record is in the format > CHILD-PARENT, it has to scan the entire table to find the parent suffix. For > the long term solution, we can write two link records, the existing > CHILD-PARENT and a new PARENT-CHILD so that the findChildViews() method can > use a key range scan. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3660) Don't pass statement properties while adding columns to a table that already exists that had APPEND_ONLY_SCHEMA=true
[ https://issues.apache.org/jira/browse/PHOENIX-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva updated PHOENIX-3660: Attachment: PHOENIX-3660.patch > Don't pass statement properties while adding columns to a table that already > exists that had APPEND_ONLY_SCHEMA=true > > > Key: PHOENIX-3660 > URL: https://issues.apache.org/jira/browse/PHOENIX-3660 > Project: Phoenix > Issue Type: Bug >Reporter: Thomas D'Silva >Assignee: Thomas D'Silva > Attachments: PHOENIX-3660.patch > > > If the table has APPEND_ONLY_SCHEMA set to true, we should only add new > columns and ignore any supplied properties. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3360) Secondary index configuration is wrong
[ https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864347#comment-15864347 ] James Taylor commented on PHOENIX-3360: --- Thanks for checking that out, [~enis]. So it sounds like [~rajeshbabu]'s patch is the way to go, right? > Secondary index configuration is wrong > -- > > Key: PHOENIX-3360 > URL: https://issues.apache.org/jira/browse/PHOENIX-3360 > Project: Phoenix > Issue Type: Bug >Reporter: Enis Soztutar >Assignee: Rajeshbabu Chintaguntla >Priority: Critical > Fix For: 4.10.0 > > Attachments: PHOENIX-3360.patch, PHOENIX-3360-v2.PATCH, > PHOENIX-3360-v3.PATCH > > > IndexRpcScheduler allocates some handler threads and uses a higher priority > for RPCs. The corresponding IndexRpcController is not used by default as it > is, but used through ServerRpcControllerFactory that we configure from Ambari > by default which sets the priority of the outgoing RPCs to either metadata > priority, or the index priority. > However, after reading code of IndexRpcController / ServerRpcController it > seems that the IndexRPCController DOES NOT look at whether the outgoing RPC > is for an Index table or not. It just sets ALL rpc priorities to be the index > priority. The intention seems to be the case that ONLY on servers, we > configure ServerRpcControllerFactory, and with clients we NEVER configure > ServerRpcControllerFactory, but instead use ClientRpcControllerFactory. We > configure ServerRpcControllerFactory from Ambari, which in affect makes it so > that ALL rpcs from Phoenix are only handled by the index handlers by default. > It means all deadlock cases are still there. > The documentation in https://phoenix.apache.org/secondary_indexing.html is > also wrong in this sense. It does not talk about server side / client side. > Plus this way of configuring different values is not how HBase configuration > is deployed. We cannot have the configuration show the > ServerRpcControllerFactory even only for server nodes, because the clients > running on those nodes will also see the wrong values. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864340#comment-15864340 ] James Taylor commented on PHOENIX-3666: --- I wouldn't want to impact performance with an extra sort (as perf is one of the main reasons we're doing this). I think using 2 bytes is reasonable as if you need more than 65K columns the sparseness is going to end up being a problem. The client-side should be able to use the number of encoded bytes of the concrete PTable for the length of the column qualifier in any reserved column qualifiers. If you're stuck on a subquery issue, I'd ping [~maryannxue] and if you're stuck on a local index issue, I'd ping [~rajeshbabu]. I don't think we should wait any longer to figure it out for 4.10. Not surfacing the setting of the number of bytes is fine for now. > Make use of EncodedColumnQualifierCellsList for all column name mapping > schemes > --- > > Key: PHOENIX-3666 > URL: https://issues.apache.org/jira/browse/PHOENIX-3666 > Project: Phoenix > Issue Type: Sub-task >Reporter: Samarth Jain >Assignee: Samarth Jain > Attachments: PHOENIX-3666_wip.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (PHOENIX-3446) Parameterize tests for different encoding and storage schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva resolved PHOENIX-3446. - Resolution: Fixed > Parameterize tests for different encoding and storage schemes > - > > Key: PHOENIX-3446 > URL: https://issues.apache.org/jira/browse/PHOENIX-3446 > Project: Phoenix > Issue Type: Sub-task >Reporter: Samarth Jain >Assignee: Thomas D'Silva > Attachments: PHOENIX-3446.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3446) Parameterize tests for different encoding and storage schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864287#comment-15864287 ] Samarth Jain commented on PHOENIX-3446: --- +, looks great. Thanks, Thomas! > Parameterize tests for different encoding and storage schemes > - > > Key: PHOENIX-3446 > URL: https://issues.apache.org/jira/browse/PHOENIX-3446 > Project: Phoenix > Issue Type: Sub-task >Reporter: Samarth Jain >Assignee: Thomas D'Silva > Attachments: PHOENIX-3446.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3572) Support FETCH NEXT| n ROWS from Cursor
[ https://issues.apache.org/jira/browse/PHOENIX-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864210#comment-15864210 ] ASF GitHub Bot commented on PHOENIX-3572: - Github user bijugs commented on the issue: https://github.com/apache/phoenix/pull/229 @ankitsinghal, Thanks for the review comments. I have made the changes for the comments. Will rebase the code to a single commit once the review process is complete. > Support FETCH NEXT| n ROWS from Cursor > -- > > Key: PHOENIX-3572 > URL: https://issues.apache.org/jira/browse/PHOENIX-3572 > Project: Phoenix > Issue Type: Sub-task >Reporter: Biju Nair >Assignee: Biju Nair > > Implement required changes to support > - {{DECLARE}} and {{OPEN}} a cursor > - query {{FETCH NEXT | n ROWS}} from the cursor > - {{CLOSE}} the cursor > Based on the feedback in [PR > #192|https://github.com/apache/phoenix/pull/192], implement the changes using > {{ResultSet}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (PHOENIX-3601) PhoenixRDD doesn't expose the preferred node locations to Spark
[ https://issues.apache.org/jira/browse/PHOENIX-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Mahonin resolved PHOENIX-3601. --- Resolution: Fixed Fix Version/s: 4.10.0 > PhoenixRDD doesn't expose the preferred node locations to Spark > --- > > Key: PHOENIX-3601 > URL: https://issues.apache.org/jira/browse/PHOENIX-3601 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: Josh Mahonin >Assignee: Josh Mahonin > Fix For: 4.10.0 > > Attachments: PHOENIX-3601.patch > > > Follow-up to PHOENIX-3600, in order to let Spark know the preferred node > locations to assign partitions to, we need to update PhoenixRDD to retrieve > the underlying node location information from the splits. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3600) Core MapReduce classes don't provide location info
[ https://issues.apache.org/jira/browse/PHOENIX-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863986#comment-15863986 ] Hudson commented on PHOENIX-3600: - FAILURE: Integrated in Jenkins build Phoenix-master #1550 (See [https://builds.apache.org/job/Phoenix-master/1550/]) PHOENIX-3600 Core MapReduce classes don't provide location info (jmahonin: rev 267323da8242fb6f0953c1a75cf96c5fde3d49ed) * (edit) phoenix-core/src/main/java/org/apache/phoenix/mapreduce/PhoenixInputFormat.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/mapreduce/PhoenixInputSplit.java * (edit) phoenix-core/src/main/java/org/apache/phoenix/mapreduce/util/PhoenixConfigurationUtil.java > Core MapReduce classes don't provide location info > -- > > Key: PHOENIX-3600 > URL: https://issues.apache.org/jira/browse/PHOENIX-3600 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: Josh Mahonin >Assignee: Josh Mahonin > Attachments: PHOENIX-3600.patch, PHOENIX-3600_v2.patch > > > The core MapReduce classes {{org.apache.phoenix.mapreduce.PhoenixInputSplit}} > and {{org.apache.phoenix.mapreduce.PhoenixInputFormat}} don't provide region > size or location information, leaving the execution engine (MR, Spark, etc.) > to randomly assign splits to nodes. > Interestingly, the phoenix-hive module has reimplemented these classes, > including the node-aware functionality. We should port a subset of those > changes back to the core code so that other engines can make use of them. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863980#comment-15863980 ] James Taylor commented on PHOENIX-3666: --- Let's just hard code column qualifiers as two bytes and not expose an option to the user to change it for now. Leave the new table column for the property, though, so that we can potentially fix in a point release. > Make use of EncodedColumnQualifierCellsList for all column name mapping > schemes > --- > > Key: PHOENIX-3666 > URL: https://issues.apache.org/jira/browse/PHOENIX-3666 > Project: Phoenix > Issue Type: Sub-task >Reporter: Samarth Jain >Assignee: Samarth Jain > Attachments: PHOENIX-3666_wip.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3600) Core MapReduce classes don't provide location info
[ https://issues.apache.org/jira/browse/PHOENIX-3600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863925#comment-15863925 ] Josh Mahonin commented on PHOENIX-3600: --- Looks like I broke 4.x-HBase-0.98. Fixing ASAP. > Core MapReduce classes don't provide location info > -- > > Key: PHOENIX-3600 > URL: https://issues.apache.org/jira/browse/PHOENIX-3600 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.8.0 >Reporter: Josh Mahonin >Assignee: Josh Mahonin > Attachments: PHOENIX-3600.patch, PHOENIX-3600_v2.patch > > > The core MapReduce classes {{org.apache.phoenix.mapreduce.PhoenixInputSplit}} > and {{org.apache.phoenix.mapreduce.PhoenixInputFormat}} don't provide region > size or location information, leaving the execution engine (MR, Spark, etc.) > to randomly assign splits to nodes. > Interestingly, the phoenix-hive module has reimplemented these classes, > including the node-aware functionality. We should port a subset of those > changes back to the core code so that other engines can make use of them. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3664) Pyspark: pushing filter by date against apache phoenix
[ https://issues.apache.org/jira/browse/PHOENIX-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863720#comment-15863720 ] Josh Mahonin commented on PHOENIX-3664: --- Hi [~pablo.castellanos] I've not seen this before, although I wonder if there's perhaps a few issues at play. 1) Some sort of date translation issue between python datetime, pySpark and phoenix-spark 2) An issue with how Spark treats the 'java.sql.Date' type, and how Phoenix stores it internally Re: 1) Is it possible to attempt a similar code block using Scala in the spark-shell? I think it should be pretty much the same code, just replace {{datetime.datetime.now}} with {{System.currentTimeMillis}} Re: 2) You might have some success passing the 'dateAsTimestamp' flag to Spark. Effectively Spark truncates the HH:MM:SS part of a date off, even though it is present in the Phoenix data type. I wonder if pyspark is doing anything strange with that. https://github.com/apache/phoenix/blob/a0e5efcec5a1a732b2dce9794251242c3d66eea6/phoenix-spark/src/it/scala/org/apache/phoenix/spark/PhoenixSparkIT.scala#L622-L633 > Pyspark: pushing filter by date against apache phoenix > -- > > Key: PHOENIX-3664 > URL: https://issues.apache.org/jira/browse/PHOENIX-3664 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 > Environment: Azure HDIndight - pyspark using phoenix client. >Reporter: Pablo Castilla > > I am trying to filter by date in apache phoenix from pyspark. The column in > phoenix is created as Date and the filter is a datetime. When I use explain I > see spark doesn't push the filter to phoenix. I have tried a lot of > combinations without luck. > Any way to do it? > df = sqlContext.read \ >.format("org.apache.phoenix.spark") \ > .option("table", "TABLENAME") \ > .option("zkUrl",zookepperServer +":2181:/hbase-unsecure" ) \ > .load() > print(df.printSchema()) > startValidation = datetime.datetime.now() > print(df.filter(df['FH'] >startValidation).explain(True)) > Results: > root > |-- METER_ID: string (nullable = true) > |-- FH: date (nullable = true) > None >== Parsed Logical Plan == > 'Filter (FH#53 > 1486726683446150) > +- > Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64] > PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure) > == Analyzed Logical Plan == > METER_ID: string, FH: date, SUMMERTIME: string, MAGNITUDE: int, SOURCE: int, > ENTRY_DATETIME: date, BC: string, T_VAL_AE: int, T_VAL_AI: int, T_VAL_R1: > int, T_VAL_R2: int, T_VAL_R3: int, T_VAL_R4: int > Filter (cast(FH#53 as string) > cast(1486726683446150 as string)) > +- > Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64] > PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure) > == Optimized Logical Plan == > Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615) > +- > Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64] > PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure) > == Physical Plan == > Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615) > +- Scan > PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64] > None > if I set the FH column as timestamp it pushes the filter but throws an > exception: > Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 > (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "12" at line > 1, column 219. > at > org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33) > at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111) > at > org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280) > at > org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363) > at > org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373) > at > org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368) > at > org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122) > ... 102 more > Caused by: MismatchedTokenException(106!=129) > at > org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:360) > at > org.apache.phoenix.shaded.org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) >
[jira] [Commented] (PHOENIX-3665) Dataset api is missing phoenix spark connector for spark 2.0.2
[ https://issues.apache.org/jira/browse/PHOENIX-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863708#comment-15863708 ] Josh Mahonin commented on PHOENIX-3665: --- Can you verify if the fix for PHOENIX- solves this? > Dataset api is missing phoenix spark connector for spark 2.0.2 > -- > > Key: PHOENIX-3665 > URL: https://issues.apache.org/jira/browse/PHOENIX-3665 > Project: Phoenix > Issue Type: Bug >Reporter: Aavesh > > We have used DataFrameFunctions class api for dataFrames for putting > DataFrame in the hbase table. But For spark 2.0.2 version dataFrame is no > more available for java and scala code and for this we need to phoenix spark > api for spark 2.0.2 which will igest data using dataset into hbase tables. > Please help if it is already available else give some workaround on this -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-3666: -- Attachment: PHOENIX-3666_wip.patch This turned out to be trickier than I anticipated. Essentially, I wanted to make sure that we are able to use all the different column mapping schemes which turned up an issue in the way we are hardcoding "reserved" column qualifiers. To help resolve this, I thought serializing the encoding scheme in ProjectedColumnExpressions would help but unfortunately that was a dark abyss. We generate various column expressions using intermediate PTable representations that don't (and can't) have the right encoding schemes in them. This took me down the path of attempting to use the right encoding scheme when we deserialize the expressions on the server side. But that made the code really fragile as we serialize expressions everywhere and having to fix the scheme in all those places was just ugly. I ultimately decided to hard code the reserved column qualifiers (range 1-10) to be serialized using ONE_BYTE_QUALIFIER scheme. I also relaxed the constraints in the encoding schemes to decode byte arrays of size 1 with the ONE_BYTE_QUALIFIER encoding/decoding scheme. A side effect of this change is that the EncodedColumnQualifierCellsList is no longer sorted wrt column qualifiers. This is because a one byte qualifier representation of 0 lexicographically sorts after lets say a 4 byte qualifier representation 11. As a result I need to sort the array of cells before creating a ResultTuple out of it. I am parking this patch as wip since I only want to do the sorting when needed. All tests pass with this patch though. > Make use of EncodedColumnQualifierCellsList for all column name mapping > schemes > --- > > Key: PHOENIX-3666 > URL: https://issues.apache.org/jira/browse/PHOENIX-3666 > Project: Phoenix > Issue Type: Sub-task >Reporter: Samarth Jain >Assignee: Samarth Jain > Attachments: PHOENIX-3666_wip.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes
[ https://issues.apache.org/jira/browse/PHOENIX-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-3666: -- Issue Type: Sub-task (was: Task) Parent: PHOENIX-1598 > Make use of EncodedColumnQualifierCellsList for all column name mapping > schemes > --- > > Key: PHOENIX-3666 > URL: https://issues.apache.org/jira/browse/PHOENIX-3666 > Project: Phoenix > Issue Type: Sub-task >Reporter: Samarth Jain >Assignee: Samarth Jain > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (PHOENIX-3666) Make use of EncodedColumnQualifierCellsList for all column name mapping schemes
Samarth Jain created PHOENIX-3666: - Summary: Make use of EncodedColumnQualifierCellsList for all column name mapping schemes Key: PHOENIX-3666 URL: https://issues.apache.org/jira/browse/PHOENIX-3666 Project: Phoenix Issue Type: Task Reporter: Samarth Jain Assignee: Samarth Jain -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (PHOENIX-3665) Dataset api is missing phoenix spark connector for spark 2.0.2
Aavesh created PHOENIX-3665: --- Summary: Dataset api is missing phoenix spark connector for spark 2.0.2 Key: PHOENIX-3665 URL: https://issues.apache.org/jira/browse/PHOENIX-3665 Project: Phoenix Issue Type: Bug Reporter: Aavesh We have used DataFrameFunctions class api for dataFrames for putting DataFrame in the hbase table. But For spark 2.0.2 version dataFrame is no more available for java and scala code and for this we need to phoenix spark api for spark 2.0.2 which will igest data using dataset into hbase tables. Please help if it is already available else give some workaround on this -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (PHOENIX-3664) Pyspark: pushing filter by date against apache phoenix
Pablo Castilla created PHOENIX-3664: --- Summary: Pyspark: pushing filter by date against apache phoenix Key: PHOENIX-3664 URL: https://issues.apache.org/jira/browse/PHOENIX-3664 Project: Phoenix Issue Type: Bug Affects Versions: 4.7.0 Environment: Azure HDIndight - pyspark using phoenix client. Reporter: Pablo Castilla I am trying to filter by date in apache phoenix from pyspark. The column in phoenix is created as Date and the filter is a datetime. When I use explain I see spark doesn't push the filter to phoenix. I have tried a lot of combinations without luck. Any way to do it? df = sqlContext.read \ .format("org.apache.phoenix.spark") \ .option("table", "TABLENAME") \ .option("zkUrl",zookepperServer +":2181:/hbase-unsecure" ) \ .load() print(df.printSchema()) startValidation = datetime.datetime.now() print(df.filter(df['FH'] >startValidation).explain(True)) Results: root |-- METER_ID: string (nullable = true) |-- FH: date (nullable = true) None == Parsed Logical Plan == 'Filter (FH#53 > 1486726683446150) +- Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64] PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure) == Analyzed Logical Plan == METER_ID: string, FH: date, SUMMERTIME: string, MAGNITUDE: int, SOURCE: int, ENTRY_DATETIME: date, BC: string, T_VAL_AE: int, T_VAL_AI: int, T_VAL_R1: int, T_VAL_R2: int, T_VAL_R3: int, T_VAL_R4: int Filter (cast(FH#53 as string) > cast(1486726683446150 as string)) +- Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64] PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure) == Optimized Logical Plan == Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615) +- Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64] PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure) == Physical Plan == Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615) +- Scan PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64] None if I set the FH column as timestamp it pushes the filter but throws an exception: Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "12" at line 1, column 219. at org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33) at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111) at org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280) at org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363) at org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373) at org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368) at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122) ... 102 more Caused by: MismatchedTokenException(106!=129) at org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:360) at org.apache.phoenix.shaded.org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115) at org.apache.phoenix.parse.PhoenixSQLParser.not_expression(PhoenixSQLParser.java:6862) at org.apache.phoenix.parse.PhoenixSQLParser.and_expression(PhoenixSQLParser.java:6677) at org.apache.phoenix.parse.PhoenixSQLParser.or_expression(PhoenixSQLParser.java:6614) at org.apache.phoenix.parse.PhoenixSQLParser.expression(PhoenixSQLParser.java:6579) at org.apache.phoenix.parse.PhoenixSQLParser.single_select(PhoenixSQLParser.java:4615) at org.apache.phoenix.parse.PhoenixSQLParser.unioned_selects(PhoenixSQLParser.java:4697) at org.apache.phoenix.parse.PhoenixSQLParser.select_node(PhoenixSQLParser.java:4763) at org.apache.phoenix.parse.PhoenixSQLParser.oneStatement(PhoenixSQLParser.java:789) at org.apache.phoenix.parse.PhoenixSQLParser.statement(PhoenixSQLParser.java:508) at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:108) ... 107 more -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3572) Support FETCH NEXT| n ROWS from Cursor
[ https://issues.apache.org/jira/browse/PHOENIX-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863298#comment-15863298 ] ASF GitHub Bot commented on PHOENIX-3572: - Github user ankitsinghal commented on a diff in the pull request: https://github.com/apache/phoenix/pull/229#discussion_r100738646 --- Diff: phoenix-core/src/main/java/org/apache/phoenix/execute/CursorFetchPlan.java --- @@ -0,0 +1,87 @@ +package org.apache.phoenix.execute; + +import java.sql.ParameterMetaData; +import java.sql.SQLException; +import java.util.List; +import java.util.Set; + +import org.apache.hadoop.hbase.client.Scan; +import org.apache.phoenix.compile.ExplainPlan; +import org.apache.phoenix.compile.GroupByCompiler.GroupBy; +import org.apache.phoenix.compile.OrderByCompiler.OrderBy; +import org.apache.phoenix.compile.QueryPlan; +import org.apache.phoenix.compile.RowProjector; +import org.apache.phoenix.compile.StatementContext; +import org.apache.phoenix.iterate.CursorResultIterator; +import org.apache.phoenix.iterate.ParallelScanGrouper; +import org.apache.phoenix.iterate.ResultIterator; +import org.apache.phoenix.jdbc.PhoenixStatement.Operation; +import org.apache.phoenix.parse.FilterableStatement; +import org.apache.phoenix.query.KeyRange; +import org.apache.phoenix.schema.TableRef; + +public class CursorFetchPlan extends DelegateQueryPlan { + + //QueryPlan cursorQueryPlan; + private CursorResultIterator resultIterator; + private int fetchSize; + + public CursorFetchPlan(QueryPlan cursorQueryPlan) { + super(cursorQueryPlan); + } + + + @Override + public ResultIterator iterator() throws SQLException { + // TODO Auto-generated method stub + StatementContext context = delegate.getContext(); + if (resultIterator != null) { + return resultIterator; + } else { + context.getOverallQueryMetrics().startQuery(); + resultIterator = (CursorResultIterator) delegate.iterator(); + return resultIterator; + } + } + + @Override + public ResultIterator iterator(ParallelScanGrouper scanGrouper) throws SQLException { + // TODO Auto-generated method stub + StatementContext context = delegate.getContext(); + if (resultIterator != null) { + return resultIterator; + } else { + context.getOverallQueryMetrics().startQuery(); + resultIterator = (CursorResultIterator) delegate.iterator(scanGrouper); + return resultIterator; + } + } + + @Override + public ResultIterator iterator(ParallelScanGrouper scanGrouper, Scan scan) throws SQLException { + // TODO Auto-generated method stub --- End diff -- can you merge these iterators as all are doing the same and base class will be calling iterator(ParallelScanGrouper scanGrouper, Scan scan) internally from other overloaded methods with special parameter value. > Support FETCH NEXT| n ROWS from Cursor > -- > > Key: PHOENIX-3572 > URL: https://issues.apache.org/jira/browse/PHOENIX-3572 > Project: Phoenix > Issue Type: Sub-task >Reporter: Biju Nair >Assignee: Biju Nair > > Implement required changes to support > - {{DECLARE}} and {{OPEN}} a cursor > - query {{FETCH NEXT | n ROWS}} from the cursor > - {{CLOSE}} the cursor > Based on the feedback in [PR > #192|https://github.com/apache/phoenix/pull/192], implement the changes using > {{ResultSet}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3572) Support FETCH NEXT| n ROWS from Cursor
[ https://issues.apache.org/jira/browse/PHOENIX-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863300#comment-15863300 ] ASF GitHub Bot commented on PHOENIX-3572: - Github user ankitsinghal commented on a diff in the pull request: https://github.com/apache/phoenix/pull/229#discussion_r100738871 --- Diff: phoenix-core/src/main/java/org/apache/phoenix/util/CursorUtil.java --- @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.phoenix.util; + +import java.sql.Connection; +import java.sql.SQLException; +import java.util.HashMap; +import java.util.Map; + +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.io.ImmutableBytesWritable; +import org.apache.phoenix.compile.QueryPlan; +import org.apache.phoenix.compile.OrderByCompiler.OrderBy; +import org.apache.phoenix.execute.CursorFetchPlan; +import org.apache.phoenix.iterate.CursorResultIterator; +import org.apache.phoenix.parse.CloseStatement; +import org.apache.phoenix.parse.DeclareCursorStatement; +import org.apache.phoenix.parse.OpenStatement; +import org.apache.phoenix.schema.tuple.Tuple; + +public final class CursorUtil { + +private static class CursorWrapper { +private final String cursorName; +private final String selectSQL; +private boolean isOpen = false; +QueryPlan queryPlan; +ImmutableBytesWritable row; +ImmutableBytesWritable previousRow; +private Scan scan; +private boolean moreValues=true; +private boolean isReversed; +private boolean islastCallNext; +private CursorFetchPlan fetchPlan; +private int offset = -1; + +private CursorWrapper(String cursorName, String selectSQL, QueryPlan queryPlan){ +this.cursorName = cursorName; +this.selectSQL = selectSQL; +this.queryPlan = queryPlan; +this.islastCallNext = true; +this.fetchPlan = new CursorFetchPlan(queryPlan); +} + +private synchronized void openCursor(Connection conn) throws SQLException { +if(isOpen){ +return; +} +this.scan = this.queryPlan.getContext().getScan(); + isReversed=OrderBy.REV_ROW_KEY_ORDER_BY.equals(this.queryPlan.getOrderBy()); +isOpen = true; +} + +private void closeCursor() throws SQLException { +isOpen = false; +((CursorResultIterator) fetchPlan.iterator()).closeCursor(); +//TODO: Determine if the cursor should be removed from the HashMap at this point. +//Semantically it makes sense that something which is 'Closed' one should be able to 'Open' again. +mapCursorIDQuery.remove(this.cursorName); +} + +private QueryPlan getFetchPlan(boolean isNext, int fetchSize) throws SQLException { +if (!isOpen) +throw new SQLException("Fetch call on closed cursor '" + this.cursorName + "'!"); + ((CursorResultIterator)fetchPlan.iterator()).setFetchSize(fetchSize); +if (!queryPlan.getStatement().isAggregate() || !queryPlan.getStatement().isDistinct()) { + if (islastCallNext != isNext) { +if (islastCallNext && !isReversed){ + ScanUtil.setReversed(scan); +} else { + ScanUtil.unsetReversed(scan); +} --- End diff -- this code seems to be for reverse/prior and belongs to another JIRA. can we remove this if it can affect the functionality? > Support FETCH NEXT| n ROWS from Cursor > -- > > Key: PHOENIX-3572 > URL:
[jira] [Commented] (PHOENIX-3572) Support FETCH NEXT| n ROWS from Cursor
[ https://issues.apache.org/jira/browse/PHOENIX-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863302#comment-15863302 ] Ankit Singhal commented on PHOENIX-3572: [~gsbiju], requires some cleanup so have left some feedback. ping [~jamestaylor] for review. > Support FETCH NEXT| n ROWS from Cursor > -- > > Key: PHOENIX-3572 > URL: https://issues.apache.org/jira/browse/PHOENIX-3572 > Project: Phoenix > Issue Type: Sub-task >Reporter: Biju Nair >Assignee: Biju Nair > > Implement required changes to support > - {{DECLARE}} and {{OPEN}} a cursor > - query {{FETCH NEXT | n ROWS}} from the cursor > - {{CLOSE}} the cursor > Based on the feedback in [PR > #192|https://github.com/apache/phoenix/pull/192], implement the changes using > {{ResultSet}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3471) Allow accessing full (legacy) Phoenix EXPLAIN information via Calcite
[ https://issues.apache.org/jira/browse/PHOENIX-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863321#comment-15863321 ] ASF GitHub Bot commented on PHOENIX-3471: - GitHub user gabrielreid opened a pull request: https://github.com/apache/phoenix/pull/231 PHOENIX-3471 Add query plan matching system Add a generic system for parsing and matching Calcite query plans using Hamcrest matchers. The general intention is to make matching of query plans less brittle and somewhat easier to write than simply matching the full text of the query plan. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gabrielreid/phoenix PHOENIX-3471_explain_plan Alternatively you can review and apply these changes as the patch at: https://github.com/apache/phoenix/pull/231.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #231 commit 128ff0a3288b0cfc48b068fc21704ca07278a33c Author: Gabriel ReidDate: 2016-11-18T09:58:18Z PHOENIX-3471 Add query plan matching system Add a generic system for parsing and matching Calcite query plans using Hamcrest matchers. The general intention is to make matching of query plans less brittle and somewhat easier to write than simply matching the full text of the query plan. > Allow accessing full (legacy) Phoenix EXPLAIN information via Calcite > - > > Key: PHOENIX-3471 > URL: https://issues.apache.org/jira/browse/PHOENIX-3471 > Project: Phoenix > Issue Type: Sub-task >Reporter: Gabriel Reid >Assignee: Gabriel Reid > > The EXPLAIN syntax in Calcite-Phoenix (either "EXPLAIN " or "EXPLAIN > PLAN FOR ") currently returns the Calcite plan for a query. For example: > {code} > EXPLAIN SELECT MAX(I) FROM T1 > {code} > results in the following Calcite explain plan: > {code} > PhoenixToEnumerableConverter > PhoenixServerAggregate(group=[{}], EXPR$0=[MAX($0)]) > PhoenixTableScan(table=[[phoenix, T1]]) > {code} > and the following (legacy) Phoenix explain plan: > {code} > CLIENT PARALLEL 1-WAY FULL SCAN OVER T1 > SERVER FILTER BY FIRST KEY ONLY > {code} > There are currently a large number of integration tests which depend on the > legacy Phoenix format of explain plan, and this format is no longer available > when running via Calcite. PHOENIX-3105 added support for accessing the > explain plan via the "EXPLAIN " syntax, but this update to the syntax > still only provides the Calcite-specific explain plan. > There are three main approaches which can be taken here: > h4. Option 1: Custom EXPLAIN execution > This approach extends the work done in PHOENIX-3105 to plug in a custom > SqlPhoenixExplain > node which returns the legacy Phoenix explain plan, with the "EXPLAIN PLAN > FOR " > syntax still returning the Calcite explain plan. > h4. Option 2: Add the legacy Phoenix explain plan to the Calcite plan as a > top-level attribute > This approach results in an explain plan that looks as follows: > {code} > PhoenixToEnumerableConverter(PhoenixExecutionPlan=[CLIENT PARALLEL 1-WAY FULL > SCAN OVER T1 > SERVER FILTER BY FIRST KEY ONLY]) > PhoenixServerAggregate(group=[{}], EXPR$0=[MAX($0)]) > PhoenixTableScan(table=[[phoenix, T1]]) > {code} > The disadvantage of this approach is that it's not really "correct" -- we're > just tacking > a different representation of the explain plan into the Calcite explain plan. > The advantage of this approach is that it's very quick and easy to implement > (i.e. it > can be done immediately), and it will require minimal changes to the many > test cases which have > hard-coded explain plans that things are checked against. All we need to do > is have a > utility to extract the PhoenixExecutionPlan value from the full Calcite plan, > and other > than that all test cases stay the same. > h4. Option 3: Add all relevant information to the correct parts of the > Calcite explain plan > This approach would result in an explain plan that looks as follows: > {code} > PhoenixToEnumerableConverter > PhoenixServerAggregate(group=[{}], EXPR$0=[MAX($0)]) > PhoenixTableScan(table=[[phoenix, T1]], scanType[CLIENT PARALLEL 1-WAY > FULL ]) > {code} > This is undoubtedly the "right" way to do things. However, it has the major > disadvantage > that it will require a large amount of work to do the following: > * add all relevant information into various implementations of > {{AbstractRelNode.explainTerms}} > * rework all test cases which verify things against an expected explain plan > It is of course also an option is to start with option 2 here, and eventually > migrate to option 3. > If we go for
[jira] [Commented] (PHOENIX-3112) Partial row scan not handled correctly
[ https://issues.apache.org/jira/browse/PHOENIX-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863149#comment-15863149 ] Manivel Poomalai commented on PHOENIX-3112: --- Hi James, Thanks for responding in email, As per your suggest I have commented in JIRA instead via email. Do you have any solution or workaround to fix this issue, and when this JIRA will be resolved do you have any tentative date. Thanks, -Manivel > Partial row scan not handled correctly > -- > > Key: PHOENIX-3112 > URL: https://issues.apache.org/jira/browse/PHOENIX-3112 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0 >Reporter: Pierre Lacave > > When doing a select of a relatively large table (a few touthands rows) some > rows return partially missing. > When increasing the fitler to return those specific rows, the values appear > as expected > {noformat} > CREATE TABLE IF NOT EXISTS TEST ( > BUCKET VARCHAR, > TIMESTAMP_DATE TIMESTAMP, > TIMESTAMP UNSIGNED_LONG NOT NULL, > SRC VARCHAR, > DST VARCHAR, > ID VARCHAR, > ION VARCHAR, > IC BOOLEAN NOT NULL, > MI UNSIGNED_LONG, > AV UNSIGNED_LONG, > MA UNSIGNED_LONG, > CNT UNSIGNED_LONG, > DUMMY VARCHAR > CONSTRAINT pk PRIMARY KEY (BUCKET, TIMESTAMP DESC, SRC, DST, ID, ION, IC) > );{noformat} > using a python script to generate a CSV with 5000 rows > {noformat} > for i in xrange(5000): > print "5SEC,2016-07-21 > 07:25:35.{i},146908593500{i},,AAA,,,false,{i}1181000,1788000{i},2497001{i},{i},a{i}".format(i=i) > {noformat} > bulk inserting the csv in the table > {noformat} > phoenix/bin/psql.py localhost -t TEST large.csv > {noformat} > here we can see one row that contains no TIMESTAMP_DATE and null values in MI > and MA > {noformat} > 0: jdbc:phoenix:localhost:2181> select * from TEST > > +-+--+---+---+--+---+---++--+--+--+---++ > | BUCKET | TIMESTAMP_DATE | TIMESTAMP |SRC| DST | > ID |ION| IC | MI | AV | MA | > CNT | DUMMY > | > +-+--+---+---+--+---+---++--+--+--+---++ > | 5SEC| 2016-07-21 07:25:35.100 | 1469085935001000 | | AAA | > | | false | 10001181000 | 17880001000 | 24970011000 | > 1000 | > a1000 | > | 5SEC| 2016-07-21 07:25:35.999 | 146908593500999 | | AAA | > | | false | 9991181000 | 1788000999 | 2497001999 | 999 > | a999 > | > | 5SEC| 2016-07-21 07:25:35.998 | 146908593500998 | | AAA | > | | false | 9981181000 | 1788000998 | 2497001998 | 998 > | a998 > | > | 5SEC| | 146908593500997 | | AAA | > | | false | null | 1788000997 | null | 997 > | > | > | 5SEC| 2016-07-21 07:25:35.996 | 146908593500996 | | AAA | > | | false | 9961181000 | 1788000996 | 2497001996 | 996 > | a996 > | > | 5SEC| 2016-07-21 07:25:35.995 | 146908593500995 | | AAA | > | | false | 9951181000 | 1788000995 | 2497001995 | 995 > | a995 > | > | 5SEC| 2016-07-21 07:25:35.994 | 146908593500994 | | AAA | > | | false | 9941181000 | 1788000994 | 2497001994 | 994 > | a994 > | > > {noformat} > but when selecting that row specifically the values are correct > {noformat} > 0: jdbc:phoenix:localhost:2181> select * from TEST where timestamp = > 146908593500997; >
[jira] [Commented] (PHOENIX-3471) Allow accessing full (legacy) Phoenix EXPLAIN information via Calcite
[ https://issues.apache.org/jira/browse/PHOENIX-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15863323#comment-15863323 ] Gabriel Reid commented on PHOENIX-3471: --- (Finally) added a PR to pull the plumbing for this into the Calcite branch: https://github.com/apache/phoenix/pull/231. This PR adds the basics to be able to interpret and match Calcite query plans as outlined in comments above. > Allow accessing full (legacy) Phoenix EXPLAIN information via Calcite > - > > Key: PHOENIX-3471 > URL: https://issues.apache.org/jira/browse/PHOENIX-3471 > Project: Phoenix > Issue Type: Sub-task >Reporter: Gabriel Reid >Assignee: Gabriel Reid > > The EXPLAIN syntax in Calcite-Phoenix (either "EXPLAIN " or "EXPLAIN > PLAN FOR ") currently returns the Calcite plan for a query. For example: > {code} > EXPLAIN SELECT MAX(I) FROM T1 > {code} > results in the following Calcite explain plan: > {code} > PhoenixToEnumerableConverter > PhoenixServerAggregate(group=[{}], EXPR$0=[MAX($0)]) > PhoenixTableScan(table=[[phoenix, T1]]) > {code} > and the following (legacy) Phoenix explain plan: > {code} > CLIENT PARALLEL 1-WAY FULL SCAN OVER T1 > SERVER FILTER BY FIRST KEY ONLY > {code} > There are currently a large number of integration tests which depend on the > legacy Phoenix format of explain plan, and this format is no longer available > when running via Calcite. PHOENIX-3105 added support for accessing the > explain plan via the "EXPLAIN " syntax, but this update to the syntax > still only provides the Calcite-specific explain plan. > There are three main approaches which can be taken here: > h4. Option 1: Custom EXPLAIN execution > This approach extends the work done in PHOENIX-3105 to plug in a custom > SqlPhoenixExplain > node which returns the legacy Phoenix explain plan, with the "EXPLAIN PLAN > FOR " > syntax still returning the Calcite explain plan. > h4. Option 2: Add the legacy Phoenix explain plan to the Calcite plan as a > top-level attribute > This approach results in an explain plan that looks as follows: > {code} > PhoenixToEnumerableConverter(PhoenixExecutionPlan=[CLIENT PARALLEL 1-WAY FULL > SCAN OVER T1 > SERVER FILTER BY FIRST KEY ONLY]) > PhoenixServerAggregate(group=[{}], EXPR$0=[MAX($0)]) > PhoenixTableScan(table=[[phoenix, T1]]) > {code} > The disadvantage of this approach is that it's not really "correct" -- we're > just tacking > a different representation of the explain plan into the Calcite explain plan. > The advantage of this approach is that it's very quick and easy to implement > (i.e. it > can be done immediately), and it will require minimal changes to the many > test cases which have > hard-coded explain plans that things are checked against. All we need to do > is have a > utility to extract the PhoenixExecutionPlan value from the full Calcite plan, > and other > than that all test cases stay the same. > h4. Option 3: Add all relevant information to the correct parts of the > Calcite explain plan > This approach would result in an explain plan that looks as follows: > {code} > PhoenixToEnumerableConverter > PhoenixServerAggregate(group=[{}], EXPR$0=[MAX($0)]) > PhoenixTableScan(table=[[phoenix, T1]], scanType[CLIENT PARALLEL 1-WAY > FULL ]) > {code} > This is undoubtedly the "right" way to do things. However, it has the major > disadvantage > that it will require a large amount of work to do the following: > * add all relevant information into various implementations of > {{AbstractRelNode.explainTerms}} > * rework all test cases which verify things against an expected explain plan > It is of course also an option is to start with option 2 here, and eventually > migrate to option 3. > If we go for option 2 or option 3, we should probably remove the custom > EXPLAIN parsing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] phoenix pull request #231: PHOENIX-3471 Add query plan matching system
GitHub user gabrielreid opened a pull request: https://github.com/apache/phoenix/pull/231 PHOENIX-3471 Add query plan matching system Add a generic system for parsing and matching Calcite query plans using Hamcrest matchers. The general intention is to make matching of query plans less brittle and somewhat easier to write than simply matching the full text of the query plan. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gabrielreid/phoenix PHOENIX-3471_explain_plan Alternatively you can review and apply these changes as the patch at: https://github.com/apache/phoenix/pull/231.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #231 commit 128ff0a3288b0cfc48b068fc21704ca07278a33c Author: Gabriel ReidDate: 2016-11-18T09:58:18Z PHOENIX-3471 Add query plan matching system Add a generic system for parsing and matching Calcite query plans using Hamcrest matchers. The general intention is to make matching of query plans less brittle and somewhat easier to write than simply matching the full text of the query plan. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---