[jira] [Commented] (PHOENIX-1315) Optimize query for Pig loader
[ https://issues.apache.org/jira/browse/PHOENIX-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161560#comment-14161560 ] James Taylor commented on PHOENIX-1315: --- Thanks, [~maghamravikiran]. Please resolve the issue as fixed. Optimize query for Pig loader - Key: PHOENIX-1315 URL: https://issues.apache.org/jira/browse/PHOENIX-1315 Project: Phoenix Issue Type: Bug Reporter: James Taylor Assignee: maghamravikiran Fix For: 4.2, 3.2 Attachments: PHOENIX-1315.patch, PHOENIX-1315_v2.patch, PHOENIX-1315_v3.patch, PHOENIX-1315_v4.patch I came across this with a recent change I was making. Why is the call to queryPlan.iterators() necessary in PhoenixInputFormat? {code} private QueryPlan getQueryPlan(final JobContext context) throws IOException { Preconditions.checkNotNull(context); if(queryPlan == null) { try{ final Connection connection = getConnection(); final String selectStatement = getConf().getSelectStatement(); Preconditions.checkNotNull(selectStatement); final Statement statement = connection.createStatement(); final PhoenixStatement pstmt = statement.unwrap(PhoenixStatement.class); this.queryPlan = pstmt.compileQuery(selectStatement); // FIXME: why is getting the iterator necessary here, as it will // cause the query to run. this.queryPlan.iterator(); } catch(Exception exception) { LOG.error(String.format(Failed to get the query plan with error [%s],exception.getMessage())); throw new RuntimeException(exception); } } return queryPlan; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1315) Optimize query for Pig loader
[ https://issues.apache.org/jira/browse/PHOENIX-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161563#comment-14161563 ] Hudson commented on PHOENIX-1315: - FAILURE: Integrated in Phoenix | Master #407 (See [https://builds.apache.org/job/Phoenix-master/407/]) PHOENIX-1315-Test Load from Index table. (ravimagham: rev e35503374393b0428f4e6603c8e05d87a073e3c3) * phoenix-pig/src/it/java/org/apache/phoenix/pig/PhoenixHBaseLoaderIT.java Optimize query for Pig loader - Key: PHOENIX-1315 URL: https://issues.apache.org/jira/browse/PHOENIX-1315 Project: Phoenix Issue Type: Bug Reporter: James Taylor Assignee: maghamravikiran Fix For: 4.2, 3.2 Attachments: PHOENIX-1315.patch, PHOENIX-1315_v2.patch, PHOENIX-1315_v3.patch, PHOENIX-1315_v4.patch I came across this with a recent change I was making. Why is the call to queryPlan.iterators() necessary in PhoenixInputFormat? {code} private QueryPlan getQueryPlan(final JobContext context) throws IOException { Preconditions.checkNotNull(context); if(queryPlan == null) { try{ final Connection connection = getConnection(); final String selectStatement = getConf().getSelectStatement(); Preconditions.checkNotNull(selectStatement); final Statement statement = connection.createStatement(); final PhoenixStatement pstmt = statement.unwrap(PhoenixStatement.class); this.queryPlan = pstmt.compileQuery(selectStatement); // FIXME: why is getting the iterator necessary here, as it will // cause the query to run. this.queryPlan.iterator(); } catch(Exception exception) { LOG.error(String.format(Failed to get the query plan with error [%s],exception.getMessage())); throw new RuntimeException(exception); } } return queryPlan; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1317) Cleanup non phoenix-core pom files
[ https://issues.apache.org/jira/browse/PHOENIX-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161574#comment-14161574 ] Hudson commented on PHOENIX-1317: - FAILURE: Integrated in Phoenix | 3.0 | Hadoop1 #243 (See [https://builds.apache.org/job/Phoenix-3.0-hadoop1/243/]) PHOENIX-1317-Test on loading data from Index table (ravimagham: rev 4b0d3ba199c0a9c64d247376a033c7761843a5c7) * phoenix-pig/src/it/java/org/apache/phoenix/pig/PhoenixHBaseLoaderIT.java Cleanup non phoenix-core pom files -- Key: PHOENIX-1317 URL: https://issues.apache.org/jira/browse/PHOENIX-1317 Project: Phoenix Issue Type: Bug Affects Versions: 3.1, 4.1 Reporter: James Taylor Assignee: maghamravikiran Attachments: 0001-PHOENIX-1317-4.1.0.patch The phoenix-core pom is in much better shape after PHOENIX-1272, but the non phoenix-core poms need to be updated as well. For one particular issue, take a look at the following comment in BIGTOP-1420: https://issues.apache.org/jira/browse/BIGTOP-1420?focusedCommentId=14125245page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14125245 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1321) Cleanup setting of timestamps when collecting and using stats
[ https://issues.apache.org/jira/browse/PHOENIX-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161614#comment-14161614 ] Hudson commented on PHOENIX-1321: - SUCCESS: Integrated in Phoenix | 3.0 | Hadoop1 #244 (See [https://builds.apache.org/job/Phoenix-3.0-hadoop1/244/]) PHOENIX-1321 Cleanup setting of timestamps when collecting and using stats (jtaylor: rev 12fa6f7004fe70a657ebaea3d745296611b2b80e) * phoenix-core/src/it/java/org/apache/phoenix/end2end/MultiCfQueryExecIT.java * phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java * phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java * phoenix-core/src/it/java/org/apache/phoenix/end2end/StatsCollectorIT.java * phoenix-core/src/main/java/org/apache/phoenix/query/QueryServicesOptions.java * phoenix-core/src/test/java/org/apache/phoenix/query/QueryServicesTestImpl.java * phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java * phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsUtils.java * phoenix-core/src/it/java/org/apache/phoenix/end2end/KeyOnlyIT.java * phoenix-core/src/it/java/org/apache/phoenix/end2end/ParallelIteratorsIT.java * phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsScanner.java * phoenix-core/src/main/java/org/apache/phoenix/compile/ExpressionCompiler.java * phoenix-core/src/main/java/org/apache/phoenix/query/QueryServices.java * phoenix-core/src/it/java/org/apache/phoenix/end2end/index/SaltedIndexIT.java * phoenix-core/src/main/java/org/apache/phoenix/util/MetaDataUtil.java * phoenix-core/src/test/java/org/apache/phoenix/util/TestUtil.java * phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsTable.java * phoenix-core/src/main/java/org/apache/phoenix/schema/stat/StatisticsCollector.java * phoenix-core/src/it/java/org/apache/phoenix/mapreduce/CsvBulkLoadToolIT.java * phoenix-core/src/it/java/org/apache/phoenix/end2end/BaseTenantSpecificTablesIT.java Cleanup setting of timestamps when collecting and using stats - Key: PHOENIX-1321 URL: https://issues.apache.org/jira/browse/PHOENIX-1321 Project: Phoenix Issue Type: Sub-task Reporter: James Taylor Assignee: James Taylor Attachments: PHOENIX-1321_4.patch We're currently not using the max timestamp that was passed through the Scan when we do an ANALYZE. In the same way, we're not using the client timestamp when we read the stats and cache them on PTable. The tricky thing is what timestamp to use for the stats when a split or compaction occurs, because in those cases we don't have a user supplied timestamp (if they're managing timestamps themselves). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-1297) Adding utility methods to get primary key information from the optimized query plan
[ https://issues.apache.org/jira/browse/PHOENIX-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-1297: -- Attachment: PHOENIX-1297_v2.patch [~jamestaylor] - attached is the patch for master branch. Please review. Thanks! Adding utility methods to get primary key information from the optimized query plan --- Key: PHOENIX-1297 URL: https://issues.apache.org/jira/browse/PHOENIX-1297 Project: Phoenix Issue Type: Task Affects Versions: 5.0.0, 4.2, 3.2 Reporter: Samarth Jain Assignee: Samarth Jain Attachments: PHOENIX-1297.patch, PHOENIX-1297_v2.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PHOENIX-1327) Disallow creating arrays of fixed width base type without the max length being specified
[ https://issues.apache.org/jira/browse/PHOENIX-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain reassigned PHOENIX-1327: - Assignee: Samarth Jain Disallow creating arrays of fixed width base type without the max length being specified Key: PHOENIX-1327 URL: https://issues.apache.org/jira/browse/PHOENIX-1327 Project: Phoenix Issue Type: Bug Reporter: Samarth Jain Assignee: Samarth Jain Fix For: 5.0.0, 4.2, 3.2 Today, we allow a user to specify an array who base type is of fixed width as: CREATE TABLE foo (k BINARY_ARRAY NOT NULL PRIMARY KEY) This shouldn't be allowed as for fixed width data types like CHAR and BINARY, specifying a max length is mandatory. These alternate statements properly enforce the max length constraint: CREATE TABLE foo (k BINARY ARRAY NOT NULL PRIMARY KEY) CREATE TABLE foo (k BINARY[] NOT NULL PRIMARY KEY) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-1327) Disallow creating arrays of fixed width base type without the max length being specified
[ https://issues.apache.org/jira/browse/PHOENIX-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain resolved PHOENIX-1327. --- Resolution: Duplicate Resolved as part of PHOENIX-1297 Disallow creating arrays of fixed width base type without the max length being specified Key: PHOENIX-1327 URL: https://issues.apache.org/jira/browse/PHOENIX-1327 Project: Phoenix Issue Type: Bug Reporter: Samarth Jain Fix For: 5.0.0, 4.2, 3.2 Today, we allow a user to specify an array who base type is of fixed width as: CREATE TABLE foo (k BINARY_ARRAY NOT NULL PRIMARY KEY) This shouldn't be allowed as for fixed width data types like CHAR and BINARY, specifying a max length is mandatory. These alternate statements properly enforce the max length constraint: CREATE TABLE foo (k BINARY ARRAY NOT NULL PRIMARY KEY) CREATE TABLE foo (k BINARY[] NOT NULL PRIMARY KEY) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-1320) Update stats atomically
[ https://issues.apache.org/jira/browse/PHOENIX-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor resolved PHOENIX-1320. --- Resolution: Fixed Fix Version/s: 3.2 4.2 5.0.0 Update stats atomically --- Key: PHOENIX-1320 URL: https://issues.apache.org/jira/browse/PHOENIX-1320 Project: Phoenix Issue Type: Sub-task Reporter: James Taylor Assignee: James Taylor Fix For: 5.0.0, 4.2, 3.2 Attachments: PHOENIX-1320.patch To prevent partially updated stats or a mix of old stats and new stats in the event of a write failure, commit the stats atomically. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-1321) Cleanup setting of timestamps when collecting and using stats
[ https://issues.apache.org/jira/browse/PHOENIX-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor resolved PHOENIX-1321. --- Resolution: Fixed Fix Version/s: 3.2 4.2 5.0.0 Cleanup setting of timestamps when collecting and using stats - Key: PHOENIX-1321 URL: https://issues.apache.org/jira/browse/PHOENIX-1321 Project: Phoenix Issue Type: Sub-task Reporter: James Taylor Assignee: James Taylor Fix For: 5.0.0, 4.2, 3.2 Attachments: PHOENIX-1321_4.patch We're currently not using the max timestamp that was passed through the Scan when we do an ANALYZE. In the same way, we're not using the client timestamp when we read the stats and cache them on PTable. The tricky thing is what timestamp to use for the stats when a split or compaction occurs, because in those cases we don't have a user supplied timestamp (if they're managing timestamps themselves). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-1328) Update ANALYZE syntax to collect stats on index tables and all tables
[ https://issues.apache.org/jira/browse/PHOENIX-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-1328: -- Description: Based on the discussion in PHOENIX-1309 we will now modify the ANALYZE query to collect the stats for index table and all the tables associated with the main table (includes index). (was: Based on the discussion in Phoenix-1309 we will now modify the ANALYZE query to collect the stats for index table and all the tables associated with the main table (includes index).) Update ANALYZE syntax to collect stats on index tables and all tables - Key: PHOENIX-1328 URL: https://issues.apache.org/jira/browse/PHOENIX-1328 Project: Phoenix Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Based on the discussion in PHOENIX-1309 we will now modify the ANALYZE query to collect the stats for index table and all the tables associated with the main table (includes index). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1328) Update ANALYZE syntax to collect stats on index tables and all tables
[ https://issues.apache.org/jira/browse/PHOENIX-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161649#comment-14161649 ] James Taylor commented on PHOENIX-1328: --- You can do an UPDATE STATISTICS on a table/index, it's indexes, or both the table and indexes (default). LIke this: - UPDATE STATISTICS table INDEX -- Updates the statistics of all indexes on the table - UPDATE STATISTICS table ALL -- Updates both the table and index statistics - UPDATE STATISTICS table --- Same as ALL - UPDATE STATISTICS table COLUMNS --- Updates only the table statistics - UPDATE STATISTICS index --- Updates only the index statistics Update ANALYZE syntax to collect stats on index tables and all tables - Key: PHOENIX-1328 URL: https://issues.apache.org/jira/browse/PHOENIX-1328 Project: Phoenix Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Based on the discussion in PHOENIX-1309 we will now modify the ANALYZE query to collect the stats for index table and all the tables associated with the main table (includes index). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-1322) Add documentation for UPDATE STATISTICS command
[ https://issues.apache.org/jira/browse/PHOENIX-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-1322: -- Issue Type: Sub-task (was: Bug) Parent: PHOENIX-1177 Add documentation for UPDATE STATISTICS command --- Key: PHOENIX-1322 URL: https://issues.apache.org/jira/browse/PHOENIX-1322 Project: Phoenix Issue Type: Sub-task Reporter: James Taylor Assignee: ramkrishna.s.vasudevan Four places need to be updated: - Add a new webpage and add the webpage to the Using menu (/site/source/src/site/site.xml). The webpage can talk about the new ANALYZE table command and give a couple of examples. It'd be good to document that stats are updated automatically during splits and compaction. Also, mention the new property values you added to control - number of bytes before guidepost put in, min time before another analyze may be done. Don't talk about implementation, though, other than the why we're doing this (i.e. to improve parallelization). - Add the new ANALYZE call with a short explanation to ./phoenix-docs/src/docsrc/help/phoenix.csv. This will cause it to appear here: http://phoenix.apache.org/language/index.html - Add an item at the top for Statistics Collection with a short explanation here: site/source/src/site/markdown/recent.md - Remove the first item from Cost-based Query Optimization, or change the font to strike through with a note that it's implemented in 3.2/4.2) here: site/source/src/site/markdown/roadmap.md -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-1030) Change Expression.isDeterministic() to return a enum of values ALWAYS, PER_STATEMENT, PER_ROW
[ https://issues.apache.org/jira/browse/PHOENIX-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas D'Silva updated PHOENIX-1030: Attachment: PHOENIX-1030-3.0.patch [~jamestaylor] I have attached a patch that works for all versions. I also modified NULL and TYPED_NULL expressions to be statically initialized using all the possible values of the Determinism enum. Thanks, Thomas Change Expression.isDeterministic() to return a enum of values ALWAYS, PER_STATEMENT, PER_ROW - Key: PHOENIX-1030 URL: https://issues.apache.org/jira/browse/PHOENIX-1030 Project: Phoenix Issue Type: Improvement Reporter: Thomas D'Silva Assignee: Thomas D'Silva Attachments: PHOENIX-1030-3.0.patch, PHOENIX-1030-3.0.patch, PHOENIX-1030-3.0.patch, PHOENIX-1030-3.0.patch, PHOENIX-1030-4.0.patch, PHOENIX-1030-4.0.patch, PHOENIX-1030-master.patch Change Expression.isDeterministic() to return an ENUM with three values DETERMINISTIC - the expression returns the same output every time given the same input. UNDETERMINISTIC_ROW - the expression should be computed for every row UNDETERMINISTIC_STMT - the expression should be be computed for a given statement only once See PHOENIX-1001 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1328) Update ANALYZE syntax to collect stats on index tables and all tables
[ https://issues.apache.org/jira/browse/PHOENIX-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161662#comment-14161662 ] ramkrishna.s.vasudevan commented on PHOENIX-1328: - bq.UPDATE STATISTICS index --- Updates only the index statistics How do you identify this - it is a specific index? From the name given? Because UPDATE STATISTICS table also will follow the same syntax right? Update ANALYZE syntax to collect stats on index tables and all tables - Key: PHOENIX-1328 URL: https://issues.apache.org/jira/browse/PHOENIX-1328 Project: Phoenix Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Based on the discussion in PHOENIX-1309 we will now modify the ANALYZE query to collect the stats for index table and all the tables associated with the main table (includes index). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1328) Update ANALYZE syntax to collect stats on index tables and all tables
[ https://issues.apache.org/jira/browse/PHOENIX-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161688#comment-14161688 ] ramkrishna.s.vasudevan commented on PHOENIX-1328: - Working on this. We need to any way collect all the indexes for the given table and issue a update stats in a future call right? That would ensure that the stats are collected parallely across all the tables in question. Update ANALYZE syntax to collect stats on index tables and all tables - Key: PHOENIX-1328 URL: https://issues.apache.org/jira/browse/PHOENIX-1328 Project: Phoenix Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Based on the discussion in PHOENIX-1309 we will now modify the ANALYZE query to collect the stats for index table and all the tables associated with the main table (includes index). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (PHOENIX-1300) Allow sub-queries to choose different execution path other than hash-join
[ https://issues.apache.org/jira/browse/PHOENIX-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue resolved PHOENIX-1300. -- Resolution: Fixed Fix Version/s: 5.0.0 4.0.0 3.0.0 Covered by fix for PHOENIX-167 Allow sub-queries to choose different execution path other than hash-join - Key: PHOENIX-1300 URL: https://issues.apache.org/jira/browse/PHOENIX-1300 Project: Phoenix Issue Type: Improvement Affects Versions: 3.0.0, 4.0.0, 5.0.0 Reporter: Maryann Xue Assignee: Maryann Xue Fix For: 3.0.0, 4.0.0, 5.0.0 Original Estimate: 240h Remaining Estimate: 240h We can take a different approach (like PHOENIX-1179) for sub-queries where the required hash-set cannot fit into memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-1300) Allow sub-queries to choose different execution path other than hash-join
[ https://issues.apache.org/jira/browse/PHOENIX-1300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue updated PHOENIX-1300: - Fix Version/s: (was: 4.0.0) (was: 3.0.0) 3.2 4.2 Allow sub-queries to choose different execution path other than hash-join - Key: PHOENIX-1300 URL: https://issues.apache.org/jira/browse/PHOENIX-1300 Project: Phoenix Issue Type: Improvement Affects Versions: 3.0.0, 4.0.0, 5.0.0 Reporter: Maryann Xue Assignee: Maryann Xue Fix For: 5.0.0, 4.2, 3.2 Original Estimate: 240h Remaining Estimate: 240h We can take a different approach (like PHOENIX-1179) for sub-queries where the required hash-set cannot fit into memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1328) Update ANALYZE syntax to collect stats on index tables and all tables
[ https://issues.apache.org/jira/browse/PHOENIX-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162112#comment-14162112 ] James Taylor commented on PHOENIX-1328: --- First pass it's fine if you just collect the stats for the table and/or indexes serially in MetaDataClient. Just loop over the indexes after you've resolved the table. like this: {code} public MutationState updateStatistics(UpdateStatisticsStatement updateStatisticsStmt) throws SQLException { // Check before updating the stats if we have reached the configured time to reupdate the stats once again long msMinBetweenUpdates = connection.getQueryServices().getProps() .getLong(QueryServices.MIN_STATS_UPDATE_FREQ_MS_ATTRIB, QueryServicesOptions.DEFAULT_MIN_STATS_UPDATE_FREQ_MS); ColumnResolver resolver = FromCompiler.getResolver(updateStatisticsStmt, connection); PTable table = resolver.getTables().get(0).getTable(); if (updateStatisticsStmt.updateColumns()) { doUpdateStatistics(table); } if (updateStatisticsStmt.updateIndexes()) { for (PTable index : table.getIndexes()) { doUpdateStatistics(index); } } } private void doUpdateStatistics(PTable table) { // TODO: refactored code here } {code} Update ANALYZE syntax to collect stats on index tables and all tables - Key: PHOENIX-1328 URL: https://issues.apache.org/jira/browse/PHOENIX-1328 Project: Phoenix Issue Type: Sub-task Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Based on the discussion in PHOENIX-1309 we will now modify the ANALYZE query to collect the stats for index table and all the tables associated with the main table (includes index). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1317) Cleanup non phoenix-core pom files
[ https://issues.apache.org/jira/browse/PHOENIX-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162118#comment-14162118 ] Andrew Purtell commented on PHOENIX-1317: - +1 Cleanup non phoenix-core pom files -- Key: PHOENIX-1317 URL: https://issues.apache.org/jira/browse/PHOENIX-1317 Project: Phoenix Issue Type: Bug Affects Versions: 3.1, 4.1 Reporter: James Taylor Assignee: maghamravikiran Attachments: 0001-PHOENIX-1317-4.1.0.patch The phoenix-core pom is in much better shape after PHOENIX-1272, but the non phoenix-core poms need to be updated as well. For one particular issue, take a look at the following comment in BIGTOP-1420: https://issues.apache.org/jira/browse/BIGTOP-1420?focusedCommentId=14125245page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14125245 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1030) Change Expression.isDeterministic() to return a enum of values ALWAYS, PER_STATEMENT, PER_ROW
[ https://issues.apache.org/jira/browse/PHOENIX-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162274#comment-14162274 ] James Taylor commented on PHOENIX-1030: --- Thanks, [~tdsilva]. Here's some feedback on some minor stuff: - I don't think you need a look here, as you should be able to index into BOOLEAN_EXPRESSIONS using the child.getDeterminism().ordinal() value. {code} public static boolean isFalse(Expression child) { -return child == FALSE_EXPRESSION || child == ND_FALSE_EXPRESSION; + for (Determinism determinism : Determinism.values()) { + if (child==BOOLEAN_EXPRESSIONS[determinism.ordinal()]) + return true; + } + return false; } public static boolean isTrue(Expression child) { -return child == TRUE_EXPRESSION || child == ND_TRUE_EXPRESSION; + for (Determinism determinism : Determinism.values()) { + if (child==BOOLEAN_EXPRESSIONS[Determinism.values().length+determinism.ordinal()]) + return true; + } + return false; } {code} - How about some static helper functions for these? {code} NULL_EXPRESSIONS[determinism.ordinal()] BOOLEAN_EXPRESSIONS[Determinism.values().length+determinism.ordinal()] TYPED_NULL_EXPRESSIONS[type.ordinal()+PDataType.values().length*determinism.ordinal()] {code} Change Expression.isDeterministic() to return a enum of values ALWAYS, PER_STATEMENT, PER_ROW - Key: PHOENIX-1030 URL: https://issues.apache.org/jira/browse/PHOENIX-1030 Project: Phoenix Issue Type: Improvement Reporter: Thomas D'Silva Assignee: Thomas D'Silva Attachments: PHOENIX-1030-3.0.patch, PHOENIX-1030-3.0.patch, PHOENIX-1030-3.0.patch, PHOENIX-1030-3.0.patch, PHOENIX-1030-4.0.patch, PHOENIX-1030-4.0.patch, PHOENIX-1030-master.patch Change Expression.isDeterministic() to return an ENUM with three values DETERMINISTIC - the expression returns the same output every time given the same input. UNDETERMINISTIC_ROW - the expression should be computed for every row UNDETERMINISTIC_STMT - the expression should be be computed for a given statement only once See PHOENIX-1001 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-167) Support semi/anti-joins
[ https://issues.apache.org/jira/browse/PHOENIX-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue updated PHOENIX-167: Fix Version/s: (was: 4.0.0) (was: 3.0.0) 3.2 4.2 Support semi/anti-joins --- Key: PHOENIX-167 URL: https://issues.apache.org/jira/browse/PHOENIX-167 Project: Phoenix Issue Type: Sub-task Reporter: James Taylor Assignee: Maryann Xue Labels: enhancement Fix For: 5.0.0, 4.2, 3.2 Attachments: 167-2.patch, 167.patch A semi-join between two tables returns rows from the first table where one or more matches are found in the second table. The difference between a semi-join and a conventional join is that rows in the first table will be returned at most once. Even if the second table contains two matches for a row in the first table, only one copy of the row will be returned. Semi-joins are written using the EXISTS or IN constructs. An anti-join is the opposite of a semi-join and is written using the NOT EXISTS or NOT IN constructs. There's a pretty good write-up [here] (http://www.dbspecialists.com/files/presentations/semijoins.html) on semi/anti joins. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-1329) Correctly support varbinary arrays
Jesse Yates created PHOENIX-1329: Summary: Correctly support varbinary arrays Key: PHOENIX-1329 URL: https://issues.apache.org/jira/browse/PHOENIX-1329 Project: Phoenix Issue Type: Bug Reporter: Jesse Yates Fix For: 5.0.0, 4.3 Storing arrays of binary data can contain 0x00, which Phoenix uses a the field separator. This leads phoenix to return arrays incorrectly - shortening them prematurely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-1329) Correctly support varbinary arrays
[ https://issues.apache.org/jira/browse/PHOENIX-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated PHOENIX-1329: - Attachment: phoenix-1329-bug.patch Attaching patch to _demonstrate_ the issue. Its going to take an encoding change to actually do this correctly. Correctly support varbinary arrays -- Key: PHOENIX-1329 URL: https://issues.apache.org/jira/browse/PHOENIX-1329 Project: Phoenix Issue Type: Bug Reporter: Jesse Yates Fix For: 5.0.0, 4.3 Attachments: phoenix-1329-bug.patch Storing arrays of binary data can contain 0x00, which Phoenix uses a the field separator. This leads phoenix to return arrays incorrectly - shortening them prematurely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1297) Adding utility methods to get primary key information from the optimized query plan
[ https://issues.apache.org/jira/browse/PHOENIX-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162746#comment-14162746 ] James Taylor commented on PHOENIX-1297: --- Looks good, [~samarthjain]. Want me to wait for any other changes, or should I pull this in? Adding utility methods to get primary key information from the optimized query plan --- Key: PHOENIX-1297 URL: https://issues.apache.org/jira/browse/PHOENIX-1297 Project: Phoenix Issue Type: Task Affects Versions: 5.0.0, 4.2, 3.2 Reporter: Samarth Jain Assignee: Samarth Jain Attachments: PHOENIX-1297.patch, PHOENIX-1297_v2.patch, PHOENIX-1297_v3.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1329) Correctly support varbinary arrays
[ https://issues.apache.org/jira/browse/PHOENIX-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162763#comment-14162763 ] James Taylor commented on PHOENIX-1329: --- Our arrays were really designed to store arrays of other primitive types, not arbitrary arrays of arbitrary bytes (i.e. VARBINARY VARBINARY [] aren't really supported and we should flag it as an error). We should brainstorm about this - maybe for your use case you can just use a VARBINARY and serialize the raw bytes yourself? If you're planning on querying the data, then the story might be different, but otherwise, VARBINARY is the way to go. Correctly support varbinary arrays -- Key: PHOENIX-1329 URL: https://issues.apache.org/jira/browse/PHOENIX-1329 Project: Phoenix Issue Type: Bug Reporter: Jesse Yates Fix For: 5.0.0, 4.3 Attachments: phoenix-1329-bug.patch Storing arrays of binary data can contain 0x00, which Phoenix uses a the field separator. This leads phoenix to return arrays incorrectly - shortening them prematurely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-1302) Query against tenant specific view should use index
[ https://issues.apache.org/jira/browse/PHOENIX-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-1302: -- Assignee: James Taylor Query against tenant specific view should use index --- Key: PHOENIX-1302 URL: https://issues.apache.org/jira/browse/PHOENIX-1302 Project: Phoenix Issue Type: Bug Affects Versions: 5.0.0, 4.2, 3.2 Reporter: Samarth Jain Assignee: James Taylor Test that can be added in QueryOptimizerTest.java {code} @Test public void testAssertQueryAgainstTenantSpecificViewGoesThroughIndex() throws Exception { Connection conn = DriverManager.getConnection(getUrl(), new Properties()); // create table conn.createStatement().execute(create table + XYZ.ABC +(organization_id char(15) not null, \n + entity_id char(15) not null,\n + a_string_array varchar(100) array[] not null,\n + b_string varchar(100),\n + a_integer integer,\n + a_date date,\n + CONSTRAINT pk PRIMARY KEY (organization_id, entity_id, a_string_array)\n + ) + MULTI_TENANT=true); // create index conn.createStatement().execute(CREATE INDEX ABC_IDX ON XYZ.ABC (a_integer) INCLUDE (a_date)); conn.close(); // switch to a tenant specific connection conn = DriverManager.getConnection(getUrl(tenantId)); // create a tenant specific view conn.createStatement().execute(CREATE VIEW ABC_VIEW AS SELECT * FROM XYZ.ABC); // query against the tenant specific view String sql = SELECT a_date FROM ABC_VIEW where a_integer = ?; PreparedStatement stmt = conn.prepareStatement(sql); stmt.setInt(1, 1000); QueryPlan plan = stmt.unwrap(PhoenixPreparedStatement.class).optimizeQuery(); assertEquals(Query should use index, PTableType.INDEX, plan.getTableRef().getTable().getType()); } {code} Error: java.lang.AssertionError: Query should use index expected:INDEX but was:VIEW -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1302) Query against tenant specific view should use index
[ https://issues.apache.org/jira/browse/PHOENIX-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162824#comment-14162824 ] Samarth Jain commented on PHOENIX-1302: --- Tests added in QueryOptimizerTest with @Ignore annotation. Query against tenant specific view should use index --- Key: PHOENIX-1302 URL: https://issues.apache.org/jira/browse/PHOENIX-1302 Project: Phoenix Issue Type: Bug Affects Versions: 5.0.0, 4.2, 3.2 Reporter: Samarth Jain Assignee: James Taylor Test that can be added in QueryOptimizerTest.java {code} @Test public void testAssertQueryAgainstTenantSpecificViewGoesThroughIndex() throws Exception { Connection conn = DriverManager.getConnection(getUrl(), new Properties()); // create table conn.createStatement().execute(create table + XYZ.ABC +(organization_id char(15) not null, \n + entity_id char(15) not null,\n + a_string_array varchar(100) array[] not null,\n + b_string varchar(100),\n + a_integer integer,\n + a_date date,\n + CONSTRAINT pk PRIMARY KEY (organization_id, entity_id, a_string_array)\n + ) + MULTI_TENANT=true); // create index conn.createStatement().execute(CREATE INDEX ABC_IDX ON XYZ.ABC (a_integer) INCLUDE (a_date)); conn.close(); // switch to a tenant specific connection conn = DriverManager.getConnection(getUrl(tenantId)); // create a tenant specific view conn.createStatement().execute(CREATE VIEW ABC_VIEW AS SELECT * FROM XYZ.ABC); // query against the tenant specific view String sql = SELECT a_date FROM ABC_VIEW where a_integer = ?; PreparedStatement stmt = conn.prepareStatement(sql); stmt.setInt(1, 1000); QueryPlan plan = stmt.unwrap(PhoenixPreparedStatement.class).optimizeQuery(); assertEquals(Query should use index, PTableType.INDEX, plan.getTableRef().getTable().getType()); } {code} Error: java.lang.AssertionError: Query should use index expected:INDEX but was:VIEW -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-1297) Adding utility methods to get primary key information from the optimized query plan
[ https://issues.apache.org/jira/browse/PHOENIX-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-1297: -- Attachment: PHOENIX-1297_v4.patch Updated patch that takes into consideration the tenant id of the connection while determining the offset. Changed encode and decode pk methods to take into consideration the view index id too. Adding utility methods to get primary key information from the optimized query plan --- Key: PHOENIX-1297 URL: https://issues.apache.org/jira/browse/PHOENIX-1297 Project: Phoenix Issue Type: Task Affects Versions: 5.0.0, 4.2, 3.2 Reporter: Samarth Jain Assignee: Samarth Jain Attachments: PHOENIX-1297.patch, PHOENIX-1297_v2.patch, PHOENIX-1297_v3.patch, PHOENIX-1297_v4.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-1330) Flag VARBINARY VARBINARY ARRAY declaration in DDL as an error
James Taylor created PHOENIX-1330: - Summary: Flag VARBINARY VARBINARY ARRAY declaration in DDL as an error Key: PHOENIX-1330 URL: https://issues.apache.org/jira/browse/PHOENIX-1330 Project: Phoenix Issue Type: Bug Reporter: James Taylor As [~jesse_yates] pointed out in PHOENIX-1329, our variable length array encoding does not handle arrays of arbitrary variable length data. We should flag attempts to declare this at DDL time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-1331) DropIndexDuringUpsertIT.testWriteFailureDropIndex fails on Mac
James Taylor created PHOENIX-1331: - Summary: DropIndexDuringUpsertIT.testWriteFailureDropIndex fails on Mac Key: PHOENIX-1331 URL: https://issues.apache.org/jira/browse/PHOENIX-1331 Project: Phoenix Issue Type: Bug Reporter: James Taylor The DropIndexDuringUpsertIT.testWriteFailureDropIndex() test consistently fails by timing out on my Mac laptop and Mac desktop with the following exception: {code} testWriteFailureDropIndex(org.apache.phoenix.end2end.index.DropIndexDuringUpsertIT) Time elapsed: 341.902 sec ERROR! java.lang.Exception: test timed out after 30 milliseconds at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.phoenix.end2end.index.DropIndexDuringUpsertIT.testWriteFailureDropIndex(DropIndexDuringUpsertIT.java:150) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-945) Support correlated subqueries in comparison without ANY/SOME/ALL
[ https://issues.apache.org/jira/browse/PHOENIX-945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue updated PHOENIX-945: Attachment: 945.patch Support correlated subqueries in comparison without ANY/SOME/ALL Key: PHOENIX-945 URL: https://issues.apache.org/jira/browse/PHOENIX-945 Project: Phoenix Issue Type: Sub-task Affects Versions: 3.0.0, 4.0.0, 5.0.0 Reporter: Maryann Xue Assignee: Maryann Xue Fix For: 3.0.0, 4.0.0, 5.0.0 Attachments: 945.patch Original Estimate: 336h Remaining Estimate: 336h Example: SELECT employee_number, name FROM employees AS Bob WHERE salary ( SELECT AVG(salary) FROM employees WHERE department = Bob.department); Basically we can optimize these queries into join queries, like: SELECT employees.employee_number, employees.name FROM employees INNER JOIN (SELECT department, AVG(salary) AS department_average FROM employees GROUP BY department) AS temp ON employees.department = temp.department WHERE employees.salary temp.department_average; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PHOENIX-1332) Support correlated subqueries in comparison with ANY/SOME/ALL
[ https://issues.apache.org/jira/browse/PHOENIX-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue reassigned PHOENIX-1332: Assignee: Maryann Xue Support correlated subqueries in comparison with ANY/SOME/ALL - Key: PHOENIX-1332 URL: https://issues.apache.org/jira/browse/PHOENIX-1332 Project: Phoenix Issue Type: Sub-task Affects Versions: 3.0.0, 4.0.0, 5.0.0 Reporter: Maryann Xue Assignee: Maryann Xue Fix For: 3.0.0, 4.0.0, 5.0.0 Support grammar like: select * from OrderTable o where quantity = ALL(select quantity from OrderTable where item_id = o.item_id) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-1332) Support correlated subqueries in comparison with ANY/SOME/ALL
Maryann Xue created PHOENIX-1332: Summary: Support correlated subqueries in comparison with ANY/SOME/ALL Key: PHOENIX-1332 URL: https://issues.apache.org/jira/browse/PHOENIX-1332 Project: Phoenix Issue Type: Sub-task Affects Versions: 3.0.0, 4.0.0, 5.0.0 Reporter: Maryann Xue Support grammar like: select * from OrderTable o where quantity = ALL(select quantity from OrderTable where item_id = o.item_id) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PHOENIX-1179) Support many-to-many joins
[ https://issues.apache.org/jira/browse/PHOENIX-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue reassigned PHOENIX-1179: Assignee: Maryann Xue Support many-to-many joins -- Key: PHOENIX-1179 URL: https://issues.apache.org/jira/browse/PHOENIX-1179 Project: Phoenix Issue Type: Sub-task Reporter: James Taylor Assignee: Maryann Xue Fix For: 3.0.0, 4.0.0, 5.0.0 Enhance our join capabilities to support many-to-many joins where the size of both sides of the join are too big to fit into memory (and thus cannot use our hash join mechanism). One technique would be to order both sides of the join by their join key and merge sort the results on the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-945) Support correlated subqueries in comparison without ANY/SOME/ALL
[ https://issues.apache.org/jira/browse/PHOENIX-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162897#comment-14162897 ] James Taylor commented on PHOENIX-945: -- Wow, +1. You implemented correlated subquery support by adding about 50 lines of code?! That's pretty awesome! What kind of limitations are there outside of only allowing the correlation in a comparison expression? Minor nit: might be worth having a copy constructor to help with readability: {code} +subquery = NODE_FACTORY.select(subquery.getFrom(), subquery.getHint(), subquery.isDistinct(), +selectNodes, where, groupbyNodes, subquery.getHaving(), subquery.getOrderBy(), +subquery.getLimit(), subquery.getBindCount(), true, subquery.hasSequence()); + {code} Support correlated subqueries in comparison without ANY/SOME/ALL Key: PHOENIX-945 URL: https://issues.apache.org/jira/browse/PHOENIX-945 Project: Phoenix Issue Type: Sub-task Affects Versions: 3.0.0, 4.0.0, 5.0.0 Reporter: Maryann Xue Assignee: Maryann Xue Fix For: 3.0.0, 4.0.0, 5.0.0 Attachments: 945.patch Original Estimate: 336h Remaining Estimate: 336h Example: SELECT employee_number, name FROM employees AS Bob WHERE salary ( SELECT AVG(salary) FROM employees WHERE department = Bob.department); Basically we can optimize these queries into join queries, like: SELECT employees.employee_number, employees.name FROM employees INNER JOIN (SELECT department, AVG(salary) AS department_average FROM employees GROUP BY department) AS temp ON employees.department = temp.department WHERE employees.salary temp.department_average; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-945) Support correlated subqueries in comparison without ANY/SOME/ALL
[ https://issues.apache.org/jira/browse/PHOENIX-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14162940#comment-14162940 ] Maryann Xue commented on PHOENIX-945: - The limitations are: 1. The correlation condition in the inner query has to be what we currently allow for ON conditions in joins. 2. The inner query must be an non-group-by aggregate query, such as select max(c1) from table1 where correlation_condition. At least half of the cases in limitation 2 will benefit from PHOENIX-1299, which will convert ANY/SOME/ALL queries into exactly the queries covered by this fix. Besides, I just opened PHOENIX-1332, which will eliminate the second limitation completely. But this would require a bit of work. With those cases for which the optimization of PHOENIX-1299 is not enough, it will depend on the completion of PHOENIX-944, plus a special aggregate function which will allow returning values in the same group as an array. Support correlated subqueries in comparison without ANY/SOME/ALL Key: PHOENIX-945 URL: https://issues.apache.org/jira/browse/PHOENIX-945 Project: Phoenix Issue Type: Sub-task Affects Versions: 3.0.0, 4.0.0, 5.0.0 Reporter: Maryann Xue Assignee: Maryann Xue Fix For: 3.0.0, 4.0.0, 5.0.0 Attachments: 945.patch Original Estimate: 336h Remaining Estimate: 336h Example: SELECT employee_number, name FROM employees AS Bob WHERE salary ( SELECT AVG(salary) FROM employees WHERE department = Bob.department); Basically we can optimize these queries into join queries, like: SELECT employees.employee_number, employees.name FROM employees INNER JOIN (SELECT department, AVG(salary) AS department_average FROM employees GROUP BY department) AS temp ON employees.department = temp.department WHERE employees.salary temp.department_average; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PHOENIX-945) Support correlated subqueries in comparison without ANY/SOME/ALL
[ https://issues.apache.org/jira/browse/PHOENIX-945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue updated PHOENIX-945: Attachment: 945-2.patch Minor change: added copy constructors in ParseNodeFactory Support correlated subqueries in comparison without ANY/SOME/ALL Key: PHOENIX-945 URL: https://issues.apache.org/jira/browse/PHOENIX-945 Project: Phoenix Issue Type: Sub-task Affects Versions: 3.0.0, 4.0.0, 5.0.0 Reporter: Maryann Xue Assignee: Maryann Xue Fix For: 3.0.0, 4.0.0, 5.0.0 Attachments: 945-2.patch, 945.patch Original Estimate: 336h Remaining Estimate: 336h Example: SELECT employee_number, name FROM employees AS Bob WHERE salary ( SELECT AVG(salary) FROM employees WHERE department = Bob.department); Basically we can optimize these queries into join queries, like: SELECT employees.employee_number, employees.name FROM employees INNER JOIN (SELECT department, AVG(salary) AS department_average FROM employees GROUP BY department) AS temp ON employees.department = temp.department WHERE employees.salary temp.department_average; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jay wong reassigned PHOENIX-1267: - Assignee: jay wong Set scan.setSmall(true) when appropriate Key: PHOENIX-1267 URL: https://issues.apache.org/jira/browse/PHOENIX-1267 Project: Phoenix Issue Type: Bug Reporter: James Taylor Assignee: jay wong Attachments: smallscan.patch There's a nice optimization that has been in HBase for a while now to set a scan as small. This prevents extra RPC calls, I believe. We should add a hint for queries that forces it to be set/not set, and make our best guess on when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-1267) Set scan.setSmall(true) when appropriate
[ https://issues.apache.org/jira/browse/PHOENIX-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163003#comment-14163003 ] jay wong commented on PHOENIX-1267: --- I have a holiday in the past several days. so sorry for reply later. I know your mean. normally the hint is more structured and a better way. I think use hint control the small is a good point. the small scan will be set true default when both the startkey and stopkey is setted. if we have a order by query. and the small is true. the result will be Infinite loop. So I think the small scan is not only a query optimize for user. I will cause a bug. So I think the smallScanForbidden is needed also. Set scan.setSmall(true) when appropriate Key: PHOENIX-1267 URL: https://issues.apache.org/jira/browse/PHOENIX-1267 Project: Phoenix Issue Type: Bug Reporter: James Taylor Assignee: jay wong Attachments: smallscan.patch There's a nice optimization that has been in HBase for a while now to set a scan as small. This prevents extra RPC calls, I believe. We should add a hint for queries that forces it to be set/not set, and make our best guess on when it should default to true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-945) Support correlated subqueries in comparison without ANY/SOME/ALL
[ https://issues.apache.org/jira/browse/PHOENIX-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163014#comment-14163014 ] Hudson commented on PHOENIX-945: SUCCESS: Integrated in Phoenix | Master #410 (See [https://builds.apache.org/job/Phoenix-master/410/]) PHOENIX-945 Support correlated subqueries in comparison without ANY/SOME/ALL (maryannxue: rev 5282a8a09fec1ea7a6241565ff034246e3b30b92) * phoenix-core/src/main/java/org/apache/phoenix/compile/SubqueryRewriter.java * phoenix-core/src/main/java/org/apache/phoenix/compile/StatementNormalizer.java * phoenix-core/src/main/java/org/apache/phoenix/compile/ExpressionCompiler.java * phoenix-core/src/it/java/org/apache/phoenix/end2end/SubqueryIT.java * phoenix-core/src/main/java/org/apache/phoenix/parse/ParseNodeFactory.java Support correlated subqueries in comparison without ANY/SOME/ALL Key: PHOENIX-945 URL: https://issues.apache.org/jira/browse/PHOENIX-945 Project: Phoenix Issue Type: Sub-task Affects Versions: 3.0.0, 4.0.0, 5.0.0 Reporter: Maryann Xue Assignee: Maryann Xue Fix For: 3.0.0, 4.0.0, 5.0.0 Attachments: 945-2.patch, 945.patch Original Estimate: 336h Remaining Estimate: 336h Example: SELECT employee_number, name FROM employees AS Bob WHERE salary ( SELECT AVG(salary) FROM employees WHERE department = Bob.department); Basically we can optimize these queries into join queries, like: SELECT employees.employee_number, employees.name FROM employees INNER JOIN (SELECT department, AVG(salary) AS department_average FROM employees GROUP BY department) AS temp ON employees.department = temp.department WHERE employees.salary temp.department_average; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-1333) Store statistics guideposts as VARBINARY
James Taylor created PHOENIX-1333: - Summary: Store statistics guideposts as VARBINARY Key: PHOENIX-1333 URL: https://issues.apache.org/jira/browse/PHOENIX-1333 Project: Phoenix Issue Type: Bug Reporter: James Taylor Assignee: ramkrishna.s.vasudevan Priority: Critical There's a potential problem with storing the guideposts as a VARBINARY ARRAY, as pointed out by PHOENIX-1329. We'd run into this issue if we're collecting stats for a table with a trailing VARBINARY row key column if the value contained embedded null bytes. Because of this, we're better off storing guideposts as VARBINARY and serializing/deserializing in the following manner: byte length as vintbytesbyte length as vintbytes... We should also store as a separate KeyValue column the total number of guideposts. So the schema of SYSTEM.STATS would look like this now instead: {code} public static final String CREATE_STATS_TABLE_METADATA = CREATE TABLE + SYSTEM_CATALOG_SCHEMA + .\ + SYSTEM_STATS_TABLE + \(\n + // PK columns PHYSICAL_NAME + VARCHAR NOT NULL, + COLUMN_FAMILY + VARCHAR, + REGION_NAME + VARCHAR, + GUIDE_POSTS + VARBINARY, + GUIDE_POSTS_COUNT + SMALLINT, + MIN_KEY + VARBINARY, + MAX_KEY + VARBINARY, + LAST_STATS_UPDATE_TIME+ DATE, + CONSTRAINT + SYSTEM_TABLE_PK_NAME + PRIMARY KEY ( + PHYSICAL_NAME + , + COLUMN_FAMILY + ,+ REGION_NAME+))\n + // TODO: should we support versioned stats? // Install split policy to prevent a physical table's stats from being split across regions. HTableDescriptor.SPLIT_POLICY + =' + MetaDataSplitPolicy.class.getName() + '\n; {code} Then the serialization code in StatisticsTable.addStats() would need to change to populate the GUIDE_POSTS_COUNT and serialize the GUIDE_POSTS in the new format. The deserialization code is isolated to StatisticsUtil.readStatisitics(). It would need to read the GUIDE_POSTS_COUNT first for estimated sizing, and then deserialize the GUIDE_POSTS in the new format. -- This message was sent by Atlassian JIRA (v6.3.4#6332)