[jira] [Commented] (PHOENIX-7258) Query Optimizer should pick Index hint even for point lookup queries

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824642#comment-17824642
 ] 

ASF GitHub Bot commented on PHOENIX-7258:
-

sanjeet006py commented on code in PR #1851:
URL: https://github.com/apache/phoenix/pull/1851#discussion_r1517343363


##
phoenix-core-client/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java:
##
@@ -247,6 +248,10 @@ private List 
getApplicablePlansForSingleFlatQuery(QueryPlan dataPlan,
 }
 plans.add(0, hintedPlan);
 }
+if (dataPlan.getContext().getScanRanges().isPointLookup()
+&& stopAtBestPlan && dataPlan.isApplicable()) {
+return Collections. singletonList(dataPlan);
+}

Review Comment:
   Please correct me if I misunderstood something.





> Query Optimizer should pick Index hint even for point lookup queries
> 
>
> Key: PHOENIX-7258
> URL: https://issues.apache.org/jira/browse/PHOENIX-7258
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Sanjeet Malhotra
>Priority: Major
>
> For better performance, user can create covered indexes such that the indexed 
> columns are same as composite primary key of the data table, but with 
> different order. For instance, create data table with columns PK1, PK2, PK3, 
> C1, C2 columns with primary key being PK1, Pk2, PK3. In order to get better 
> performance from HBase block caching, if the data with same PK3 values are 
> going to reside as close to each other as possible, we can create an index on 
> PK3, PK2, PK1 and also include columns C1 and C2.
> For point lookups on the data table, it might still be helpful to query index 
> table depending on the usecase. We should allow using index hint to query the 
> index table for point lookup.
> When the query optimizer identifies that the query is point lookup for the 
> data table and if the "stop at best plan" is true, then it immediately 
> returns without checking the hint. We should check for hint and if applicable 
> index based hint plan is identified, use it.
>  
> Assuming getHintedPlanIfApplicable() retrieves hinted plan, it can look 
> something like:
> {code:java}
> if (dataPlan.getContext().getScanRanges().isPointLookup() && stopAtBestPlan
> && dataPlan.isApplicable()) {
> if (indexes.isEmpty() || select.getHint().getHint(Hint.INDEX) == null) {
> return Collections.singletonList(dataPlan);
> }
> QueryPlan hintedPlan = getHintedPlanIfApplicable(dataPlan, statement, 
> targetColumns,
> parallelIteratorFactory, select, indexes);
> if (hintedPlan != null) {
> PTable index = hintedPlan.getTableRef().getTable();
> if (hintedPlan.isApplicable() && (index.getIndexWhere() == null
> || isPartialIndexUsable(select, dataPlan, index))) {
> return Collections.singletonList(hintedPlan);
> }
> }
> return Collections.singletonList(dataPlan);
> } {code}
> We still need to be optimal i.e. if the hinted index plan is not applicable 
> or useful, we still need to immediately return the data plan of point lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PHOENIX-7258) Query Optimizer should pick Index hint even for point lookup queries

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824641#comment-17824641
 ] 

ASF GitHub Bot commented on PHOENIX-7258:
-

sanjeet006py commented on code in PR #1851:
URL: https://github.com/apache/phoenix/pull/1851#discussion_r1517343103


##
phoenix-core-client/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java:
##
@@ -247,6 +248,10 @@ private List 
getApplicablePlansForSingleFlatQuery(QueryPlan dataPlan,
 }
 plans.add(0, hintedPlan);
 }
+if (dataPlan.getContext().getScanRanges().isPointLookup()
+&& stopAtBestPlan && dataPlan.isApplicable()) {
+return Collections. singletonList(dataPlan);
+}

Review Comment:
   Yes, this fix honors index hint even if original data plan is point lookup. 
But if hinted plan is null or not applicable then we need to return original 
data plan as we are already maximum optimized.





> Query Optimizer should pick Index hint even for point lookup queries
> 
>
> Key: PHOENIX-7258
> URL: https://issues.apache.org/jira/browse/PHOENIX-7258
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Sanjeet Malhotra
>Priority: Major
>
> For better performance, user can create covered indexes such that the indexed 
> columns are same as composite primary key of the data table, but with 
> different order. For instance, create data table with columns PK1, PK2, PK3, 
> C1, C2 columns with primary key being PK1, Pk2, PK3. In order to get better 
> performance from HBase block caching, if the data with same PK3 values are 
> going to reside as close to each other as possible, we can create an index on 
> PK3, PK2, PK1 and also include columns C1 and C2.
> For point lookups on the data table, it might still be helpful to query index 
> table depending on the usecase. We should allow using index hint to query the 
> index table for point lookup.
> When the query optimizer identifies that the query is point lookup for the 
> data table and if the "stop at best plan" is true, then it immediately 
> returns without checking the hint. We should check for hint and if applicable 
> index based hint plan is identified, use it.
>  
> Assuming getHintedPlanIfApplicable() retrieves hinted plan, it can look 
> something like:
> {code:java}
> if (dataPlan.getContext().getScanRanges().isPointLookup() && stopAtBestPlan
> && dataPlan.isApplicable()) {
> if (indexes.isEmpty() || select.getHint().getHint(Hint.INDEX) == null) {
> return Collections.singletonList(dataPlan);
> }
> QueryPlan hintedPlan = getHintedPlanIfApplicable(dataPlan, statement, 
> targetColumns,
> parallelIteratorFactory, select, indexes);
> if (hintedPlan != null) {
> PTable index = hintedPlan.getTableRef().getTable();
> if (hintedPlan.isApplicable() && (index.getIndexWhere() == null
> || isPartialIndexUsable(select, dataPlan, index))) {
> return Collections.singletonList(hintedPlan);
> }
> }
> return Collections.singletonList(dataPlan);
> } {code}
> We still need to be optimal i.e. if the hinted index plan is not applicable 
> or useful, we still need to immediately return the data plan of point lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7258: Query Optimizer should pick Index hint even for point lookup queries [phoenix]

2024-03-07 Thread via GitHub


sanjeet006py commented on code in PR #1851:
URL: https://github.com/apache/phoenix/pull/1851#discussion_r1517343363


##
phoenix-core-client/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java:
##
@@ -247,6 +248,10 @@ private List 
getApplicablePlansForSingleFlatQuery(QueryPlan dataPlan,
 }
 plans.add(0, hintedPlan);
 }
+if (dataPlan.getContext().getScanRanges().isPointLookup()
+&& stopAtBestPlan && dataPlan.isApplicable()) {
+return Collections. singletonList(dataPlan);
+}

Review Comment:
   Please correct me if I misunderstood something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] PHOENIX-7258: Query Optimizer should pick Index hint even for point lookup queries [phoenix]

2024-03-07 Thread via GitHub


sanjeet006py commented on code in PR #1851:
URL: https://github.com/apache/phoenix/pull/1851#discussion_r1517343103


##
phoenix-core-client/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java:
##
@@ -247,6 +248,10 @@ private List 
getApplicablePlansForSingleFlatQuery(QueryPlan dataPlan,
 }
 plans.add(0, hintedPlan);
 }
+if (dataPlan.getContext().getScanRanges().isPointLookup()
+&& stopAtBestPlan && dataPlan.isApplicable()) {
+return Collections. singletonList(dataPlan);
+}

Review Comment:
   Yes, this fix honors index hint even if original data plan is point lookup. 
But if hinted plan is null or not applicable then we need to return original 
data plan as we are already maximum optimized.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7265) Add 5.2 versions to BackwardsCompatibilityIT once released

2024-03-07 Thread Istvan Toth (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824633#comment-17824633
 ] 

Istvan Toth commented on PHOENIX-7265:
--

We should also add this step to release steps on the website.

> Add 5.2 versions to BackwardsCompatibilityIT once released
> --
>
> Key: PHOENIX-7265
> URL: https://issues.apache.org/jira/browse/PHOENIX-7265
> Project: Phoenix
>  Issue Type: Task
>  Components: core
>Affects Versions: 5.2.0, 5.3.0
>Reporter: Istvan Toth
>Priority: Critical
>
> This is a reminder to add the new 5.2 versions once they are released.
> We cannot add them until the 5.2.0 artifacts are avaialble publicly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PHOENIX-7255) Non-existent artifacts referred in compatible_client_versions.json

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824632#comment-17824632
 ] 

ASF GitHub Bot commented on PHOENIX-7255:
-

stoty opened a new pull request, #1852:
URL: https://github.com/apache/phoenix/pull/1852

   …sions.json
   
   also remove old HBase 1.x entries




> Non-existent artifacts referred in compatible_client_versions.json
> --
>
> Key: PHOENIX-7255
> URL: https://issues.apache.org/jira/browse/PHOENIX-7255
> Project: Phoenix
>  Issue Type: Bug
>  Components: core
>Affects Versions: 5.2.0, 5.3.0
>Reporter: Istvan Toth
>Priority: Major
>
> The compatible_client_versions.json refers to Hbase 2.3 support for Phoenix 
> 5.2, which has been removed some time ago, but the file hasn't been updated.
> We need to keep in mind that we need to update this file as new hbase 
> profiles are added or old ones are dropped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] PHOENIX-7255 Non-existent artifacts referred in compatible_client_ver… [phoenix]

2024-03-07 Thread via GitHub


stoty opened a new pull request, #1852:
URL: https://github.com/apache/phoenix/pull/1852

   …sions.json
   
   also remove old HBase 1.x entries


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7263) Row value constructor split keys not allowed on indexes

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824629#comment-17824629
 ] 

ASF GitHub Bot commented on PHOENIX-7263:
-

stoty commented on PR #1850:
URL: https://github.com/apache/phoenix/pull/1850#issuecomment-1985099655

   The checkstyle errors also look relevant.




> Row value constructor split keys not allowed on indexes
> ---
>
> Key: PHOENIX-7263
> URL: https://issues.apache.org/jira/browse/PHOENIX-7263
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Rajeshbabu Chintaguntla
>Priority: Major
> Fix For: 5.2.0, 5.3.0, 5.1.4
>
>
> While creating indexes if we pass row value constructor split keys getting 
> following error  same is passing with create table because while creating the 
> table properly building the split keys using expression compiler which is not 
> the case with index creation.
> {noformat}
> java.lang.ClassCastException: 
> org.apache.phoenix.expression.RowValueConstructorExpression cannot be cast to 
> org.apache.phoenix.expression.LiteralExpression
>   at 
> org.apache.phoenix.compile.CreateIndexCompiler.compile(CreateIndexCompiler.java:77)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableCreateIndexStatement.compilePlan(PhoenixStatement.java:1205)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableCreateIndexStatement.compilePlan(PhoenixStatement.java:1191)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:435)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:425)
>   at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:424)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:412)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:2009)
>   at sqlline.Commands.executeSingleQuery(Commands.java:1054)
>   at sqlline.Commands.execute(Commands.java:1003)
>   at sqlline.Commands.sql(Commands.java:967)
>   at sqlline.SqlLine.dispatch(SqlLine.java:734)
>   at sqlline.SqlLine.begin(SqlLine.java:541)
>   at sqlline.SqlLine.start(SqlLine.java:267)
>   at sqlline.SqlLine.main(SqlLine.java:206)
> {noformat}
> In create table:
> {code:java}
> final byte[][] splits = new byte[splitNodes.size()][];
> ImmutableBytesWritable ptr = context.getTempPtr();
> ExpressionCompiler expressionCompiler = new 
> ExpressionCompiler(context);
> for (int i = 0; i < splits.length; i++) {
> ParseNode node = splitNodes.get(i);
> if (node instanceof BindParseNode) {
> context.getBindManager().addParamMetaData((BindParseNode) 
> node, VARBINARY_DATUM);
> }
> if (node.isStateless()) {
> Expression expression = node.accept(expressionCompiler);
> if (expression.evaluate(null, ptr)) {;
> splits[i] = ByteUtil.copyKeyBytesIfNecessary(ptr);
> continue;
> }
> }
> throw new 
> SQLExceptionInfo.Builder(SQLExceptionCode.SPLIT_POINT_NOT_CONSTANT)
> .setMessage("Node: " + node).build().buildException();
> }
> {code}
> Where as in indexing expecting only literals.
> {code:java}
> final byte[][] splits = new byte[splitNodes.size()][];
> for (int i = 0; i < splits.length; i++) {
> ParseNode node = splitNodes.get(i);
> if (!node.isStateless()) {
> throw new 
> SQLExceptionInfo.Builder(SQLExceptionCode.SPLIT_POINT_NOT_CONSTANT)
> .setMessage("Node: " + node).build().buildException();
> }
> LiteralExpression expression = 
> (LiteralExpression)node.accept(expressionCompiler);
> splits[i] = expression.getBytes();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7263 Row value constructor split keys not allowed on indexes [phoenix]

2024-03-07 Thread via GitHub


stoty commented on PR #1850:
URL: https://github.com/apache/phoenix/pull/1850#issuecomment-1985099655

   The checkstyle errors also look relevant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7263) Row value constructor split keys not allowed on indexes

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824628#comment-17824628
 ] 

ASF GitHub Bot commented on PHOENIX-7263:
-

stoty commented on code in PR #1850:
URL: https://github.com/apache/phoenix/pull/1850#discussion_r1517205871


##
phoenix-core-client/src/main/java/org/apache/phoenix/compile/SplitKeyUtil.java:
##
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.compile;
+
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.phoenix.exception.SQLExceptionCode;
+import org.apache.phoenix.exception.SQLExceptionInfo;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.parse.BindParseNode;
+import org.apache.phoenix.parse.ParseNode;
+import org.apache.phoenix.schema.PDatum;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PVarbinary;
+import org.apache.phoenix.util.ByteUtil;
+
+import java.sql.SQLException;
+import java.util.List;
+
+public class SplitKeyUtil {

Review Comment:
   Do we need a new Util class ?
   Do we have an existing Util class that would make sense for this ?



##
phoenix-core/src/it/java/org/apache/phoenix/end2end/index/BaseIndexIT.java:
##
@@ -134,6 +134,51 @@ protected BaseIndexIT(boolean localIndex, boolean 
uncovered, boolean mutable, St
 this.tableDDLOptions = optionBuilder.toString();
 }
 
+@Test
+public void testCreateIndexWithRowValueConstructorSplitKeys() throws 
Exception {
+Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
+String tableName = "TBL_" + generateUniqueName();
+String indexName = "IND_" + generateUniqueName();
+String fullTableName = 
SchemaUtil.getTableName(TestUtil.DEFAULT_SCHEMA_NAME, tableName);
+String fullIndexName = 
SchemaUtil.getTableName(TestUtil.DEFAULT_SCHEMA_NAME, indexName);
+
+try (Connection conn = DriverManager.getConnection(getUrl(), props)) {
+conn.setAutoCommit(false);
+String splitKeys = "('foo', 'a', 1, 100, 1.0, 
TO_DATE('2024-01-01')), " +
+"('john', 'b', 2, 101, 2.0, TO_DATE('2024-01-01')), " +
+"('tom', 'c', 3, 102, 3.0, TO_DATE('2024-01-01'))";
+String ddl ="CREATE TABLE " + fullTableName + 
TestUtil.TEST_TABLE_SCHEMA + tableDDLOptions + " SPLIT ON (" +
+splitKeys + ")";
+Statement stmt = conn.createStatement();
+stmt.execute(ddl);
+BaseTest.populateTestTable(fullTableName);
+String indexSplitKeys = "('a',111, 'foo', 'a', 1, 100, 1.0, 
TO_DATE('2024-01-01')), " +
+"('e','222','john', 'b', 2, 101, 2.0, 
TO_DATE('2024-01-01')), " +
+"('i', '333', 'tom', 'c', 3, 102, 3.0, 
TO_DATE('2024-01-01'))";
+ddl = "CREATE " + (localIndex ? "LOCAL" : "") + (uncovered ? 
"UNCOVERED" : "")
++ " INDEX " + indexName + " ON " + fullTableName
++ " (char_col1 ASC, int_col1 ASC)"
++ (uncovered ? "" : " INCLUDE (long_col1, long_col2)") + 
(localIndex ? "" :
+" SPLIT ON (" + indexSplitKeys + ")");
+stmt.execute(ddl);
+
+String query = "SELECT d.char_col1, int_col1 from " + 
fullTableName + " as d";
+ResultSet rs = conn.createStatement().executeQuery(query);
+assertTrue(rs.next());
+assertEquals("chara", rs.getString(1));
+assertEquals("chara", rs.getString("char_col1"));
+assertEquals(2, rs.getInt(2));
+assertTrue(rs.next());
+assertEquals("chara", rs.getString(1));
+assertEquals(3, rs.getInt(2));
+assertTrue(rs.next());
+assertEquals("chara", rs.getString(1));
+assertEquals(4, rs.getInt(2));
+assertFalse(rs.next());
+conn.createStatement().execute("DROP INDEX " + indexName + " ON " 
+ fullTableName);

Review Comment:
   Nit: We already have a 

Re: [PR] PHOENIX-7263 Row value constructor split keys not allowed on indexes [phoenix]

2024-03-07 Thread via GitHub


stoty commented on code in PR #1850:
URL: https://github.com/apache/phoenix/pull/1850#discussion_r1517205871


##
phoenix-core-client/src/main/java/org/apache/phoenix/compile/SplitKeyUtil.java:
##
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.compile;
+
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.phoenix.exception.SQLExceptionCode;
+import org.apache.phoenix.exception.SQLExceptionInfo;
+import org.apache.phoenix.expression.Expression;
+import org.apache.phoenix.parse.BindParseNode;
+import org.apache.phoenix.parse.ParseNode;
+import org.apache.phoenix.schema.PDatum;
+import org.apache.phoenix.schema.SortOrder;
+import org.apache.phoenix.schema.types.PDataType;
+import org.apache.phoenix.schema.types.PVarbinary;
+import org.apache.phoenix.util.ByteUtil;
+
+import java.sql.SQLException;
+import java.util.List;
+
+public class SplitKeyUtil {

Review Comment:
   Do we need a new Util class ?
   Do we have an existing Util class that would make sense for this ?



##
phoenix-core/src/it/java/org/apache/phoenix/end2end/index/BaseIndexIT.java:
##
@@ -134,6 +134,51 @@ protected BaseIndexIT(boolean localIndex, boolean 
uncovered, boolean mutable, St
 this.tableDDLOptions = optionBuilder.toString();
 }
 
+@Test
+public void testCreateIndexWithRowValueConstructorSplitKeys() throws 
Exception {
+Properties props = PropertiesUtil.deepCopy(TEST_PROPERTIES);
+String tableName = "TBL_" + generateUniqueName();
+String indexName = "IND_" + generateUniqueName();
+String fullTableName = 
SchemaUtil.getTableName(TestUtil.DEFAULT_SCHEMA_NAME, tableName);
+String fullIndexName = 
SchemaUtil.getTableName(TestUtil.DEFAULT_SCHEMA_NAME, indexName);
+
+try (Connection conn = DriverManager.getConnection(getUrl(), props)) {
+conn.setAutoCommit(false);
+String splitKeys = "('foo', 'a', 1, 100, 1.0, 
TO_DATE('2024-01-01')), " +
+"('john', 'b', 2, 101, 2.0, TO_DATE('2024-01-01')), " +
+"('tom', 'c', 3, 102, 3.0, TO_DATE('2024-01-01'))";
+String ddl ="CREATE TABLE " + fullTableName + 
TestUtil.TEST_TABLE_SCHEMA + tableDDLOptions + " SPLIT ON (" +
+splitKeys + ")";
+Statement stmt = conn.createStatement();
+stmt.execute(ddl);
+BaseTest.populateTestTable(fullTableName);
+String indexSplitKeys = "('a',111, 'foo', 'a', 1, 100, 1.0, 
TO_DATE('2024-01-01')), " +
+"('e','222','john', 'b', 2, 101, 2.0, 
TO_DATE('2024-01-01')), " +
+"('i', '333', 'tom', 'c', 3, 102, 3.0, 
TO_DATE('2024-01-01'))";
+ddl = "CREATE " + (localIndex ? "LOCAL" : "") + (uncovered ? 
"UNCOVERED" : "")
++ " INDEX " + indexName + " ON " + fullTableName
++ " (char_col1 ASC, int_col1 ASC)"
++ (uncovered ? "" : " INCLUDE (long_col1, long_col2)") + 
(localIndex ? "" :
+" SPLIT ON (" + indexSplitKeys + ")");
+stmt.execute(ddl);
+
+String query = "SELECT d.char_col1, int_col1 from " + 
fullTableName + " as d";
+ResultSet rs = conn.createStatement().executeQuery(query);
+assertTrue(rs.next());
+assertEquals("chara", rs.getString(1));
+assertEquals("chara", rs.getString("char_col1"));
+assertEquals(2, rs.getInt(2));
+assertTrue(rs.next());
+assertEquals("chara", rs.getString(1));
+assertEquals(3, rs.getInt(2));
+assertTrue(rs.next());
+assertEquals("chara", rs.getString(1));
+assertEquals(4, rs.getInt(2));
+assertFalse(rs.next());
+conn.createStatement().execute("DROP INDEX " + indexName + " ON " 
+ fullTableName);

Review Comment:
   Nit: We already have a stmt object. We could also create the stmt in the 
try-with-reources block.



##
phoenix-core/src/it/java/org/apache/phoenix/end2end/index/BaseIndexIT.java:
##
@@ -134,6 +134,51 @@ protected BaseIndexIT(boolean localIndex, boolean 

[jira] [Commented] (PHOENIX-7251) Refactor server-side code to support multiple ServerMetadataCache for HA tests

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824623#comment-17824623
 ] 

ASF GitHub Bot commented on PHOENIX-7251:
-

shahrs87 merged PR #1845:
URL: https://github.com/apache/phoenix/pull/1845




> Refactor server-side code to support multiple ServerMetadataCache for HA tests
> --
>
> Key: PHOENIX-7251
> URL: https://issues.apache.org/jira/browse/PHOENIX-7251
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Palash Chauhan
>Assignee: Palash Chauhan
>Priority: Major
>
> In the metadata caching re-design, `ServerMetadataCache` is required to be a 
> singleton in the implementation. This affects tests for the HA use case 
> because the coprocessors on the 2 clusters end up using the same 
> `ServerMetadataCache`. All tests which execute queries with 1 of the clusters 
> unavailable will fail. 
> We can refactor the implementation in the following way to support HA test 
> cases:
> 1. Create a `ServerMetadataCache` interface and use the current 
> implementation as `ServerMetadataCacheImpl` for all other tests. This would 
> be a singleton.
> 2. Implement `ServerMetadataCacheHAImpl` with a map of instances keyed on 
> config.
> 3. Extend `PhoenixRegionServerEndpoint` and use `ServerMetadataCacheHAImpl`. 
> 4. In HA tests, load this new endpoint on the region servers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7251 : Refactor server-side code to support multiple ServerMetadataCache for ITs which create multiple RSs or mini clusters [phoenix]

2024-03-07 Thread via GitHub


shahrs87 merged PR #1845:
URL: https://github.com/apache/phoenix/pull/1845


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7258) Query Optimizer should pick Index hint even for point lookup queries

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824612#comment-17824612
 ] 

ASF GitHub Bot commented on PHOENIX-7258:
-

virajjasani commented on code in PR #1851:
URL: https://github.com/apache/phoenix/pull/1851#discussion_r1517163722


##
phoenix-core-client/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java:
##
@@ -247,6 +248,10 @@ private List 
getApplicablePlansForSingleFlatQuery(QueryPlan dataPlan,
 }
 plans.add(0, hintedPlan);
 }
+if (dataPlan.getContext().getScanRanges().isPointLookup()
+&& stopAtBestPlan && dataPlan.isApplicable()) {
+return Collections. singletonList(dataPlan);
+}

Review Comment:
   I was thinking about the same.
   
   It seems that we might have more plans in the list of plans than just data 
plan, so if we return single plan here as data plan, it would help cover this 
case: Even if the hinted index is not applicable, and the data table has point 
lookup, we return best plan as point lookup on data table.





> Query Optimizer should pick Index hint even for point lookup queries
> 
>
> Key: PHOENIX-7258
> URL: https://issues.apache.org/jira/browse/PHOENIX-7258
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Sanjeet Malhotra
>Priority: Major
>
> For better performance, user can create covered indexes such that the indexed 
> columns are same as composite primary key of the data table, but with 
> different order. For instance, create data table with columns PK1, PK2, PK3, 
> C1, C2 columns with primary key being PK1, Pk2, PK3. In order to get better 
> performance from HBase block caching, if the data with same PK3 values are 
> going to reside as close to each other as possible, we can create an index on 
> PK3, PK2, PK1 and also include columns C1 and C2.
> For point lookups on the data table, it might still be helpful to query index 
> table depending on the usecase. We should allow using index hint to query the 
> index table for point lookup.
> When the query optimizer identifies that the query is point lookup for the 
> data table and if the "stop at best plan" is true, then it immediately 
> returns without checking the hint. We should check for hint and if applicable 
> index based hint plan is identified, use it.
>  
> Assuming getHintedPlanIfApplicable() retrieves hinted plan, it can look 
> something like:
> {code:java}
> if (dataPlan.getContext().getScanRanges().isPointLookup() && stopAtBestPlan
> && dataPlan.isApplicable()) {
> if (indexes.isEmpty() || select.getHint().getHint(Hint.INDEX) == null) {
> return Collections.singletonList(dataPlan);
> }
> QueryPlan hintedPlan = getHintedPlanIfApplicable(dataPlan, statement, 
> targetColumns,
> parallelIteratorFactory, select, indexes);
> if (hintedPlan != null) {
> PTable index = hintedPlan.getTableRef().getTable();
> if (hintedPlan.isApplicable() && (index.getIndexWhere() == null
> || isPartialIndexUsable(select, dataPlan, index))) {
> return Collections.singletonList(hintedPlan);
> }
> }
> return Collections.singletonList(dataPlan);
> } {code}
> We still need to be optimal i.e. if the hinted index plan is not applicable 
> or useful, we still need to immediately return the data plan of point lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7258: Query Optimizer should pick Index hint even for point lookup queries [phoenix]

2024-03-07 Thread via GitHub


virajjasani commented on code in PR #1851:
URL: https://github.com/apache/phoenix/pull/1851#discussion_r1517163722


##
phoenix-core-client/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java:
##
@@ -247,6 +248,10 @@ private List 
getApplicablePlansForSingleFlatQuery(QueryPlan dataPlan,
 }
 plans.add(0, hintedPlan);
 }
+if (dataPlan.getContext().getScanRanges().isPointLookup()
+&& stopAtBestPlan && dataPlan.isApplicable()) {
+return Collections. singletonList(dataPlan);
+}

Review Comment:
   I was thinking about the same.
   
   It seems that we might have more plans in the list of plans than just data 
plan, so if we return single plan here as data plan, it would help cover this 
case: Even if the hinted index is not applicable, and the data table has point 
lookup, we return best plan as point lookup on data table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (PHOENIX-7253) Perf improvement for non-full scan queries on large table

2024-03-07 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824569#comment-17824569
 ] 

Viraj Jasani edited comment on PHOENIX-7253 at 3/7/24 11:37 PM:


{quote}AFAIK for point lookups the code short circuits and we don't fetch any 
region information.
{quote}
Yes, that's only for point lookup. However, not for range scan or even full 
scan for tenant connection. So the significant perf impact comes for:
 * Range Scan
 * Range scan on Salted table
 * Range scan on Salted table with Tenant id and/or View index id
 * Range Scan on Tenant connection
 * Full Scan on Tenant connection

 

I think I missed removing point lookup from PR comment, whereas the Jira 
description is correct one:
{quote}Unless the query is strict point lookup, any query on any tenant view 
would end up retrieving region locations of all regions of the base table.
{quote}


was (Author: vjasani):
{quote}AFAIK for point lookups the code short circuits and we don't fetch any 
region information.
{quote}
Yes, that's only for point lookup. However, not for range scan or even full 
scan for tenant connection. So the significant perf impact comes for:
 * Range Scan
 * Range scan on Salted table
 * Range scan on Salted table with Tenant id and/or View index id
 * Range Scan on Tenant connection
 * Full Scan on Tenant connection

> Perf improvement for non-full scan queries on large table
> -
>
> Key: PHOENIX-7253
> URL: https://issues.apache.org/jira/browse/PHOENIX-7253
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Critical
> Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic 
> performance if we access all region locations from meta for the given table 
> before generating parallel or sequential scans for the given query. The perf 
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query 
> is strict point lookup, any query on any tenant view would end up retrieving 
> region locations of all regions of the base table. In case if IOException is 
> thrown by HBase client during any region location lookup in meta, we only 
> perform single retry.
> Proposal:
>  # All non point lookup queries should only retrieve region locations that 
> cover the scan boundary. Avoid fetching all region locations of the base 
> table.
>  # Make retries configurable with higher default value.
>  
> The proposal should improve the performance of queries:
>  * Range Scan
>  * Range scan on Salted table
>  * Range scan on Salted table with Tenant id and/or View index id
>  * Range Scan on Tenant connection
>  * Full Scan on Tenant connection
>  
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack 
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table 
> regions.
>     at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
>     at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
>     at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:555)
>     at 
> org.apache.phoenix.iterate.SerialIterators.(SerialIterators.java:69)
>     at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:374)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:222)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:217)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:370)
>     at 

[jira] [Commented] (PHOENIX-7253) Perf improvement for non-full scan queries on large table

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824574#comment-17824574
 ] 

ASF GitHub Bot commented on PHOENIX-7253:
-

virajjasani commented on PR #1848:
URL: https://github.com/apache/phoenix/pull/1848#issuecomment-1984781605

   Even earlier, without this patch, once CQSI connection is cached, multiple 
queries executed from the same client, using the same CQSI, do end up calling 
the same metadata API from HBase ClusterConnection:
   ```
 /**
  * Find region location hosting passed row
  * @param tableName table name
  * @param row   Row to find.
  * @param reloadIf true do not use cache, otherwise bypass.
  * @return Location of row.
  * @throws IOException if a remote or network exception occurs
  */
 HRegionLocation getRegionLocation(TableName tableName, byte[] row, boolean 
reload)
   throws IOException;
   ```




> Perf improvement for non-full scan queries on large table
> -
>
> Key: PHOENIX-7253
> URL: https://issues.apache.org/jira/browse/PHOENIX-7253
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Critical
> Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic 
> performance if we access all region locations from meta for the given table 
> before generating parallel or sequential scans for the given query. The perf 
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query 
> is strict point lookup, any query on any tenant view would end up retrieving 
> region locations of all regions of the base table. In case if IOException is 
> thrown by HBase client during any region location lookup in meta, we only 
> perform single retry.
> Proposal:
>  # All non point lookup queries should only retrieve region locations that 
> cover the scan boundary. Avoid fetching all region locations of the base 
> table.
>  # Make retries configurable with higher default value.
>  
> The proposal should improve the performance of queries:
>  * Range Scan
>  * Range scan on Salted table
>  * Range scan on Salted table with Tenant id and/or View index id
>  * Range Scan on Tenant connection
>  * Full Scan on Tenant connection
>  
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack 
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table 
> regions.
>     at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
>     at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
>     at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:555)
>     at 
> org.apache.phoenix.iterate.SerialIterators.(SerialIterators.java:69)
>     at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:374)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:222)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:217)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:370)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:328)
>     at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:328)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:320)
>     at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeQuery(PhoenixPreparedStatement.java:188)

Re: [PR] PHOENIX-7253 Perf improvement for non-full scan queries on large table [phoenix]

2024-03-07 Thread via GitHub


virajjasani commented on PR #1848:
URL: https://github.com/apache/phoenix/pull/1848#issuecomment-1984781605

   Even earlier, without this patch, once CQSI connection is cached, multiple 
queries executed from the same client, using the same CQSI, do end up calling 
the same metadata API from HBase ClusterConnection:
   ```
 /**
  * Find region location hosting passed row
  * @param tableName table name
  * @param row   Row to find.
  * @param reloadIf true do not use cache, otherwise bypass.
  * @return Location of row.
  * @throws IOException if a remote or network exception occurs
  */
 HRegionLocation getRegionLocation(TableName tableName, byte[] row, boolean 
reload)
   throws IOException;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Comment Edited] (PHOENIX-7253) Perf improvement for non-full scan queries on large table

2024-03-07 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824569#comment-17824569
 ] 

Viraj Jasani edited comment on PHOENIX-7253 at 3/7/24 11:17 PM:


{quote}AFAIK for point lookups the code short circuits and we don't fetch any 
region information.
{quote}
Yes, that's only for point lookup. However, not for range scan or even full 
scan for tenant connection. So the significant perf impact comes for:
 * Range Scan
 * Range scan on Salted table
 * Range scan on Salted table with Tenant id and/or View index id
 * Range Scan on Tenant connection
 * Full Scan on Tenant connection


was (Author: vjasani):
{quote}AFAIK for point lookups the code short circuits and we don't fetch any 
region information.
{quote}
Yes, that's only for point lookup. However, not for range scan or even full 
scan for tenant connection. So the significant perf impact comes for:
 * Range Scan
 * Any queries using tenant connection - full scan, range scan
 * Range scan on Salted table
 * Range scan on Salted table with Tenant id and/or View index id
 * Range Scan on Tenant connection
 * Full Scan on Tenant connection

> Perf improvement for non-full scan queries on large table
> -
>
> Key: PHOENIX-7253
> URL: https://issues.apache.org/jira/browse/PHOENIX-7253
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Critical
> Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic 
> performance if we access all region locations from meta for the given table 
> before generating parallel or sequential scans for the given query. The perf 
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query 
> is strict point lookup, any query on any tenant view would end up retrieving 
> region locations of all regions of the base table. In case if IOException is 
> thrown by HBase client during any region location lookup in meta, we only 
> perform single retry.
> Proposal:
>  # All non point lookup queries should only retrieve region locations that 
> cover the scan boundary. Avoid fetching all region locations of the base 
> table.
>  # Make retries configurable with higher default value.
>  
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack 
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table 
> regions.
>     at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
>     at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
>     at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:555)
>     at 
> org.apache.phoenix.iterate.SerialIterators.(SerialIterators.java:69)
>     at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:374)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:222)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:217)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:370)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:328)
>     at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:328)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:320)
>     at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeQuery(PhoenixPreparedStatement.java:188)
>     ...
>    

[jira] [Comment Edited] (PHOENIX-7253) Perf improvement for non-full scan queries on large table

2024-03-07 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824569#comment-17824569
 ] 

Viraj Jasani edited comment on PHOENIX-7253 at 3/7/24 11:16 PM:


{quote}AFAIK for point lookups the code short circuits and we don't fetch any 
region information.
{quote}
Yes, that's only for point lookup. However, not for range scan or even full 
scan for tenant connection. So the significant perf impact comes for:
 * Range Scan
 * Any queries using tenant connection - full scan, range scan
 * Range scan on Salted table
 * Range scan on Salted table with Tenant id and/or View index id
 * Range Scan on Tenant connection
 * Full Scan on Tenant connection


was (Author: vjasani):
{quote}AFAIK for point lookups the code short circuits and we don't fetch any 
region information.
{quote}
Yes, that's only for point lookup. However, not for range scan or even full 
scan for tenant connection. So the significant perf impact comes for:
 * Range Scan
 * Any queries using tenant connection - full scan and range scan
 * Range scan on Salted table
 * Range scan on Salted table with Tenant id and/or View index id

> Perf improvement for non-full scan queries on large table
> -
>
> Key: PHOENIX-7253
> URL: https://issues.apache.org/jira/browse/PHOENIX-7253
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Critical
> Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic 
> performance if we access all region locations from meta for the given table 
> before generating parallel or sequential scans for the given query. The perf 
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query 
> is strict point lookup, any query on any tenant view would end up retrieving 
> region locations of all regions of the base table. In case if IOException is 
> thrown by HBase client during any region location lookup in meta, we only 
> perform single retry.
> Proposal:
>  # All non point lookup queries should only retrieve region locations that 
> cover the scan boundary. Avoid fetching all region locations of the base 
> table.
>  # Make retries configurable with higher default value.
>  
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack 
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table 
> regions.
>     at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
>     at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
>     at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:555)
>     at 
> org.apache.phoenix.iterate.SerialIterators.(SerialIterators.java:69)
>     at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:374)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:222)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:217)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:370)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:328)
>     at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:328)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:320)
>     at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeQuery(PhoenixPreparedStatement.java:188)
>     ...
>     

[jira] [Commented] (PHOENIX-7253) Perf improvement for non-full scan queries on large table

2024-03-07 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824569#comment-17824569
 ] 

Viraj Jasani commented on PHOENIX-7253:
---

{quote}AFAIK for point lookups the code short circuits and we don't fetch any 
region information.
{quote}
Yes, that's only for point lookup. However, not for range scan or even full 
scan for tenant connection. So the significant perf impact comes for:
 * Range Scan
 * Any queries using tenant connection - full scan and range scan
 * Range scan on Salted table
 * Range scan on Salted table with Tenant id and/or View index id

> Perf improvement for non-full scan queries on large table
> -
>
> Key: PHOENIX-7253
> URL: https://issues.apache.org/jira/browse/PHOENIX-7253
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Critical
> Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic 
> performance if we access all region locations from meta for the given table 
> before generating parallel or sequential scans for the given query. The perf 
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query 
> is strict point lookup, any query on any tenant view would end up retrieving 
> region locations of all regions of the base table. In case if IOException is 
> thrown by HBase client during any region location lookup in meta, we only 
> perform single retry.
> Proposal:
>  # All non point lookup queries should only retrieve region locations that 
> cover the scan boundary. Avoid fetching all region locations of the base 
> table.
>  # Make retries configurable with higher default value.
>  
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack 
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table 
> regions.
>     at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
>     at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
>     at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:555)
>     at 
> org.apache.phoenix.iterate.SerialIterators.(SerialIterators.java:69)
>     at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:374)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:222)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:217)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:370)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:328)
>     at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:328)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:320)
>     at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeQuery(PhoenixPreparedStatement.java:188)
>     ...
>     ...
>     Caused by: java.io.InterruptedIOException: Origin: InterruptedException
>         at 
> org.apache.hadoop.hbase.util.ExceptionUtil.asInterrupt(ExceptionUtil.java:72)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.takeUserRegionLock(ConnectionImplementation.java:1129)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:994)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:895)
>         at 
> 

[jira] [Commented] (PHOENIX-7253) Perf improvement for non-full scan queries on large table

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824568#comment-17824568
 ] 

ASF GitHub Bot commented on PHOENIX-7253:
-

dbwong commented on PR #1848:
URL: https://github.com/apache/phoenix/pull/1848#issuecomment-1984753107

   @virajjasani This change as setup minimally looks reasonable to me... one 
area i do think we may need to be careful on however is the single threadedness 
of the HConnection... I didn't look into the details of how you were optimizing 
the but we should be careful not to allow multiple calls into the new 
getTableRegions(byte[] tableName, byte[] startRowKey, byte[] endRowKey) from 
multiple threads especially since they can somewhat overlap to a degree and 
while i haven't looked in detail i have a small concern about things like 
[phoenix-core-client/src/main/java/org/apache/phoenix/iterate/DefaultParallelScanGrouper.java](https://github.com/apache/phoenix/pull/1848/files#diff-f487983a399ca9d3c947e1b8b3a5093791b4a0947395851cee4bae006a4236cd)
 doing this.




> Perf improvement for non-full scan queries on large table
> -
>
> Key: PHOENIX-7253
> URL: https://issues.apache.org/jira/browse/PHOENIX-7253
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Critical
> Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic 
> performance if we access all region locations from meta for the given table 
> before generating parallel or sequential scans for the given query. The perf 
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query 
> is strict point lookup, any query on any tenant view would end up retrieving 
> region locations of all regions of the base table. In case if IOException is 
> thrown by HBase client during any region location lookup in meta, we only 
> perform single retry.
> Proposal:
>  # All non point lookup queries should only retrieve region locations that 
> cover the scan boundary. Avoid fetching all region locations of the base 
> table.
>  # Make retries configurable with higher default value.
>  
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack 
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table 
> regions.
>     at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
>     at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
>     at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:555)
>     at 
> org.apache.phoenix.iterate.SerialIterators.(SerialIterators.java:69)
>     at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:374)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:222)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:217)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:370)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:328)
>     at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:328)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:320)
>     at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeQuery(PhoenixPreparedStatement.java:188)
>     ...
>     ...
>     Caused by: java.io.InterruptedIOException: Origin: InterruptedException
>         at 
> 

Re: [PR] PHOENIX-7253 Perf improvement for non-full scan queries on large table [phoenix]

2024-03-07 Thread via GitHub


dbwong commented on PR #1848:
URL: https://github.com/apache/phoenix/pull/1848#issuecomment-1984753107

   @virajjasani This change as setup minimally looks reasonable to me... one 
area i do think we may need to be careful on however is the single threadedness 
of the HConnection... I didn't look into the details of how you were optimizing 
the but we should be careful not to allow multiple calls into the new 
getTableRegions(byte[] tableName, byte[] startRowKey, byte[] endRowKey) from 
multiple threads especially since they can somewhat overlap to a degree and 
while i haven't looked in detail i have a small concern about things like 
[phoenix-core-client/src/main/java/org/apache/phoenix/iterate/DefaultParallelScanGrouper.java](https://github.com/apache/phoenix/pull/1848/files#diff-f487983a399ca9d3c947e1b8b3a5093791b4a0947395851cee4bae006a4236cd)
 doing this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7253) Perf improvement for non-full scan queries on large table

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824567#comment-17824567
 ] 

ASF GitHub Bot commented on PHOENIX-7253:
-

virajjasani commented on code in PR #1848:
URL: https://github.com/apache/phoenix/pull/1848#discussion_r1516955054


##
phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java:
##
@@ -1005,6 +1005,43 @@ private ScansWithRegionLocations getParallelScans(byte[] 
startKey, byte[] stopKe
 
 int regionIndex = 0;
 int startRegionIndex = 0;
+
+List regionLocations;
+if (isSalted && !isLocalIndex) {
+// key prefix = salt num + view index id + tenant id
+// If salting is used with tenant or view index id, scan start and 
end
+// rowkeys will not be empty. We need to generate region locations 
for
+// all the scan range such that we cover (each salt bucket num) + 
(prefix starting from
+// index position 1 to cover view index and/or tenant id and/or 
remaining prefix).
+if (scan.getStartRow().length > 0 && scan.getStopRow().length > 0) 
{
+regionLocations = new ArrayList<>();
+for (int i = 0; i < getTable().getBucketNum(); i++) {
+byte[] saltStartRegionKey = new 
byte[scan.getStartRow().length];
+saltStartRegionKey[0] = (byte) i;
+System.arraycopy(scan.getStartRow(), 1, 
saltStartRegionKey, 1,
+scan.getStartRow().length - 1);
+
+byte[] saltStopRegionKey = new 
byte[scan.getStopRow().length];
+saltStopRegionKey[0] = (byte) i;
+System.arraycopy(scan.getStopRow(), 1, saltStopRegionKey, 
1,
+scan.getStopRow().length - 1);
+
+regionLocations.addAll(
+getRegionBoundaries(scanGrouper, saltStartRegionKey, 
saltStopRegionKey));
+}

Review Comment:
   `traverseAllRegions` is only used here, to use scan boundaries:
   ```
   } else if (!traverseAllRegions) {
   byte[] scanStartRow = scan.getStartRow();
   if (scanStartRow.length != 0 && Bytes.compareTo(scanStartRow, 
startKey) > 0) {
   startRegionBoundaryKey = startKey = scanStartRow;
   }
   byte[] scanStopRow = scan.getStopRow();
   if (stopKey.length == 0
   || (scanStopRow.length != 0 && 
Bytes.compareTo(scanStopRow, stopKey) < 0)) {
   stopRegionBoundaryKey = stopKey = scanStopRow;
   }
   }
   ```
   
   Here, with this work, we will be able to restrict the range scan for salted 
table to only specific regions, given that single salt can have many more 
regions as the table size grows.
   Moreover, salted table with tenant id makes the restriction even narrower 
for large table. Hence, with this work, we are covering optimization for large 
salted tables.





> Perf improvement for non-full scan queries on large table
> -
>
> Key: PHOENIX-7253
> URL: https://issues.apache.org/jira/browse/PHOENIX-7253
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Critical
> Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic 
> performance if we access all region locations from meta for the given table 
> before generating parallel or sequential scans for the given query. The perf 
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query 
> is strict point lookup, any query on any tenant view would end up retrieving 
> region locations of all regions of the base table. In case if IOException is 
> thrown by HBase client during any region location lookup in meta, we only 
> perform single retry.
> Proposal:
>  # All non point lookup queries should only retrieve region locations that 
> cover the scan boundary. Avoid fetching all region locations of the base 
> table.
>  # Make retries configurable with higher default value.
>  
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack 
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table 
> regions.
>     at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
>     at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
>     at 
> 

Re: [PR] PHOENIX-7253 Perf improvement for non-full scan queries on large table [phoenix]

2024-03-07 Thread via GitHub


virajjasani commented on code in PR #1848:
URL: https://github.com/apache/phoenix/pull/1848#discussion_r1516955054


##
phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java:
##
@@ -1005,6 +1005,43 @@ private ScansWithRegionLocations getParallelScans(byte[] 
startKey, byte[] stopKe
 
 int regionIndex = 0;
 int startRegionIndex = 0;
+
+List regionLocations;
+if (isSalted && !isLocalIndex) {
+// key prefix = salt num + view index id + tenant id
+// If salting is used with tenant or view index id, scan start and 
end
+// rowkeys will not be empty. We need to generate region locations 
for
+// all the scan range such that we cover (each salt bucket num) + 
(prefix starting from
+// index position 1 to cover view index and/or tenant id and/or 
remaining prefix).
+if (scan.getStartRow().length > 0 && scan.getStopRow().length > 0) 
{
+regionLocations = new ArrayList<>();
+for (int i = 0; i < getTable().getBucketNum(); i++) {
+byte[] saltStartRegionKey = new 
byte[scan.getStartRow().length];
+saltStartRegionKey[0] = (byte) i;
+System.arraycopy(scan.getStartRow(), 1, 
saltStartRegionKey, 1,
+scan.getStartRow().length - 1);
+
+byte[] saltStopRegionKey = new 
byte[scan.getStopRow().length];
+saltStopRegionKey[0] = (byte) i;
+System.arraycopy(scan.getStopRow(), 1, saltStopRegionKey, 
1,
+scan.getStopRow().length - 1);
+
+regionLocations.addAll(
+getRegionBoundaries(scanGrouper, saltStartRegionKey, 
saltStopRegionKey));
+}

Review Comment:
   `traverseAllRegions` is only used here, to use scan boundaries:
   ```
   } else if (!traverseAllRegions) {
   byte[] scanStartRow = scan.getStartRow();
   if (scanStartRow.length != 0 && Bytes.compareTo(scanStartRow, 
startKey) > 0) {
   startRegionBoundaryKey = startKey = scanStartRow;
   }
   byte[] scanStopRow = scan.getStopRow();
   if (stopKey.length == 0
   || (scanStopRow.length != 0 && 
Bytes.compareTo(scanStopRow, stopKey) < 0)) {
   stopRegionBoundaryKey = stopKey = scanStopRow;
   }
   }
   ```
   
   Here, with this work, we will be able to restrict the range scan for salted 
table to only specific regions, given that single salt can have many more 
regions as the table size grows.
   Moreover, salted table with tenant id makes the restriction even narrower 
for large table. Hence, with this work, we are covering optimization for large 
salted tables.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7253) Perf improvement for non-full scan queries on large table

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824566#comment-17824566
 ] 

ASF GitHub Bot commented on PHOENIX-7253:
-

tkhurana commented on code in PR #1848:
URL: https://github.com/apache/phoenix/pull/1848#discussion_r1516942676


##
phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java:
##
@@ -1005,6 +1005,43 @@ private ScansWithRegionLocations getParallelScans(byte[] 
startKey, byte[] stopKe
 
 int regionIndex = 0;
 int startRegionIndex = 0;
+
+List regionLocations;
+if (isSalted && !isLocalIndex) {
+// key prefix = salt num + view index id + tenant id
+// If salting is used with tenant or view index id, scan start and 
end
+// rowkeys will not be empty. We need to generate region locations 
for
+// all the scan range such that we cover (each salt bucket num) + 
(prefix starting from
+// index position 1 to cover view index and/or tenant id and/or 
remaining prefix).
+if (scan.getStartRow().length > 0 && scan.getStopRow().length > 0) 
{
+regionLocations = new ArrayList<>();
+for (int i = 0; i < getTable().getBucketNum(); i++) {
+byte[] saltStartRegionKey = new 
byte[scan.getStartRow().length];
+saltStartRegionKey[0] = (byte) i;
+System.arraycopy(scan.getStartRow(), 1, 
saltStartRegionKey, 1,
+scan.getStartRow().length - 1);
+
+byte[] saltStopRegionKey = new 
byte[scan.getStopRow().length];
+saltStopRegionKey[0] = (byte) i;
+System.arraycopy(scan.getStopRow(), 1, saltStopRegionKey, 
1,
+scan.getStopRow().length - 1);
+
+regionLocations.addAll(
+getRegionBoundaries(scanGrouper, saltStartRegionKey, 
saltStopRegionKey));
+}

Review Comment:
   Is all this necessary ? The variable `traverseAllRegions`  is set to true 
for salted tables so what are we gaining with this extra work 





> Perf improvement for non-full scan queries on large table
> -
>
> Key: PHOENIX-7253
> URL: https://issues.apache.org/jira/browse/PHOENIX-7253
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Critical
> Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic 
> performance if we access all region locations from meta for the given table 
> before generating parallel or sequential scans for the given query. The perf 
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query 
> is strict point lookup, any query on any tenant view would end up retrieving 
> region locations of all regions of the base table. In case if IOException is 
> thrown by HBase client during any region location lookup in meta, we only 
> perform single retry.
> Proposal:
>  # All non point lookup queries should only retrieve region locations that 
> cover the scan boundary. Avoid fetching all region locations of the base 
> table.
>  # Make retries configurable with higher default value.
>  
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack 
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table 
> regions.
>     at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
>     at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
>     at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:555)
>     at 
> 

Re: [PR] PHOENIX-7253 Perf improvement for non-full scan queries on large table [phoenix]

2024-03-07 Thread via GitHub


tkhurana commented on code in PR #1848:
URL: https://github.com/apache/phoenix/pull/1848#discussion_r1516942676


##
phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java:
##
@@ -1005,6 +1005,43 @@ private ScansWithRegionLocations getParallelScans(byte[] 
startKey, byte[] stopKe
 
 int regionIndex = 0;
 int startRegionIndex = 0;
+
+List regionLocations;
+if (isSalted && !isLocalIndex) {
+// key prefix = salt num + view index id + tenant id
+// If salting is used with tenant or view index id, scan start and 
end
+// rowkeys will not be empty. We need to generate region locations 
for
+// all the scan range such that we cover (each salt bucket num) + 
(prefix starting from
+// index position 1 to cover view index and/or tenant id and/or 
remaining prefix).
+if (scan.getStartRow().length > 0 && scan.getStopRow().length > 0) 
{
+regionLocations = new ArrayList<>();
+for (int i = 0; i < getTable().getBucketNum(); i++) {
+byte[] saltStartRegionKey = new 
byte[scan.getStartRow().length];
+saltStartRegionKey[0] = (byte) i;
+System.arraycopy(scan.getStartRow(), 1, 
saltStartRegionKey, 1,
+scan.getStartRow().length - 1);
+
+byte[] saltStopRegionKey = new 
byte[scan.getStopRow().length];
+saltStopRegionKey[0] = (byte) i;
+System.arraycopy(scan.getStopRow(), 1, saltStopRegionKey, 
1,
+scan.getStopRow().length - 1);
+
+regionLocations.addAll(
+getRegionBoundaries(scanGrouper, saltStartRegionKey, 
saltStopRegionKey));
+}

Review Comment:
   Is all this necessary ? The variable `traverseAllRegions`  is set to true 
for salted tables so what are we gaining with this extra work 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7258) Query Optimizer should pick Index hint even for point lookup queries

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824565#comment-17824565
 ] 

ASF GitHub Bot commented on PHOENIX-7258:
-

tkhurana commented on code in PR #1851:
URL: https://github.com/apache/phoenix/pull/1851#discussion_r1516937729


##
phoenix-core-client/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java:
##
@@ -247,6 +248,10 @@ private List 
getApplicablePlansForSingleFlatQuery(QueryPlan dataPlan,
 }
 plans.add(0, hintedPlan);
 }
+if (dataPlan.getContext().getScanRanges().isPointLookup()
+&& stopAtBestPlan && dataPlan.isApplicable()) {
+return Collections. singletonList(dataPlan);
+}

Review Comment:
   Why is this logic added here ? The dataPlan is already added to the list of 
plans. This feels like code smell.





> Query Optimizer should pick Index hint even for point lookup queries
> 
>
> Key: PHOENIX-7258
> URL: https://issues.apache.org/jira/browse/PHOENIX-7258
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Sanjeet Malhotra
>Priority: Major
>
> For better performance, user can create covered indexes such that the indexed 
> columns are same as composite primary key of the data table, but with 
> different order. For instance, create data table with columns PK1, PK2, PK3, 
> C1, C2 columns with primary key being PK1, Pk2, PK3. In order to get better 
> performance from HBase block caching, if the data with same PK3 values are 
> going to reside as close to each other as possible, we can create an index on 
> PK3, PK2, PK1 and also include columns C1 and C2.
> For point lookups on the data table, it might still be helpful to query index 
> table depending on the usecase. We should allow using index hint to query the 
> index table for point lookup.
> When the query optimizer identifies that the query is point lookup for the 
> data table and if the "stop at best plan" is true, then it immediately 
> returns without checking the hint. We should check for hint and if applicable 
> index based hint plan is identified, use it.
>  
> Assuming getHintedPlanIfApplicable() retrieves hinted plan, it can look 
> something like:
> {code:java}
> if (dataPlan.getContext().getScanRanges().isPointLookup() && stopAtBestPlan
> && dataPlan.isApplicable()) {
> if (indexes.isEmpty() || select.getHint().getHint(Hint.INDEX) == null) {
> return Collections.singletonList(dataPlan);
> }
> QueryPlan hintedPlan = getHintedPlanIfApplicable(dataPlan, statement, 
> targetColumns,
> parallelIteratorFactory, select, indexes);
> if (hintedPlan != null) {
> PTable index = hintedPlan.getTableRef().getTable();
> if (hintedPlan.isApplicable() && (index.getIndexWhere() == null
> || isPartialIndexUsable(select, dataPlan, index))) {
> return Collections.singletonList(hintedPlan);
> }
> }
> return Collections.singletonList(dataPlan);
> } {code}
> We still need to be optimal i.e. if the hinted index plan is not applicable 
> or useful, we still need to immediately return the data plan of point lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7258: Query Optimizer should pick Index hint even for point lookup queries [phoenix]

2024-03-07 Thread via GitHub


tkhurana commented on code in PR #1851:
URL: https://github.com/apache/phoenix/pull/1851#discussion_r1516937729


##
phoenix-core-client/src/main/java/org/apache/phoenix/optimize/QueryOptimizer.java:
##
@@ -247,6 +248,10 @@ private List 
getApplicablePlansForSingleFlatQuery(QueryPlan dataPlan,
 }
 plans.add(0, hintedPlan);
 }
+if (dataPlan.getContext().getScanRanges().isPointLookup()
+&& stopAtBestPlan && dataPlan.isApplicable()) {
+return Collections. singletonList(dataPlan);
+}

Review Comment:
   Why is this logic added here ? The dataPlan is already added to the list of 
plans. This feels like code smell.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7253) Perf improvement for non-full scan queries on large table

2024-03-07 Thread Tanuj Khurana (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824564#comment-17824564
 ] 

Tanuj Khurana commented on PHOENIX-7253:


[~vjasani] I didn't understand what was the perf issue for point lookups when 
using tenant connections or salted tables ? AFAIK for point lookups the code 
short circuits and we don't fetch any region information.

> Perf improvement for non-full scan queries on large table
> -
>
> Key: PHOENIX-7253
> URL: https://issues.apache.org/jira/browse/PHOENIX-7253
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Critical
> Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic 
> performance if we access all region locations from meta for the given table 
> before generating parallel or sequential scans for the given query. The perf 
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query 
> is strict point lookup, any query on any tenant view would end up retrieving 
> region locations of all regions of the base table. In case if IOException is 
> thrown by HBase client during any region location lookup in meta, we only 
> perform single retry.
> Proposal:
>  # All non point lookup queries should only retrieve region locations that 
> cover the scan boundary. Avoid fetching all region locations of the base 
> table.
>  # Make retries configurable with higher default value.
>  
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack 
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table 
> regions.
>     at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
>     at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
>     at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:555)
>     at 
> org.apache.phoenix.iterate.SerialIterators.(SerialIterators.java:69)
>     at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:278)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:374)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:222)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:217)
>     at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:370)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:328)
>     at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:328)
>     at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:320)
>     at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.executeQuery(PhoenixPreparedStatement.java:188)
>     ...
>     ...
>     Caused by: java.io.InterruptedIOException: Origin: InterruptedException
>         at 
> org.apache.hadoop.hbase.util.ExceptionUtil.asInterrupt(ExceptionUtil.java:72)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.takeUserRegionLock(ConnectionImplementation.java:1129)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:994)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:895)
>         at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:881)
>         at 
> 

[jira] [Commented] (PHOENIX-7258) Query Optimizer should pick Index hint even for point lookup queries

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824548#comment-17824548
 ] 

ASF GitHub Bot commented on PHOENIX-7258:
-

kadirozde commented on code in PR #1851:
URL: https://github.com/apache/phoenix/pull/1851#discussion_r1516869330


##
phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java:
##
@@ -1221,4 +1222,29 @@ public void rvcOffsetTrailingNullKeyTest() throws 
Exception {
 }
 }
 
+// Test point lookup over data table with index hint and hinted plan is 
not applicable
+@Test
+public void testRVCOffsetWithNotApplicableIndexHint() throws Exception {
+String sql = String.format("SELECT /*+ INDEX(%s %s)*/ %s FROM %s "
++ "WHERE t_id = 'b' AND k1 = 2 AND k2 = 3 OFFSET 
(%s)=('a', 1, 2)",
+TABLE_NAME, INDEX_NAME, 
TABLE_ROW_KEY,TABLE_NAME,TABLE_ROW_KEY);
+try (Statement statement = conn.createStatement()){
+ResultSet rs = statement.executeQuery("EXPLAIN " + sql);
+String actualQueryPlan = QueryUtil.getExplainPlan(rs);
+// As hinted plan is not applicable so use data plan which is 
point lookup
+assertTrue(actualQueryPlan.contains("POINT LOOKUP ON 1 KEY OVER " 
+ TABLE_NAME));
+}
+}
+
+@Test
+public void testRVCOffsetWithNotApplicableAndPointLookup() throws 
Exception {

Review Comment:
   is this test applicable to this PR?





> Query Optimizer should pick Index hint even for point lookup queries
> 
>
> Key: PHOENIX-7258
> URL: https://issues.apache.org/jira/browse/PHOENIX-7258
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Sanjeet Malhotra
>Priority: Major
>
> For better performance, user can create covered indexes such that the indexed 
> columns are same as composite primary key of the data table, but with 
> different order. For instance, create data table with columns PK1, PK2, PK3, 
> C1, C2 columns with primary key being PK1, Pk2, PK3. In order to get better 
> performance from HBase block caching, if the data with same PK3 values are 
> going to reside as close to each other as possible, we can create an index on 
> PK3, PK2, PK1 and also include columns C1 and C2.
> For point lookups on the data table, it might still be helpful to query index 
> table depending on the usecase. We should allow using index hint to query the 
> index table for point lookup.
> When the query optimizer identifies that the query is point lookup for the 
> data table and if the "stop at best plan" is true, then it immediately 
> returns without checking the hint. We should check for hint and if applicable 
> index based hint plan is identified, use it.
>  
> Assuming getHintedPlanIfApplicable() retrieves hinted plan, it can look 
> something like:
> {code:java}
> if (dataPlan.getContext().getScanRanges().isPointLookup() && stopAtBestPlan
> && dataPlan.isApplicable()) {
> if (indexes.isEmpty() || select.getHint().getHint(Hint.INDEX) == null) {
> return Collections.singletonList(dataPlan);
> }
> QueryPlan hintedPlan = getHintedPlanIfApplicable(dataPlan, statement, 
> targetColumns,
> parallelIteratorFactory, select, indexes);
> if (hintedPlan != null) {
> PTable index = hintedPlan.getTableRef().getTable();
> if (hintedPlan.isApplicable() && (index.getIndexWhere() == null
> || isPartialIndexUsable(select, dataPlan, index))) {
> return Collections.singletonList(hintedPlan);
> }
> }
> return Collections.singletonList(dataPlan);
> } {code}
> We still need to be optimal i.e. if the hinted index plan is not applicable 
> or useful, we still need to immediately return the data plan of point lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7258: Query Optimizer should pick Index hint even for point lookup queries [phoenix]

2024-03-07 Thread via GitHub


kadirozde commented on code in PR #1851:
URL: https://github.com/apache/phoenix/pull/1851#discussion_r1516869330


##
phoenix-core/src/it/java/org/apache/phoenix/end2end/RowValueConstructorOffsetIT.java:
##
@@ -1221,4 +1222,29 @@ public void rvcOffsetTrailingNullKeyTest() throws 
Exception {
 }
 }
 
+// Test point lookup over data table with index hint and hinted plan is 
not applicable
+@Test
+public void testRVCOffsetWithNotApplicableIndexHint() throws Exception {
+String sql = String.format("SELECT /*+ INDEX(%s %s)*/ %s FROM %s "
++ "WHERE t_id = 'b' AND k1 = 2 AND k2 = 3 OFFSET 
(%s)=('a', 1, 2)",
+TABLE_NAME, INDEX_NAME, 
TABLE_ROW_KEY,TABLE_NAME,TABLE_ROW_KEY);
+try (Statement statement = conn.createStatement()){
+ResultSet rs = statement.executeQuery("EXPLAIN " + sql);
+String actualQueryPlan = QueryUtil.getExplainPlan(rs);
+// As hinted plan is not applicable so use data plan which is 
point lookup
+assertTrue(actualQueryPlan.contains("POINT LOOKUP ON 1 KEY OVER " 
+ TABLE_NAME));
+}
+}
+
+@Test
+public void testRVCOffsetWithNotApplicableAndPointLookup() throws 
Exception {

Review Comment:
   is this test applicable to this PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7253) Perf improvement for non-full scan queries on large table

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824546#comment-17824546
 ] 

ASF GitHub Bot commented on PHOENIX-7253:
-

virajjasani commented on PR #1848:
URL: https://github.com/apache/phoenix/pull/1848#issuecomment-1984557213

   @dbwong here are some of the observations:
   In both HBase 1 and 2, region locations for the given table are cached at 
HBase Connection level. Phoenix CQSI connections are by default cached for ~24 
hr. In fact, the issue for the large table range scan queries that we have seen 
occurs after 24 hr such that multiple range scan queries get hit by performance 
issue. Sometimes the queries take ~5-10 min worth of time and sometimes the 
thread that performs meta table lookup gets interrupted. However, even after 24 
hr, still significant num of queries get affected.
   
   In the incident, what we have observed is that the base table has ~138k 
regions. However, the queries are being done using tenant connections. Because 
of the fact that we get all table regions regardless of the nature of the 
query, we end up spending significant time retrieving region locations and 
filling up the connection cache, even when the given query on the tenant view 
likely does not require going through more than 5-10 regions.
   
   Hence, this fix would improve the performance of queries being done on the 
large tables (and usually tables that share more tenants are larger) 
significantly. I am not proposing any change to how the getAllTableRegions API 
is written, we can continue to follow the same pattern because HBase still does 
not provide API where it can take start and end key of the scan range and 
provide all list of region locations in single API, I can file a jira for the 
same as well.
   
   The current state of the PR address the performance issues of the queries:
   
   - Range Scan
   - Any queries using tenant connection - full scan, range scan, point lookup
   - Point lookup or Range scan on Salted table
   - Point lookup or Range scan on Salted table with Tenant id and/or View 
index id
   
   The only queries that should require all table region locations are the ones 
that need full base table scan.
   
   Does this look good to you?




> Perf improvement for non-full scan queries on large table
> -
>
> Key: PHOENIX-7253
> URL: https://issues.apache.org/jira/browse/PHOENIX-7253
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Critical
> Fix For: 5.2.0, 5.1.4
>
>
> Any considerably large table with more than 100k regions can give problematic 
> performance if we access all region locations from meta for the given table 
> before generating parallel or sequential scans for the given query. The perf 
> impact can really hurt range scan queries.
> Consider a table with hundreds of thousands of tenant views. Unless the query 
> is strict point lookup, any query on any tenant view would end up retrieving 
> region locations of all regions of the base table. In case if IOException is 
> thrown by HBase client during any region location lookup in meta, we only 
> perform single retry.
> Proposal:
>  # All non point lookup queries should only retrieve region locations that 
> cover the scan boundary. Avoid fetching all region locations of the base 
> table.
>  # Make retries configurable with higher default value.
>  
> Sample stacktrace from the multiple failures observed:
> {code:java}
> java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions.Stack 
> trace: java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table 
> regions.
>     at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:620)
>     at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
>     at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:781)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.getAllTableRegions(DelegateConnectionQueryServices.java:87)
>     at 
> org.apache.phoenix.iterate.DefaultParallelScanGrouper.getRegionBoundaries(DefaultParallelScanGrouper.java:74)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getRegionBoundaries(BaseResultIterators.java:587)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:936)
>     at 
> org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:669)
>     at 
> 

Re: [PR] PHOENIX-7253 Perf improvement for non-full scan queries on large table [phoenix]

2024-03-07 Thread via GitHub


virajjasani commented on PR #1848:
URL: https://github.com/apache/phoenix/pull/1848#issuecomment-1984557213

   @dbwong here are some of the observations:
   In both HBase 1 and 2, region locations for the given table are cached at 
HBase Connection level. Phoenix CQSI connections are by default cached for ~24 
hr. In fact, the issue for the large table range scan queries that we have seen 
occurs after 24 hr such that multiple range scan queries get hit by performance 
issue. Sometimes the queries take ~5-10 min worth of time and sometimes the 
thread that performs meta table lookup gets interrupted. However, even after 24 
hr, still significant num of queries get affected.
   
   In the incident, what we have observed is that the base table has ~138k 
regions. However, the queries are being done using tenant connections. Because 
of the fact that we get all table regions regardless of the nature of the 
query, we end up spending significant time retrieving region locations and 
filling up the connection cache, even when the given query on the tenant view 
likely does not require going through more than 5-10 regions.
   
   Hence, this fix would improve the performance of queries being done on the 
large tables (and usually tables that share more tenants are larger) 
significantly. I am not proposing any change to how the getAllTableRegions API 
is written, we can continue to follow the same pattern because HBase still does 
not provide API where it can take start and end key of the scan range and 
provide all list of region locations in single API, I can file a jira for the 
same as well.
   
   The current state of the PR address the performance issues of the queries:
   
   - Range Scan
   - Any queries using tenant connection - full scan, range scan, point lookup
   - Point lookup or Range scan on Salted table
   - Point lookup or Range scan on Salted table with Tenant id and/or View 
index id
   
   The only queries that should require all table region locations are the ones 
that need full base table scan.
   
   Does this look good to you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7258) Query Optimizer should pick Index hint even for point lookup queries

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824542#comment-17824542
 ] 

ASF GitHub Bot commented on PHOENIX-7258:
-

virajjasani commented on code in PR #1851:
URL: https://github.com/apache/phoenix/pull/1851#discussion_r1516856703


##
phoenix-core/src/it/java/org/apache/phoenix/end2end/TenantSpecificTablesDDLIT.java:
##
@@ -616,6 +621,52 @@ public void testTableMetadataScan() throws Exception {
 conn.close();
 }
 }
+
+@Test
+public void testIndexHintWithTenantView() throws Exception {
+String schemaName = generateUniqueName();
+String dataTableName = generateUniqueName();
+String fullDataTableName = SchemaUtil.getTableName(schemaName, 
dataTableName);
+String viewName = generateUniqueName();
+String fullViewName = SchemaUtil.getTableName(schemaName, viewName);
+String viewIndexName = generateUniqueName();
+try(Connection conn = DriverManager.getConnection(getUrl());
+Statement stmt = conn.createStatement()) {
+String createDataTable = "create table " + fullDataTableName + " 
(orgid varchar(10) not null, "
++ "id1 varchar(10) not null, id2 varchar(10) not null, id3 
integer not null, "
++ "val1 varchar(10), val2 varchar(10) " +
+"CONSTRAINT PK PRIMARY KEY (orgid, id1, id2, id3)) 
MULTI_TENANT=true";
+stmt.execute(createDataTable);
+stmt.execute("create view " + fullViewName + " as select * from " 
+ fullDataTableName);
+stmt.execute("create index " + viewIndexName + " on " + 
fullViewName + "(id3, id2, id1) include (val1, val2)");
+}
+try(Connection conn = 
DriverManager.getConnection(PHOENIX_JDBC_TENANT_SPECIFIC_URL);
+Statement stmt = conn.createStatement()) {
+String grandChildViewName = generateUniqueName();
+String fullGrandChildViewName = 
SchemaUtil.getTableName(schemaName, grandChildViewName);
+stmt.execute("create view " + fullGrandChildViewName + " as select 
* from " + fullViewName);
+PhoenixConnection pconn = conn.unwrap(PhoenixConnection.class);
+pconn.getTableNoCache(pconn.getTenantId(), fullGrandChildViewName);
+stmt.execute("upsert into " + fullGrandChildViewName + " values 
('a1', 'a2', 3, 'a4', 'a5')");
+conn.commit();
+stmt.execute("upsert into " + fullGrandChildViewName + " values 
('b1', 'b2', 3, 'b4', 'b5')");
+conn.commit();
+String physicalViewIndexTableName = 
MetaDataUtil.getViewIndexPhysicalName(fullDataTableName);
+TableName viewIndexHBaseTable = 
TableName.valueOf(physicalViewIndexTableName);
+TestUtil.assertRawRowCount(conn, viewIndexHBaseTable, 2);
+String sql = "SELECT /*+ INDEX(" + fullGrandChildViewName + " " + 
viewIndexName + ")*/ "
++ "val2, id2, val1, id3, id1 FROM " + 
fullGrandChildViewName
++ " WHERE id2 = 'a2' AND (id1 = 'a1' OR id1 = 'b1') AND 
id3 = 3";
+ResultSet rs = stmt.executeQuery("EXPLAIN " + sql);
+String actualQueryPlan = QueryUtil.getExplainPlan(rs);
+String expectedQueryPlan = "CLIENT PARALLEL 1-WAY POINT LOOKUP ON 
2 KEYS OVER "
++ physicalViewIndexTableName;

Review Comment:
   Could you please also use the same query without hint and assert that the 
Explain plan now uses 1-WAY POINT LOOKUP on the base table?





> Query Optimizer should pick Index hint even for point lookup queries
> 
>
> Key: PHOENIX-7258
> URL: https://issues.apache.org/jira/browse/PHOENIX-7258
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Sanjeet Malhotra
>Priority: Major
>
> For better performance, user can create covered indexes such that the indexed 
> columns are same as composite primary key of the data table, but with 
> different order. For instance, create data table with columns PK1, PK2, PK3, 
> C1, C2 columns with primary key being PK1, Pk2, PK3. In order to get better 
> performance from HBase block caching, if the data with same PK3 values are 
> going to reside as close to each other as possible, we can create an index on 
> PK3, PK2, PK1 and also include columns C1 and C2.
> For point lookups on the data table, it might still be helpful to query index 
> table depending on the usecase. We should allow using index hint to query the 
> index table for point lookup.
> When the query optimizer identifies that the query is point lookup for the 
> data table and if the "stop at best plan" is true, then it immediately 
> 

Re: [PR] PHOENIX-7258: Query Optimizer should pick Index hint even for point lookup queries [phoenix]

2024-03-07 Thread via GitHub


virajjasani commented on code in PR #1851:
URL: https://github.com/apache/phoenix/pull/1851#discussion_r1516856703


##
phoenix-core/src/it/java/org/apache/phoenix/end2end/TenantSpecificTablesDDLIT.java:
##
@@ -616,6 +621,52 @@ public void testTableMetadataScan() throws Exception {
 conn.close();
 }
 }
+
+@Test
+public void testIndexHintWithTenantView() throws Exception {
+String schemaName = generateUniqueName();
+String dataTableName = generateUniqueName();
+String fullDataTableName = SchemaUtil.getTableName(schemaName, 
dataTableName);
+String viewName = generateUniqueName();
+String fullViewName = SchemaUtil.getTableName(schemaName, viewName);
+String viewIndexName = generateUniqueName();
+try(Connection conn = DriverManager.getConnection(getUrl());
+Statement stmt = conn.createStatement()) {
+String createDataTable = "create table " + fullDataTableName + " 
(orgid varchar(10) not null, "
++ "id1 varchar(10) not null, id2 varchar(10) not null, id3 
integer not null, "
++ "val1 varchar(10), val2 varchar(10) " +
+"CONSTRAINT PK PRIMARY KEY (orgid, id1, id2, id3)) 
MULTI_TENANT=true";
+stmt.execute(createDataTable);
+stmt.execute("create view " + fullViewName + " as select * from " 
+ fullDataTableName);
+stmt.execute("create index " + viewIndexName + " on " + 
fullViewName + "(id3, id2, id1) include (val1, val2)");
+}
+try(Connection conn = 
DriverManager.getConnection(PHOENIX_JDBC_TENANT_SPECIFIC_URL);
+Statement stmt = conn.createStatement()) {
+String grandChildViewName = generateUniqueName();
+String fullGrandChildViewName = 
SchemaUtil.getTableName(schemaName, grandChildViewName);
+stmt.execute("create view " + fullGrandChildViewName + " as select 
* from " + fullViewName);
+PhoenixConnection pconn = conn.unwrap(PhoenixConnection.class);
+pconn.getTableNoCache(pconn.getTenantId(), fullGrandChildViewName);
+stmt.execute("upsert into " + fullGrandChildViewName + " values 
('a1', 'a2', 3, 'a4', 'a5')");
+conn.commit();
+stmt.execute("upsert into " + fullGrandChildViewName + " values 
('b1', 'b2', 3, 'b4', 'b5')");
+conn.commit();
+String physicalViewIndexTableName = 
MetaDataUtil.getViewIndexPhysicalName(fullDataTableName);
+TableName viewIndexHBaseTable = 
TableName.valueOf(physicalViewIndexTableName);
+TestUtil.assertRawRowCount(conn, viewIndexHBaseTable, 2);
+String sql = "SELECT /*+ INDEX(" + fullGrandChildViewName + " " + 
viewIndexName + ")*/ "
++ "val2, id2, val1, id3, id1 FROM " + 
fullGrandChildViewName
++ " WHERE id2 = 'a2' AND (id1 = 'a1' OR id1 = 'b1') AND 
id3 = 3";
+ResultSet rs = stmt.executeQuery("EXPLAIN " + sql);
+String actualQueryPlan = QueryUtil.getExplainPlan(rs);
+String expectedQueryPlan = "CLIENT PARALLEL 1-WAY POINT LOOKUP ON 
2 KEYS OVER "
++ physicalViewIndexTableName;

Review Comment:
   Could you please also use the same query without hint and assert that the 
Explain plan now uses 1-WAY POINT LOOKUP on the base table?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7251) Refactor server-side code to support multiple ServerMetadataCache for HA tests

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824541#comment-17824541
 ] 

ASF GitHub Bot commented on PHOENIX-7251:
-

shahrs87 commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984524062

   > So I think we do need to read all site files?
   
   Looks like we do have to use `this.conf = HBaseConfiguration.create(conf)` 
   But since we are caching ServerMetadataCache object and this instance will 
be created just once for each RS, I think it is fine.




> Refactor server-side code to support multiple ServerMetadataCache for HA tests
> --
>
> Key: PHOENIX-7251
> URL: https://issues.apache.org/jira/browse/PHOENIX-7251
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Palash Chauhan
>Assignee: Palash Chauhan
>Priority: Major
>
> In the metadata caching re-design, `ServerMetadataCache` is required to be a 
> singleton in the implementation. This affects tests for the HA use case 
> because the coprocessors on the 2 clusters end up using the same 
> `ServerMetadataCache`. All tests which execute queries with 1 of the clusters 
> unavailable will fail. 
> We can refactor the implementation in the following way to support HA test 
> cases:
> 1. Create a `ServerMetadataCache` interface and use the current 
> implementation as `ServerMetadataCacheImpl` for all other tests. This would 
> be a singleton.
> 2. Implement `ServerMetadataCacheHAImpl` with a map of instances keyed on 
> config.
> 3. Extend `PhoenixRegionServerEndpoint` and use `ServerMetadataCacheHAImpl`. 
> 4. In HA tests, load this new endpoint on the region servers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7251 : Refactor server-side code to support multiple ServerMetadataCache for ITs which create multiple RSs or mini clusters [phoenix]

2024-03-07 Thread via GitHub


shahrs87 commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984524062

   > So I think we do need to read all site files?
   
   Looks like we do have to use `this.conf = HBaseConfiguration.create(conf)` 
   But since we are caching ServerMetadataCache object and this instance will 
be created just once for each RS, I think it is fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7258) Query Optimizer should pick Index hint even for point lookup queries

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824538#comment-17824538
 ] 

ASF GitHub Bot commented on PHOENIX-7258:
-

sanjeet006py opened a new pull request, #1851:
URL: https://github.com/apache/phoenix/pull/1851

   Ensuring index hint is honored even if its point lookup over data table and 
if hinted plan is null or not applicable then return original data table plan.




> Query Optimizer should pick Index hint even for point lookup queries
> 
>
> Key: PHOENIX-7258
> URL: https://issues.apache.org/jira/browse/PHOENIX-7258
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Viraj Jasani
>Assignee: Sanjeet Malhotra
>Priority: Major
>
> For better performance, user can create covered indexes such that the indexed 
> columns are same as composite primary key of the data table, but with 
> different order. For instance, create data table with columns PK1, PK2, PK3, 
> C1, C2 columns with primary key being PK1, Pk2, PK3. In order to get better 
> performance from HBase block caching, if the data with same PK3 values are 
> going to reside as close to each other as possible, we can create an index on 
> PK3, PK2, PK1 and also include columns C1 and C2.
> For point lookups on the data table, it might still be helpful to query index 
> table depending on the usecase. We should allow using index hint to query the 
> index table for point lookup.
> When the query optimizer identifies that the query is point lookup for the 
> data table and if the "stop at best plan" is true, then it immediately 
> returns without checking the hint. We should check for hint and if applicable 
> index based hint plan is identified, use it.
>  
> Assuming getHintedPlanIfApplicable() retrieves hinted plan, it can look 
> something like:
> {code:java}
> if (dataPlan.getContext().getScanRanges().isPointLookup() && stopAtBestPlan
> && dataPlan.isApplicable()) {
> if (indexes.isEmpty() || select.getHint().getHint(Hint.INDEX) == null) {
> return Collections.singletonList(dataPlan);
> }
> QueryPlan hintedPlan = getHintedPlanIfApplicable(dataPlan, statement, 
> targetColumns,
> parallelIteratorFactory, select, indexes);
> if (hintedPlan != null) {
> PTable index = hintedPlan.getTableRef().getTable();
> if (hintedPlan.isApplicable() && (index.getIndexWhere() == null
> || isPartialIndexUsable(select, dataPlan, index))) {
> return Collections.singletonList(hintedPlan);
> }
> }
> return Collections.singletonList(dataPlan);
> } {code}
> We still need to be optimal i.e. if the hinted index plan is not applicable 
> or useful, we still need to immediately return the data plan of point lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PHOENIX-7243) Add connectionType property to ConnectionInfo class.

2024-03-07 Thread Palash Chauhan (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824515#comment-17824515
 ] 

Palash Chauhan commented on PHOENIX-7243:
-

[~shahrs87] Please let me know when we have this change in our feature branch. 

> Add connectionType property to ConnectionInfo class.
> 
>
> Key: PHOENIX-7243
> URL: https://issues.apache.org/jira/browse/PHOENIX-7243
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Rushabh Shah
>Assignee: Palash Chauhan
>Priority: Major
> Fix For: 5.3.0
>
>
> In PhoenixDriver, we have a cache of ConnectionQueryServices which is keyed 
> by ConnectionInfo object. Refer 
> [here|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/PhoenixDriver.java#L258-L270]
>  for more details.
> Lets say if we want to create a server  connection (with property 
> IS_SERVER_CONNECTION set to true) and we already have a _non server_ 
> connection present in the cache (with the same user, principal, keytab, 
> haGroup), it will return the non server connection.
> We need to add isServerConnection property to 
> [ConnectionInfo|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/ConnectionInfo.java#L317-L334]
>  class to differentiate between server and non server connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PHOENIX-7251) Refactor server-side code to support multiple ServerMetadataCache for HA tests

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824514#comment-17824514
 ] 

ASF GitHub Bot commented on PHOENIX-7251:
-

palashc commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984251635

   > Lets see why this is breaking our test.
   Creating a connection on the server side fails because of malformed URL 
exception. 
   
   ```
   Caused by: java.sql.SQLException: ERROR 102 (08001): Malformed connection 
url. Quorum not specified and hbase.client.zookeeper.quorum is not set in 
configuration : jdbc:phoenix
at 
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:656)
at 
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
at 
org.apache.phoenix.jdbc.ConnectionInfo.getMalFormedUrlException(ConnectionInfo.java:82)
at 
org.apache.phoenix.jdbc.ZKConnectionInfo$Builder.normalize(ZKConnectionInfo.java:322)
at 
org.apache.phoenix.jdbc.ZKConnectionInfo$Builder.create(ZKConnectionInfo.java:175)
at 
org.apache.phoenix.jdbc.ConnectionInfo.create(ConnectionInfo.java:174)
at 
org.apache.phoenix.jdbc.ConnectionInfo.createNoLogin(ConnectionInfo.java:119)
at 
org.apache.phoenix.util.QueryUtil.getConnectionUrl(QueryUtil.java:454)
at 
org.apache.phoenix.util.QueryUtil.getConnectionUrl(QueryUtil.java:438)
at org.apache.phoenix.util.QueryUtil.getConnection(QueryUtil.java:429)
at 
org.apache.phoenix.util.QueryUtil.getConnectionOnServer(QueryUtil.java:410)
at 
org.apache.phoenix.cache.ServerMetadataCacheImpl.getConnection(ServerMetadataCacheImpl.java:162)
at 
org.apache.phoenix.end2end.ServerMetadataCacheTestImpl.getConnection(ServerMetadataCacheTestImpl.java:87)
at 
org.apache.phoenix.cache.ServerMetadataCacheImpl.getLastDDLTimestampForTable(ServerMetadataCacheImpl.java:134)
at 
org.apache.phoenix.coprocessor.VerifyLastDDLTimestamp.verifyLastDDLTimestamp(VerifyLastDDLTimestamp.java:57)
at 
org.apache.phoenix.coprocessor.PhoenixRegionServerEndpoint.validateLastDDLTimestamp(PhoenixRegionServerEndpoint.java:76)
   ```
   
   
   So I think we do need to read all site files? 




> Refactor server-side code to support multiple ServerMetadataCache for HA tests
> --
>
> Key: PHOENIX-7251
> URL: https://issues.apache.org/jira/browse/PHOENIX-7251
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Palash Chauhan
>Assignee: Palash Chauhan
>Priority: Major
>
> In the metadata caching re-design, `ServerMetadataCache` is required to be a 
> singleton in the implementation. This affects tests for the HA use case 
> because the coprocessors on the 2 clusters end up using the same 
> `ServerMetadataCache`. All tests which execute queries with 1 of the clusters 
> unavailable will fail. 
> We can refactor the implementation in the following way to support HA test 
> cases:
> 1. Create a `ServerMetadataCache` interface and use the current 
> implementation as `ServerMetadataCacheImpl` for all other tests. This would 
> be a singleton.
> 2. Implement `ServerMetadataCacheHAImpl` with a map of instances keyed on 
> config.
> 3. Extend `PhoenixRegionServerEndpoint` and use `ServerMetadataCacheHAImpl`. 
> 4. In HA tests, load this new endpoint on the region servers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7251 : Refactor server-side code to support multiple ServerMetadataCache for ITs which create multiple RSs or mini clusters [phoenix]

2024-03-07 Thread via GitHub


palashc commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984251635

   > Lets see why this is breaking our test.
   Creating a connection on the server side fails because of malformed URL 
exception. 
   
   ```
   Caused by: java.sql.SQLException: ERROR 102 (08001): Malformed connection 
url. Quorum not specified and hbase.client.zookeeper.quorum is not set in 
configuration : jdbc:phoenix
at 
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:656)
at 
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:229)
at 
org.apache.phoenix.jdbc.ConnectionInfo.getMalFormedUrlException(ConnectionInfo.java:82)
at 
org.apache.phoenix.jdbc.ZKConnectionInfo$Builder.normalize(ZKConnectionInfo.java:322)
at 
org.apache.phoenix.jdbc.ZKConnectionInfo$Builder.create(ZKConnectionInfo.java:175)
at 
org.apache.phoenix.jdbc.ConnectionInfo.create(ConnectionInfo.java:174)
at 
org.apache.phoenix.jdbc.ConnectionInfo.createNoLogin(ConnectionInfo.java:119)
at 
org.apache.phoenix.util.QueryUtil.getConnectionUrl(QueryUtil.java:454)
at 
org.apache.phoenix.util.QueryUtil.getConnectionUrl(QueryUtil.java:438)
at org.apache.phoenix.util.QueryUtil.getConnection(QueryUtil.java:429)
at 
org.apache.phoenix.util.QueryUtil.getConnectionOnServer(QueryUtil.java:410)
at 
org.apache.phoenix.cache.ServerMetadataCacheImpl.getConnection(ServerMetadataCacheImpl.java:162)
at 
org.apache.phoenix.end2end.ServerMetadataCacheTestImpl.getConnection(ServerMetadataCacheTestImpl.java:87)
at 
org.apache.phoenix.cache.ServerMetadataCacheImpl.getLastDDLTimestampForTable(ServerMetadataCacheImpl.java:134)
at 
org.apache.phoenix.coprocessor.VerifyLastDDLTimestamp.verifyLastDDLTimestamp(VerifyLastDDLTimestamp.java:57)
at 
org.apache.phoenix.coprocessor.PhoenixRegionServerEndpoint.validateLastDDLTimestamp(PhoenixRegionServerEndpoint.java:76)
   ```
   
   
   So I think we do need to read all site files? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7243) Add isServerConnection property to ConnectionInfo class.

2024-03-07 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824513#comment-17824513
 ] 

Rushabh Shah commented on PHOENIX-7243:
---

Thank you [~palashc] for the PR. 

> Add isServerConnection property to ConnectionInfo class.
> 
>
> Key: PHOENIX-7243
> URL: https://issues.apache.org/jira/browse/PHOENIX-7243
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Rushabh Shah
>Assignee: Palash Chauhan
>Priority: Major
> Fix For: 5.3.0
>
>
> In PhoenixDriver, we have a cache of ConnectionQueryServices which is keyed 
> by ConnectionInfo object. Refer 
> [here|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/PhoenixDriver.java#L258-L270]
>  for more details.
> Lets say if we want to create a server  connection (with property 
> IS_SERVER_CONNECTION set to true) and we already have a _non server_ 
> connection present in the cache (with the same user, principal, keytab, 
> haGroup), it will return the non server connection.
> We need to add isServerConnection property to 
> [ConnectionInfo|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/ConnectionInfo.java#L317-L334]
>  class to differentiate between server and non server connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (PHOENIX-7243) Add isServerConnection property to ConnectionInfo class.

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824511#comment-17824511
 ] 

ASF GitHub Bot commented on PHOENIX-7243:
-

shahrs87 commented on PR #1839:
URL: https://github.com/apache/phoenix/pull/1839#issuecomment-1984248090

   Forgot to approve the PR. Belated +1.




> Add isServerConnection property to ConnectionInfo class.
> 
>
> Key: PHOENIX-7243
> URL: https://issues.apache.org/jira/browse/PHOENIX-7243
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Rushabh Shah
>Assignee: Palash Chauhan
>Priority: Major
> Fix For: 5.3.0
>
>
> In PhoenixDriver, we have a cache of ConnectionQueryServices which is keyed 
> by ConnectionInfo object. Refer 
> [here|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/PhoenixDriver.java#L258-L270]
>  for more details.
> Lets say if we want to create a server  connection (with property 
> IS_SERVER_CONNECTION set to true) and we already have a _non server_ 
> connection present in the cache (with the same user, principal, keytab, 
> haGroup), it will return the non server connection.
> We need to add isServerConnection property to 
> [ConnectionInfo|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/ConnectionInfo.java#L317-L334]
>  class to differentiate between server and non server connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7243 : Add connectionType property to ConnectionInfo [phoenix]

2024-03-07 Thread via GitHub


shahrs87 commented on PR #1839:
URL: https://github.com/apache/phoenix/pull/1839#issuecomment-1984248090

   Forgot to approve the PR. Belated +1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7243) Add isServerConnection property to ConnectionInfo class.

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824510#comment-17824510
 ] 

ASF GitHub Bot commented on PHOENIX-7243:
-

shahrs87 merged PR #1839:
URL: https://github.com/apache/phoenix/pull/1839




> Add isServerConnection property to ConnectionInfo class.
> 
>
> Key: PHOENIX-7243
> URL: https://issues.apache.org/jira/browse/PHOENIX-7243
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Rushabh Shah
>Assignee: Palash Chauhan
>Priority: Major
>
> In PhoenixDriver, we have a cache of ConnectionQueryServices which is keyed 
> by ConnectionInfo object. Refer 
> [here|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/PhoenixDriver.java#L258-L270]
>  for more details.
> Lets say if we want to create a server  connection (with property 
> IS_SERVER_CONNECTION set to true) and we already have a _non server_ 
> connection present in the cache (with the same user, principal, keytab, 
> haGroup), it will return the non server connection.
> We need to add isServerConnection property to 
> [ConnectionInfo|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/ConnectionInfo.java#L317-L334]
>  class to differentiate between server and non server connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7243 : Add connectionType property to ConnectionInfo [phoenix]

2024-03-07 Thread via GitHub


shahrs87 merged PR #1839:
URL: https://github.com/apache/phoenix/pull/1839


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7251) Refactor server-side code to support multiple ServerMetadataCache for HA tests

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824507#comment-17824507
 ] 

ASF GitHub Bot commented on PHOENIX-7251:
-

shahrs87 commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984244698

   > Maybe we need HBaseConfiguration.create(conf); - Yes, that works!
   
   Lets not do this. This reads all site.xml files (hbase-site.xml, 
hbase-default.xml, hdfs-site.xml) and then overrides the properties with the 
passed conf. 
   
   >  This change breaks our tests, need to look into it.
   
   Lets see why this is breaking our test.




> Refactor server-side code to support multiple ServerMetadataCache for HA tests
> --
>
> Key: PHOENIX-7251
> URL: https://issues.apache.org/jira/browse/PHOENIX-7251
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Palash Chauhan
>Assignee: Palash Chauhan
>Priority: Major
>
> In the metadata caching re-design, `ServerMetadataCache` is required to be a 
> singleton in the implementation. This affects tests for the HA use case 
> because the coprocessors on the 2 clusters end up using the same 
> `ServerMetadataCache`. All tests which execute queries with 1 of the clusters 
> unavailable will fail. 
> We can refactor the implementation in the following way to support HA test 
> cases:
> 1. Create a `ServerMetadataCache` interface and use the current 
> implementation as `ServerMetadataCacheImpl` for all other tests. This would 
> be a singleton.
> 2. Implement `ServerMetadataCacheHAImpl` with a map of instances keyed on 
> config.
> 3. Extend `PhoenixRegionServerEndpoint` and use `ServerMetadataCacheHAImpl`. 
> 4. In HA tests, load this new endpoint on the region servers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7251 : Refactor server-side code to support multiple ServerMetadataCache for ITs which create multiple RSs or mini clusters [phoenix]

2024-03-07 Thread via GitHub


shahrs87 commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984244698

   > Maybe we need HBaseConfiguration.create(conf); - Yes, that works!
   
   Lets not do this. This reads all site.xml files (hbase-site.xml, 
hbase-default.xml, hdfs-site.xml) and then overrides the properties with the 
passed conf. 
   
   >  This change breaks our tests, need to look into it.
   
   Lets see why this is breaking our test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7251) Refactor server-side code to support multiple ServerMetadataCache for HA tests

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824505#comment-17824505
 ] 

ASF GitHub Bot commented on PHOENIX-7251:
-

palashc commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984217352

   `this.conf = new Configuration(conf); ` - This change breaks our tests, need 
to look into it. 




> Refactor server-side code to support multiple ServerMetadataCache for HA tests
> --
>
> Key: PHOENIX-7251
> URL: https://issues.apache.org/jira/browse/PHOENIX-7251
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Palash Chauhan
>Assignee: Palash Chauhan
>Priority: Major
>
> In the metadata caching re-design, `ServerMetadataCache` is required to be a 
> singleton in the implementation. This affects tests for the HA use case 
> because the coprocessors on the 2 clusters end up using the same 
> `ServerMetadataCache`. All tests which execute queries with 1 of the clusters 
> unavailable will fail. 
> We can refactor the implementation in the following way to support HA test 
> cases:
> 1. Create a `ServerMetadataCache` interface and use the current 
> implementation as `ServerMetadataCacheImpl` for all other tests. This would 
> be a singleton.
> 2. Implement `ServerMetadataCacheHAImpl` with a map of instances keyed on 
> config.
> 3. Extend `PhoenixRegionServerEndpoint` and use `ServerMetadataCacheHAImpl`. 
> 4. In HA tests, load this new endpoint on the region servers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7251 : Refactor server-side code to support multiple ServerMetadataCache for ITs which create multiple RSs or mini clusters [phoenix]

2024-03-07 Thread via GitHub


palashc commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984217352

   `this.conf = new Configuration(conf); ` - This change breaks our tests, need 
to look into it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7251) Refactor server-side code to support multiple ServerMetadataCache for HA tests

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824501#comment-17824501
 ] 

ASF GitHub Bot commented on PHOENIX-7251:
-

shahrs87 commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984174529

   > INSTANCE variable does not match static variable name pattern but it was 
already present in code, we just refactored in this PR.
   
   Can we fix this just by following the pattern that it expects? 
   
   > new class does have license but it's not a line by line match to what is 
expected, I took it from other classes in the codebase.
   
   Do we know what is expected? If yes then lets make checkstyle happy.
   
   
   > This is about storing the conf object inside the server metadata cache, 
also something which was only refactored into a new class in this PR.
   
   Agree that this is just refactoring. But I think the correct fix is creating 
a new conf object something like this:
   `this.conf = new Configuration(conf);
   `
   
   Lets do these changes to make checkstyle and spotbugs happy. @palashc 




> Refactor server-side code to support multiple ServerMetadataCache for HA tests
> --
>
> Key: PHOENIX-7251
> URL: https://issues.apache.org/jira/browse/PHOENIX-7251
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Palash Chauhan
>Assignee: Palash Chauhan
>Priority: Major
>
> In the metadata caching re-design, `ServerMetadataCache` is required to be a 
> singleton in the implementation. This affects tests for the HA use case 
> because the coprocessors on the 2 clusters end up using the same 
> `ServerMetadataCache`. All tests which execute queries with 1 of the clusters 
> unavailable will fail. 
> We can refactor the implementation in the following way to support HA test 
> cases:
> 1. Create a `ServerMetadataCache` interface and use the current 
> implementation as `ServerMetadataCacheImpl` for all other tests. This would 
> be a singleton.
> 2. Implement `ServerMetadataCacheHAImpl` with a map of instances keyed on 
> config.
> 3. Extend `PhoenixRegionServerEndpoint` and use `ServerMetadataCacheHAImpl`. 
> 4. In HA tests, load this new endpoint on the region servers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7251 : Refactor server-side code to support multiple ServerMetadataCache for ITs which create multiple RSs or mini clusters [phoenix]

2024-03-07 Thread via GitHub


shahrs87 commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984174529

   > INSTANCE variable does not match static variable name pattern but it was 
already present in code, we just refactored in this PR.
   
   Can we fix this just by following the pattern that it expects? 
   
   > new class does have license but it's not a line by line match to what is 
expected, I took it from other classes in the codebase.
   
   Do we know what is expected? If yes then lets make checkstyle happy.
   
   
   > This is about storing the conf object inside the server metadata cache, 
also something which was only refactored into a new class in this PR.
   
   Agree that this is just refactoring. But I think the correct fix is creating 
a new conf object something like this:
   `this.conf = new Configuration(conf);
   `
   
   Lets do these changes to make checkstyle and spotbugs happy. @palashc 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7243) Add isServerConnection property to ConnectionInfo class.

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824500#comment-17824500
 ] 

ASF GitHub Bot commented on PHOENIX-7243:
-

shahrs87 commented on PR #1839:
URL: https://github.com/apache/phoenix/pull/1839#issuecomment-1984168480

   Change the title of the PR and commit to "PHOENIX-7243 : Add connectionType 
property to ConnectionInfo"
   There is 1 checkstyle warning that needs to be addressed.




> Add isServerConnection property to ConnectionInfo class.
> 
>
> Key: PHOENIX-7243
> URL: https://issues.apache.org/jira/browse/PHOENIX-7243
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Rushabh Shah
>Assignee: Palash Chauhan
>Priority: Major
>
> In PhoenixDriver, we have a cache of ConnectionQueryServices which is keyed 
> by ConnectionInfo object. Refer 
> [here|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/PhoenixDriver.java#L258-L270]
>  for more details.
> Lets say if we want to create a server  connection (with property 
> IS_SERVER_CONNECTION set to true) and we already have a _non server_ 
> connection present in the cache (with the same user, principal, keytab, 
> haGroup), it will return the non server connection.
> We need to add isServerConnection property to 
> [ConnectionInfo|https://github.com/apache/phoenix/blob/master/phoenix-core-client/src/main/java/org/apache/phoenix/jdbc/ConnectionInfo.java#L317-L334]
>  class to differentiate between server and non server connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7243 : Add isServerConnection property to ConnectionInfo [phoenix]

2024-03-07 Thread via GitHub


shahrs87 commented on PR #1839:
URL: https://github.com/apache/phoenix/pull/1839#issuecomment-1984168480

   Change the title of the PR and commit to "PHOENIX-7243 : Add connectionType 
property to ConnectionInfo"
   There is 1 checkstyle warning that needs to be addressed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7263) Row value constructor split keys not allowed on indexes

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824497#comment-17824497
 ] 

ASF GitHub Bot commented on PHOENIX-7263:
-

chrajeshbabu opened a new pull request, #1850:
URL: https://github.com/apache/phoenix/pull/1850

   (no comment)




> Row value constructor split keys not allowed on indexes
> ---
>
> Key: PHOENIX-7263
> URL: https://issues.apache.org/jira/browse/PHOENIX-7263
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Rajeshbabu Chintaguntla
>Priority: Major
>
> While creating indexes if we pass row value constructor split keys getting 
> following error  same is passing with create table because while creating the 
> table properly building the split keys using expression compiler which is not 
> the case with index creation.
> {noformat}
> java.lang.ClassCastException: 
> org.apache.phoenix.expression.RowValueConstructorExpression cannot be cast to 
> org.apache.phoenix.expression.LiteralExpression
>   at 
> org.apache.phoenix.compile.CreateIndexCompiler.compile(CreateIndexCompiler.java:77)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableCreateIndexStatement.compilePlan(PhoenixStatement.java:1205)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableCreateIndexStatement.compilePlan(PhoenixStatement.java:1191)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:435)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:425)
>   at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:424)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:412)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:2009)
>   at sqlline.Commands.executeSingleQuery(Commands.java:1054)
>   at sqlline.Commands.execute(Commands.java:1003)
>   at sqlline.Commands.sql(Commands.java:967)
>   at sqlline.SqlLine.dispatch(SqlLine.java:734)
>   at sqlline.SqlLine.begin(SqlLine.java:541)
>   at sqlline.SqlLine.start(SqlLine.java:267)
>   at sqlline.SqlLine.main(SqlLine.java:206)
> {noformat}
> In create table:
> {code:java}
> final byte[][] splits = new byte[splitNodes.size()][];
> ImmutableBytesWritable ptr = context.getTempPtr();
> ExpressionCompiler expressionCompiler = new 
> ExpressionCompiler(context);
> for (int i = 0; i < splits.length; i++) {
> ParseNode node = splitNodes.get(i);
> if (node instanceof BindParseNode) {
> context.getBindManager().addParamMetaData((BindParseNode) 
> node, VARBINARY_DATUM);
> }
> if (node.isStateless()) {
> Expression expression = node.accept(expressionCompiler);
> if (expression.evaluate(null, ptr)) {;
> splits[i] = ByteUtil.copyKeyBytesIfNecessary(ptr);
> continue;
> }
> }
> throw new 
> SQLExceptionInfo.Builder(SQLExceptionCode.SPLIT_POINT_NOT_CONSTANT)
> .setMessage("Node: " + node).build().buildException();
> }
> {code}
> Where as in indexing expecting only literals.
> {code:java}
> final byte[][] splits = new byte[splitNodes.size()][];
> for (int i = 0; i < splits.length; i++) {
> ParseNode node = splitNodes.get(i);
> if (!node.isStateless()) {
> throw new 
> SQLExceptionInfo.Builder(SQLExceptionCode.SPLIT_POINT_NOT_CONSTANT)
> .setMessage("Node: " + node).build().buildException();
> }
> LiteralExpression expression = 
> (LiteralExpression)node.accept(expressionCompiler);
> splits[i] = expression.getBytes();
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] PHOENIX-7263 Row value constructor split keys not allowed on indexes [phoenix]

2024-03-07 Thread via GitHub


chrajeshbabu opened a new pull request, #1850:
URL: https://github.com/apache/phoenix/pull/1850

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7251) Refactor server-side code to support multiple ServerMetadataCache for HA tests

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824493#comment-17824493
 ] 

ASF GitHub Bot commented on PHOENIX-7251:
-

palashc commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984109150

   Checkstyle
   - INSTANCE variable does not match static variable name pattern but it was 
already present in code, we just refactored in this PR.
   - new class does have license but it's not a line by line match to what is 
expected, I took it from other classes in the codebase.

   Spotbug
   - This is about storing the conf object inside the server metadata cache, 
also something which was only refactored into a new class in this PR. 




> Refactor server-side code to support multiple ServerMetadataCache for HA tests
> --
>
> Key: PHOENIX-7251
> URL: https://issues.apache.org/jira/browse/PHOENIX-7251
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Palash Chauhan
>Assignee: Palash Chauhan
>Priority: Major
>
> In the metadata caching re-design, `ServerMetadataCache` is required to be a 
> singleton in the implementation. This affects tests for the HA use case 
> because the coprocessors on the 2 clusters end up using the same 
> `ServerMetadataCache`. All tests which execute queries with 1 of the clusters 
> unavailable will fail. 
> We can refactor the implementation in the following way to support HA test 
> cases:
> 1. Create a `ServerMetadataCache` interface and use the current 
> implementation as `ServerMetadataCacheImpl` for all other tests. This would 
> be a singleton.
> 2. Implement `ServerMetadataCacheHAImpl` with a map of instances keyed on 
> config.
> 3. Extend `PhoenixRegionServerEndpoint` and use `ServerMetadataCacheHAImpl`. 
> 4. In HA tests, load this new endpoint on the region servers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7251 : Refactor server-side code to support multiple ServerMetadataCache for ITs which create multiple RSs or mini clusters [phoenix]

2024-03-07 Thread via GitHub


palashc commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984109150

   Checkstyle
   - INSTANCE variable does not match static variable name pattern but it was 
already present in code, we just refactored in this PR.
   - new class does have license but it's not a line by line match to what is 
expected, I took it from other classes in the codebase.

   Spotbug
   - This is about storing the conf object inside the server metadata cache, 
also something which was only refactored into a new class in this PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7251) Refactor server-side code to support multiple ServerMetadataCache for HA tests

2024-03-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824482#comment-17824482
 ] 

ASF GitHub Bot commented on PHOENIX-7251:
-

shahrs87 commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984033676

   @palashc  There are still 2 checkstyle and 1 spotbugs warning. Mind taking a 
look? 
   spotbugs warning is relevant in this case.




> Refactor server-side code to support multiple ServerMetadataCache for HA tests
> --
>
> Key: PHOENIX-7251
> URL: https://issues.apache.org/jira/browse/PHOENIX-7251
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Palash Chauhan
>Assignee: Palash Chauhan
>Priority: Major
>
> In the metadata caching re-design, `ServerMetadataCache` is required to be a 
> singleton in the implementation. This affects tests for the HA use case 
> because the coprocessors on the 2 clusters end up using the same 
> `ServerMetadataCache`. All tests which execute queries with 1 of the clusters 
> unavailable will fail. 
> We can refactor the implementation in the following way to support HA test 
> cases:
> 1. Create a `ServerMetadataCache` interface and use the current 
> implementation as `ServerMetadataCacheImpl` for all other tests. This would 
> be a singleton.
> 2. Implement `ServerMetadataCacheHAImpl` with a map of instances keyed on 
> config.
> 3. Extend `PhoenixRegionServerEndpoint` and use `ServerMetadataCacheHAImpl`. 
> 4. In HA tests, load this new endpoint on the region servers. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] PHOENIX-7251 : Refactor server-side code to support multiple ServerMetadataCache for ITs which create multiple RSs or mini clusters [phoenix]

2024-03-07 Thread via GitHub


shahrs87 commented on PR #1845:
URL: https://github.com/apache/phoenix/pull/1845#issuecomment-1984033676

   @palashc  There are still 2 checkstyle and 1 spotbugs warning. Mind taking a 
look? 
   spotbugs warning is relevant in this case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PHOENIX-7123) Support for multi-column split keys

2024-03-07 Thread Rajeshbabu Chintaguntla (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-7123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824377#comment-17824377
 ] 

Rajeshbabu Chintaguntla commented on PHOENIX-7123:
--

I think this has been supported using row value constructors. 
Still dynamic splitting of table post creation using alter command still not 
supported which can be handled as part of this.
FYI [~nihaljain.cs].

> Support for multi-column split keys
> ---
>
> Key: PHOENIX-7123
> URL: https://issues.apache.org/jira/browse/PHOENIX-7123
> Project: Phoenix
>  Issue Type: New Feature
>  Components: phoenix
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Nihal Jain
>Priority: Major
>
> Currently there is only way to create split keys by passing array of strings 
> which will be leading parts of rowkey. If a table have multi-column row key 
> or non varchar key application developers need to do lot of research to 
> prepare the split keys according to columns types and need to dig internal 
> details of how the row key formed with separators  in case of variable length 
> columns etc. We can support the multi column split keys by passing array of 
> arrays of identifiers so that split keys formed by considering data types, 
> fixed length or variable length column types etc.
> Even we can support splitting existing regions with alter kind of queries so 
> that need not relay on hbase APIs to split the regions.
> Example syntax:
> {code:sql}
> create table test(a integer not null, b varchar not null, c float, d bigint, 
> constraint pk primary key(a,b)) split on ([1, 'bob'],[5, 'fan'],[7,'nob'])
> {code}
> Similarly for dynamic splitting existing regions we can define alter command 
> also as below
> {code:sql}
> alter table test split on ([3, 'cob'],[4, 'end'])
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)