[jira] [Assigned] (PHOENIX-5344) MapReduce Jobs Over Salted Snapshots Give Wrong Results
[ https://issues.apache.org/jira/browse/PHOENIX-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra reassigned PHOENIX-5344: - Assignee: Akshita Malhotra > MapReduce Jobs Over Salted Snapshots Give Wrong Results > --- > > Key: PHOENIX-5344 > URL: https://issues.apache.org/jira/browse/PHOENIX-5344 > Project: Phoenix > Issue Type: Bug >Reporter: Geoffrey Jacoby > Assignee: Akshita Malhotra >Priority: Major > > I'm modifying an existing MapReduce job to use Phoenix's MapReduce / HBase > snapshot integration. When testing, I noticed that existing tests that had > previously worked for this job when running on salted Phoenix tables began to > fail when running on a snapshot of those tables. They pass when running > identical logic against the live table. Unsalted tables give the same, > correct result whether running against a live table or a snapshot. > The symptom on the salted snapshots is that the row count is way too high (a > factor of about 7x), but the exact amount appears non-deterministic. > My working theory is that somewhere the snapshot MapReduce integration for > Phoenix sets up the scans improperly for salted tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-3817) VerifyReplication using SQL
[ https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3817: -- Attachment: PHOENIX-3817-final2.patch > VerifyReplication using SQL > --- > > Key: PHOENIX-3817 > URL: https://issues.apache.org/jira/browse/PHOENIX-3817 > Project: Phoenix > Issue Type: Improvement >Reporter: Alex Araujo > Assignee: Akshita Malhotra >Priority: Minor > Fix For: 4.15.0 > > Attachments: PHOENIX-3817-final.patch, PHOENIX-3817-final2.patch, > PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, PHOENIX-3817.v3.patch, > PHOENIX-3817.v4.patch, PHOENIX-3817.v5.patch, PHOENIX-3817.v6.patch, > PHOENIX-3817.v7.patch > > > Certain use cases may copy or replicate a subset of a table to a different > table or cluster. For example, application topologies may map data for > specific tenants to different peer clusters. > It would be useful to have a Phoenix VerifyReplication tool that accepts an > SQL query, a target table, and an optional target cluster. The tool would > compare data returned by the query on the different tables and update various > result counters (similar to HBase's VerifyReplication). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PHOENIX-4867) Document Verify Replication tool (PHOENIX-3817)
[ https://issues.apache.org/jira/browse/PHOENIX-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra reassigned PHOENIX-4867: - Assignee: Akshita Malhotra > Document Verify Replication tool (PHOENIX-3817) > --- > > Key: PHOENIX-4867 > URL: https://issues.apache.org/jira/browse/PHOENIX-4867 > Project: Phoenix > Issue Type: Bug > Reporter: Akshita Malhotra > Assignee: Akshita Malhotra >Priority: Minor > > Create a phoenix user level doc for explaining the features and limitations > of VerifyReplication tool (PHOENIX-3817) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-4867) Document Verify Replication tool (PHOENIX-3817)
Akshita Malhotra created PHOENIX-4867: - Summary: Document Verify Replication tool (PHOENIX-3817) Key: PHOENIX-4867 URL: https://issues.apache.org/jira/browse/PHOENIX-4867 Project: Phoenix Issue Type: Bug Reporter: Akshita Malhotra Create a phoenix user level doc for explaining the features and limitations of VerifyReplication tool (PHOENIX-3817) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-3817) VerifyReplication using SQL
[ https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3817: -- Attachment: PHOENIX-3817-final.patch > VerifyReplication using SQL > --- > > Key: PHOENIX-3817 > URL: https://issues.apache.org/jira/browse/PHOENIX-3817 > Project: Phoenix > Issue Type: Improvement >Reporter: Alex Araujo > Assignee: Akshita Malhotra >Priority: Minor > Fix For: 4.15.0 > > Attachments: PHOENIX-3817-final.patch, PHOENIX-3817.v1.patch, > PHOENIX-3817.v2.patch, PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch, > PHOENIX-3817.v5.patch, PHOENIX-3817.v6.patch, PHOENIX-3817.v7.patch > > > Certain use cases may copy or replicate a subset of a table to a different > table or cluster. For example, application topologies may map data for > specific tenants to different peer clusters. > It would be useful to have a Phoenix VerifyReplication tool that accepts an > SQL query, a target table, and an optional target cluster. The tool would > compare data returned by the query on the different tables and update various > result counters (similar to HBase's VerifyReplication). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4849) UPSERT SELECT fails with stale region boundary exception after a split
[ https://issues.apache.org/jira/browse/PHOENIX-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-4849: -- Attachment: PHOENIX-4849.patch > UPSERT SELECT fails with stale region boundary exception after a split > -- > > Key: PHOENIX-4849 > URL: https://issues.apache.org/jira/browse/PHOENIX-4849 > Project: Phoenix > Issue Type: Bug > Reporter: Akshita Malhotra >Priority: Major > Attachments: PHOENIX-4849.patch > > > UPSERT SELECT throws a StaleRegionBoundaryCacheException immediately after a > split. On the other hand, an upsert followed by a select for example works > absolutely fine > org.apache.phoenix.schema.StaleRegionBoundaryCacheException: ERROR 1108 > (XCL08): Cache of region boundaries are out of date. > at > org.apache.phoenix.exception.SQLExceptionCode$14.newException(SQLExceptionCode.java:365) > at > org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150) > at > org.apache.phoenix.util.ServerUtil.parseRemoteException(ServerUtil.java:183) > at > org.apache.phoenix.util.ServerUtil.parseServerExceptionOrNull(ServerUtil.java:167) > at > org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:134) > at > org.apache.phoenix.iterate.ScanningResultIterator.next(ScanningResultIterator.java:153) > at > org.apache.phoenix.iterate.TableResultIterator.next(TableResultIterator.java:228) > at > org.apache.phoenix.iterate.LookAheadResultIterator$1.advance(LookAheadResultIterator.java:47) > at > org.apache.phoenix.iterate.LookAheadResultIterator.init(LookAheadResultIterator.java:59) > at > org.apache.phoenix.iterate.LookAheadResultIterator.peek(LookAheadResultIterator.java:73) > at > org.apache.phoenix.iterate.SerialIterators$SerialIterator.nextIterator(SerialIterators.java:187) > at > org.apache.phoenix.iterate.SerialIterators$SerialIterator.currentIterator(SerialIterators.java:160) > at > org.apache.phoenix.iterate.SerialIterators$SerialIterator.peek(SerialIterators.java:218) > at > org.apache.phoenix.iterate.ConcatResultIterator.currentIterator(ConcatResultIterator.java:100) > at > org.apache.phoenix.iterate.ConcatResultIterator.next(ConcatResultIterator.java:117) > at > org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44) > at > org.apache.phoenix.iterate.LimitingResultIterator.next(LimitingResultIterator.java:47) > at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:805) > at > org.apache.phoenix.compile.UpsertCompiler.upsertSelect(UpsertCompiler.java:219) > at > org.apache.phoenix.compile.UpsertCompiler$ClientUpsertSelectMutationPlan.execute(UpsertCompiler.java:1292) > at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:408) > at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:391) > at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) > at > org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:390) > at > org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:378) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:173) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:183) > at > org.apache.phoenix.end2end.UpsertSelectAfterSplitTest.upsertSelectData1(UpsertSelectAfterSplitTest.java:109) > at > org.apache.phoenix.end2end.UpsertSelectAfterSplitTest.testUpsertSelect(UpsertSelectAfterSplitTest.java:59) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) >
[jira] [Created] (PHOENIX-4849) UPSERT SELECT fails with stale region boundary exception after a split
Akshita Malhotra created PHOENIX-4849: - Summary: UPSERT SELECT fails with stale region boundary exception after a split Key: PHOENIX-4849 URL: https://issues.apache.org/jira/browse/PHOENIX-4849 Project: Phoenix Issue Type: Bug Reporter: Akshita Malhotra UPSERT SELECT throws a StaleRegionBoundaryCacheException immediately after a split. On the other hand, an upsert followed by a select for example works absolutely fine org.apache.phoenix.schema.StaleRegionBoundaryCacheException: ERROR 1108 (XCL08): Cache of region boundaries are out of date. at org.apache.phoenix.exception.SQLExceptionCode$14.newException(SQLExceptionCode.java:365) at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150) at org.apache.phoenix.util.ServerUtil.parseRemoteException(ServerUtil.java:183) at org.apache.phoenix.util.ServerUtil.parseServerExceptionOrNull(ServerUtil.java:167) at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:134) at org.apache.phoenix.iterate.ScanningResultIterator.next(ScanningResultIterator.java:153) at org.apache.phoenix.iterate.TableResultIterator.next(TableResultIterator.java:228) at org.apache.phoenix.iterate.LookAheadResultIterator$1.advance(LookAheadResultIterator.java:47) at org.apache.phoenix.iterate.LookAheadResultIterator.init(LookAheadResultIterator.java:59) at org.apache.phoenix.iterate.LookAheadResultIterator.peek(LookAheadResultIterator.java:73) at org.apache.phoenix.iterate.SerialIterators$SerialIterator.nextIterator(SerialIterators.java:187) at org.apache.phoenix.iterate.SerialIterators$SerialIterator.currentIterator(SerialIterators.java:160) at org.apache.phoenix.iterate.SerialIterators$SerialIterator.peek(SerialIterators.java:218) at org.apache.phoenix.iterate.ConcatResultIterator.currentIterator(ConcatResultIterator.java:100) at org.apache.phoenix.iterate.ConcatResultIterator.next(ConcatResultIterator.java:117) at org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44) at org.apache.phoenix.iterate.LimitingResultIterator.next(LimitingResultIterator.java:47) at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:805) at org.apache.phoenix.compile.UpsertCompiler.upsertSelect(UpsertCompiler.java:219) at org.apache.phoenix.compile.UpsertCompiler$ClientUpsertSelectMutationPlan.execute(UpsertCompiler.java:1292) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:408) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:391) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:390) at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:378) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:173) at org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:183) at org.apache.phoenix.end2end.UpsertSelectAfterSplitTest.upsertSelectData1(UpsertSelectAfterSplitTest.java:109) at org.apache.phoenix.end2end.UpsertSelectAfterSplitTest.testUpsertSelect(UpsertSelectAfterSplitTest.java:59) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:119
[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...
Github user akshita-malhotra commented on a diff in the pull request: https://github.com/apache/phoenix/pull/309#discussion_r204187343 --- Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/VerifyReplicationTool.java --- @@ -0,0 +1,477 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you maynot use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicablelaw or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.phoenix.mapreduce; + +import java.io.IOException; +import java.sql.SQLException; +import java.util.Collections; +import java.util.Map; + +import org.apache.commons.cli.CommandLine; +import org.apache.commons.cli.CommandLineParser; +import org.apache.commons.cli.HelpFormatter; +import org.apache.commons.cli.Option; +import org.apache.commons.cli.Options; +import org.apache.commons.cli.ParseException; +import org.apache.commons.cli.PosixParser; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.HConstants; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.io.ImmutableBytesWritable; +import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.Mapper; +import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat; +import org.apache.hadoop.util.Tool; +import org.apache.hadoop.util.ToolRunner; +import org.apache.phoenix.compile.QueryPlan; +import org.apache.phoenix.coprocessor.BaseScannerRegionObserver; +import org.apache.phoenix.iterate.ResultIterator; +import org.apache.phoenix.jdbc.PhoenixResultSet; +import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil; +import org.apache.phoenix.mapreduce.util.PhoenixMapReduceUtil; +import org.apache.phoenix.util.EnvironmentEdgeManager; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import com.google.common.base.Strings; + +/** + * Map only job that compares data across a source and target table. The target table can be on the + * same cluster or on a remote cluster. SQL conditions may be specified to compare only a subset of + * both tables. + */ +public class VerifyReplicationTool implements Tool { +private static final Logger LOG = LoggerFactory.getLogger(VerifyReplicationTool.class); + +static final Option +ZK_QUORUM_OPT = +new Option("z", "zookeeper", true, "ZooKeeper connection details (optional)"); +static final Option +TABLE_NAME_OPT = +new Option("t", "table", true, "Phoenix table name (required)"); +static final Option +TARGET_TABLE_NAME_OPT = +new Option("tt", "target-table", true, "Target Phoenix table name (optional)"); +static final Option +TARGET_ZK_QUORUM_OPT = +new Option("tz", "target-zookeeper", true, +"Target ZooKeeper connection details (optional)"); +static final Option +CONDITIONS_OPT = +new Option("c", "conditions", true, +"Conditions for select query WHERE clause (optional)"); +static final Option TIMESTAMP = +new Option("ts", "timestamp", true, +"Timestamp in millis used to compare the two tables. Defaults to current time minus 60 seconds"); + +static final Option HELP_OPT = new Option("h", "help", false, "Show this help and quit"); + +private Configuration conf; + +private String zkQuorum; +private String tableName;
[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...
Github user akshita-malhotra commented on a diff in the pull request: https://github.com/apache/phoenix/pull/309#discussion_r204181390 --- Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/VerifyReplicationTool.java --- @@ -0,0 +1,477 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you maynot use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicablelaw or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.phoenix.mapreduce; + +import java.io.IOException; +import java.sql.SQLException; +import java.util.Collections; +import java.util.Map; + +import org.apache.commons.cli.CommandLine; +import org.apache.commons.cli.CommandLineParser; +import org.apache.commons.cli.HelpFormatter; +import org.apache.commons.cli.Option; +import org.apache.commons.cli.Options; +import org.apache.commons.cli.ParseException; +import org.apache.commons.cli.PosixParser; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.HConstants; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.io.ImmutableBytesWritable; +import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.Mapper; +import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat; +import org.apache.hadoop.util.Tool; +import org.apache.hadoop.util.ToolRunner; +import org.apache.phoenix.compile.QueryPlan; +import org.apache.phoenix.coprocessor.BaseScannerRegionObserver; +import org.apache.phoenix.iterate.ResultIterator; +import org.apache.phoenix.jdbc.PhoenixResultSet; +import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil; +import org.apache.phoenix.mapreduce.util.PhoenixMapReduceUtil; +import org.apache.phoenix.util.EnvironmentEdgeManager; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import com.google.common.base.Strings; + +/** + * Map only job that compares data across a source and target table. The target table can be on the + * same cluster or on a remote cluster. SQL conditions may be specified to compare only a subset of + * both tables. + */ +public class VerifyReplicationTool implements Tool { +private static final Logger LOG = LoggerFactory.getLogger(VerifyReplicationTool.class); + +static final Option +ZK_QUORUM_OPT = +new Option("z", "zookeeper", true, "ZooKeeper connection details (optional)"); +static final Option +TABLE_NAME_OPT = +new Option("t", "table", true, "Phoenix table name (required)"); +static final Option +TARGET_TABLE_NAME_OPT = +new Option("tt", "target-table", true, "Target Phoenix table name (optional)"); +static final Option +TARGET_ZK_QUORUM_OPT = +new Option("tz", "target-zookeeper", true, +"Target ZooKeeper connection details (optional)"); +static final Option +CONDITIONS_OPT = +new Option("c", "conditions", true, +"Conditions for select query WHERE clause (optional)"); +static final Option TIMESTAMP = +new Option("ts", "timestamp", true, +"Timestamp in millis used to compare the two tables. Defaults to current time minus 60 seconds"); + +static final Option HELP_OPT = new Option("h", "help", false, "Show this help and quit"); + +private Configuration conf; + +private String zkQuorum; +private String tableName;
[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...
Github user akshita-malhotra commented on a diff in the pull request: https://github.com/apache/phoenix/pull/309#discussion_r204181281 --- Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/VerifyReplicationTool.java --- @@ -0,0 +1,477 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you maynot use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicablelaw or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.phoenix.mapreduce; + +import java.io.IOException; +import java.sql.SQLException; +import java.util.Collections; +import java.util.Map; + +import org.apache.commons.cli.CommandLine; +import org.apache.commons.cli.CommandLineParser; +import org.apache.commons.cli.HelpFormatter; +import org.apache.commons.cli.Option; +import org.apache.commons.cli.Options; +import org.apache.commons.cli.ParseException; +import org.apache.commons.cli.PosixParser; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.HConstants; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.io.ImmutableBytesWritable; +import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.Mapper; +import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat; +import org.apache.hadoop.util.Tool; +import org.apache.hadoop.util.ToolRunner; +import org.apache.phoenix.compile.QueryPlan; +import org.apache.phoenix.coprocessor.BaseScannerRegionObserver; +import org.apache.phoenix.iterate.ResultIterator; +import org.apache.phoenix.jdbc.PhoenixResultSet; +import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil; +import org.apache.phoenix.mapreduce.util.PhoenixMapReduceUtil; +import org.apache.phoenix.util.EnvironmentEdgeManager; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import com.google.common.base.Strings; + +/** + * Map only job that compares data across a source and target table. The target table can be on the + * same cluster or on a remote cluster. SQL conditions may be specified to compare only a subset of + * both tables. + */ +public class VerifyReplicationTool implements Tool { +private static final Logger LOG = LoggerFactory.getLogger(VerifyReplicationTool.class); + +static final Option +ZK_QUORUM_OPT = +new Option("z", "zookeeper", true, "ZooKeeper connection details (optional)"); +static final Option +TABLE_NAME_OPT = +new Option("t", "table", true, "Phoenix table name (required)"); +static final Option +TARGET_TABLE_NAME_OPT = +new Option("tt", "target-table", true, "Target Phoenix table name (optional)"); +static final Option +TARGET_ZK_QUORUM_OPT = +new Option("tz", "target-zookeeper", true, +"Target ZooKeeper connection details (optional)"); +static final Option +CONDITIONS_OPT = +new Option("c", "conditions", true, +"Conditions for select query WHERE clause (optional)"); +static final Option TIMESTAMP = +new Option("ts", "timestamp", true, +"Timestamp in millis used to compare the two tables. Defaults to current time minus 60 seconds"); + +static final Option HELP_OPT = new Option("h", "help", false, "Show this help and quit"); + +private Configuration conf; + +private String zkQuorum; +private String tableName;
[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...
Github user akshita-malhotra commented on a diff in the pull request: https://github.com/apache/phoenix/pull/309#discussion_r204180883 --- Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/VerifyReplicationTool.java --- @@ -0,0 +1,477 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you maynot use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicablelaw or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.phoenix.mapreduce; + +import java.io.IOException; +import java.sql.SQLException; +import java.util.Collections; +import java.util.Map; + +import org.apache.commons.cli.CommandLine; +import org.apache.commons.cli.CommandLineParser; +import org.apache.commons.cli.HelpFormatter; +import org.apache.commons.cli.Option; +import org.apache.commons.cli.Options; +import org.apache.commons.cli.ParseException; +import org.apache.commons.cli.PosixParser; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.HConstants; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.io.ImmutableBytesWritable; +import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.Mapper; +import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat; +import org.apache.hadoop.util.Tool; +import org.apache.hadoop.util.ToolRunner; +import org.apache.phoenix.compile.QueryPlan; +import org.apache.phoenix.coprocessor.BaseScannerRegionObserver; +import org.apache.phoenix.iterate.ResultIterator; +import org.apache.phoenix.jdbc.PhoenixResultSet; +import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil; +import org.apache.phoenix.mapreduce.util.PhoenixMapReduceUtil; +import org.apache.phoenix.util.EnvironmentEdgeManager; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import com.google.common.base.Strings; + +/** + * Map only job that compares data across a source and target table. The target table can be on the + * same cluster or on a remote cluster. SQL conditions may be specified to compare only a subset of + * both tables. + */ +public class VerifyReplicationTool implements Tool { +private static final Logger LOG = LoggerFactory.getLogger(VerifyReplicationTool.class); + +static final Option +ZK_QUORUM_OPT = +new Option("z", "zookeeper", true, "ZooKeeper connection details (optional)"); +static final Option +TABLE_NAME_OPT = +new Option("t", "table", true, "Phoenix table name (required)"); +static final Option +TARGET_TABLE_NAME_OPT = +new Option("tt", "target-table", true, "Target Phoenix table name (optional)"); +static final Option +TARGET_ZK_QUORUM_OPT = +new Option("tz", "target-zookeeper", true, +"Target ZooKeeper connection details (optional)"); +static final Option +CONDITIONS_OPT = +new Option("c", "conditions", true, +"Conditions for select query WHERE clause (optional)"); +static final Option TIMESTAMP = +new Option("ts", "timestamp", true, +"Timestamp in millis used to compare the two tables. Defaults to current time minus 60 seconds"); + +static final Option HELP_OPT = new Option("h", "help", false, "Show this help and quit"); + +private Configuration conf; + +private String zkQuorum; +private String tableName;
[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...
Github user akshita-malhotra commented on a diff in the pull request: https://github.com/apache/phoenix/pull/309#discussion_r204180658 --- Diff: phoenix-core/src/main/java/org/apache/phoenix/mapreduce/VerifyReplicationTool.java --- @@ -0,0 +1,477 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you maynot use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicablelaw or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.phoenix.mapreduce; + +import java.io.IOException; +import java.sql.SQLException; +import java.util.Collections; +import java.util.Map; + +import org.apache.commons.cli.CommandLine; +import org.apache.commons.cli.CommandLineParser; +import org.apache.commons.cli.HelpFormatter; +import org.apache.commons.cli.Option; +import org.apache.commons.cli.Options; +import org.apache.commons.cli.ParseException; +import org.apache.commons.cli.PosixParser; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hbase.HBaseConfiguration; +import org.apache.hadoop.hbase.HConstants; +import org.apache.hadoop.hbase.client.Scan; +import org.apache.hadoop.hbase.io.ImmutableBytesWritable; +import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.Mapper; +import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat; +import org.apache.hadoop.util.Tool; +import org.apache.hadoop.util.ToolRunner; +import org.apache.phoenix.compile.QueryPlan; +import org.apache.phoenix.coprocessor.BaseScannerRegionObserver; +import org.apache.phoenix.iterate.ResultIterator; +import org.apache.phoenix.jdbc.PhoenixResultSet; +import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil; +import org.apache.phoenix.mapreduce.util.PhoenixMapReduceUtil; +import org.apache.phoenix.util.EnvironmentEdgeManager; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import com.google.common.base.Strings; + +/** + * Map only job that compares data across a source and target table. The target table can be on the + * same cluster or on a remote cluster. SQL conditions may be specified to compare only a subset of + * both tables. + */ +public class VerifyReplicationTool implements Tool { +private static final Logger LOG = LoggerFactory.getLogger(VerifyReplicationTool.class); + +static final Option +ZK_QUORUM_OPT = +new Option("z", "zookeeper", true, "ZooKeeper connection details (optional)"); +static final Option +TABLE_NAME_OPT = +new Option("t", "table", true, "Phoenix table name (required)"); +static final Option +TARGET_TABLE_NAME_OPT = +new Option("tt", "target-table", true, "Target Phoenix table name (optional)"); +static final Option +TARGET_ZK_QUORUM_OPT = +new Option("tz", "target-zookeeper", true, +"Target ZooKeeper connection details (optional)"); +static final Option +CONDITIONS_OPT = +new Option("c", "conditions", true, +"Conditions for select query WHERE clause (optional)"); +static final Option TIMESTAMP = +new Option("ts", "timestamp", true, +"Timestamp in millis used to compare the two tables. Defaults to current time minus 60 seconds"); + +static final Option HELP_OPT = new Option("h", "help", false, "Show this help and quit"); + +private Configuration conf; + +private String zkQuorum; +private String tableName;
[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...
Github user akshita-malhotra commented on a diff in the pull request: https://github.com/apache/phoenix/pull/309#discussion_r203943433 --- Diff: phoenix-core/src/it/java/org/apache/phoenix/mapreduce/VerifyReplicationToolIT.java --- @@ -0,0 +1,323 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.phoenix.mapreduce; + +import java.io.IOException; +import java.sql.*; +import java.util.*; + +import com.google.common.collect.Maps; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hbase.HRegionInfo; +import org.apache.hadoop.hbase.MiniHBaseCluster; +import org.apache.hadoop.hbase.ServerName; +import org.apache.hadoop.hbase.TableName; +import org.apache.hadoop.hbase.client.HBaseAdmin; +import org.apache.hadoop.hbase.master.HMaster; +import org.apache.hadoop.hbase.regionserver.HRegionServer; +import org.apache.hadoop.hbase.util.Bytes; +import org.apache.hadoop.mapreduce.Counters; +import org.apache.hadoop.mapreduce.Job; +import org.apache.phoenix.end2end.BaseUniqueNamesOwnClusterIT; +import org.apache.phoenix.util.EnvironmentEdgeManager; +import org.apache.phoenix.util.ReadOnlyProps; +import org.junit.Assert; +import org.junit.Before; +import org.junit.BeforeClass; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES; +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertNotEquals; + +public class VerifyReplicationToolIT extends BaseUniqueNamesOwnClusterIT { +private static final Logger LOG = LoggerFactory.getLogger(VerifyReplicationToolIT.class); +private static final String CREATE_USER_TABLE = "CREATE TABLE IF NOT EXISTS %s ( " + +" TENANT_ID VARCHAR NOT NULL, USER_ID VARCHAR NOT NULL, AGE INTEGER " + +" CONSTRAINT pk PRIMARY KEY ( TENANT_ID, USER_ID ))"; +private static final String UPSERT_USER = "UPSERT INTO %s VALUES (?, ?, ?)"; +private static final String UPSERT_SELECT_USERS = +"UPSERT INTO %s SELECT TENANT_ID, USER_ID, %d FROM %s WHERE TENANT_ID = ? LIMIT %d"; +private static final Random RANDOM = new Random(); + +private static int tenantNum = 0; +private static int userNum = 0; +private static String sourceTableName; +private static String targetTableName; +private List sourceTenants; +private String sourceOnlyTenant; +private String sourceAndTargetTenant; +private String targetOnlyTenant; + +@BeforeClass +public static void createTables() throws Exception { +NUM_SLAVES_BASE = 2; +Map props = Maps.newHashMapWithExpectedSize(1); +setUpTestDriver(new ReadOnlyProps(props.entrySet().iterator())); +Connection conn = DriverManager.getConnection(getUrl()); +sourceTableName = generateUniqueName(); +targetTableName = generateUniqueName(); +// tables will have the same schema, but a different number of regions +conn.createStatement().execute(String.format(CREATE_USER_TABLE, sourceTableName)); +conn.createStatement().execute(String.format(CREATE_USER_TABLE, targetTableName)); +conn.commit(); +} + +@Before +public void setupTenants() throws Exception { +sourceTenants = new ArrayList<>(2); +sourceTenants.add("tenant" + tenantNum++); +sourceTenants.add("tenant" + tenantNum++); +sourceOnlyTenant = sourceTenants.get(0); +sourceAndTargetTenant = sourceTenants.get(1); +targetOnlyTenant = "tenant" + tenantNum++; +upsertData(); +split(sourceTableName, 4); +split(targetTableName, 2); +// ensure scans
[jira] [Commented] (PHOENIX-3817) VerifyReplication using SQL
[ https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549976#comment-16549976 ] Akshita Malhotra commented on PHOENIX-3817: --- [~gjacoby] Added the support for scn setting in the latest patch > VerifyReplication using SQL > --- > > Key: PHOENIX-3817 > URL: https://issues.apache.org/jira/browse/PHOENIX-3817 > Project: Phoenix > Issue Type: Improvement >Reporter: Alex Araujo > Assignee: Akshita Malhotra >Priority: Minor > Fix For: 4.15.0 > > Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, > PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch, PHOENIX-3817.v5.patch, > PHOENIX-3817.v6.patch > > > Certain use cases may copy or replicate a subset of a table to a different > table or cluster. For example, application topologies may map data for > specific tenants to different peer clusters. > It would be useful to have a Phoenix VerifyReplication tool that accepts an > SQL query, a target table, and an optional target cluster. The tool would > compare data returned by the query on the different tables and update various > result counters (similar to HBase's VerifyReplication). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-3817) VerifyReplication using SQL
[ https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3817: -- Attachment: PHOENIX-3817.v6.patch > VerifyReplication using SQL > --- > > Key: PHOENIX-3817 > URL: https://issues.apache.org/jira/browse/PHOENIX-3817 > Project: Phoenix > Issue Type: Improvement >Reporter: Alex Araujo > Assignee: Akshita Malhotra >Priority: Minor > Fix For: 4.15.0 > > Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, > PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch, PHOENIX-3817.v5.patch, > PHOENIX-3817.v6.patch > > > Certain use cases may copy or replicate a subset of a table to a different > table or cluster. For example, application topologies may map data for > specific tenants to different peer clusters. > It would be useful to have a Phoenix VerifyReplication tool that accepts an > SQL query, a target table, and an optional target cluster. The tool would > compare data returned by the query on the different tables and update various > result counters (similar to HBase's VerifyReplication). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-3817) VerifyReplication using SQL
[ https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3817: -- Attachment: (was: PHOENIX-3817.v4.patch) > VerifyReplication using SQL > --- > > Key: PHOENIX-3817 > URL: https://issues.apache.org/jira/browse/PHOENIX-3817 > Project: Phoenix > Issue Type: Improvement >Reporter: Alex Araujo > Assignee: Akshita Malhotra >Priority: Minor > Fix For: 4.15.0 > > Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, > PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch, PHOENIX-3817.v5.patch > > > Certain use cases may copy or replicate a subset of a table to a different > table or cluster. For example, application topologies may map data for > specific tenants to different peer clusters. > It would be useful to have a Phoenix VerifyReplication tool that accepts an > SQL query, a target table, and an optional target cluster. The tool would > compare data returned by the query on the different tables and update various > result counters (similar to HBase's VerifyReplication). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-3817) VerifyReplication using SQL
[ https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3817: -- Attachment: PHOENIX-3817.v5.patch > VerifyReplication using SQL > --- > > Key: PHOENIX-3817 > URL: https://issues.apache.org/jira/browse/PHOENIX-3817 > Project: Phoenix > Issue Type: Improvement >Reporter: Alex Araujo > Assignee: Akshita Malhotra >Priority: Minor > Fix For: 4.15.0 > > Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, > PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch, PHOENIX-3817.v5.patch > > > Certain use cases may copy or replicate a subset of a table to a different > table or cluster. For example, application topologies may map data for > specific tenants to different peer clusters. > It would be useful to have a Phoenix VerifyReplication tool that accepts an > SQL query, a target table, and an optional target cluster. The tool would > compare data returned by the query on the different tables and update various > result counters (similar to HBase's VerifyReplication). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-3817) VerifyReplication using SQL
[ https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3817: -- Attachment: PHOENIX-3817.v4.patch > VerifyReplication using SQL > --- > > Key: PHOENIX-3817 > URL: https://issues.apache.org/jira/browse/PHOENIX-3817 > Project: Phoenix > Issue Type: Improvement >Reporter: Alex Araujo > Assignee: Akshita Malhotra >Priority: Minor > Fix For: 4.15.0 > > Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, > PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch, PHOENIX-3817.v4.patch > > > Certain use cases may copy or replicate a subset of a table to a different > table or cluster. For example, application topologies may map data for > specific tenants to different peer clusters. > It would be useful to have a Phoenix VerifyReplication tool that accepts an > SQL query, a target table, and an optional target cluster. The tool would > compare data returned by the query on the different tables and update various > result counters (similar to HBase's VerifyReplication). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...
GitHub user akshita-malhotra opened a pull request: https://github.com/apache/phoenix/pull/309 [Do Not Merge] PHOENIX-3817 Verify Replication using SQL conditions You can merge this pull request into a Git repository by running: $ git pull https://github.com/akshita-malhotra/phoenix Phoenix3817 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/phoenix/pull/309.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #309 commit abf570eac4f7678148d1498f6140b74fa61e1bd3 Author: Akshita Malhotra Date: 2018-06-01T17:38:43Z Verify Replication using SQL conditions ---
[jira] [Commented] (PHOENIX-4771) Deleting tenant rows using a global connection on the base table does not work.
[ https://issues.apache.org/jira/browse/PHOENIX-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500995#comment-16500995 ] Akshita Malhotra commented on PHOENIX-4771: --- Thanks [~tdsilva] > Deleting tenant rows using a global connection on the base table does not > work. > --- > > Key: PHOENIX-4771 > URL: https://issues.apache.org/jira/browse/PHOENIX-4771 > Project: Phoenix > Issue Type: Bug > Reporter: Akshita Malhotra >Priority: Major > Attachments: deletes.diff > > > Phoenix point deletes on base table using a global connection silently not > deleting data created by a tenant view. > Ques 1: Is this the right behavior? > Ques 2: If yes, should Phoenix validate and through an error/exception? If > no, should Phoenix delete the data correctly? > > The attached test fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4771) Deleting tenant rows using a global connection on the base table does not work.
[ https://issues.apache.org/jira/browse/PHOENIX-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500963#comment-16500963 ] Akshita Malhotra commented on PHOENIX-4771: --- fyi, [~tdsilva] [~jamestaylor] [~gjacoby] > Deleting tenant rows using a global connection on the base table does not > work. > --- > > Key: PHOENIX-4771 > URL: https://issues.apache.org/jira/browse/PHOENIX-4771 > Project: Phoenix > Issue Type: Bug > Reporter: Akshita Malhotra >Priority: Major > Attachments: deletes.diff > > > Phoenix point deletes on base table using a global connection silently not > deleting data created by a tenant view. > Ques 1: Is this the right behavior? > Ques 2: If yes, should Phoenix validate and through an error/exception? If > no, should Phoenix delete the data correctly? > > The attached test fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4771) Deleting tenant rows using a global connection on the base table does not work.
[ https://issues.apache.org/jira/browse/PHOENIX-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-4771: -- Description: Phoenix point deletes on base table using a global connection silently not deleting data created by a tenant view. Ques 1: Is this the right behavior? Ques 2: If yes, should Phoenix validate and through an error/exception? If no, should Phoenix delete the data correctly? The attached test fails. was: Phoenix point deletes on base table silently not deleting data created by a tenant view. Ques 1: Is this the right behavior? Ques 2: If yes, should Phoenix validate and through an error/exception? If no, should Phoenix delete the data correctly? The attached test fails. > Deleting tenant rows using a global connection on the base table does not > work. > --- > > Key: PHOENIX-4771 > URL: https://issues.apache.org/jira/browse/PHOENIX-4771 > Project: Phoenix > Issue Type: Bug > Reporter: Akshita Malhotra >Priority: Major > Attachments: deletes.diff > > > Phoenix point deletes on base table using a global connection silently not > deleting data created by a tenant view. > Ques 1: Is this the right behavior? > Ques 2: If yes, should Phoenix validate and through an error/exception? If > no, should Phoenix delete the data correctly? > > The attached test fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4771) Deleting tenant rows using a global connection on the base table does not work.
[ https://issues.apache.org/jira/browse/PHOENIX-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-4771: -- Summary: Deleting tenant rows using a global connection on the base table does not work. (was: Deleting tenant rows using a tenant connection on the base table does not work.) > Deleting tenant rows using a global connection on the base table does not > work. > --- > > Key: PHOENIX-4771 > URL: https://issues.apache.org/jira/browse/PHOENIX-4771 > Project: Phoenix > Issue Type: Bug > Reporter: Akshita Malhotra >Priority: Major > Attachments: deletes.diff > > > Phoenix point deletes on base table silently not deleting data created by a > tenant view. > Ques 1: Is this the right behavior? > Ques 2: If yes, should Phoenix validate and through an error/exception? If > no, should Phoenix delete the data correctly? > > The attached test fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-4771) Deleting tenant rows using a tenant connection on the base table does not work.
Akshita Malhotra created PHOENIX-4771: - Summary: Deleting tenant rows using a tenant connection on the base table does not work. Key: PHOENIX-4771 URL: https://issues.apache.org/jira/browse/PHOENIX-4771 Project: Phoenix Issue Type: Bug Reporter: Akshita Malhotra Attachments: deletes.diff Phoenix point deletes on base table silently not deleting data created by a tenant view. Ques 1: Is this the right behavior? Ques 2: If yes, should Phoenix validate and through an error/exception? If no, should Phoenix delete the data correctly? The attached test fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-3817) VerifyReplication using SQL
[ https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500599#comment-16500599 ] Akshita Malhotra commented on PHOENIX-3817: --- [~alexaraujo] From the various tests I have run seems like there are certain assumptions being made with the Multi-Table RecordReader approach. For example, while setting the start row for a target region scan based on source scan start row, if the target start row is strictly greater and the size of the target scan is smaller than the source scan this approach would fail to determine the correct amount of good/bad rows (a subset scenario). Similarly, it would yield incorrect results if there are holes in the target scan which is a likely error scenario in case a map reduce job discard nondeterministically processed rows (not very likely in our migration scenario but generally with M/R). I was going through the HBase Verify Replication approach, one way to resolve these issues would be to do something similar i.e. for every source row processed, find the corresponding target scan (start row = current source row and end row = source split end row) thereby eliminating the need for a multi-table record reader. fyi, [~gjacoby] > VerifyReplication using SQL > --- > > Key: PHOENIX-3817 > URL: https://issues.apache.org/jira/browse/PHOENIX-3817 > Project: Phoenix > Issue Type: Improvement >Reporter: Alex Araujo >Assignee: Alex Araujo >Priority: Minor > Fix For: 4.15.0 > > Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, > PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch > > > Certain use cases may copy or replicate a subset of a table to a different > table or cluster. For example, application topologies may map data for > specific tenants to different peer clusters. > It would be useful to have a Phoenix VerifyReplication tool that accepts an > SQL query, a target table, and an optional target cluster. The tool would > compare data returned by the query on the different tables and update various > result counters (similar to HBase's VerifyReplication). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PHOENIX-4667) Create index on a view should return error if any of the REPLICATION_SCOPE/TTL/KEEP_DELETED_CELLS attributes are set
[ https://issues.apache.org/jira/browse/PHOENIX-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-4667: -- Description: As the physical view index table is shared, create index on a view statements should return error if the user tries to set attributes which affect the physical table such as REPLICATION_SCOPE, TTL, KEEP_DELETED_CELLS etc. (was: As the physical view index table is shared, create index on a view statements should return error if the user tries to set attributes which affect the physical table such as SOR settings, TTL, KEEP_DELETED_CELLS etc.) Summary: Create index on a view should return error if any of the REPLICATION_SCOPE/TTL/KEEP_DELETED_CELLS attributes are set (was: Create index on a view should return error if any of the SOR/TTL/KEEP_DELETED_CELLS attributes are set) > Create index on a view should return error if any of the > REPLICATION_SCOPE/TTL/KEEP_DELETED_CELLS attributes are set > > > Key: PHOENIX-4667 > URL: https://issues.apache.org/jira/browse/PHOENIX-4667 > Project: Phoenix > Issue Type: Bug > Reporter: Akshita Malhotra >Priority: Minor > Labels: index, schema > Fix For: 4.13.0, 4.14.0 > > > As the physical view index table is shared, create index on a view statements > should return error if the user tries to set attributes which affect the > physical table such as REPLICATION_SCOPE, TTL, KEEP_DELETED_CELLS etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-4667) Create index on a view should return error if any of the SOR/TTL/KEEP_DELETED_CELLS attributes are set
Akshita Malhotra created PHOENIX-4667: - Summary: Create index on a view should return error if any of the SOR/TTL/KEEP_DELETED_CELLS attributes are set Key: PHOENIX-4667 URL: https://issues.apache.org/jira/browse/PHOENIX-4667 Project: Phoenix Issue Type: Bug Reporter: Akshita Malhotra Fix For: 4.13.0, 4.14.0 As the physical view index table is shared, create index on a view statements should return error if the user tries to set attributes which affect the physical table such as SOR settings, TTL, KEEP_DELETED_CELLS etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (PHOENIX-4623) Inconsistent physical view index name
[ https://issues.apache.org/jira/browse/PHOENIX-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372211#comment-16372211 ] Akshita Malhotra edited comment on PHOENIX-4623 at 2/21/18 11:31 PM: - [~jamestaylor] As per an offline discussion with [~tdsilva], seems to be a naming bug during creation of physical view index table unless it was intended which doesn't seem plausible . A simple bug fix would be to modify the getViewIndexName API to return "_IDX_SCH.TABLE" Might need to follow up on other implications of this. was (Author: akshita.malhotra): [~jamestaylor] As per an offline discussion with [~tdsilva], seems to be a naming bug during creation of physical view index table unless it was intended which doesn't seem plausible . A simple bug fix would be to modify the getViewIndexName API to return "_IDX_SCH.TABLE" Due to this Hgrate is not correctly identifying the physical view indexes. What could be other implications of this? > Inconsistent physical view index name > - > > Key: PHOENIX-4623 > URL: https://issues.apache.org/jira/browse/PHOENIX-4623 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.13.0 >Reporter: Akshita Malhotra >Priority: Major > Labels: easyfix > Fix For: 4.14.0 > > > The physical view indexes are incorrectly named when table has a schema. For > instance, if a table name is "SCH.TABLE", during creation the physical index > table is named as "_IDX_SCH.TABLE" which doesn't look right. In case > namespaces are enabled, the physical index table is named as "SCH:_IDX_TABLE" > The client APIs on the other hand such as > MetaDataUtil.getViewIndexName(String schemaName, String tableName) API to > retrieve the phyisical view index name returns "SCH._IDX_TABLE" which as per > convention returns the right name but functionally leads to wrong results as > this is not how the physical indexes are named during construction. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4623) Inconsistent physical view index name
[ https://issues.apache.org/jira/browse/PHOENIX-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372211#comment-16372211 ] Akshita Malhotra commented on PHOENIX-4623: --- [~jamestaylor] As per an offline discussion with [~tdsilva], seems to be a naming bug during creation of physical view index table unless it was intended which doesn't seem plausible . A simple bug fix would be to modify the getViewIndexName API to return "_IDX_SCH.TABLE" Due to this Hgrate is not correctly identifying the physical view indexes. What could be other implications of this? > Inconsistent physical view index name > - > > Key: PHOENIX-4623 > URL: https://issues.apache.org/jira/browse/PHOENIX-4623 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.13.0 > Reporter: Akshita Malhotra >Priority: Major > Labels: easyfix > Fix For: 4.14.0 > > > The physical view indexes are incorrectly named when table has a schema. For > instance, if a table name is "SCH.TABLE", during creation the physical index > table is named as "_IDX_SCH.TABLE" which doesn't look right. In case > namespaces are enabled, the physical index table is named as "SCH:_IDX_TABLE" > The client APIs on the other hand such as > MetaDataUtil.getViewIndexName(String schemaName, String tableName) API to > retrieve the phyisical view index name returns "SCH._IDX_TABLE" which as per > convention returns the right name but functionally leads to wrong results as > this is not how the physical indexes are named during construction. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PHOENIX-4623) Inconsistent physical view index name
Akshita Malhotra created PHOENIX-4623: - Summary: Inconsistent physical view index name Key: PHOENIX-4623 URL: https://issues.apache.org/jira/browse/PHOENIX-4623 Project: Phoenix Issue Type: Bug Affects Versions: 4.13.0 Reporter: Akshita Malhotra Fix For: 4.14.0 The physical view indexes are incorrectly named when table has a schema. For instance, if a table name is "SCH.TABLE", during creation the physical index table is named as "_IDX_SCH.TABLE" which doesn't look right. In case namespaces are enabled, the physical index table is named as "SCH:_IDX_TABLE" The client APIs on the other hand such as MetaDataUtil.getViewIndexName(String schemaName, String tableName) API to retrieve the phyisical view index name returns "SCH._IDX_TABLE" which as per convention returns the right name but functionally leads to wrong results as this is not how the physical indexes are named during construction. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4344) MapReduce Delete Support
[ https://issues.apache.org/jira/browse/PHOENIX-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363286#comment-16363286 ] Akshita Malhotra commented on PHOENIX-4344: --- [~jamestaylor] Can you explain why would it do a point scan? Maybe I am thinking in the wrong direction but as [~gjacoby] explained, even if the initial delete is deleting over a non PK column, when a point phoenix delete query is being issued, I can provide the PK information (obtain from the map reduce scan) along with the extra predicate that would include the non-PK column. > MapReduce Delete Support > > > Key: PHOENIX-4344 > URL: https://issues.apache.org/jira/browse/PHOENIX-4344 > Project: Phoenix > Issue Type: New Feature >Affects Versions: 4.12.0 >Reporter: Geoffrey Jacoby >Assignee: Geoffrey Jacoby >Priority: Major > > Phoenix already has the ability to use MapReduce for asynchronous handling of > long-running SELECTs. It would be really useful to have this capability for > long-running DELETEs, particularly of tables with indexes where using HBase's > own MapReduce integration would be prohibitively complicated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PHOENIX-4353) Constraint violation error in Snapshot based index rebuild job
[ https://issues.apache.org/jira/browse/PHOENIX-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra reassigned PHOENIX-4353: - Assignee: Akshita Malhotra > Constraint violation error in Snapshot based index rebuild job > -- > > Key: PHOENIX-4353 > URL: https://issues.apache.org/jira/browse/PHOENIX-4353 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.13.0 >Reporter: Monani Mihir >Assignee: Akshita Malhotra >Priority: Critical > > When we try to rebuild index with data table snapshot, many mappers fails > with ERROR 218 (23018): Constraint violation. Example below :- > Cmd to run snapshot based index job :- > bin/hbase org.apache.phoenix.mapreduce.index.IndexTool -it DATA_INDEX -dt > DATA -s SCHEMA -snap -op /TEST/DATA_INDEX > Mappers failed with error :- > {code} > 2017-11-06 09:25:24,380 INFO [main] regionserver.HRegion - Onlined > eac5484a276e8d942e9eebf8275f114f; next sequenceid=18399282 > 2017-11-06 09:25:24,522 ERROR [main] index.PhoenixIndexImportMapper - Error > ERROR 218 (23018): Constraint violation. SCHEMA.DATA_INDEX.:DATA_ID may not > be null while read/write of a record > 2017-11-06 09:25:24,545 INFO [42e9eebf8275f114f.-1] regionserver.HStore - > Closed 0 > 2017-11-06 09:25:24,546 INFO [main] regionserver.HRegion - Closed > SCHEMA.DATA_INDEX,userID1234orgid1234,1509939061852.eac5484a276e8d942e9eebf8275f114f. > 2017-11-06 09:25:24,547 INFO [main] mapred.MapTask - Starting flush of map > output > 2017-11-06 09:25:24,557 INFO [main] compress.CodecPool - Got brand-new > compressor [.snappy] > 2017-11-06 09:25:24,560 WARN [main] mapred.YarnChild - Exception running > child : java.lang.RuntimeException: java.sql.SQLException: ERROR 218 (23018): > Constraint violation. SCHEMA.DATA_INDEX.:DATA_ID may not be null > at > org.apache.phoenix.mapreduce.index.PhoenixIndexImportMapper.map(PhoenixIndexImportMapper.java:122) > at > org.apache.phoenix.mapreduce.index.PhoenixIndexImportMapper.map(PhoenixIndexImportMapper.java:48) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1751) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) > Caused by: java.sql.SQLException: ERROR 218 (23018): Constraint violation. > SCHEMA.DATA_INDEX.:DATA_ID may not be null > at > org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:488) > at > org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150) > at > org.apache.phoenix.schema.ConstraintViolationException.(ConstraintViolationException.java:39) > at org.apache.phoenix.schema.PTableImpl.newKey(PTableImpl.java:753) > at > org.apache.phoenix.compile.UpsertCompiler.setValues(UpsertCompiler.java:154) > at > org.apache.phoenix.compile.UpsertCompiler.access$500(UpsertCompiler.java:116) > at > org.apache.phoenix.compile.UpsertCompiler$4.execute(UpsertCompiler.java:1078) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:393) > at > org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:376) > at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) > at > org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:374) > at > org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:363) > at > org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:269) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172) > at > org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177) > at > org.apache.phoenix.mapreduce.index.PhoenixIndexImportMapper.map(PhoenixIndexImportMapper.java:101) > ... 9 more > 2017-11-06 09:25:24,563 INFO [main] mapred.Task - Runnning cleanup for the > task > 2017-11-06 09:25:24,564 WARN [main] output.FileOutputCommitter - Could not > delete > hdfs://hdfs-local/TEST/DATA_INDEX/SCHEMA.DATA_INDEX/_temporary/1/_temporary/attempt_1508241002000_5658_m_14_0 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (PHOENIX-4355) Snapshot based index rebuild job wont work for two index table of same data table in parallel
[ https://issues.apache.org/jira/browse/PHOENIX-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra reassigned PHOENIX-4355: - Assignee: Akshita Malhotra > Snapshot based index rebuild job wont work for two index table of same data > table in parallel > - > > Key: PHOENIX-4355 > URL: https://issues.apache.org/jira/browse/PHOENIX-4355 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.13.0 >Reporter: Monani Mihir > Assignee: Akshita Malhotra >Priority: Minor > > Run Index rebuild job for one index :- > {code} > bin/hbase org.apache.phoenix.mapreduce.index.IndexTool -it DATA_INDEX_1 -dt > DATA -s SCHEMA -snap -op /TEST/DATA_INDEX_1 > {code} > then run index rebuild job for another index with same source data table.:- > {code} > bin/hbase org.apache.phoenix.mapreduce.index.IndexTool -it DATA_INDEX_2 -dt > DATA -s SCHEMA -snap -op /TEST/DATA_INDEX_1 > {code} > Second command will fail without triggering MR jobs. When you delete previous > MR Job snapshot, it will be able to run. > {code} > It fails with below Error :- > 2017-11-06 06:38:25,122 DEBUG [main] security.HBaseSaslRpcClient - Will send > token of size 0 from initSASLContext. > 2017-11-06 06:38:25,122 DEBUG [main] security.HBaseSaslRpcClient - Will read > input token of size 32 for processing by initSASLContext > 2017-11-06 06:38:25,122 DEBUG [main] security.HBaseSaslRpcClient - Will send > token of size 32 from initSASLContext. > 2017-11-06 06:38:25,122 DEBUG [main] security.HBaseSaslRpcClient - SASL > client context established. Negotiated QoP: auth > 2017-11-06 06:38:26,819 ERROR [main] index.IndexTool - utureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1512) > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1714) > at > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1784) > at > org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.snapshot(MasterProtos.java:47487) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.snapshot(HConnectionManager.java:2146) > at org.apache.hadoop.hbase.client.HBaseAdmin$28.call(HBaseAdmin.java:2882) > at org.apache.hadoop.hbase.client.HBaseAdmin$28.call(HBaseAdmin.java:2879) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:125) > ... 13 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (PHOENIX-4354) Mappers fails in Snapshot based index rebuilding job
[ https://issues.apache.org/jira/browse/PHOENIX-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra reassigned PHOENIX-4354: - Assignee: Akshita Malhotra > Mappers fails in Snapshot based index rebuilding job > > > Key: PHOENIX-4354 > URL: https://issues.apache.org/jira/browse/PHOENIX-4354 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.13.0 >Reporter: Monani Mihir >Assignee: Akshita Malhotra > > Cmd to run snapshot based index job :- > bin/hbase org.apache.phoenix.mapreduce.index.IndexTool -it DATA_INDEX -dt > DATA -s SCHEMA -snap -op /TEST/DATA_INDEX > {code} > 2017-11-06 09:25:25,054 WARN [oreSnapshot-pool6-t1] backup.HFileArchiver - > Failed to archive class > org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, > file:hdfs://hdfs-local/index-snapshot-dir/restore-dir/ed465e0f-002e-43b3-8ec4-133e81c4e3ea/data/default/SCHEMA.DATA/0b93e3fcba18cf281cc147a08fc4656f/0/SCHEMA.DATA=0b93e3fcba18cf281cc147a08fc4656f-14aa829f6e63460fab309cd1f32b9627 > on try #2 > java.io.FileNotFoundException: File/Directory > /index-snapshot-dir/restore-dir/ed465e0f-002e-43b3-8ec4-133e81c4e3ea/data/default/SCHEMA.DATA/0b93e3fcba18cf281cc147a08fc4656f/0/SCHEMA.DATA=0b93e3fcba18cf281cc147a08fc4656f-14aa829f6e63460fab309cd1f32b9627 > does not exist. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:123) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1921) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1751) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3167) > at > org.apache.hadoop.hdfs.DistributedFileSystem$31.doCall(DistributedFileSystem.java:1548) > at > org.apache.hadoop.hdfs.DistributedFileSystem$31.doCall(DistributedFileSystem.java:1544) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1544) > at > org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1964) > at > org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:586) > at > org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:425) > at > org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFile(HFileArchiver.java:260) > at > org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreRegion(RestoreSnapshotHelper.java:445) > at > org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.access$300(RestoreSnapshotHelper.java:110) > at > org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper$2.editRegion(RestoreSnapshotHelper.java:393) > at > org.apache.hadoop.hbase.util.ModifyRegionUtils$2.call(ModifyRegionUtils.java:215) > at > org.apache.hadoop.hbase.util.ModifyRegionUtils$2.call(ModifyRegionUtils.java:212) > at java.util.concurrent.FutureTask.run(FutureTask.java
[jira] [Commented] (PHOENIX-4003) Document how to use snapshots for MR
[ https://issues.apache.org/jira/browse/PHOENIX-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159252#comment-16159252 ] Akshita Malhotra commented on PHOENIX-4003: --- [~pconrad] I will create the first draft outlining the api definition/use case and then follow-up. Thanks! > Document how to use snapshots for MR > > > Key: PHOENIX-4003 > URL: https://issues.apache.org/jira/browse/PHOENIX-4003 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Assignee: Akshita Malhotra > > Now that PHOENIX-3744 is resolved and released, we should update our website > to let users know how to take advantage of this cool new feature (i.e. new > snapshot argument to IndexTool). This could be added to a couple of > placeshttp://phoenix.apache.org/phoenix_mr.html and maybe here > http://phoenix.apache.org/pig_integration.html (is there a way to use > snapshots through our Pig integration? If not we should file a JIRA and do > this). > Directions to update the website are here: > http://phoenix.apache.org/building_website.html -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4161) TableSnapshotReadsMapReduceIT shouldn't need to run its own mini cluster
[ https://issues.apache.org/jira/browse/PHOENIX-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159250#comment-16159250 ] Akshita Malhotra commented on PHOENIX-4161: --- Looking into the issue. > TableSnapshotReadsMapReduceIT shouldn't need to run its own mini cluster > > > Key: PHOENIX-4161 > URL: https://issues.apache.org/jira/browse/PHOENIX-4161 > Project: Phoenix > Issue Type: Bug >Reporter: Samarth Jain > Assignee: Akshita Malhotra > > In PHOENIX-4141, I made a few attempts to get TableSnapshotReadsMapReduceIT > to pass. But finally had to resort to running the test in its own mini > cluster. I don't see any why reason we should, though. [~akshita.malhotra] - > can you please take a look. > Below are the errors I saw in logs: > {code} > java.lang.Exception: java.lang.IllegalArgumentException: Filesystems for > restore directory and HBase root directory should be the same > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) > Caused by: java.lang.IllegalArgumentException: Filesystems for restore > directory and HBase root directory should be the same > at > org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:716) > at > org.apache.phoenix.iterate.TableSnapshotResultIterator.init(TableSnapshotResultIterator.java:77) > at > org.apache.phoenix.iterate.TableSnapshotResultIterator.(TableSnapshotResultIterator.java:73) > at > org.apache.phoenix.mapreduce.PhoenixRecordReader.initialize(PhoenixRecordReader.java:126) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > {code} > Caused by: java.lang.IllegalArgumentException: Restore directory cannot be a > sub directory of HBase root directory. RootDir: > hdfs://localhost:45485/user/jenkins/test-data/3fe1b641-9d14-4053-b3e6-a811035e34b0, > restoreDir: > hdfs://localhost:45485/user/jenkins/test-data/3fe1b641-9d14-4053-b3e6-a811035e34b0/FOO/3eb31efb-b541-4b75-b98f-4558ddf5994e > at > org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:720) > at > org.apache.phoenix.iterate.TableSnapshotResultIterator.init(TableSnapshotResultIterator.java:77) > at > org.apache.phoenix.iterate.TableSnapshotResultIterator.(TableSnapshotResultIterator.java:73) > at > org.apache.phoenix.mapreduce.PhoenixRecordReader.initialize(PhoenixRecordReader.java:126) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (PHOENIX-3976) Validate Index ASYNC job complete when building off a data table snapshot
[ https://issues.apache.org/jira/browse/PHOENIX-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra reassigned PHOENIX-3976: - Assignee: Akshita Malhotra > Validate Index ASYNC job complete when building off a data table snapshot > - > > Key: PHOENIX-3976 > URL: https://issues.apache.org/jira/browse/PHOENIX-3976 > Project: Phoenix > Issue Type: Improvement >Reporter: Samarth Jain > Assignee: Akshita Malhotra > > [~akshita.malhotra] had this good idea of validating whether an async index > build job has completed successfully by comparing the > PhoenixJobCounters.INPUT_RECORDS with the number of expected rows. This would > be especially helpful when we are building the index using a data table > snapshot. Since the data table snapshot won't be taking any writes, it should > be correct and hopefully relatively easy to verify that the number of rows in > the data table snapshot is equal to the PhoenixJobCounters.INPUT_RECORDS > counter. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (PHOENIX-3812) Use HBase snapshots in async index building M/R job
[ https://issues.apache.org/jira/browse/PHOENIX-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045213#comment-16045213 ] Akshita Malhotra edited comment on PHOENIX-3812 at 6/9/17 11:29 PM: [~jamestaylor] Thanks for the comment. I have updated and uploaded two patches: PHOENIX-3812.patch applies cleanly to master and 1.1 branch. PHOENIX-3812-4.x-0.98.patch is for 4.x-0.98 branch. was (Author: akshita.malhotra): [~jamestaylor] Thanks for the comment. I have uploaded two patches: PHOENIX-3812.patch applies cleanly to master and 1.1 branch. PHOENIX-3812-4.x-0.98.patch is for 4.x-0.98 branch. > Use HBase snapshots in async index building M/R job > --- > > Key: PHOENIX-3812 > URL: https://issues.apache.org/jira/browse/PHOENIX-3812 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.10.0 >Reporter: Maddineni Sukumar >Assignee: Akshita Malhotra > Attachments: PHOENIX-3812-4.x-0.98.patch, PHOENIX-3812.patch > > > As per discussion with James, HBase snapshots makes it lot easier and faster > to operate on existing data. > So explore using HBase snapshots in index building M/R job for async index. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3812) Use HBase snapshots in async index building M/R job
[ https://issues.apache.org/jira/browse/PHOENIX-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045213#comment-16045213 ] Akshita Malhotra commented on PHOENIX-3812: --- [~jamestaylor] Thanks for the comment. I have uploaded two patches: PHOENIX-3812.patch applies cleanly to master and 1.1 branch. PHOENIX-3812-4.x-0.98.patch is for 4.x-0.98 branch. > Use HBase snapshots in async index building M/R job > --- > > Key: PHOENIX-3812 > URL: https://issues.apache.org/jira/browse/PHOENIX-3812 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.10.0 >Reporter: Maddineni Sukumar >Assignee: Akshita Malhotra > Attachments: PHOENIX-3812-4.x-0.98.patch, PHOENIX-3812.patch > > > As per discussion with James, HBase snapshots makes it lot easier and faster > to operate on existing data. > So explore using HBase snapshots in index building M/R job for async index. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3812) Use HBase snapshots in async index building M/R job
[ https://issues.apache.org/jira/browse/PHOENIX-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3812: -- Attachment: PHOENIX-3812-4.x-0.98.patch > Use HBase snapshots in async index building M/R job > --- > > Key: PHOENIX-3812 > URL: https://issues.apache.org/jira/browse/PHOENIX-3812 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.10.0 >Reporter: Maddineni Sukumar >Assignee: Akshita Malhotra > Attachments: PHOENIX-3812-4.x-0.98.patch, PHOENIX-3812.patch > > > As per discussion with James, HBase snapshots makes it lot easier and faster > to operate on existing data. > So explore using HBase snapshots in index building M/R job for async index. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3812) Use HBase snapshots in async index building M/R job
[ https://issues.apache.org/jira/browse/PHOENIX-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3812: -- Attachment: PHOENIX-3812.patch > Use HBase snapshots in async index building M/R job > --- > > Key: PHOENIX-3812 > URL: https://issues.apache.org/jira/browse/PHOENIX-3812 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.10.0 >Reporter: Maddineni Sukumar >Assignee: Akshita Malhotra > Attachments: PHOENIX-3812.patch > > > As per discussion with James, HBase snapshots makes it lot easier and faster > to operate on existing data. > So explore using HBase snapshots in index building M/R job for async index. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] phoenix pull request #260: PHOENIX-3812: Use HBase snapshots in async index ...
GitHub user akshita-malhotra opened a pull request: https://github.com/apache/phoenix/pull/260 PHOENIX-3812: Use HBase snapshots in async index building M/R job - Index tool creates a snapshot and uses it as a configuration parameter to run index M/R job using HBase snapshot. - Add option to configure use of snapshots in IndexTool You can merge this pull request into a Git repository by running: $ git pull https://github.com/akshita-malhotra/phoenix PHOENIX-3812 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/phoenix/pull/260.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #260 commit 57fb264ba76e849db3bc4375f87091499cbce618 Author: Akshita <akshita.malho...@salesforce.com> Date: 2017-06-07T23:14:47Z PHOENIX-3812: Use HBase snapshots in async index building M/R job --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries
[ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3744: -- Attachment: PHOENIX-3744-4.x-HBase-1.1.patch > Support snapshot scanners for MR-based queries > -- > > Key: PHOENIX-3744 > URL: https://issues.apache.org/jira/browse/PHOENIX-3744 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Assignee: Akshita Malhotra > Attachments: PHOENIX-3744-4.x-HBase-0.98.patch, > PHOENIX-3744-4.x-HBase-1.1.patch, PHOENIX-3744.patch, PHOENIX-3744.patch, > PHOENIX-3744.patch > > > HBase support scanning over snapshots, with a SnapshotScanner that accesses > the region directly in HDFS. We should make sure that Phoenix can support > that. > Not sure how we'd want to decide when to run a query over a snapshot. Some > ideas: > - if there's an SCN set (i.e. the query is running at a point in time in the > past) > - if the memstore is empty > - if the query is being run at a timestamp earlier than any memstore data > - as a config option on the table > - as a query hint > - based on some kind of optimizer rule (i.e. based on estimated # of bytes > that will be scanned) > Phoenix typically runs a query at the timestamp at which it was compiled. Any > data committed after this time should not be seen while a query is running. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] phoenix pull request #256: PHOENIX-3477 patch for 4.x-HBase-1.1
GitHub user akshita-malhotra opened a pull request: https://github.com/apache/phoenix/pull/256 PHOENIX-3477 patch for 4.x-HBase-1.1 PHOENIX-3477 patch for 4.x-HBase-1.1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/akshita-malhotra/phoenix PHOENIX-3744-4.x-HBase-1.1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/phoenix/pull/256.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #256 commit 43367cf81bab6e957e03845ba1387017bc7e8530 Author: Akshita <akshita.malho...@salesforce.com> Date: 2017-06-06T00:41:40Z PHOENIX-3477 patch for 4.x-HBase-1.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries
[ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3744: -- Attachment: PHOENIX-3744-4.x-HBase-0.98.patch > Support snapshot scanners for MR-based queries > -- > > Key: PHOENIX-3744 > URL: https://issues.apache.org/jira/browse/PHOENIX-3744 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Assignee: Akshita Malhotra > Attachments: PHOENIX-3744-4.x-HBase-0.98.patch, PHOENIX-3744.patch, > PHOENIX-3744.patch, PHOENIX-3744.patch > > > HBase support scanning over snapshots, with a SnapshotScanner that accesses > the region directly in HDFS. We should make sure that Phoenix can support > that. > Not sure how we'd want to decide when to run a query over a snapshot. Some > ideas: > - if there's an SCN set (i.e. the query is running at a point in time in the > past) > - if the memstore is empty > - if the query is being run at a timestamp earlier than any memstore data > - as a config option on the table > - as a query hint > - based on some kind of optimizer rule (i.e. based on estimated # of bytes > that will be scanned) > Phoenix typically runs a query at the timestamp at which it was compiled. Any > data committed after this time should not be seen while a query is running. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] phoenix pull request #255: PHOENIX-3744 for 4.x-HBase-0.98
GitHub user akshita-malhotra opened a pull request: https://github.com/apache/phoenix/pull/255 PHOENIX-3744 for 4.x-HBase-0.98 PHOENIX-3744 patch for 4.x-HBase-0.98 branch You can merge this pull request into a Git repository by running: $ git pull https://github.com/akshita-malhotra/phoenix PHOENIX-3744-4.x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/phoenix/pull/255.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #255 commit 718dfb2b11c3b57ae1dc94b79d15ada516bba4a9 Author: Akshita <akshita.malho...@salesforce.com> Date: 2017-06-05T23:49:08Z PHOENIX-3744 for 4.x-HBase-0.98 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries
[ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3744: -- Attachment: PHOENIX-3744.patch Patch wasn't applying due to recent changes to scan metrics. Resolved conflicts and uploaded the patch. > Support snapshot scanners for MR-based queries > -- > > Key: PHOENIX-3744 > URL: https://issues.apache.org/jira/browse/PHOENIX-3744 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Assignee: Akshita Malhotra > Attachments: PHOENIX-3744.patch, PHOENIX-3744.patch, > PHOENIX-3744.patch > > > HBase support scanning over snapshots, with a SnapshotScanner that accesses > the region directly in HDFS. We should make sure that Phoenix can support > that. > Not sure how we'd want to decide when to run a query over a snapshot. Some > ideas: > - if there's an SCN set (i.e. the query is running at a point in time in the > past) > - if the memstore is empty > - if the query is being run at a timestamp earlier than any memstore data > - as a config option on the table > - as a query hint > - based on some kind of optimizer rule (i.e. based on estimated # of bytes > that will be scanned) > Phoenix typically runs a query at the timestamp at which it was compiled. Any > data committed after this time should not be seen while a query is running. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries
[ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3744: -- Attachment: PHOENIX-3744.patch Updated patch > Support snapshot scanners for MR-based queries > -- > > Key: PHOENIX-3744 > URL: https://issues.apache.org/jira/browse/PHOENIX-3744 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Assignee: Akshita Malhotra > Attachments: PHOENIX-3744.patch, PHOENIX-3744.patch > > > HBase support scanning over snapshots, with a SnapshotScanner that accesses > the region directly in HDFS. We should make sure that Phoenix can support > that. > Not sure how we'd want to decide when to run a query over a snapshot. Some > ideas: > - if there's an SCN set (i.e. the query is running at a point in time in the > past) > - if the memstore is empty > - if the query is being run at a timestamp earlier than any memstore data > - as a config option on the table > - as a query hint > - based on some kind of optimizer rule (i.e. based on estimated # of bytes > that will be scanned) > Phoenix typically runs a query at the timestamp at which it was compiled. Any > data committed after this time should not be seen while a query is running. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries
[ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3744: -- Attachment: (was: PHOENIX-3744.patch) > Support snapshot scanners for MR-based queries > -- > > Key: PHOENIX-3744 > URL: https://issues.apache.org/jira/browse/PHOENIX-3744 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Assignee: Akshita Malhotra > Attachments: PHOENIX-3744.patch > > > HBase support scanning over snapshots, with a SnapshotScanner that accesses > the region directly in HDFS. We should make sure that Phoenix can support > that. > Not sure how we'd want to decide when to run a query over a snapshot. Some > ideas: > - if there's an SCN set (i.e. the query is running at a point in time in the > past) > - if the memstore is empty > - if the query is being run at a timestamp earlier than any memstore data > - as a config option on the table > - as a query hint > - based on some kind of optimizer rule (i.e. based on estimated # of bytes > that will be scanned) > Phoenix typically runs a query at the timestamp at which it was compiled. Any > data committed after this time should not be seen while a query is running. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] phoenix issue #239: PHOENIX-3744: Support snapshot scanners for MR-based Non...
Github user akshita-malhotra commented on the issue: https://github.com/apache/phoenix/pull/239 @JamesRTaylor Thanks a lot for the review. I have made the suggested changes and uploaded the updated patch on the jira. Regarding creating snapshot to generalize the use of snapshots for M/R jobs, I was under the impression that we are passing the snapshot name as input after our last discussion with @lhofhansl and Rahul G. If we are to follow the former approach, I will go ahead and make the changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries
[ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3744: -- Attachment: PHOENIX-3744.patch Updated patch > Support snapshot scanners for MR-based queries > -- > > Key: PHOENIX-3744 > URL: https://issues.apache.org/jira/browse/PHOENIX-3744 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Assignee: Akshita Malhotra > Attachments: PHOENIX-3744.patch, PHOENIX-3744.patch > > > HBase support scanning over snapshots, with a SnapshotScanner that accesses > the region directly in HDFS. We should make sure that Phoenix can support > that. > Not sure how we'd want to decide when to run a query over a snapshot. Some > ideas: > - if there's an SCN set (i.e. the query is running at a point in time in the > past) > - if the memstore is empty > - if the query is being run at a timestamp earlier than any memstore data > - as a config option on the table > - as a query hint > - based on some kind of optimizer rule (i.e. based on estimated # of bytes > that will be scanned) > Phoenix typically runs a query at the timestamp at which it was compiled. Any > data committed after this time should not be seen while a query is running. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] phoenix issue #239: PHOENIX-3744: Support snapshot scanners for MR-based Non...
Github user akshita-malhotra commented on the issue: https://github.com/apache/phoenix/pull/239 @JamesRTaylor - Changed ParallelScanGrouper classes as per the review - Changes to BaseTest were to avoid the following error: "Restore directory cannot be a sub directory of HBase root directory" Therefore, was sending true to create the root dir. Changed to use a random dir instead to avoid to make these changes - Refactored the util classes to Factory Also, uploaded the patch on the jira. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries
[ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra updated PHOENIX-3744: -- Attachment: PHOENIX-3744.patch PHOENIX-3744: Support snapshot scanners for MR-based Non-aggregate queries > Support snapshot scanners for MR-based queries > -- > > Key: PHOENIX-3744 > URL: https://issues.apache.org/jira/browse/PHOENIX-3744 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Assignee: Akshita Malhotra > Attachments: PHOENIX-3744.patch > > > HBase support scanning over snapshots, with a SnapshotScanner that accesses > the region directly in HDFS. We should make sure that Phoenix can support > that. > Not sure how we'd want to decide when to run a query over a snapshot. Some > ideas: > - if there's an SCN set (i.e. the query is running at a point in time in the > past) > - if the memstore is empty > - if the query is being run at a timestamp earlier than any memstore data > - as a config option on the table > - as a query hint > - based on some kind of optimizer rule (i.e. based on estimated # of bytes > that will be scanned) > Phoenix typically runs a query at the timestamp at which it was compiled. Any > data committed after this time should not be seen while a query is running. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] phoenix issue #239: Phoenix-3744: Support snapshot scanners for MR-based que...
Github user akshita-malhotra commented on the issue: https://github.com/apache/phoenix/pull/239 Thanks @JamesRTaylor . I squashed the commits and changed the prefix of the commit message. I will answer/make appropriate changes and upload the patch onto the jira. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] phoenix issue #239: Phoenix-3744: Support snapshot scanners for MR-based que...
Github user akshita-malhotra commented on the issue: https://github.com/apache/phoenix/pull/239 Sure, I will do that. Thanks @JamesRTaylor --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (PHOENIX-3852) Support snapshot scanner M/R jobs for aggregate queries
Akshita Malhotra created PHOENIX-3852: - Summary: Support snapshot scanner M/R jobs for aggregate queries Key: PHOENIX-3852 URL: https://issues.apache.org/jira/browse/PHOENIX-3852 Project: Phoenix Issue Type: New Feature Affects Versions: 4.10.0 Reporter: Akshita Malhotra Assignee: Akshita Malhotra -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] phoenix issue #239: Phoenix-3744: Support snapshot scanners for MR-based que...
Github user akshita-malhotra commented on the issue: https://github.com/apache/phoenix/pull/239 - Snapshot scanner for non-aggregate queries. - Added integration tests (simple select query, conditional and limit) - Abstracted out ScanRegionObserver code to fetch the processed region scanner without the coprocessor environment - To make the snapshot work for aggregate queries, due to the complexity of the aggregate region observer code it is almost impossible to refactor the code without fully understanding the functionality. Will require some guidance if the aggregate query use case is required. @JamesRTaylor @lhofhansl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (PHOENIX-3820) Refactor Region Observer functionality (PHOENIX) to fetch processed region scanner without coprocessor environment
Akshita Malhotra created PHOENIX-3820: - Summary: Refactor Region Observer functionality (PHOENIX) to fetch processed region scanner without coprocessor environment Key: PHOENIX-3820 URL: https://issues.apache.org/jira/browse/PHOENIX-3820 Project: Phoenix Issue Type: Improvement Affects Versions: 4.10.0 Reporter: Akshita Malhotra Assignee: Akshita Malhotra -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (PHOENIX-3812) Explore using HBase snapshots in async index building M/R job
[ https://issues.apache.org/jira/browse/PHOENIX-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akshita Malhotra reassigned PHOENIX-3812: - Assignee: Akshita Malhotra (was: Maddineni Sukumar) > Explore using HBase snapshots in async index building M/R job > - > > Key: PHOENIX-3812 > URL: https://issues.apache.org/jira/browse/PHOENIX-3812 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.10.0 >Reporter: Maddineni Sukumar >Assignee: Akshita Malhotra > > As per discussion with James, HBase snapshots makes it lot easier and faster > to operate on existing data. > So explore using HBase snapshots in index building M/R job for async index. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (PHOENIX-3744) Support snapshot scanners for MR-based queries
[ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981853#comment-15981853 ] Akshita Malhotra edited comment on PHOENIX-3744 at 4/24/17 8:41 PM: Parallel Scan grouper is extended to differentiate the functionality for getting region boundaries Added integration test, compares the snapshot read result with the result from select query by setting CurrentScn value. the configuration parameter is the snapshot name key, if set do a snapshot read Used an existing PhoenixIndexDBWritable class for the purpose of testing, will add a new one as I will add more tests. ExpressionProjector functionality is extended for snapshots as the keyvalue format returned from TableSnapshotScanner is different from ClientScanner and therefore not properly interrupted by Phoenix thereby returning null in case of projected columns. For the same table, following shows the different format of the keyvalues: ClientScanner: keyvalues={AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=�SSDD��} TableSnapshotScanner: keyvalues={AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x, AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD} To DO: Add more integration tests to cover different scenarios such as where clause etc fyi: [~jamestaylor] was (Author: akshita.malhotra): Parallel Scan grouper is extended to differentiate the functionality for getting region boundaries Added integration test, compares the snapshot read result with the result from select query by setting CurrentScn value. the configuration parameter is the snapshot name key, if set do a snapshot read Used an existing PhoenixIndexDBWritable class for the purpose of testing, will add a new one as I will add more tests. ExpressionProjector functionality is extended for snapshots as the keyvalue format returned from TableSnapshotScanner is different from ClientScanner and therefore not properly interrupted by Phoenix thereby returning null in case of projected columns. For the same table, following shows the different format of the keyvalues: ClientScanner: keyvalues={AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=�SSDD��} TableSnapshotScanner: keyvalues={AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x, AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD} To DO: Add more integration tests to cover different scenarios such as where clause etc [~jamestaylor] > Support snapshot scanners for MR-based queries > -- > > Key: PHOENIX-3744 > URL: https://issues.apache.org/jira/browse/PHOENIX-3744 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Assignee: Akshita Malhotra > > HBase support scanning over snapshots, with a SnapshotScanner that accesses > the region directly in HDFS. We should make sure that Phoenix can support > that. > Not sure how we'd want to decide when to run a query over a snapshot. Some > ideas: > - if there's an SCN set (i.e. the query is running at a point in time in the > past) > - if the memstore is empty > - if the query is being run at a timestamp earlier than any memstore data > - as a config option on the table > - as a query hint > - based on some kind of optimizer rule (i.e. based on estimated # of bytes > that will be scanned) > Phoenix typically runs a query at the timestamp at which it was compiled. Any > data committed after this time should not be seen while a query is running. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (PHOENIX-3744) Support snapshot scanners for MR-based queries
[ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981853#comment-15981853 ] Akshita Malhotra edited comment on PHOENIX-3744 at 4/24/17 8:41 PM: Parallel Scan grouper is extended to differentiate the functionality for getting region boundaries Added integration test, compares the snapshot read result with the result from select query by setting CurrentScn value. the configuration parameter is the snapshot name key, if set do a snapshot read Used an existing PhoenixIndexDBWritable class for the purpose of testing, will add a new one as I will add more tests. ExpressionProjector functionality is extended for snapshots as the keyvalue format returned from TableSnapshotScanner is different from ClientScanner and therefore not properly interrupted by Phoenix thereby returning null in case of projected columns. For the same table, following shows the different format of the keyvalues: ClientScanner: keyvalues={AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=�SSDD��} TableSnapshotScanner: keyvalues={AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x, AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD} To DO: Add more integration tests to cover different scenarios such as where clause etc [~jamestaylor] was (Author: akshita.malhotra): Parallel Scan grouper is extended to differentiate the functionality for getting region boundaries Added integration test, compares the snapshot read result with the result from select query by setting CurrentScn value. the configuration parameter is the snapshot name key, if set do a snapshot read Used an existing PhoenixIndexDBWritable class for the purpose of testing, will add a new one as I will add more tests. ExpressionProjector functionality is extended for snapshots as the keyvalue format returned from TableSnapshotScanner is different from ClientScanner and therefore not properly interrupted by Phoenix thereby returning null in case of projected columns. For the same table, following shows the different format of the keyvalues: ClientScanner: keyvalues={AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=�SSDD��} TableSnapshotScanner: keyvalues={AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x, AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD} To DO: Add more integration tests to cover different scenarios such as where clause etc > Support snapshot scanners for MR-based queries > -- > > Key: PHOENIX-3744 > URL: https://issues.apache.org/jira/browse/PHOENIX-3744 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > Assignee: Akshita Malhotra > > HBase support scanning over snapshots, with a SnapshotScanner that accesses > the region directly in HDFS. We should make sure that Phoenix can support > that. > Not sure how we'd want to decide when to run a query over a snapshot. Some > ideas: > - if there's an SCN set (i.e. the query is running at a point in time in the > past) > - if the memstore is empty > - if the query is being run at a timestamp earlier than any memstore data > - as a config option on the table > - as a query hint > - based on some kind of optimizer rule (i.e. based on estimated # of bytes > that will be scanned) > Phoenix typically runs a query at the timestamp at which it was compiled. Any > data committed after this time should not be seen while a query is running. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[GitHub] phoenix pull request #239: Phoenix-3744: Support snapshot scanners for MR-ba...
GitHub user akshita-malhotra opened a pull request: https://github.com/apache/phoenix/pull/239 Phoenix-3744: Support snapshot scanners for MR-based queries - Parallel Scan grouper is extended to differentiate the functionality for getting region boundaries - Added integration test, compares the snapshot read result with the result from select query by setting CurrentScn value. - the configuration parameter is the snapshot name key, if set do a snapshot read - Used an existing PhoenixIndexDBWritable class for the purpose of testing, will add a new one as I will add more tests. - ExpressionProjector functionality is extended for snapshots as the keyvalue format returned from TableSnapshotScanner is different from ClientScanner and therefore not properly interrupted by Phoenix thereby returning null in case of projected columns. For the same table, following shows the different format of the keyvalues: 1. ClientScanner: keyvalues={AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=SSDD } 2. TableSnapshotScanner: keyvalues={AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x, AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD} @JamesRTaylor @lhofhansl To DO: Add more integration tests to cover different scenarios such as where clause etc You can merge this pull request into a Git repository by running: $ git pull https://github.com/akshita-malhotra/phoenix Phoenix-3744 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/phoenix/pull/239.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #239 commit 73b1ac04c45381a2a0511146c666af476e488cdf Author: Akshita <akshita.malho...@salesforce.com> Date: 2017-04-24T18:43:02Z Phoenix-3744: Support snapshot scanners for MR-based queries --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (PHOENIX-3475) MetaData #getTables() API doesn't return view indexes
[ https://issues.apache.org/jira/browse/PHOENIX-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656021#comment-15656021 ] Akshita Malhotra commented on PHOENIX-3475: --- Yes, I am looking for Phoenix metadata. As per my understanding, the data corresponding to a view on an index is stored in the base index table (_IDX_) and there is no HTable which map to a view index name (globalViewIdx in the above test scenario). Therefore, to migrate view indexes we need to copy data in the base index table similar to what we do in case of views, copying over rows in SYSTEM.CATALOG For example: When I run #getTables("","", "_IDX_MIGRATIONTEST", new String[] {"INDEX","TABLE"}), it returns empty result set. How can I get metadata corresponding to this table? > MetaData #getTables() API doesn't return view indexes > - > > Key: PHOENIX-3475 > URL: https://issues.apache.org/jira/browse/PHOENIX-3475 > Project: Phoenix > Issue Type: Bug >Reporter: Akshita Malhotra > Fix For: 4.9.0 > > > HBase migration tool uses DatabaseMetadata#getTables() API to retrieve the > tables for copying data. We have found that API doesn't return base index > tables ( _IDX_) > For testing purposes, we issue following DDL to generate the view and the > corresponding view index: > -CREATE VIEW IF NOT EXISTS MIGRATIONTEST_VIEW (OLD_VALUE_VIEW varchar) AS > SELECT * FROM MIGRATIONTEST WHERE OLD_VALUE like 'E%' > -CREATE INDEX IF NOT EXISTS MIGRATIONTEST_VIEW_IDX ON MIGRATIONTEST_VIEW > (OLD_VALUE_VIEW) > By using HBase API, we were able to confirm that base index table > (_IDX_MIGRATIONTEST) is created. > Both jdbc DatabaseMetadata API and P* getMetaDataCache API doesn't seem to > be returning view indexes. Also P*MetaData #getTableRef API return > "TableNotFoundException" when attempted to fetch PTable corresponding to the > base index table name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PHOENIX-3475) MetaData #getTables() API doesn't return view indexes
[ https://issues.apache.org/jira/browse/PHOENIX-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655525#comment-15655525 ] Akshita Malhotra commented on PHOENIX-3475: --- Thanks [~jamestaylor] Both the solutions give us just the name of the base index table. We require additional Phoenix metadata to construct table descriptor. Also using DatabaseMetadats #getTables() to retrieve all index types is undesirable as it returns views whose names don't map to an HBase table. Moreover, it still doesn't return view indexes. In all, the task is to retrieve complete Phoenix metadata corresponding to a physical view index table (not just the name). > MetaData #getTables() API doesn't return view indexes > - > > Key: PHOENIX-3475 > URL: https://issues.apache.org/jira/browse/PHOENIX-3475 > Project: Phoenix > Issue Type: Bug > Reporter: Akshita Malhotra > Fix For: 4.9.0 > > > HBase migration tool uses DatabaseMetadata#getTables() API to retrieve the > tables for copying data. We have found that API doesn't return base index > tables ( _IDX_) > For testing purposes, we issue following DDL to generate the view and the > corresponding view index: > -CREATE VIEW IF NOT EXISTS MIGRATIONTEST_VIEW (OLD_VALUE_VIEW varchar) AS > SELECT * FROM MIGRATIONTEST WHERE OLD_VALUE like 'E%' > -CREATE INDEX IF NOT EXISTS MIGRATIONTEST_VIEW_IDX ON MIGRATIONTEST_VIEW > (OLD_VALUE_VIEW) > By using HBase API, we were able to confirm that base index table > (_IDX_MIGRATIONTEST) is created. > Both jdbc DatabaseMetadata API and P* getMetaDataCache API doesn't seem to > be returning view indexes. Also P*MetaData #getTableRef API return > "TableNotFoundException" when attempted to fetch PTable corresponding to the > base index table name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PHOENIX-3475) MetaData #getTables() API doesn't return view indexes
Akshita Malhotra created PHOENIX-3475: - Summary: MetaData #getTables() API doesn't return view indexes Key: PHOENIX-3475 URL: https://issues.apache.org/jira/browse/PHOENIX-3475 Project: Phoenix Issue Type: Bug Reporter: Akshita Malhotra Fix For: 4.9.0 HBase migration tool uses DatabaseMetadata#getTables() API to retrieve the tables for copying data. We have found that API doesn't return base index tables ( _IDX_) For testing purposes, we issue following DDL to generate the view and the corresponding view index: -CREATE VIEW IF NOT EXISTS MIGRATIONTEST_VIEW (OLD_VALUE_VIEW varchar) AS SELECT * FROM MIGRATIONTEST WHERE OLD_VALUE like 'E%' -CREATE INDEX IF NOT EXISTS MIGRATIONTEST_VIEW_IDX ON MIGRATIONTEST_VIEW (OLD_VALUE_VIEW) By using HBase API, we were able to confirm that base index table (_IDX_MIGRATIONTEST) is created. Both jdbc DatabaseMetadata API and P* getMetaDataCache API doesn't seem to be returning view indexes. Also P*MetaData #getTableRef API return "TableNotFoundException" when attempted to fetch PTable corresponding to the base index table name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)