[jira] [Assigned] (PHOENIX-5344) MapReduce Jobs Over Salted Snapshots Give Wrong Results

2019-06-14 Thread Akshita Malhotra (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-5344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra reassigned PHOENIX-5344:
-

Assignee: Akshita Malhotra

> MapReduce Jobs Over Salted Snapshots Give Wrong Results
> ---
>
> Key: PHOENIX-5344
> URL: https://issues.apache.org/jira/browse/PHOENIX-5344
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Geoffrey Jacoby
>    Assignee: Akshita Malhotra
>Priority: Major
>
> I'm modifying an existing MapReduce job to use Phoenix's MapReduce / HBase 
> snapshot integration. When testing, I noticed that existing tests that had 
> previously worked for this job when running on salted Phoenix tables began to 
> fail when running on a snapshot of those tables. They pass when running 
> identical logic against the live table. Unsalted tables give the same, 
> correct result whether running against a live table or a snapshot. 
> The symptom on the salted snapshots is that the row count is way too high (a 
> factor of about 7x), but the exact amount appears non-deterministic. 
> My working theory is that somewhere the snapshot MapReduce integration for 
> Phoenix sets up the scans improperly for salted tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-3817) VerifyReplication using SQL

2018-09-19 Thread Akshita Malhotra (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3817:
--
Attachment: PHOENIX-3817-final2.patch

> VerifyReplication using SQL
> ---
>
> Key: PHOENIX-3817
> URL: https://issues.apache.org/jira/browse/PHOENIX-3817
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Alex Araujo
>    Assignee: Akshita Malhotra
>Priority: Minor
> Fix For: 4.15.0
>
> Attachments: PHOENIX-3817-final.patch, PHOENIX-3817-final2.patch, 
> PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, PHOENIX-3817.v3.patch, 
> PHOENIX-3817.v4.patch, PHOENIX-3817.v5.patch, PHOENIX-3817.v6.patch, 
> PHOENIX-3817.v7.patch
>
>
> Certain use cases may copy or replicate a subset of a table to a different 
> table or cluster. For example, application topologies may map data for 
> specific tenants to different peer clusters.
> It would be useful to have a Phoenix VerifyReplication tool that accepts an 
> SQL query, a target table, and an optional target cluster. The tool would 
> compare data returned by the query on the different tables and update various 
> result counters (similar to HBase's VerifyReplication).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PHOENIX-4867) Document Verify Replication tool (PHOENIX-3817)

2018-08-24 Thread Akshita Malhotra (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra reassigned PHOENIX-4867:
-

Assignee: Akshita Malhotra

> Document Verify Replication tool (PHOENIX-3817)
> ---
>
> Key: PHOENIX-4867
> URL: https://issues.apache.org/jira/browse/PHOENIX-4867
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Akshita Malhotra
>    Assignee: Akshita Malhotra
>Priority: Minor
>
> Create a phoenix user level doc for explaining the features and limitations 
> of VerifyReplication tool (PHOENIX-3817)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4867) Document Verify Replication tool (PHOENIX-3817)

2018-08-24 Thread Akshita Malhotra (JIRA)
Akshita Malhotra created PHOENIX-4867:
-

 Summary: Document Verify Replication tool (PHOENIX-3817)
 Key: PHOENIX-4867
 URL: https://issues.apache.org/jira/browse/PHOENIX-4867
 Project: Phoenix
  Issue Type: Bug
Reporter: Akshita Malhotra


Create a phoenix user level doc for explaining the features and limitations of 
VerifyReplication tool (PHOENIX-3817)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-3817) VerifyReplication using SQL

2018-08-23 Thread Akshita Malhotra (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3817:
--
Attachment: PHOENIX-3817-final.patch

> VerifyReplication using SQL
> ---
>
> Key: PHOENIX-3817
> URL: https://issues.apache.org/jira/browse/PHOENIX-3817
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Alex Araujo
>    Assignee: Akshita Malhotra
>Priority: Minor
> Fix For: 4.15.0
>
> Attachments: PHOENIX-3817-final.patch, PHOENIX-3817.v1.patch, 
> PHOENIX-3817.v2.patch, PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch, 
> PHOENIX-3817.v5.patch, PHOENIX-3817.v6.patch, PHOENIX-3817.v7.patch
>
>
> Certain use cases may copy or replicate a subset of a table to a different 
> table or cluster. For example, application topologies may map data for 
> specific tenants to different peer clusters.
> It would be useful to have a Phoenix VerifyReplication tool that accepts an 
> SQL query, a target table, and an optional target cluster. The tool would 
> compare data returned by the query on the different tables and update various 
> result counters (similar to HBase's VerifyReplication).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4849) UPSERT SELECT fails with stale region boundary exception after a split

2018-08-14 Thread Akshita Malhotra (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-4849:
--
Attachment: PHOENIX-4849.patch

> UPSERT SELECT fails with stale region boundary exception after a split
> --
>
> Key: PHOENIX-4849
> URL: https://issues.apache.org/jira/browse/PHOENIX-4849
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Akshita Malhotra
>Priority: Major
> Attachments: PHOENIX-4849.patch
>
>
> UPSERT SELECT throws a StaleRegionBoundaryCacheException immediately after a 
> split. On the other hand, an upsert followed by a select for example works 
> absolutely fine
> org.apache.phoenix.schema.StaleRegionBoundaryCacheException: ERROR 1108 
> (XCL08): Cache of region boundaries are out of date.
> at 
> org.apache.phoenix.exception.SQLExceptionCode$14.newException(SQLExceptionCode.java:365)
>  at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
>  at 
> org.apache.phoenix.util.ServerUtil.parseRemoteException(ServerUtil.java:183)
>  at 
> org.apache.phoenix.util.ServerUtil.parseServerExceptionOrNull(ServerUtil.java:167)
>  at 
> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:134)
>  at 
> org.apache.phoenix.iterate.ScanningResultIterator.next(ScanningResultIterator.java:153)
>  at 
> org.apache.phoenix.iterate.TableResultIterator.next(TableResultIterator.java:228)
>  at 
> org.apache.phoenix.iterate.LookAheadResultIterator$1.advance(LookAheadResultIterator.java:47)
>  at 
> org.apache.phoenix.iterate.LookAheadResultIterator.init(LookAheadResultIterator.java:59)
>  at 
> org.apache.phoenix.iterate.LookAheadResultIterator.peek(LookAheadResultIterator.java:73)
>  at 
> org.apache.phoenix.iterate.SerialIterators$SerialIterator.nextIterator(SerialIterators.java:187)
>  at 
> org.apache.phoenix.iterate.SerialIterators$SerialIterator.currentIterator(SerialIterators.java:160)
>  at 
> org.apache.phoenix.iterate.SerialIterators$SerialIterator.peek(SerialIterators.java:218)
>  at 
> org.apache.phoenix.iterate.ConcatResultIterator.currentIterator(ConcatResultIterator.java:100)
>  at 
> org.apache.phoenix.iterate.ConcatResultIterator.next(ConcatResultIterator.java:117)
>  at 
> org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
>  at 
> org.apache.phoenix.iterate.LimitingResultIterator.next(LimitingResultIterator.java:47)
>  at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:805)
>  at 
> org.apache.phoenix.compile.UpsertCompiler.upsertSelect(UpsertCompiler.java:219)
>  at 
> org.apache.phoenix.compile.UpsertCompiler$ClientUpsertSelectMutationPlan.execute(UpsertCompiler.java:1292)
>  at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:408)
>  at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:391)
>  at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>  at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:390)
>  at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:378)
>  at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:173)
>  at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:183)
>  at 
> org.apache.phoenix.end2end.UpsertSelectAfterSplitTest.upsertSelectData1(UpsertSelectAfterSplitTest.java:109)
>  at 
> org.apache.phoenix.end2end.UpsertSelectAfterSplitTest.testUpsertSelect(UpsertSelectAfterSplitTest.java:59)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>  

[jira] [Created] (PHOENIX-4849) UPSERT SELECT fails with stale region boundary exception after a split

2018-08-14 Thread Akshita Malhotra (JIRA)
Akshita Malhotra created PHOENIX-4849:
-

 Summary: UPSERT SELECT fails with stale region boundary exception 
after a split
 Key: PHOENIX-4849
 URL: https://issues.apache.org/jira/browse/PHOENIX-4849
 Project: Phoenix
  Issue Type: Bug
Reporter: Akshita Malhotra


UPSERT SELECT throws a StaleRegionBoundaryCacheException immediately after a 
split. On the other hand, an upsert followed by a select for example works 
absolutely fine

org.apache.phoenix.schema.StaleRegionBoundaryCacheException: ERROR 1108 
(XCL08): Cache of region boundaries are out of date.

at 
org.apache.phoenix.exception.SQLExceptionCode$14.newException(SQLExceptionCode.java:365)
 at 
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
 at org.apache.phoenix.util.ServerUtil.parseRemoteException(ServerUtil.java:183)
 at 
org.apache.phoenix.util.ServerUtil.parseServerExceptionOrNull(ServerUtil.java:167)
 at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:134)
 at 
org.apache.phoenix.iterate.ScanningResultIterator.next(ScanningResultIterator.java:153)
 at 
org.apache.phoenix.iterate.TableResultIterator.next(TableResultIterator.java:228)
 at 
org.apache.phoenix.iterate.LookAheadResultIterator$1.advance(LookAheadResultIterator.java:47)
 at 
org.apache.phoenix.iterate.LookAheadResultIterator.init(LookAheadResultIterator.java:59)
 at 
org.apache.phoenix.iterate.LookAheadResultIterator.peek(LookAheadResultIterator.java:73)
 at 
org.apache.phoenix.iterate.SerialIterators$SerialIterator.nextIterator(SerialIterators.java:187)
 at 
org.apache.phoenix.iterate.SerialIterators$SerialIterator.currentIterator(SerialIterators.java:160)
 at 
org.apache.phoenix.iterate.SerialIterators$SerialIterator.peek(SerialIterators.java:218)
 at 
org.apache.phoenix.iterate.ConcatResultIterator.currentIterator(ConcatResultIterator.java:100)
 at 
org.apache.phoenix.iterate.ConcatResultIterator.next(ConcatResultIterator.java:117)
 at 
org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
 at 
org.apache.phoenix.iterate.LimitingResultIterator.next(LimitingResultIterator.java:47)
 at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:805)
 at 
org.apache.phoenix.compile.UpsertCompiler.upsertSelect(UpsertCompiler.java:219)
 at 
org.apache.phoenix.compile.UpsertCompiler$ClientUpsertSelectMutationPlan.execute(UpsertCompiler.java:1292)
 at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:408)
 at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:391)
 at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
 at 
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:390)
 at 
org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:378)
 at 
org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:173)
 at 
org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:183)
 at 
org.apache.phoenix.end2end.UpsertSelectAfterSplitTest.upsertSelectData1(UpsertSelectAfterSplitTest.java:109)
 at 
org.apache.phoenix.end2end.UpsertSelectAfterSplitTest.testUpsertSelect(UpsertSelectAfterSplitTest.java:59)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
 at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
 at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
 at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
 at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
 at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
 at org.junit.rules.RunRules.evaluate(RunRules.java:20)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
 at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
 at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:119

[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...

2018-07-20 Thread akshita-malhotra
Github user akshita-malhotra commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/309#discussion_r204187343
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/VerifyReplicationTool.java
 ---
@@ -0,0 +1,477 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you maynot use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicablelaw or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.mapreduce;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.Collections;
+import java.util.Map;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.CommandLineParser;
+import org.apache.commons.cli.HelpFormatter;
+import org.apache.commons.cli.Option;
+import org.apache.commons.cli.Options;
+import org.apache.commons.cli.ParseException;
+import org.apache.commons.cli.PosixParser;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.HConstants;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.hadoop.mapreduce.Mapper;
+import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;
+import org.apache.hadoop.util.Tool;
+import org.apache.hadoop.util.ToolRunner;
+import org.apache.phoenix.compile.QueryPlan;
+import org.apache.phoenix.coprocessor.BaseScannerRegionObserver;
+import org.apache.phoenix.iterate.ResultIterator;
+import org.apache.phoenix.jdbc.PhoenixResultSet;
+import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil;
+import org.apache.phoenix.mapreduce.util.PhoenixMapReduceUtil;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.base.Strings;
+
+/**
+ * Map only job that compares data across a source and target table. The 
target table can be on the
+ * same cluster or on a remote cluster. SQL conditions may be specified to 
compare only a subset of
+ * both tables.
+ */
+public class VerifyReplicationTool implements Tool {
+private static final Logger LOG = 
LoggerFactory.getLogger(VerifyReplicationTool.class);
+
+static final Option
+ZK_QUORUM_OPT =
+new Option("z", "zookeeper", true, "ZooKeeper connection 
details (optional)");
+static final Option
+TABLE_NAME_OPT =
+new Option("t", "table", true, "Phoenix table name 
(required)");
+static final Option
+TARGET_TABLE_NAME_OPT =
+new Option("tt", "target-table", true, "Target Phoenix table 
name (optional)");
+static final Option
+TARGET_ZK_QUORUM_OPT =
+new Option("tz", "target-zookeeper", true,
+"Target ZooKeeper connection details (optional)");
+static final Option
+CONDITIONS_OPT =
+new Option("c", "conditions", true,
+"Conditions for select query WHERE clause (optional)");
+static final Option TIMESTAMP =
+new Option("ts", "timestamp", true,
+"Timestamp in millis used to compare the two tables.  
Defaults to current time minus 60 seconds");
+
+static final Option HELP_OPT = new Option("h", "help", false, "Show 
this help and quit");
+
+private Configuration conf;
+
+private String zkQuorum;
+private String tableName;

[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...

2018-07-20 Thread akshita-malhotra
Github user akshita-malhotra commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/309#discussion_r204181390
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/VerifyReplicationTool.java
 ---
@@ -0,0 +1,477 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you maynot use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicablelaw or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.mapreduce;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.Collections;
+import java.util.Map;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.CommandLineParser;
+import org.apache.commons.cli.HelpFormatter;
+import org.apache.commons.cli.Option;
+import org.apache.commons.cli.Options;
+import org.apache.commons.cli.ParseException;
+import org.apache.commons.cli.PosixParser;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.HConstants;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.hadoop.mapreduce.Mapper;
+import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;
+import org.apache.hadoop.util.Tool;
+import org.apache.hadoop.util.ToolRunner;
+import org.apache.phoenix.compile.QueryPlan;
+import org.apache.phoenix.coprocessor.BaseScannerRegionObserver;
+import org.apache.phoenix.iterate.ResultIterator;
+import org.apache.phoenix.jdbc.PhoenixResultSet;
+import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil;
+import org.apache.phoenix.mapreduce.util.PhoenixMapReduceUtil;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.base.Strings;
+
+/**
+ * Map only job that compares data across a source and target table. The 
target table can be on the
+ * same cluster or on a remote cluster. SQL conditions may be specified to 
compare only a subset of
+ * both tables.
+ */
+public class VerifyReplicationTool implements Tool {
+private static final Logger LOG = 
LoggerFactory.getLogger(VerifyReplicationTool.class);
+
+static final Option
+ZK_QUORUM_OPT =
+new Option("z", "zookeeper", true, "ZooKeeper connection 
details (optional)");
+static final Option
+TABLE_NAME_OPT =
+new Option("t", "table", true, "Phoenix table name 
(required)");
+static final Option
+TARGET_TABLE_NAME_OPT =
+new Option("tt", "target-table", true, "Target Phoenix table 
name (optional)");
+static final Option
+TARGET_ZK_QUORUM_OPT =
+new Option("tz", "target-zookeeper", true,
+"Target ZooKeeper connection details (optional)");
+static final Option
+CONDITIONS_OPT =
+new Option("c", "conditions", true,
+"Conditions for select query WHERE clause (optional)");
+static final Option TIMESTAMP =
+new Option("ts", "timestamp", true,
+"Timestamp in millis used to compare the two tables.  
Defaults to current time minus 60 seconds");
+
+static final Option HELP_OPT = new Option("h", "help", false, "Show 
this help and quit");
+
+private Configuration conf;
+
+private String zkQuorum;
+private String tableName;

[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...

2018-07-20 Thread akshita-malhotra
Github user akshita-malhotra commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/309#discussion_r204181281
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/VerifyReplicationTool.java
 ---
@@ -0,0 +1,477 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you maynot use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicablelaw or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.mapreduce;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.Collections;
+import java.util.Map;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.CommandLineParser;
+import org.apache.commons.cli.HelpFormatter;
+import org.apache.commons.cli.Option;
+import org.apache.commons.cli.Options;
+import org.apache.commons.cli.ParseException;
+import org.apache.commons.cli.PosixParser;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.HConstants;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.hadoop.mapreduce.Mapper;
+import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;
+import org.apache.hadoop.util.Tool;
+import org.apache.hadoop.util.ToolRunner;
+import org.apache.phoenix.compile.QueryPlan;
+import org.apache.phoenix.coprocessor.BaseScannerRegionObserver;
+import org.apache.phoenix.iterate.ResultIterator;
+import org.apache.phoenix.jdbc.PhoenixResultSet;
+import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil;
+import org.apache.phoenix.mapreduce.util.PhoenixMapReduceUtil;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.base.Strings;
+
+/**
+ * Map only job that compares data across a source and target table. The 
target table can be on the
+ * same cluster or on a remote cluster. SQL conditions may be specified to 
compare only a subset of
+ * both tables.
+ */
+public class VerifyReplicationTool implements Tool {
+private static final Logger LOG = 
LoggerFactory.getLogger(VerifyReplicationTool.class);
+
+static final Option
+ZK_QUORUM_OPT =
+new Option("z", "zookeeper", true, "ZooKeeper connection 
details (optional)");
+static final Option
+TABLE_NAME_OPT =
+new Option("t", "table", true, "Phoenix table name 
(required)");
+static final Option
+TARGET_TABLE_NAME_OPT =
+new Option("tt", "target-table", true, "Target Phoenix table 
name (optional)");
+static final Option
+TARGET_ZK_QUORUM_OPT =
+new Option("tz", "target-zookeeper", true,
+"Target ZooKeeper connection details (optional)");
+static final Option
+CONDITIONS_OPT =
+new Option("c", "conditions", true,
+"Conditions for select query WHERE clause (optional)");
+static final Option TIMESTAMP =
+new Option("ts", "timestamp", true,
+"Timestamp in millis used to compare the two tables.  
Defaults to current time minus 60 seconds");
+
+static final Option HELP_OPT = new Option("h", "help", false, "Show 
this help and quit");
+
+private Configuration conf;
+
+private String zkQuorum;
+private String tableName;

[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...

2018-07-20 Thread akshita-malhotra
Github user akshita-malhotra commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/309#discussion_r204180883
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/VerifyReplicationTool.java
 ---
@@ -0,0 +1,477 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you maynot use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicablelaw or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.mapreduce;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.Collections;
+import java.util.Map;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.CommandLineParser;
+import org.apache.commons.cli.HelpFormatter;
+import org.apache.commons.cli.Option;
+import org.apache.commons.cli.Options;
+import org.apache.commons.cli.ParseException;
+import org.apache.commons.cli.PosixParser;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.HConstants;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.hadoop.mapreduce.Mapper;
+import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;
+import org.apache.hadoop.util.Tool;
+import org.apache.hadoop.util.ToolRunner;
+import org.apache.phoenix.compile.QueryPlan;
+import org.apache.phoenix.coprocessor.BaseScannerRegionObserver;
+import org.apache.phoenix.iterate.ResultIterator;
+import org.apache.phoenix.jdbc.PhoenixResultSet;
+import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil;
+import org.apache.phoenix.mapreduce.util.PhoenixMapReduceUtil;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.base.Strings;
+
+/**
+ * Map only job that compares data across a source and target table. The 
target table can be on the
+ * same cluster or on a remote cluster. SQL conditions may be specified to 
compare only a subset of
+ * both tables.
+ */
+public class VerifyReplicationTool implements Tool {
+private static final Logger LOG = 
LoggerFactory.getLogger(VerifyReplicationTool.class);
+
+static final Option
+ZK_QUORUM_OPT =
+new Option("z", "zookeeper", true, "ZooKeeper connection 
details (optional)");
+static final Option
+TABLE_NAME_OPT =
+new Option("t", "table", true, "Phoenix table name 
(required)");
+static final Option
+TARGET_TABLE_NAME_OPT =
+new Option("tt", "target-table", true, "Target Phoenix table 
name (optional)");
+static final Option
+TARGET_ZK_QUORUM_OPT =
+new Option("tz", "target-zookeeper", true,
+"Target ZooKeeper connection details (optional)");
+static final Option
+CONDITIONS_OPT =
+new Option("c", "conditions", true,
+"Conditions for select query WHERE clause (optional)");
+static final Option TIMESTAMP =
+new Option("ts", "timestamp", true,
+"Timestamp in millis used to compare the two tables.  
Defaults to current time minus 60 seconds");
+
+static final Option HELP_OPT = new Option("h", "help", false, "Show 
this help and quit");
+
+private Configuration conf;
+
+private String zkQuorum;
+private String tableName;

[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...

2018-07-20 Thread akshita-malhotra
Github user akshita-malhotra commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/309#discussion_r204180658
  
--- Diff: 
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/VerifyReplicationTool.java
 ---
@@ -0,0 +1,477 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you maynot use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicablelaw or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.mapreduce;
+
+import java.io.IOException;
+import java.sql.SQLException;
+import java.util.Collections;
+import java.util.Map;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.CommandLineParser;
+import org.apache.commons.cli.HelpFormatter;
+import org.apache.commons.cli.Option;
+import org.apache.commons.cli.Options;
+import org.apache.commons.cli.ParseException;
+import org.apache.commons.cli.PosixParser;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.HConstants;
+import org.apache.hadoop.hbase.client.Scan;
+import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
+import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.hadoop.mapreduce.Mapper;
+import org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;
+import org.apache.hadoop.util.Tool;
+import org.apache.hadoop.util.ToolRunner;
+import org.apache.phoenix.compile.QueryPlan;
+import org.apache.phoenix.coprocessor.BaseScannerRegionObserver;
+import org.apache.phoenix.iterate.ResultIterator;
+import org.apache.phoenix.jdbc.PhoenixResultSet;
+import org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil;
+import org.apache.phoenix.mapreduce.util.PhoenixMapReduceUtil;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.base.Strings;
+
+/**
+ * Map only job that compares data across a source and target table. The 
target table can be on the
+ * same cluster or on a remote cluster. SQL conditions may be specified to 
compare only a subset of
+ * both tables.
+ */
+public class VerifyReplicationTool implements Tool {
+private static final Logger LOG = 
LoggerFactory.getLogger(VerifyReplicationTool.class);
+
+static final Option
+ZK_QUORUM_OPT =
+new Option("z", "zookeeper", true, "ZooKeeper connection 
details (optional)");
+static final Option
+TABLE_NAME_OPT =
+new Option("t", "table", true, "Phoenix table name 
(required)");
+static final Option
+TARGET_TABLE_NAME_OPT =
+new Option("tt", "target-table", true, "Target Phoenix table 
name (optional)");
+static final Option
+TARGET_ZK_QUORUM_OPT =
+new Option("tz", "target-zookeeper", true,
+"Target ZooKeeper connection details (optional)");
+static final Option
+CONDITIONS_OPT =
+new Option("c", "conditions", true,
+"Conditions for select query WHERE clause (optional)");
+static final Option TIMESTAMP =
+new Option("ts", "timestamp", true,
+"Timestamp in millis used to compare the two tables.  
Defaults to current time minus 60 seconds");
+
+static final Option HELP_OPT = new Option("h", "help", false, "Show 
this help and quit");
+
+private Configuration conf;
+
+private String zkQuorum;
+private String tableName;

[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...

2018-07-19 Thread akshita-malhotra
Github user akshita-malhotra commented on a diff in the pull request:

https://github.com/apache/phoenix/pull/309#discussion_r203943433
  
--- Diff: 
phoenix-core/src/it/java/org/apache/phoenix/mapreduce/VerifyReplicationToolIT.java
 ---
@@ -0,0 +1,323 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.phoenix.mapreduce;
+
+import java.io.IOException;
+import java.sql.*;
+import java.util.*;
+
+import com.google.common.collect.Maps;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.HRegionInfo;
+import org.apache.hadoop.hbase.MiniHBaseCluster;
+import org.apache.hadoop.hbase.ServerName;
+import org.apache.hadoop.hbase.TableName;
+import org.apache.hadoop.hbase.client.HBaseAdmin;
+import org.apache.hadoop.hbase.master.HMaster;
+import org.apache.hadoop.hbase.regionserver.HRegionServer;
+import org.apache.hadoop.hbase.util.Bytes;
+import org.apache.hadoop.mapreduce.Counters;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.phoenix.end2end.BaseUniqueNamesOwnClusterIT;
+import org.apache.phoenix.util.EnvironmentEdgeManager;
+import org.apache.phoenix.util.ReadOnlyProps;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.phoenix.util.TestUtil.TEST_PROPERTIES;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotEquals;
+
+public class VerifyReplicationToolIT extends BaseUniqueNamesOwnClusterIT {
+private static final Logger LOG = 
LoggerFactory.getLogger(VerifyReplicationToolIT.class);
+private static final String CREATE_USER_TABLE = "CREATE TABLE IF NOT 
EXISTS %s ( " +
+" TENANT_ID VARCHAR NOT NULL, USER_ID VARCHAR NOT NULL, AGE 
INTEGER " +
+" CONSTRAINT pk PRIMARY KEY ( TENANT_ID, USER_ID ))";
+private static final String UPSERT_USER = "UPSERT INTO %s VALUES (?, 
?, ?)";
+private static final String UPSERT_SELECT_USERS =
+"UPSERT INTO %s SELECT TENANT_ID, USER_ID, %d FROM %s WHERE 
TENANT_ID = ? LIMIT %d";
+private static final Random RANDOM = new Random();
+
+private static int tenantNum = 0;
+private static int userNum = 0;
+private static String sourceTableName;
+private static String targetTableName;
+private List sourceTenants;
+private String sourceOnlyTenant;
+private String sourceAndTargetTenant;
+private String targetOnlyTenant;
+
+@BeforeClass
+public static void createTables() throws Exception {
+NUM_SLAVES_BASE = 2;
+Map props = Maps.newHashMapWithExpectedSize(1);
+setUpTestDriver(new ReadOnlyProps(props.entrySet().iterator()));
+Connection conn = DriverManager.getConnection(getUrl());
+sourceTableName = generateUniqueName();
+targetTableName = generateUniqueName();
+// tables will have the same schema, but a different number of 
regions
+conn.createStatement().execute(String.format(CREATE_USER_TABLE, 
sourceTableName));
+conn.createStatement().execute(String.format(CREATE_USER_TABLE, 
targetTableName));
+conn.commit();
+}
+
+@Before
+public void setupTenants() throws Exception {
+sourceTenants = new ArrayList<>(2);
+sourceTenants.add("tenant" + tenantNum++);
+sourceTenants.add("tenant" + tenantNum++);
+sourceOnlyTenant = sourceTenants.get(0);
+sourceAndTargetTenant = sourceTenants.get(1);
+targetOnlyTenant = "tenant" + tenantNum++;
+upsertData();
+split(sourceTableName, 4);
+split(targetTableName, 2);
+// ensure scans

[jira] [Commented] (PHOENIX-3817) VerifyReplication using SQL

2018-07-19 Thread Akshita Malhotra (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549976#comment-16549976
 ] 

Akshita Malhotra commented on PHOENIX-3817:
---

[~gjacoby] Added the support for scn setting in the latest patch

> VerifyReplication using SQL
> ---
>
> Key: PHOENIX-3817
> URL: https://issues.apache.org/jira/browse/PHOENIX-3817
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Alex Araujo
>    Assignee: Akshita Malhotra
>Priority: Minor
> Fix For: 4.15.0
>
> Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, 
> PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch, PHOENIX-3817.v5.patch, 
> PHOENIX-3817.v6.patch
>
>
> Certain use cases may copy or replicate a subset of a table to a different 
> table or cluster. For example, application topologies may map data for 
> specific tenants to different peer clusters.
> It would be useful to have a Phoenix VerifyReplication tool that accepts an 
> SQL query, a target table, and an optional target cluster. The tool would 
> compare data returned by the query on the different tables and update various 
> result counters (similar to HBase's VerifyReplication).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-3817) VerifyReplication using SQL

2018-07-19 Thread Akshita Malhotra (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3817:
--
Attachment: PHOENIX-3817.v6.patch

> VerifyReplication using SQL
> ---
>
> Key: PHOENIX-3817
> URL: https://issues.apache.org/jira/browse/PHOENIX-3817
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Alex Araujo
>    Assignee: Akshita Malhotra
>Priority: Minor
> Fix For: 4.15.0
>
> Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, 
> PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch, PHOENIX-3817.v5.patch, 
> PHOENIX-3817.v6.patch
>
>
> Certain use cases may copy or replicate a subset of a table to a different 
> table or cluster. For example, application topologies may map data for 
> specific tenants to different peer clusters.
> It would be useful to have a Phoenix VerifyReplication tool that accepts an 
> SQL query, a target table, and an optional target cluster. The tool would 
> compare data returned by the query on the different tables and update various 
> result counters (similar to HBase's VerifyReplication).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-3817) VerifyReplication using SQL

2018-07-10 Thread Akshita Malhotra (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3817:
--
Attachment: (was: PHOENIX-3817.v4.patch)

> VerifyReplication using SQL
> ---
>
> Key: PHOENIX-3817
> URL: https://issues.apache.org/jira/browse/PHOENIX-3817
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Alex Araujo
>    Assignee: Akshita Malhotra
>Priority: Minor
> Fix For: 4.15.0
>
> Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, 
> PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch, PHOENIX-3817.v5.patch
>
>
> Certain use cases may copy or replicate a subset of a table to a different 
> table or cluster. For example, application topologies may map data for 
> specific tenants to different peer clusters.
> It would be useful to have a Phoenix VerifyReplication tool that accepts an 
> SQL query, a target table, and an optional target cluster. The tool would 
> compare data returned by the query on the different tables and update various 
> result counters (similar to HBase's VerifyReplication).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-3817) VerifyReplication using SQL

2018-07-10 Thread Akshita Malhotra (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3817:
--
Attachment: PHOENIX-3817.v5.patch

> VerifyReplication using SQL
> ---
>
> Key: PHOENIX-3817
> URL: https://issues.apache.org/jira/browse/PHOENIX-3817
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Alex Araujo
>    Assignee: Akshita Malhotra
>Priority: Minor
> Fix For: 4.15.0
>
> Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, 
> PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch, PHOENIX-3817.v5.patch
>
>
> Certain use cases may copy or replicate a subset of a table to a different 
> table or cluster. For example, application topologies may map data for 
> specific tenants to different peer clusters.
> It would be useful to have a Phoenix VerifyReplication tool that accepts an 
> SQL query, a target table, and an optional target cluster. The tool would 
> compare data returned by the query on the different tables and update various 
> result counters (similar to HBase's VerifyReplication).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-3817) VerifyReplication using SQL

2018-07-10 Thread Akshita Malhotra (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3817:
--
Attachment: PHOENIX-3817.v4.patch

> VerifyReplication using SQL
> ---
>
> Key: PHOENIX-3817
> URL: https://issues.apache.org/jira/browse/PHOENIX-3817
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Alex Araujo
>    Assignee: Akshita Malhotra
>Priority: Minor
> Fix For: 4.15.0
>
> Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, 
> PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch, PHOENIX-3817.v4.patch
>
>
> Certain use cases may copy or replicate a subset of a table to a different 
> table or cluster. For example, application topologies may map data for 
> specific tenants to different peer clusters.
> It would be useful to have a Phoenix VerifyReplication tool that accepts an 
> SQL query, a target table, and an optional target cluster. The tool would 
> compare data returned by the query on the different tables and update various 
> result counters (similar to HBase's VerifyReplication).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] phoenix pull request #309: [Do Not Merge] PHOENIX-3817 Verify Replication us...

2018-07-10 Thread akshita-malhotra
GitHub user akshita-malhotra opened a pull request:

https://github.com/apache/phoenix/pull/309

[Do Not Merge] PHOENIX-3817 Verify Replication using SQL conditions



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akshita-malhotra/phoenix Phoenix3817

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/phoenix/pull/309.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #309


commit abf570eac4f7678148d1498f6140b74fa61e1bd3
Author: Akshita Malhotra 
Date:   2018-06-01T17:38:43Z

Verify Replication using SQL conditions




---


[jira] [Commented] (PHOENIX-4771) Deleting tenant rows using a global connection on the base table does not work.

2018-06-04 Thread Akshita Malhotra (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500995#comment-16500995
 ] 

Akshita Malhotra commented on PHOENIX-4771:
---

Thanks [~tdsilva]

> Deleting tenant rows using a global connection on the base table does not 
> work.
> ---
>
> Key: PHOENIX-4771
> URL: https://issues.apache.org/jira/browse/PHOENIX-4771
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Akshita Malhotra
>Priority: Major
> Attachments: deletes.diff
>
>
> Phoenix point deletes on base table using a global connection silently not 
> deleting data created by a tenant view.
> Ques 1: Is this the right behavior?
> Ques 2: If yes, should Phoenix validate and through an error/exception? If 
> no, should Phoenix delete the data correctly?
>  
> The attached test fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4771) Deleting tenant rows using a global connection on the base table does not work.

2018-06-04 Thread Akshita Malhotra (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500963#comment-16500963
 ] 

Akshita Malhotra commented on PHOENIX-4771:
---

fyi, [~tdsilva] [~jamestaylor] [~gjacoby]

> Deleting tenant rows using a global connection on the base table does not 
> work.
> ---
>
> Key: PHOENIX-4771
> URL: https://issues.apache.org/jira/browse/PHOENIX-4771
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Akshita Malhotra
>Priority: Major
> Attachments: deletes.diff
>
>
> Phoenix point deletes on base table using a global connection silently not 
> deleting data created by a tenant view.
> Ques 1: Is this the right behavior?
> Ques 2: If yes, should Phoenix validate and through an error/exception? If 
> no, should Phoenix delete the data correctly?
>  
> The attached test fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4771) Deleting tenant rows using a global connection on the base table does not work.

2018-06-04 Thread Akshita Malhotra (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-4771:
--
Description: 
Phoenix point deletes on base table using a global connection silently not 
deleting data created by a tenant view.

Ques 1: Is this the right behavior?
Ques 2: If yes, should Phoenix validate and through an error/exception? If no, 
should Phoenix delete the data correctly?
 
The attached test fails.

  was:
Phoenix point deletes on base table silently not deleting data created by a 
tenant view.

Ques 1: Is this the right behavior?
Ques 2: If yes, should Phoenix validate and through an error/exception? If no, 
should Phoenix delete the data correctly?
 
The attached test fails.


> Deleting tenant rows using a global connection on the base table does not 
> work.
> ---
>
> Key: PHOENIX-4771
> URL: https://issues.apache.org/jira/browse/PHOENIX-4771
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Akshita Malhotra
>Priority: Major
> Attachments: deletes.diff
>
>
> Phoenix point deletes on base table using a global connection silently not 
> deleting data created by a tenant view.
> Ques 1: Is this the right behavior?
> Ques 2: If yes, should Phoenix validate and through an error/exception? If 
> no, should Phoenix delete the data correctly?
>  
> The attached test fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4771) Deleting tenant rows using a global connection on the base table does not work.

2018-06-04 Thread Akshita Malhotra (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-4771:
--
Summary: Deleting tenant rows using a global connection on the base table 
does not work.  (was: Deleting tenant rows using a tenant connection on the 
base table does not work.)

> Deleting tenant rows using a global connection on the base table does not 
> work.
> ---
>
> Key: PHOENIX-4771
> URL: https://issues.apache.org/jira/browse/PHOENIX-4771
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Akshita Malhotra
>Priority: Major
> Attachments: deletes.diff
>
>
> Phoenix point deletes on base table silently not deleting data created by a 
> tenant view.
> Ques 1: Is this the right behavior?
> Ques 2: If yes, should Phoenix validate and through an error/exception? If 
> no, should Phoenix delete the data correctly?
>  
> The attached test fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4771) Deleting tenant rows using a tenant connection on the base table does not work.

2018-06-04 Thread Akshita Malhotra (JIRA)
Akshita Malhotra created PHOENIX-4771:
-

 Summary: Deleting tenant rows using a tenant connection on the 
base table does not work.
 Key: PHOENIX-4771
 URL: https://issues.apache.org/jira/browse/PHOENIX-4771
 Project: Phoenix
  Issue Type: Bug
Reporter: Akshita Malhotra
 Attachments: deletes.diff

Phoenix point deletes on base table silently not deleting data created by a 
tenant view.

Ques 1: Is this the right behavior?
Ques 2: If yes, should Phoenix validate and through an error/exception? If no, 
should Phoenix delete the data correctly?
 
The attached test fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-3817) VerifyReplication using SQL

2018-06-04 Thread Akshita Malhotra (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16500599#comment-16500599
 ] 

Akshita Malhotra commented on PHOENIX-3817:
---

[~alexaraujo] From the various tests I have run seems like there are certain 
assumptions being made with the Multi-Table RecordReader approach. For example, 
while setting the start row for a target region scan based on source scan start 
row, if the target start row is strictly greater and the size of the target 
scan is smaller than the source scan this approach would fail to determine the 
correct amount of good/bad rows (a subset scenario). Similarly, it would yield 
incorrect results if there are holes in the target scan which is a likely error 
scenario in case a map reduce job discard nondeterministically processed rows 
(not very likely in our migration scenario but generally with M/R).

I was going through the HBase Verify Replication approach, one way to resolve 
these issues would be to do something similar i.e. for every source row 
processed, find the corresponding target scan (start row = current source row 
and end row = source split end row) thereby eliminating the need for a 
multi-table record reader. 

fyi, [~gjacoby]

> VerifyReplication using SQL
> ---
>
> Key: PHOENIX-3817
> URL: https://issues.apache.org/jira/browse/PHOENIX-3817
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Alex Araujo
>Assignee: Alex Araujo
>Priority: Minor
> Fix For: 4.15.0
>
> Attachments: PHOENIX-3817.v1.patch, PHOENIX-3817.v2.patch, 
> PHOENIX-3817.v3.patch, PHOENIX-3817.v4.patch
>
>
> Certain use cases may copy or replicate a subset of a table to a different 
> table or cluster. For example, application topologies may map data for 
> specific tenants to different peer clusters.
> It would be useful to have a Phoenix VerifyReplication tool that accepts an 
> SQL query, a target table, and an optional target cluster. The tool would 
> compare data returned by the query on the different tables and update various 
> result counters (similar to HBase's VerifyReplication).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4667) Create index on a view should return error if any of the REPLICATION_SCOPE/TTL/KEEP_DELETED_CELLS attributes are set

2018-03-21 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-4667:
--
Description: As the physical view index table is shared, create index on a 
view statements should return error if the user tries to set attributes which 
affect the physical table such as REPLICATION_SCOPE, TTL, KEEP_DELETED_CELLS 
etc.  (was: As the physical view index table is shared, create index on a view 
statements should return error if the user tries to set attributes which affect 
the physical table such as SOR settings, TTL, KEEP_DELETED_CELLS etc.)
Summary: Create index on a view should return error if any of the 
REPLICATION_SCOPE/TTL/KEEP_DELETED_CELLS attributes are set  (was: Create index 
on a view should return error if any of the SOR/TTL/KEEP_DELETED_CELLS 
attributes are set)

> Create index on a view should return error if any of the 
> REPLICATION_SCOPE/TTL/KEEP_DELETED_CELLS attributes are set
> 
>
> Key: PHOENIX-4667
> URL: https://issues.apache.org/jira/browse/PHOENIX-4667
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Akshita Malhotra
>Priority: Minor
>  Labels: index, schema
> Fix For: 4.13.0, 4.14.0
>
>
> As the physical view index table is shared, create index on a view statements 
> should return error if the user tries to set attributes which affect the 
> physical table such as REPLICATION_SCOPE, TTL, KEEP_DELETED_CELLS etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4667) Create index on a view should return error if any of the SOR/TTL/KEEP_DELETED_CELLS attributes are set

2018-03-21 Thread Akshita Malhotra (JIRA)
Akshita Malhotra created PHOENIX-4667:
-

 Summary: Create index on a view should return error if any of the 
SOR/TTL/KEEP_DELETED_CELLS attributes are set
 Key: PHOENIX-4667
 URL: https://issues.apache.org/jira/browse/PHOENIX-4667
 Project: Phoenix
  Issue Type: Bug
Reporter: Akshita Malhotra
 Fix For: 4.13.0, 4.14.0


As the physical view index table is shared, create index on a view statements 
should return error if the user tries to set attributes which affect the 
physical table such as SOR settings, TTL, KEEP_DELETED_CELLS etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (PHOENIX-4623) Inconsistent physical view index name

2018-02-21 Thread Akshita Malhotra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372211#comment-16372211
 ] 

Akshita Malhotra edited comment on PHOENIX-4623 at 2/21/18 11:31 PM:
-

[~jamestaylor] As per an offline discussion with [~tdsilva], seems to be a 
naming bug during creation of physical view index table unless it was intended 
which doesn't seem plausible . A simple bug fix would be to modify the 
getViewIndexName API to return "_IDX_SCH.TABLE"

Might need to follow up on other implications of this.


was (Author: akshita.malhotra):
[~jamestaylor] As per an offline discussion with [~tdsilva], seems to be a 
naming bug during creation of physical view index table unless it was intended 
which doesn't seem plausible . A simple bug fix would be to modify the 
getViewIndexName API to return "_IDX_SCH.TABLE"

Due to this Hgrate is not correctly identifying the physical view indexes. What 
could be other implications of this?

> Inconsistent physical view index name
> -
>
> Key: PHOENIX-4623
> URL: https://issues.apache.org/jira/browse/PHOENIX-4623
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.0
>Reporter: Akshita Malhotra
>Priority: Major
>  Labels: easyfix
> Fix For: 4.14.0
>
>
> The physical view indexes are incorrectly named when table has a schema. For 
> instance, if a table name is "SCH.TABLE", during creation the physical index 
> table is named as "_IDX_SCH.TABLE" which doesn't look right. In case 
> namespaces are enabled, the physical index table is named as "SCH:_IDX_TABLE"
> The client APIs on the other hand such as 
> MetaDataUtil.getViewIndexName(String schemaName, String tableName) API to 
> retrieve the phyisical view index name returns "SCH._IDX_TABLE" which as per 
> convention returns the right name but functionally leads to wrong results as 
> this is not how the physical indexes are named during construction.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4623) Inconsistent physical view index name

2018-02-21 Thread Akshita Malhotra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372211#comment-16372211
 ] 

Akshita Malhotra commented on PHOENIX-4623:
---

[~jamestaylor] As per an offline discussion with [~tdsilva], seems to be a 
naming bug during creation of physical view index table unless it was intended 
which doesn't seem plausible . A simple bug fix would be to modify the 
getViewIndexName API to return "_IDX_SCH.TABLE"

Due to this Hgrate is not correctly identifying the physical view indexes. What 
could be other implications of this?

> Inconsistent physical view index name
> -
>
> Key: PHOENIX-4623
> URL: https://issues.apache.org/jira/browse/PHOENIX-4623
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.0
>    Reporter: Akshita Malhotra
>Priority: Major
>  Labels: easyfix
> Fix For: 4.14.0
>
>
> The physical view indexes are incorrectly named when table has a schema. For 
> instance, if a table name is "SCH.TABLE", during creation the physical index 
> table is named as "_IDX_SCH.TABLE" which doesn't look right. In case 
> namespaces are enabled, the physical index table is named as "SCH:_IDX_TABLE"
> The client APIs on the other hand such as 
> MetaDataUtil.getViewIndexName(String schemaName, String tableName) API to 
> retrieve the phyisical view index name returns "SCH._IDX_TABLE" which as per 
> convention returns the right name but functionally leads to wrong results as 
> this is not how the physical indexes are named during construction.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4623) Inconsistent physical view index name

2018-02-21 Thread Akshita Malhotra (JIRA)
Akshita Malhotra created PHOENIX-4623:
-

 Summary: Inconsistent physical view index name
 Key: PHOENIX-4623
 URL: https://issues.apache.org/jira/browse/PHOENIX-4623
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.13.0
Reporter: Akshita Malhotra
 Fix For: 4.14.0


The physical view indexes are incorrectly named when table has a schema. For 
instance, if a table name is "SCH.TABLE", during creation the physical index 
table is named as "_IDX_SCH.TABLE" which doesn't look right. In case namespaces 
are enabled, the physical index table is named as "SCH:_IDX_TABLE"

The client APIs on the other hand such as MetaDataUtil.getViewIndexName(String 
schemaName, String tableName) API to retrieve the phyisical view index name 
returns "SCH._IDX_TABLE" which as per convention returns the right name but 
functionally leads to wrong results as this is not how the physical indexes are 
named during construction.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4344) MapReduce Delete Support

2018-02-13 Thread Akshita Malhotra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363286#comment-16363286
 ] 

Akshita Malhotra commented on PHOENIX-4344:
---

[~jamestaylor] Can you explain why would it do a point scan? Maybe I am 
thinking in the wrong direction but as [~gjacoby] explained, even if the 
initial delete is deleting over a non PK column, when a point phoenix delete 
query is being issued, I can provide the PK information (obtain from the map 
reduce scan) along with the extra predicate that would include the non-PK 
column. 

> MapReduce Delete Support
> 
>
> Key: PHOENIX-4344
> URL: https://issues.apache.org/jira/browse/PHOENIX-4344
> Project: Phoenix
>  Issue Type: New Feature
>Affects Versions: 4.12.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>Priority: Major
>
> Phoenix already has the ability to use MapReduce for asynchronous handling of 
> long-running SELECTs. It would be really useful to have this capability for 
> long-running DELETEs, particularly of tables with indexes where using HBase's 
> own MapReduce integration would be prohibitively complicated. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PHOENIX-4353) Constraint violation error in Snapshot based index rebuild job

2017-11-07 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra reassigned PHOENIX-4353:
-

Assignee: Akshita Malhotra

> Constraint violation error in Snapshot based index rebuild job
> --
>
> Key: PHOENIX-4353
> URL: https://issues.apache.org/jira/browse/PHOENIX-4353
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.0
>Reporter: Monani Mihir
>Assignee: Akshita Malhotra
>Priority: Critical
>
> When we try to rebuild index with data table snapshot, many mappers fails 
> with ERROR 218 (23018): Constraint violation. Example below :-
> Cmd to run snapshot based index job :-
> bin/hbase org.apache.phoenix.mapreduce.index.IndexTool -it DATA_INDEX -dt 
> DATA -s SCHEMA -snap -op /TEST/DATA_INDEX
> Mappers failed with error :-
> {code}
> 2017-11-06 09:25:24,380 INFO  [main] regionserver.HRegion - Onlined 
> eac5484a276e8d942e9eebf8275f114f; next sequenceid=18399282
> 2017-11-06 09:25:24,522 ERROR [main] index.PhoenixIndexImportMapper - Error 
> ERROR 218 (23018): Constraint violation. SCHEMA.DATA_INDEX.:DATA_ID may not 
> be null  while read/write of a record 
> 2017-11-06 09:25:24,545 INFO  [42e9eebf8275f114f.-1] regionserver.HStore - 
> Closed 0
> 2017-11-06 09:25:24,546 INFO  [main] regionserver.HRegion - Closed 
> SCHEMA.DATA_INDEX,userID1234orgid1234,1509939061852.eac5484a276e8d942e9eebf8275f114f.
> 2017-11-06 09:25:24,547 INFO  [main] mapred.MapTask - Starting flush of map 
> output
> 2017-11-06 09:25:24,557 INFO  [main] compress.CodecPool - Got brand-new 
> compressor [.snappy]
> 2017-11-06 09:25:24,560 WARN  [main] mapred.YarnChild - Exception running 
> child : java.lang.RuntimeException: java.sql.SQLException: ERROR 218 (23018): 
> Constraint violation. SCHEMA.DATA_INDEX.:DATA_ID may not be null
> at 
> org.apache.phoenix.mapreduce.index.PhoenixIndexImportMapper.map(PhoenixIndexImportMapper.java:122)
> at 
> org.apache.phoenix.mapreduce.index.PhoenixIndexImportMapper.map(PhoenixIndexImportMapper.java:48)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1751)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
> Caused by: java.sql.SQLException: ERROR 218 (23018): Constraint violation. 
> SCHEMA.DATA_INDEX.:DATA_ID may not be null
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:488)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
> at 
> org.apache.phoenix.schema.ConstraintViolationException.(ConstraintViolationException.java:39)
> at org.apache.phoenix.schema.PTableImpl.newKey(PTableImpl.java:753)
> at 
> org.apache.phoenix.compile.UpsertCompiler.setValues(UpsertCompiler.java:154)
> at 
> org.apache.phoenix.compile.UpsertCompiler.access$500(UpsertCompiler.java:116)
> at 
> org.apache.phoenix.compile.UpsertCompiler$4.execute(UpsertCompiler.java:1078)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:393)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:376)
> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:374)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:363)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:269)
> at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:172)
> at 
> org.apache.phoenix.jdbc.PhoenixPreparedStatement.execute(PhoenixPreparedStatement.java:177)
> at 
> org.apache.phoenix.mapreduce.index.PhoenixIndexImportMapper.map(PhoenixIndexImportMapper.java:101)
> ... 9 more
> 2017-11-06 09:25:24,563 INFO  [main] mapred.Task - Runnning cleanup for the 
> task
> 2017-11-06 09:25:24,564 WARN  [main] output.FileOutputCommitter - Could not 
> delete 
> hdfs://hdfs-local/TEST/DATA_INDEX/SCHEMA.DATA_INDEX/_temporary/1/_temporary/attempt_1508241002000_5658_m_14_0
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (PHOENIX-4355) Snapshot based index rebuild job wont work for two index table of same data table in parallel

2017-11-07 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra reassigned PHOENIX-4355:
-

Assignee: Akshita Malhotra

> Snapshot based index rebuild job wont work for two index table of same data 
> table in parallel
> -
>
> Key: PHOENIX-4355
> URL: https://issues.apache.org/jira/browse/PHOENIX-4355
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.0
>Reporter: Monani Mihir
>    Assignee: Akshita Malhotra
>Priority: Minor
>
> Run Index rebuild job for one index :-
> {code}
> bin/hbase org.apache.phoenix.mapreduce.index.IndexTool -it DATA_INDEX_1 -dt 
> DATA -s SCHEMA -snap -op /TEST/DATA_INDEX_1
> {code}
> then run index rebuild job for another index with same source data table.:-
> {code}
> bin/hbase org.apache.phoenix.mapreduce.index.IndexTool -it DATA_INDEX_2 -dt 
> DATA -s SCHEMA -snap -op /TEST/DATA_INDEX_1
> {code}
> Second command will fail without triggering MR jobs. When you delete previous 
> MR Job snapshot, it will be able to run. 
> {code}
> It fails with below Error :-
> 2017-11-06 06:38:25,122 DEBUG [main] security.HBaseSaslRpcClient - Will send 
> token of size 0 from initSASLContext.
> 2017-11-06 06:38:25,122 DEBUG [main] security.HBaseSaslRpcClient - Will read 
> input token of size 32 for processing by initSASLContext
> 2017-11-06 06:38:25,122 DEBUG [main] security.HBaseSaslRpcClient - Will send 
> token of size 32 from initSASLContext.
> 2017-11-06 06:38:25,122 DEBUG [main] security.HBaseSaslRpcClient - SASL 
> client context established. Negotiated QoP: auth
> 2017-11-06 06:38:26,819 ERROR [main] index.IndexTool - utureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1512)
> at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1714)
> at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1784)
> at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.snapshot(MasterProtos.java:47487)
> at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.snapshot(HConnectionManager.java:2146)
> at org.apache.hadoop.hbase.client.HBaseAdmin$28.call(HBaseAdmin.java:2882)
> at org.apache.hadoop.hbase.client.HBaseAdmin$28.call(HBaseAdmin.java:2879)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:125)
> ... 13 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (PHOENIX-4354) Mappers fails in Snapshot based index rebuilding job

2017-11-07 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra reassigned PHOENIX-4354:
-

Assignee: Akshita Malhotra

> Mappers fails in Snapshot based index rebuilding job
> 
>
> Key: PHOENIX-4354
> URL: https://issues.apache.org/jira/browse/PHOENIX-4354
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.0
>Reporter: Monani Mihir
>Assignee: Akshita Malhotra
>
> Cmd to run snapshot based index job :-
> bin/hbase org.apache.phoenix.mapreduce.index.IndexTool -it DATA_INDEX -dt 
> DATA -s SCHEMA -snap -op /TEST/DATA_INDEX
> {code}
> 2017-11-06 09:25:25,054 WARN  [oreSnapshot-pool6-t1] backup.HFileArchiver - 
> Failed to archive class 
> org.apache.hadoop.hbase.backup.HFileArchiver$FileablePath, 
> file:hdfs://hdfs-local/index-snapshot-dir/restore-dir/ed465e0f-002e-43b3-8ec4-133e81c4e3ea/data/default/SCHEMA.DATA/0b93e3fcba18cf281cc147a08fc4656f/0/SCHEMA.DATA=0b93e3fcba18cf281cc147a08fc4656f-14aa829f6e63460fab309cd1f32b9627
>  on try #2
> java.io.FileNotFoundException: File/Directory 
> /index-snapshot-dir/restore-dir/ed465e0f-002e-43b3-8ec4-133e81c4e3ea/data/default/SCHEMA.DATA/0b93e3fcba18cf281cc147a08fc4656f/0/SCHEMA.DATA=0b93e3fcba18cf281cc147a08fc4656f-14aa829f6e63460fab309cd1f32b9627
>  does not exist.
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setTimes(FSDirAttrOp.java:123)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setTimes(FSNamesystem.java:1921)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setTimes(NameNodeRpcServer.java:1223)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setTimes(ClientNamenodeProtocolServerSideTranslatorPB.java:915)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1751)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
> at org.apache.hadoop.hdfs.DFSClient.setTimes(DFSClient.java:3167)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$31.doCall(DistributedFileSystem.java:1548)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$31.doCall(DistributedFileSystem.java:1544)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.setTimes(DistributedFileSystem.java:1544)
> at 
> org.apache.hadoop.hbase.util.FSUtils.renameAndSetModifyTime(FSUtils.java:1964)
> at 
> org.apache.hadoop.hbase.backup.HFileArchiver$File.moveAndClose(HFileArchiver.java:586)
> at 
> org.apache.hadoop.hbase.backup.HFileArchiver.resolveAndArchiveFile(HFileArchiver.java:425)
> at 
> org.apache.hadoop.hbase.backup.HFileArchiver.archiveStoreFile(HFileArchiver.java:260)
> at 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreRegion(RestoreSnapshotHelper.java:445)
> at 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.access$300(RestoreSnapshotHelper.java:110)
> at 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper$2.editRegion(RestoreSnapshotHelper.java:393)
> at 
> org.apache.hadoop.hbase.util.ModifyRegionUtils$2.call(ModifyRegionUtils.java:215)
> at 
> org.apache.hadoop.hbase.util.ModifyRegionUtils$2.call(ModifyRegionUtils.java:212)
> at java.util.concurrent.FutureTask.run(FutureTask.java

[jira] [Commented] (PHOENIX-4003) Document how to use snapshots for MR

2017-09-08 Thread Akshita Malhotra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159252#comment-16159252
 ] 

Akshita Malhotra commented on PHOENIX-4003:
---

[~pconrad] I will create the first draft outlining the api definition/use case 
and then follow-up. Thanks!

> Document how to use snapshots for MR
> 
>
> Key: PHOENIX-4003
> URL: https://issues.apache.org/jira/browse/PHOENIX-4003
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>    Assignee: Akshita Malhotra
>
> Now that PHOENIX-3744 is resolved and released, we should update our website 
> to let users know how to take advantage of this cool new feature (i.e. new 
> snapshot argument to IndexTool). This could be added to a couple of 
> placeshttp://phoenix.apache.org/phoenix_mr.html and maybe here 
> http://phoenix.apache.org/pig_integration.html (is there a way to use 
> snapshots through our Pig integration? If not we should file a JIRA and do 
> this).
> Directions to update the website are here: 
> http://phoenix.apache.org/building_website.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4161) TableSnapshotReadsMapReduceIT shouldn't need to run its own mini cluster

2017-09-08 Thread Akshita Malhotra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159250#comment-16159250
 ] 

Akshita Malhotra commented on PHOENIX-4161:
---

Looking into the issue.

> TableSnapshotReadsMapReduceIT shouldn't need to run its own mini cluster
> 
>
> Key: PHOENIX-4161
> URL: https://issues.apache.org/jira/browse/PHOENIX-4161
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Samarth Jain
>    Assignee: Akshita Malhotra
>
> In PHOENIX-4141, I made a few attempts to get TableSnapshotReadsMapReduceIT 
> to pass. But finally had to resort to running the test in its own mini 
> cluster. I don't see any why reason we should, though. [~akshita.malhotra] - 
> can you please take a look. 
> Below are the errors I saw in logs:
> {code}
> java.lang.Exception: java.lang.IllegalArgumentException: Filesystems for 
> restore directory and HBase root directory should be the same
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.lang.IllegalArgumentException: Filesystems for restore 
> directory and HBase root directory should be the same
>   at 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:716)
>   at 
> org.apache.phoenix.iterate.TableSnapshotResultIterator.init(TableSnapshotResultIterator.java:77)
>   at 
> org.apache.phoenix.iterate.TableSnapshotResultIterator.(TableSnapshotResultIterator.java:73)
>   at 
> org.apache.phoenix.mapreduce.PhoenixRecordReader.initialize(PhoenixRecordReader.java:126)
>   at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> {code}
> Caused by: java.lang.IllegalArgumentException: Restore directory cannot be a 
> sub directory of HBase root directory. RootDir: 
> hdfs://localhost:45485/user/jenkins/test-data/3fe1b641-9d14-4053-b3e6-a811035e34b0,
>  restoreDir: 
> hdfs://localhost:45485/user/jenkins/test-data/3fe1b641-9d14-4053-b3e6-a811035e34b0/FOO/3eb31efb-b541-4b75-b98f-4558ddf5994e
>   at 
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:720)
>   at 
> org.apache.phoenix.iterate.TableSnapshotResultIterator.init(TableSnapshotResultIterator.java:77)
>   at 
> org.apache.phoenix.iterate.TableSnapshotResultIterator.(TableSnapshotResultIterator.java:73)
>   at 
> org.apache.phoenix.mapreduce.PhoenixRecordReader.initialize(PhoenixRecordReader.java:126)
>   at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (PHOENIX-3976) Validate Index ASYNC job complete when building off a data table snapshot

2017-06-26 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra reassigned PHOENIX-3976:
-

Assignee: Akshita Malhotra

> Validate Index ASYNC job complete when building off a data table snapshot
> -
>
> Key: PHOENIX-3976
> URL: https://issues.apache.org/jira/browse/PHOENIX-3976
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Samarth Jain
>    Assignee: Akshita Malhotra
>
> [~akshita.malhotra] had this good idea of validating whether an async index 
> build job has completed successfully by comparing the 
> PhoenixJobCounters.INPUT_RECORDS with the number of expected rows. This would 
> be especially helpful when we are building the index using a data table 
> snapshot. Since the data table snapshot won't be taking any writes, it should 
> be correct and hopefully relatively easy to verify that the number of rows in 
> the data table snapshot is equal to the PhoenixJobCounters.INPUT_RECORDS 
> counter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (PHOENIX-3812) Use HBase snapshots in async index building M/R job

2017-06-09 Thread Akshita Malhotra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045213#comment-16045213
 ] 

Akshita Malhotra edited comment on PHOENIX-3812 at 6/9/17 11:29 PM:


[~jamestaylor] Thanks for the comment.
I have updated and uploaded two patches:
PHOENIX-3812.patch applies cleanly to master and 1.1 branch.
PHOENIX-3812-4.x-0.98.patch is for 4.x-0.98 branch.



was (Author: akshita.malhotra):
[~jamestaylor] Thanks for the comment.
I have uploaded two patches:
PHOENIX-3812.patch applies cleanly to master and 1.1 branch.
PHOENIX-3812-4.x-0.98.patch is for 4.x-0.98 branch.


> Use HBase snapshots in async index building M/R job
> ---
>
> Key: PHOENIX-3812
> URL: https://issues.apache.org/jira/browse/PHOENIX-3812
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.10.0
>Reporter: Maddineni Sukumar
>Assignee: Akshita Malhotra
> Attachments: PHOENIX-3812-4.x-0.98.patch, PHOENIX-3812.patch
>
>
> As per discussion with James,  HBase snapshots makes it lot easier and faster 
> to operate on existing data. 
> So explore using HBase snapshots in index building M/R job for async index. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3812) Use HBase snapshots in async index building M/R job

2017-06-09 Thread Akshita Malhotra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045213#comment-16045213
 ] 

Akshita Malhotra commented on PHOENIX-3812:
---

[~jamestaylor] Thanks for the comment.
I have uploaded two patches:
PHOENIX-3812.patch applies cleanly to master and 1.1 branch.
PHOENIX-3812-4.x-0.98.patch is for 4.x-0.98 branch.


> Use HBase snapshots in async index building M/R job
> ---
>
> Key: PHOENIX-3812
> URL: https://issues.apache.org/jira/browse/PHOENIX-3812
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.10.0
>Reporter: Maddineni Sukumar
>Assignee: Akshita Malhotra
> Attachments: PHOENIX-3812-4.x-0.98.patch, PHOENIX-3812.patch
>
>
> As per discussion with James,  HBase snapshots makes it lot easier and faster 
> to operate on existing data. 
> So explore using HBase snapshots in index building M/R job for async index. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3812) Use HBase snapshots in async index building M/R job

2017-06-09 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3812:
--
Attachment: PHOENIX-3812-4.x-0.98.patch

> Use HBase snapshots in async index building M/R job
> ---
>
> Key: PHOENIX-3812
> URL: https://issues.apache.org/jira/browse/PHOENIX-3812
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.10.0
>Reporter: Maddineni Sukumar
>Assignee: Akshita Malhotra
> Attachments: PHOENIX-3812-4.x-0.98.patch, PHOENIX-3812.patch
>
>
> As per discussion with James,  HBase snapshots makes it lot easier and faster 
> to operate on existing data. 
> So explore using HBase snapshots in index building M/R job for async index. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3812) Use HBase snapshots in async index building M/R job

2017-06-09 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3812:
--
Attachment: PHOENIX-3812.patch

> Use HBase snapshots in async index building M/R job
> ---
>
> Key: PHOENIX-3812
> URL: https://issues.apache.org/jira/browse/PHOENIX-3812
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.10.0
>Reporter: Maddineni Sukumar
>Assignee: Akshita Malhotra
> Attachments: PHOENIX-3812.patch
>
>
> As per discussion with James,  HBase snapshots makes it lot easier and faster 
> to operate on existing data. 
> So explore using HBase snapshots in index building M/R job for async index. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] phoenix pull request #260: PHOENIX-3812: Use HBase snapshots in async index ...

2017-06-09 Thread akshita-malhotra
GitHub user akshita-malhotra opened a pull request:

https://github.com/apache/phoenix/pull/260

PHOENIX-3812: Use HBase snapshots in async index building M/R job

- Index tool creates a snapshot and uses it as a configuration parameter to 
run index M/R job using HBase snapshot.
- Add option to configure use of snapshots in IndexTool

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akshita-malhotra/phoenix PHOENIX-3812

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/phoenix/pull/260.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #260


commit 57fb264ba76e849db3bc4375f87091499cbce618
Author: Akshita <akshita.malho...@salesforce.com>
Date:   2017-06-07T23:14:47Z

PHOENIX-3812: Use HBase snapshots in async index building M/R job




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries

2017-06-05 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3744:
--
Attachment: PHOENIX-3744-4.x-HBase-1.1.patch

> Support snapshot scanners for MR-based queries
> --
>
> Key: PHOENIX-3744
> URL: https://issues.apache.org/jira/browse/PHOENIX-3744
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>    Assignee: Akshita Malhotra
> Attachments: PHOENIX-3744-4.x-HBase-0.98.patch, 
> PHOENIX-3744-4.x-HBase-1.1.patch, PHOENIX-3744.patch, PHOENIX-3744.patch, 
> PHOENIX-3744.patch
>
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses 
> the region directly in HDFS. We should make sure that Phoenix can support 
> that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some 
> ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the 
> past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes 
> that will be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any 
> data committed after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] phoenix pull request #256: PHOENIX-3477 patch for 4.x-HBase-1.1

2017-06-05 Thread akshita-malhotra
GitHub user akshita-malhotra opened a pull request:

https://github.com/apache/phoenix/pull/256

PHOENIX-3477 patch for 4.x-HBase-1.1

PHOENIX-3477 patch for 4.x-HBase-1.1

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akshita-malhotra/phoenix 
PHOENIX-3744-4.x-HBase-1.1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/phoenix/pull/256.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #256


commit 43367cf81bab6e957e03845ba1387017bc7e8530
Author: Akshita <akshita.malho...@salesforce.com>
Date:   2017-06-06T00:41:40Z

PHOENIX-3477 patch for 4.x-HBase-1.1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries

2017-06-05 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3744:
--
Attachment: PHOENIX-3744-4.x-HBase-0.98.patch

> Support snapshot scanners for MR-based queries
> --
>
> Key: PHOENIX-3744
> URL: https://issues.apache.org/jira/browse/PHOENIX-3744
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>    Assignee: Akshita Malhotra
> Attachments: PHOENIX-3744-4.x-HBase-0.98.patch, PHOENIX-3744.patch, 
> PHOENIX-3744.patch, PHOENIX-3744.patch
>
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses 
> the region directly in HDFS. We should make sure that Phoenix can support 
> that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some 
> ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the 
> past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes 
> that will be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any 
> data committed after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] phoenix pull request #255: PHOENIX-3744 for 4.x-HBase-0.98

2017-06-05 Thread akshita-malhotra
GitHub user akshita-malhotra opened a pull request:

https://github.com/apache/phoenix/pull/255

PHOENIX-3744 for 4.x-HBase-0.98

PHOENIX-3744 patch for 4.x-HBase-0.98 branch

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akshita-malhotra/phoenix PHOENIX-3744-4.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/phoenix/pull/255.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #255


commit 718dfb2b11c3b57ae1dc94b79d15ada516bba4a9
Author: Akshita <akshita.malho...@salesforce.com>
Date:   2017-06-05T23:49:08Z

PHOENIX-3744 for 4.x-HBase-0.98




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries

2017-05-31 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3744:
--
Attachment: PHOENIX-3744.patch

Patch wasn't applying due to recent changes to scan metrics. Resolved conflicts 
and uploaded the patch.

> Support snapshot scanners for MR-based queries
> --
>
> Key: PHOENIX-3744
> URL: https://issues.apache.org/jira/browse/PHOENIX-3744
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>    Assignee: Akshita Malhotra
> Attachments: PHOENIX-3744.patch, PHOENIX-3744.patch, 
> PHOENIX-3744.patch
>
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses 
> the region directly in HDFS. We should make sure that Phoenix can support 
> that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some 
> ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the 
> past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes 
> that will be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any 
> data committed after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries

2017-05-30 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3744:
--
Attachment: PHOENIX-3744.patch

Updated patch

> Support snapshot scanners for MR-based queries
> --
>
> Key: PHOENIX-3744
> URL: https://issues.apache.org/jira/browse/PHOENIX-3744
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>    Assignee: Akshita Malhotra
> Attachments: PHOENIX-3744.patch, PHOENIX-3744.patch
>
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses 
> the region directly in HDFS. We should make sure that Phoenix can support 
> that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some 
> ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the 
> past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes 
> that will be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any 
> data committed after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries

2017-05-30 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3744:
--
Attachment: (was: PHOENIX-3744.patch)

> Support snapshot scanners for MR-based queries
> --
>
> Key: PHOENIX-3744
> URL: https://issues.apache.org/jira/browse/PHOENIX-3744
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>    Assignee: Akshita Malhotra
> Attachments: PHOENIX-3744.patch
>
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses 
> the region directly in HDFS. We should make sure that Phoenix can support 
> that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some 
> ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the 
> past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes 
> that will be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any 
> data committed after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] phoenix issue #239: PHOENIX-3744: Support snapshot scanners for MR-based Non...

2017-05-30 Thread akshita-malhotra
Github user akshita-malhotra commented on the issue:

https://github.com/apache/phoenix/pull/239
  
@JamesRTaylor  Thanks a lot for the review. I have made the suggested 
changes and uploaded the updated patch on the jira. 
Regarding creating snapshot to generalize the use of snapshots for M/R 
jobs, I was under the impression that we are passing the snapshot name as input 
after our last discussion with @lhofhansl and Rahul G.
If we are to follow the former approach, I will go ahead and make the 
changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries

2017-05-30 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3744:
--
Attachment: PHOENIX-3744.patch

Updated patch

> Support snapshot scanners for MR-based queries
> --
>
> Key: PHOENIX-3744
> URL: https://issues.apache.org/jira/browse/PHOENIX-3744
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>    Assignee: Akshita Malhotra
> Attachments: PHOENIX-3744.patch, PHOENIX-3744.patch
>
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses 
> the region directly in HDFS. We should make sure that Phoenix can support 
> that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some 
> ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the 
> past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes 
> that will be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any 
> data committed after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] phoenix issue #239: PHOENIX-3744: Support snapshot scanners for MR-based Non...

2017-05-18 Thread akshita-malhotra
Github user akshita-malhotra commented on the issue:

https://github.com/apache/phoenix/pull/239
  
@JamesRTaylor 
- Changed ParallelScanGrouper classes as per the review
- Changes to BaseTest were to avoid the following error:
"Restore directory cannot be a sub directory of HBase root directory"
Therefore, was sending true to create the root dir. Changed to use a random 
dir instead to avoid to make these changes
- Refactored the util classes to Factory

Also, uploaded the patch on the jira.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (PHOENIX-3744) Support snapshot scanners for MR-based queries

2017-05-18 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra updated PHOENIX-3744:
--
Attachment: PHOENIX-3744.patch

PHOENIX-3744: Support snapshot scanners for MR-based Non-aggregate queries

> Support snapshot scanners for MR-based queries
> --
>
> Key: PHOENIX-3744
> URL: https://issues.apache.org/jira/browse/PHOENIX-3744
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>    Assignee: Akshita Malhotra
> Attachments: PHOENIX-3744.patch
>
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses 
> the region directly in HDFS. We should make sure that Phoenix can support 
> that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some 
> ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the 
> past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes 
> that will be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any 
> data committed after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] phoenix issue #239: Phoenix-3744: Support snapshot scanners for MR-based que...

2017-05-15 Thread akshita-malhotra
Github user akshita-malhotra commented on the issue:

https://github.com/apache/phoenix/pull/239
  
Thanks @JamesRTaylor . I squashed the commits and changed the prefix of the 
commit message. I will answer/make appropriate changes and upload the patch 
onto the jira.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] phoenix issue #239: Phoenix-3744: Support snapshot scanners for MR-based que...

2017-05-15 Thread akshita-malhotra
Github user akshita-malhotra commented on the issue:

https://github.com/apache/phoenix/pull/239
  
Sure, I will do that. Thanks @JamesRTaylor 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (PHOENIX-3852) Support snapshot scanner M/R jobs for aggregate queries

2017-05-15 Thread Akshita Malhotra (JIRA)
Akshita Malhotra created PHOENIX-3852:
-

 Summary: Support snapshot scanner M/R jobs for aggregate queries
 Key: PHOENIX-3852
 URL: https://issues.apache.org/jira/browse/PHOENIX-3852
 Project: Phoenix
  Issue Type: New Feature
Affects Versions: 4.10.0
Reporter: Akshita Malhotra
Assignee: Akshita Malhotra






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] phoenix issue #239: Phoenix-3744: Support snapshot scanners for MR-based que...

2017-05-04 Thread akshita-malhotra
Github user akshita-malhotra commented on the issue:

https://github.com/apache/phoenix/pull/239
  
- Snapshot scanner for non-aggregate queries.
- Added integration tests (simple select query, conditional and limit)
- Abstracted out ScanRegionObserver code to fetch the processed region 
scanner without the coprocessor environment
- To make the snapshot work for aggregate queries, due to the complexity of 
the aggregate region observer  code it is almost impossible to refactor the 
code without fully understanding the functionality. Will require some guidance 
if the aggregate query use case is required.
@JamesRTaylor @lhofhansl 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (PHOENIX-3820) Refactor Region Observer functionality (PHOENIX) to fetch processed region scanner without coprocessor environment

2017-05-01 Thread Akshita Malhotra (JIRA)
Akshita Malhotra created PHOENIX-3820:
-

 Summary: Refactor Region Observer functionality (PHOENIX) to fetch 
processed region scanner without coprocessor environment
 Key: PHOENIX-3820
 URL: https://issues.apache.org/jira/browse/PHOENIX-3820
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 4.10.0
Reporter: Akshita Malhotra
Assignee: Akshita Malhotra






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (PHOENIX-3812) Explore using HBase snapshots in async index building M/R job

2017-05-01 Thread Akshita Malhotra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akshita Malhotra reassigned PHOENIX-3812:
-

Assignee: Akshita Malhotra  (was: Maddineni Sukumar)

> Explore using HBase snapshots in async index building M/R job
> -
>
> Key: PHOENIX-3812
> URL: https://issues.apache.org/jira/browse/PHOENIX-3812
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.10.0
>Reporter: Maddineni Sukumar
>Assignee: Akshita Malhotra
>
> As per discussion with James,  HBase snapshots makes it lot easier and faster 
> to operate on existing data. 
> So explore using HBase snapshots in index building M/R job for async index. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (PHOENIX-3744) Support snapshot scanners for MR-based queries

2017-04-24 Thread Akshita Malhotra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981853#comment-15981853
 ] 

Akshita Malhotra edited comment on PHOENIX-3744 at 4/24/17 8:41 PM:


Parallel Scan grouper is extended to differentiate the functionality for 
getting region boundaries

Added integration test, compares the snapshot read result with the result from 
select query by setting CurrentScn value.

the configuration parameter is the snapshot name key, if set do a snapshot read

Used an existing PhoenixIndexDBWritable class for the purpose of testing, will 
add a new one as I will add more tests.

ExpressionProjector functionality is extended for snapshots as the keyvalue 
format returned from TableSnapshotScanner is different from ClientScanner and 
therefore not properly interrupted by Phoenix thereby returning null in case of 
projected columns.
For the same table, following shows the different format of the keyvalues:

ClientScanner:
keyvalues={AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=�SSDD��}

TableSnapshotScanner:
keyvalues={AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x,
AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD}

To DO:
Add more integration tests to cover different scenarios such as where clause etc

fyi: [~jamestaylor]


was (Author: akshita.malhotra):
Parallel Scan grouper is extended to differentiate the functionality for 
getting region boundaries

Added integration test, compares the snapshot read result with the result from 
select query by setting CurrentScn value.

the configuration parameter is the snapshot name key, if set do a snapshot read

Used an existing PhoenixIndexDBWritable class for the purpose of testing, will 
add a new one as I will add more tests.

ExpressionProjector functionality is extended for snapshots as the keyvalue 
format returned from TableSnapshotScanner is different from ClientScanner and 
therefore not properly interrupted by Phoenix thereby returning null in case of 
projected columns.
For the same table, following shows the different format of the keyvalues:

ClientScanner:
keyvalues={AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=�SSDD��}

TableSnapshotScanner:
keyvalues={AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x,
AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD}

To DO:
Add more integration tests to cover different scenarios such as where clause etc

[~jamestaylor]

> Support snapshot scanners for MR-based queries
> --
>
> Key: PHOENIX-3744
> URL: https://issues.apache.org/jira/browse/PHOENIX-3744
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>    Assignee: Akshita Malhotra
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses 
> the region directly in HDFS. We should make sure that Phoenix can support 
> that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some 
> ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the 
> past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes 
> that will be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any 
> data committed after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (PHOENIX-3744) Support snapshot scanners for MR-based queries

2017-04-24 Thread Akshita Malhotra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981853#comment-15981853
 ] 

Akshita Malhotra edited comment on PHOENIX-3744 at 4/24/17 8:41 PM:


Parallel Scan grouper is extended to differentiate the functionality for 
getting region boundaries

Added integration test, compares the snapshot read result with the result from 
select query by setting CurrentScn value.

the configuration parameter is the snapshot name key, if set do a snapshot read

Used an existing PhoenixIndexDBWritable class for the purpose of testing, will 
add a new one as I will add more tests.

ExpressionProjector functionality is extended for snapshots as the keyvalue 
format returned from TableSnapshotScanner is different from ClientScanner and 
therefore not properly interrupted by Phoenix thereby returning null in case of 
projected columns.
For the same table, following shows the different format of the keyvalues:

ClientScanner:
keyvalues={AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=�SSDD��}

TableSnapshotScanner:
keyvalues={AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x,
AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD}

To DO:
Add more integration tests to cover different scenarios such as where clause etc

[~jamestaylor]


was (Author: akshita.malhotra):
Parallel Scan grouper is extended to differentiate the functionality for 
getting region boundaries

Added integration test, compares the snapshot read result with the result from 
select query by setting CurrentScn value.

the configuration parameter is the snapshot name key, if set do a snapshot read

Used an existing PhoenixIndexDBWritable class for the purpose of testing, will 
add a new one as I will add more tests.

ExpressionProjector functionality is extended for snapshots as the keyvalue 
format returned from TableSnapshotScanner is different from ClientScanner and 
therefore not properly interrupted by Phoenix thereby returning null in case of 
projected columns.
For the same table, following shows the different format of the keyvalues:

ClientScanner:
keyvalues={AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=�SSDD��}

TableSnapshotScanner:
keyvalues={AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x,
AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD}

To DO:
Add more integration tests to cover different scenarios such as where clause etc

> Support snapshot scanners for MR-based queries
> --
>
> Key: PHOENIX-3744
> URL: https://issues.apache.org/jira/browse/PHOENIX-3744
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>    Assignee: Akshita Malhotra
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses 
> the region directly in HDFS. We should make sure that Phoenix can support 
> that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some 
> ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the 
> past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes 
> that will be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any 
> data committed after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] phoenix pull request #239: Phoenix-3744: Support snapshot scanners for MR-ba...

2017-04-24 Thread akshita-malhotra
GitHub user akshita-malhotra opened a pull request:

https://github.com/apache/phoenix/pull/239

Phoenix-3744: Support snapshot scanners for MR-based queries

- Parallel Scan grouper is extended to differentiate the functionality for 
getting region boundaries

- Added integration test, compares the snapshot read result with the result 
from select query by setting CurrentScn value.

- the configuration parameter is the snapshot name key, if set do a 
snapshot read

- Used an existing PhoenixIndexDBWritable class for the purpose of testing, 
will add a new one as I will add more tests.

- ExpressionProjector functionality is extended for snapshots as the 
keyvalue format returned from TableSnapshotScanner is different from 
ClientScanner and therefore not properly interrupted by Phoenix thereby 
returning null in case of projected columns.
For the same table, following shows the different format of the keyvalues:

1. ClientScanner:

keyvalues={AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=SSDD}

2. TableSnapshotScanner:

keyvalues={AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x, 
AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD}

@JamesRTaylor @lhofhansl 

To DO:
Add more integration tests to cover different scenarios such as where 
clause etc


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/akshita-malhotra/phoenix Phoenix-3744

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/phoenix/pull/239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #239


commit 73b1ac04c45381a2a0511146c666af476e488cdf
Author: Akshita <akshita.malho...@salesforce.com>
Date:   2017-04-24T18:43:02Z

Phoenix-3744: Support snapshot scanners for MR-based queries




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (PHOENIX-3475) MetaData #getTables() API doesn't return view indexes

2016-11-10 Thread Akshita Malhotra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656021#comment-15656021
 ] 

Akshita Malhotra commented on PHOENIX-3475:
---

Yes, I am looking for Phoenix metadata. As per my understanding, the data 
corresponding to a view on an index is stored in the base index table 
(_IDX_) and there is no HTable which map to a view index name 
(globalViewIdx in the above test scenario). Therefore, to migrate view indexes 
we need to copy data in the base index table similar to what we do in case of 
views, copying over rows in SYSTEM.CATALOG

For example:
When I run #getTables("","", "_IDX_MIGRATIONTEST", new String[] 
{"INDEX","TABLE"}), it returns empty result set. 
How can I get metadata corresponding to this table?

> MetaData #getTables() API doesn't return view indexes
> -
>
> Key: PHOENIX-3475
> URL: https://issues.apache.org/jira/browse/PHOENIX-3475
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Akshita Malhotra
> Fix For: 4.9.0
>
>
> HBase migration tool uses DatabaseMetadata#getTables() API to retrieve the 
> tables for copying data. We have found that API doesn't return base index 
> tables ( _IDX_)
> For testing purposes, we issue following DDL to generate the view and the 
> corresponding view index:
> -CREATE VIEW IF NOT EXISTS MIGRATIONTEST_VIEW (OLD_VALUE_VIEW varchar) AS 
> SELECT * FROM MIGRATIONTEST WHERE OLD_VALUE like 'E%'
> -CREATE INDEX IF NOT EXISTS MIGRATIONTEST_VIEW_IDX ON MIGRATIONTEST_VIEW 
> (OLD_VALUE_VIEW)
> By using HBase API, we were able to confirm that base index table 
> (_IDX_MIGRATIONTEST) is created. 
> Both jdbc  DatabaseMetadata API and P* getMetaDataCache API doesn't seem to 
> be returning view indexes. Also P*MetaData #getTableRef API return 
> "TableNotFoundException" when attempted to fetch PTable corresponding to the 
> base index table name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3475) MetaData #getTables() API doesn't return view indexes

2016-11-10 Thread Akshita Malhotra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655525#comment-15655525
 ] 

Akshita Malhotra commented on PHOENIX-3475:
---

Thanks [~jamestaylor]
Both the solutions give us just the name of the base index table. We require 
additional Phoenix metadata to construct table descriptor. Also using 
DatabaseMetadats #getTables() to retrieve all index types is undesirable as it 
returns views whose names don't map to an HBase table. Moreover, it still 
doesn't return view indexes.
In all, the task is to retrieve complete Phoenix metadata corresponding to a 
physical view index table (not just the name).



> MetaData #getTables() API doesn't return view indexes
> -
>
> Key: PHOENIX-3475
> URL: https://issues.apache.org/jira/browse/PHOENIX-3475
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Akshita Malhotra
> Fix For: 4.9.0
>
>
> HBase migration tool uses DatabaseMetadata#getTables() API to retrieve the 
> tables for copying data. We have found that API doesn't return base index 
> tables ( _IDX_)
> For testing purposes, we issue following DDL to generate the view and the 
> corresponding view index:
> -CREATE VIEW IF NOT EXISTS MIGRATIONTEST_VIEW (OLD_VALUE_VIEW varchar) AS 
> SELECT * FROM MIGRATIONTEST WHERE OLD_VALUE like 'E%'
> -CREATE INDEX IF NOT EXISTS MIGRATIONTEST_VIEW_IDX ON MIGRATIONTEST_VIEW 
> (OLD_VALUE_VIEW)
> By using HBase API, we were able to confirm that base index table 
> (_IDX_MIGRATIONTEST) is created. 
> Both jdbc  DatabaseMetadata API and P* getMetaDataCache API doesn't seem to 
> be returning view indexes. Also P*MetaData #getTableRef API return 
> "TableNotFoundException" when attempted to fetch PTable corresponding to the 
> base index table name.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3475) MetaData #getTables() API doesn't return view indexes

2016-11-10 Thread Akshita Malhotra (JIRA)
Akshita Malhotra created PHOENIX-3475:
-

 Summary: MetaData #getTables() API doesn't return view indexes
 Key: PHOENIX-3475
 URL: https://issues.apache.org/jira/browse/PHOENIX-3475
 Project: Phoenix
  Issue Type: Bug
Reporter: Akshita Malhotra
 Fix For: 4.9.0


HBase migration tool uses DatabaseMetadata#getTables() API to retrieve the 
tables for copying data. We have found that API doesn't return base index 
tables ( _IDX_)

For testing purposes, we issue following DDL to generate the view and the 
corresponding view index:
-CREATE VIEW IF NOT EXISTS MIGRATIONTEST_VIEW (OLD_VALUE_VIEW varchar) AS 
SELECT * FROM MIGRATIONTEST WHERE OLD_VALUE like 'E%'
-CREATE INDEX IF NOT EXISTS MIGRATIONTEST_VIEW_IDX ON MIGRATIONTEST_VIEW 
(OLD_VALUE_VIEW)

By using HBase API, we were able to confirm that base index table 
(_IDX_MIGRATIONTEST) is created. 

Both jdbc  DatabaseMetadata API and P* getMetaDataCache API doesn't seem to be 
returning view indexes. Also P*MetaData #getTableRef API return 
"TableNotFoundException" when attempted to fetch PTable corresponding to the 
base index table name.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)