[jira] [Work logged] (HIVE-24187) Handle _files creation for HA config with same nameservice name on source and destination

2020-11-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24187?focusedWorklogId=517575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-517575
 ]

ASF GitHub Bot logged work on HIVE-24187:
-

Author: ASF GitHub Bot
Created on: 29/Nov/20 00:47
Start Date: 29/Nov/20 00:47
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #1515:
URL: https://github.com/apache/hive/pull/1515


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 517575)
Time Spent: 1.5h  (was: 1h 20m)

> Handle _files creation for HA config with same nameservice name on source and 
> destination
> -
>
> Key: HIVE-24187
> URL: https://issues.apache.org/jira/browse/HIVE-24187
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24187.01.patch, HIVE-24187.02.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Current HA is supported only for different nameservices on Source and 
> Destination. We need to add support of same nameservice on Source and 
> Destination.
> Local nameservice will be passed correctly to the repl command.
> Remote nameservice will be a random name and corresponding configs for the 
> same.
> Example:
> Clusters originally configured with ns for hdfs:
> src: ns1
> target : ns1
> We can denote remote name with some random name, say for example: nsRemote. 
> This is how the command will see the ns w.r.t source and target:
> Repl Dump : src: ns1, target: nsRemote
> Repl Load: src: nsRemote, target: ns1
> Entries in the _files(for managed table data loc) will be made with nsRemote 
> in stead of ns1(for src).
> Example: 
> hdfs://nsRemote/whLoc/dbName.db/table1:checksum:subDir:hdfs://nsRemote/cmroot
> Same way list of external table data locations will also be modified using 
> nsRemote in stead of ns1(for src).
> New configs can control the behavior:
> *hive.repl.ha.datapath.replace.remote.nameservice = *
> *hive.repl.ha.datapath.replace.remote.nameservice.name = *
> Based on the above configs replacement of nameservice can be done.
> This will also require that 'hive.repl.rootdir' is passed accordingly during 
> dump and load:
> Repl dump:
> ||Repl Operation||Repl Command||
> |*Staging on source cluster*|
> |Repl Dump|repl dump dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |*Staging on target cluster*|
> |Repl Dump|repl dump dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24187) Handle _files creation for HA config with same nameservice name on source and destination

2020-11-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24187?focusedWorklogId=515184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-515184
 ]

ASF GitHub Bot logged work on HIVE-24187:
-

Author: ASF GitHub Bot
Created on: 22/Nov/20 00:44
Start Date: 22/Nov/20 00:44
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on pull request #1515:
URL: https://github.com/apache/hive/pull/1515#issuecomment-731659041


   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 515184)
Time Spent: 1h 20m  (was: 1h 10m)

> Handle _files creation for HA config with same nameservice name on source and 
> destination
> -
>
> Key: HIVE-24187
> URL: https://issues.apache.org/jira/browse/HIVE-24187
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24187.01.patch, HIVE-24187.02.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Current HA is supported only for different nameservices on Source and 
> Destination. We need to add support of same nameservice on Source and 
> Destination.
> Local nameservice will be passed correctly to the repl command.
> Remote nameservice will be a random name and corresponding configs for the 
> same.
> Example:
> Clusters originally configured with ns for hdfs:
> src: ns1
> target : ns1
> We can denote remote name with some random name, say for example: nsRemote. 
> This is how the command will see the ns w.r.t source and target:
> Repl Dump : src: ns1, target: nsRemote
> Repl Load: src: nsRemote, target: ns1
> Entries in the _files(for managed table data loc) will be made with nsRemote 
> in stead of ns1(for src).
> Example: 
> hdfs://nsRemote/whLoc/dbName.db/table1:checksum:subDir:hdfs://nsRemote/cmroot
> Same way list of external table data locations will also be modified using 
> nsRemote in stead of ns1(for src).
> New configs can control the behavior:
> *hive.repl.ha.datapath.replace.remote.nameservice = *
> *hive.repl.ha.datapath.replace.remote.nameservice.name = *
> Based on the above configs replacement of nameservice can be done.
> This will also require that 'hive.repl.rootdir' is passed accordingly during 
> dump and load:
> Repl dump:
> ||Repl Operation||Repl Command||
> |*Staging on source cluster*|
> |Repl Dump|repl dump dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |*Staging on target cluster*|
> |Repl Dump|repl dump dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24187) Handle _files creation for HA config with same nameservice name on source and destination

2020-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24187?focusedWorklogId=489185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-489185
 ]

ASF GitHub Bot logged work on HIVE-24187:
-

Author: ASF GitHub Bot
Created on: 23/Sep/20 04:42
Start Date: 23/Sep/20 04:42
Worklog Time Spent: 10m 
  Work Description: pkumarsinha opened a new pull request #1515:
URL: https://github.com/apache/hive/pull/1515


   …e name on source and destination
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 489185)
Time Spent: 1h 10m  (was: 1h)

> Handle _files creation for HA config with same nameservice name on source and 
> destination
> -
>
> Key: HIVE-24187
> URL: https://issues.apache.org/jira/browse/HIVE-24187
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24187.01.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Current HA is supported only for different nameservices on Source and 
> Destination. We need to add support of same nameservice on Source and 
> Destination.
> Local nameservice will be passed correctly to the repl command.
> Remote nameservice will be a random name and corresponding configs for the 
> same.
> Example:
> Clusters originally configured with ns for hdfs:
> src: ns1
> target : ns1
> We can denote remote name with some random name, say for example: nsRemote. 
> This is how the command will see the ns w.r.t source and target:
> Repl Dump : src: ns1, target: nsRemote
> Repl Load: src: nsRemote, target: ns1
> Entries in the _files(for managed table data loc) will be made with nsRemote 
> in stead of ns1(for src).
> Example: 
> hdfs://nsRemote/whLoc/dbName.db/table1:checksum:subDir:hdfs://nsRemote/cmroot
> Same way list of external table data locations will also be modified using 
> nsRemote in stead of ns1(for src).
> New configs can control the behavior:
> *hive.repl.ha.datapath.replace.remote.nameservice = *
> *hive.repl.ha.datapath.replace.remote.nameservice.name = *
> Based on the above configs replacement of nameservice can be done.
> This will also require that 'hive.repl.rootdir' is passed accordingly during 
> dump and load:
> Repl dump:
> ||Repl Operation||Repl Command||
> |*Staging on source cluster*|
> |Repl Dump|repl dump dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |*Staging on target cluster*|
> |Repl Dump|repl dump dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24187) Handle _files creation for HA config with same nameservice name on source and destination

2020-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24187?focusedWorklogId=43=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-43
 ]

ASF GitHub Bot logged work on HIVE-24187:
-

Author: ASF GitHub Bot
Created on: 23/Sep/20 04:16
Start Date: 23/Sep/20 04:16
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1515:
URL: https://github.com/apache/hive/pull/1515#discussion_r492461348



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
##
@@ -424,6 +424,20 @@ public String encodeFileUri(String fileUriStr, String 
fileChecksum, String encod
 return encodedUri;
   }
 
+  public static String encodeFileUri(String fileUriStr, String fileChecksum, 
String cmroot, String encodedSubDir) {
+String encodedUri = fileUriStr;
+if ((fileChecksum != null) && (cmroot != null)) {
+  encodedUri = encodedUri + URI_FRAGMENT_SEPARATOR + fileChecksum + 
URI_FRAGMENT_SEPARATOR + cmroot;
+} else {
+  encodedUri = encodedUri + URI_FRAGMENT_SEPARATOR + 
URI_FRAGMENT_SEPARATOR;

Review comment:
   why do we have 2 URI_FRAGMENT_SEPARATOR

##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
##
@@ -424,6 +424,20 @@ public String encodeFileUri(String fileUriStr, String 
fileChecksum, String encod
 return encodedUri;
   }
 
+  public static String encodeFileUri(String fileUriStr, String fileChecksum, 
String cmroot, String encodedSubDir) {
+String encodedUri = fileUriStr;
+if ((fileChecksum != null) && (cmroot != null)) {
+  encodedUri = encodedUri + URI_FRAGMENT_SEPARATOR + fileChecksum + 
URI_FRAGMENT_SEPARATOR + cmroot;
+} else {
+  encodedUri = encodedUri + URI_FRAGMENT_SEPARATOR + 
URI_FRAGMENT_SEPARATOR;
+}
+encodedUri = encodedUri + URI_FRAGMENT_SEPARATOR + ((encodedSubDir != 
null) ? encodedSubDir : "");
+if (LOG.isDebugEnabled()) {

Review comment:
   Do we need this check?

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -522,6 +522,14 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 REPLCMINTERVAL("hive.repl.cm.interval","3600s",
 new TimeValidator(TimeUnit.SECONDS),
 "Inteval for cmroot cleanup thread."),
+
REPL_HA_DATAPATH_REPLACE_REMOTE_NAMESERVICE("hive.repl.ha.datapath.replace.remote.nameservice",
 false,
+"When HDFS is HA enabled and both source and target clusters are 
configured with same nameservice names," +
+"enable this flag and provide a "),

Review comment:
   sentence is incomplete

##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
##
@@ -424,6 +424,20 @@ public String encodeFileUri(String fileUriStr, String 
fileChecksum, String encod
 return encodedUri;
   }
 
+  public static String encodeFileUri(String fileUriStr, String fileChecksum, 
String cmroot, String encodedSubDir) {
+String encodedUri = fileUriStr;
+if ((fileChecksum != null) && (cmroot != null)) {

Review comment:
   empty check not needed?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/Utils.java
##
@@ -72,6 +76,40 @@ public static void writeOutput(List> 
listValues, Path outputFile, H
 writeOutput(listValues, outputFile, hiveConf, false);
   }
 
+  /**
+   * Given a ReplChangeManger's encoded uri, replaces the namespace and 
returns the modified encoded uri.
+   */
+  public static String replaceNameSpaceInEncodedURI(String cmEncodedURI, 
HiveConf hiveConf) throws SemanticException {

Review comment:
   replace name service?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
##
@@ -1963,4 +2062,12 @@ private void setupUDFJarOnHDFS(Path 
identityUdfLocalPath, Path identityUdfHdfsPa
 FileSystem fs = primary.miniDFSCluster.getFileSystem();
 fs.copyFromLocalFile(identityUdfLocalPath, identityUdfHdfsPath);
   }
+
+  private List getHdfsNamespaceClause() {

Review comment:
   replace with nameservice

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
##
@@ -1604,6 +1605,122 @@ public void testRangerReplication() throws Throwable {
 .verifyResults(new String[] {"1", "2"});
   }
 
+  @Test
+  public void testHdfsNamespaceLazyCopy() throws Throwable {
+List clause = getHdfsNameserviceClause();
+clause.add("'" + 
HiveConf.ConfVars.REPL_DUMP_METADATA_ONLY_FOR_EXTERNAL_TABLE.varname + 
"'='true'");
+primary.run("use " + primaryDbName)
+.run("create table  acid_table 

[jira] [Work logged] (HIVE-24187) Handle _files creation for HA config with same nameservice name on source and destination

2020-09-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24187?focusedWorklogId=487986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487986
 ]

ASF GitHub Bot logged work on HIVE-24187:
-

Author: ASF GitHub Bot
Created on: 22/Sep/20 06:47
Start Date: 22/Sep/20 06:47
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1515:
URL: https://github.com/apache/hive/pull/1515#discussion_r492506493



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
##
@@ -1604,6 +1605,122 @@ public void testRangerReplication() throws Throwable {
 .verifyResults(new String[] {"1", "2"});
   }
 
+  @Test
+  public void testHdfsNamespaceLazyCopy() throws Throwable {
+List clause = getHdfsNameserviceClause();
+clause.add("'" + 
HiveConf.ConfVars.REPL_DUMP_METADATA_ONLY_FOR_EXTERNAL_TABLE.varname + 
"'='true'");
+primary.run("use " + primaryDbName)
+.run("create table  acid_table (key int, value int) partitioned by 
(load_date date) " +
+"clustered by(key) into 2 buckets stored as orc 
tblproperties ('transactional'='true')")
+.run("create table table1 (i int)")
+.run("insert into table1 values (1)")
+.run("insert into table1 values (2)")
+.run("create external table ext_table1 (id int)")
+.run("insert into ext_table1 values (3)")
+.run("insert into ext_table1 values (4)")
+.dump(primaryDbName, clause);
+
+try{
+  replica.load(replicatedDbName, primaryDbName, clause);
+  Assert.fail("Expected the UnknownHostException to be thrown.");
+} catch (IllegalArgumentException ex) {
+  assertTrue(ex.getMessage().contains("java.net.UnknownHostException: 
nsRemote"));
+}
+  }
+
+  @Test
+  public void testHdfsNamespaceLazyCopyIncr() throws Throwable {
+ArrayList clause = new ArrayList();
+clause.add("'" + 
HiveConf.ConfVars.REPL_DUMP_METADATA_ONLY_FOR_EXTERNAL_TABLE.varname + 
"'='true'");
+primary.run("use " + primaryDbName)
+.run("create table  acid_table (key int, value int) partitioned by 
(load_date date) " +
+"clustered by(key) into 2 buckets stored as orc 
tblproperties ('transactional'='true')")
+.run("create table table1 (i String)")
+.run("insert into table1 values (1)")
+.run("insert into table1 values (2)")
+.run("create external table ext_table1 (id int)")
+.run("insert into ext_table1 values (3)")
+.run("insert into ext_table1 values (4)")
+.dump(primaryDbName);
+
+replica.load(replicatedDbName, primaryDbName, clause)
+.run("use " + replicatedDbName)
+.run("show tables")
+.verifyResults(new String[] {"acid_table", "table1", "ext_table1"})
+.run("select * from table1")
+.verifyResults(new String[] {"1", "2"})
+.run("select * from ext_table1")
+.verifyResults(new String[] {"3", "4"});
+
+clause.addAll(getHdfsNameserviceClause());
+primary.run("use " + primaryDbName)
+.run("insert into table1 values (5)")
+.run("insert into ext_table1 values (6)")
+.dump(primaryDbName, clause);
+try{
+  replica.load(replicatedDbName, primaryDbName, clause);
+  Assert.fail("Expected the UnknownHostException to be thrown.");
+} catch (IllegalArgumentException ex) {
+  assertTrue(ex.getMessage().contains("java.net.UnknownHostException: 
nsRemote"));
+}
+  }
+
+  @Test
+  public void testHdfsNamespaceWithDataCopy() throws Throwable {

Review comment:
   nameservice





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 487986)
Time Spent: 50m  (was: 40m)

> Handle _files creation for HA config with same nameservice name on source and 
> destination
> -
>
> Key: HIVE-24187
> URL: https://issues.apache.org/jira/browse/HIVE-24187
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24187.01.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Current HA is supported only for different nameservices on Source and 
> Destination. We need to 

[jira] [Work logged] (HIVE-24187) Handle _files creation for HA config with same nameservice name on source and destination

2020-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24187?focusedWorklogId=487930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487930
 ]

ASF GitHub Bot logged work on HIVE-24187:
-

Author: ASF GitHub Bot
Created on: 22/Sep/20 04:25
Start Date: 22/Sep/20 04:25
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1515:
URL: https://github.com/apache/hive/pull/1515#discussion_r492467429



##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/parse/TestReplicationScenariosAcrossInstances.java
##
@@ -1963,4 +2062,12 @@ private void setupUDFJarOnHDFS(Path 
identityUdfLocalPath, Path identityUdfHdfsPa
 FileSystem fs = primary.miniDFSCluster.getFileSystem();
 fs.copyFromLocalFile(identityUdfLocalPath, identityUdfHdfsPath);
   }
+
+  private List getHdfsNamespaceClause() {

Review comment:
   replace with nameservice





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 487930)
Time Spent: 40m  (was: 0.5h)

> Handle _files creation for HA config with same nameservice name on source and 
> destination
> -
>
> Key: HIVE-24187
> URL: https://issues.apache.org/jira/browse/HIVE-24187
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24187.01.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Current HA is supported only for different nameservices on Source and 
> Destination. We need to add support of same nameservice on Source and 
> Destination.
> Local nameservice will be passed correctly to the repl command.
> Remote nameservice will be a random name and corresponding configs for the 
> same.
> Example:
> Clusters originally configured with ns for hdfs:
> src: ns1
> target : ns1
> We can denote remote name with some random name, say for example: nsRemote. 
> This is how the command will see the ns w.r.t source and target:
> Repl Dump : src: ns1, target: nsRemote
> Repl Load: src: nsRemote, target: ns1
> Entries in the _files(for managed table data loc) will be made with nsRemote 
> in stead of ns1(for src).
> Example: 
> hdfs://nsRemote/whLoc/dbName.db/table1:checksum:subDir:hdfs://nsRemote/cmroot
> Same way list of external table data locations will also be modified using 
> nsRemote in stead of ns1(for src).
> New configs can control the behavior:
> *hive.repl.ha.datapath.replace.remote.nameservice = *
> *hive.repl.ha.datapath.replace.remote.nameservice.name = *
> Based on the above configs replacement of nameservice can be done.
> This will also require that 'hive.repl.rootdir' is passed accordingly during 
> dump and load:
> Repl dump:
> ||Repl Operation||Repl Command||
> |*Staging on source cluster*|
> |Repl Dump|repl dump dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |*Staging on target cluster*|
> |Repl Dump|repl dump dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24187) Handle _files creation for HA config with same nameservice name on source and destination

2020-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24187?focusedWorklogId=487928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487928
 ]

ASF GitHub Bot logged work on HIVE-24187:
-

Author: ASF GitHub Bot
Created on: 22/Sep/20 04:00
Start Date: 22/Sep/20 04:00
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1515:
URL: https://github.com/apache/hive/pull/1515#discussion_r492461348



##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
##
@@ -424,6 +424,20 @@ public String encodeFileUri(String fileUriStr, String 
fileChecksum, String encod
 return encodedUri;
   }
 
+  public static String encodeFileUri(String fileUriStr, String fileChecksum, 
String cmroot, String encodedSubDir) {
+String encodedUri = fileUriStr;
+if ((fileChecksum != null) && (cmroot != null)) {
+  encodedUri = encodedUri + URI_FRAGMENT_SEPARATOR + fileChecksum + 
URI_FRAGMENT_SEPARATOR + cmroot;
+} else {
+  encodedUri = encodedUri + URI_FRAGMENT_SEPARATOR + 
URI_FRAGMENT_SEPARATOR;

Review comment:
   why do we have 2 URI_FRAGMENT_SEPARATOR

##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
##
@@ -424,6 +424,20 @@ public String encodeFileUri(String fileUriStr, String 
fileChecksum, String encod
 return encodedUri;
   }
 
+  public static String encodeFileUri(String fileUriStr, String fileChecksum, 
String cmroot, String encodedSubDir) {
+String encodedUri = fileUriStr;
+if ((fileChecksum != null) && (cmroot != null)) {
+  encodedUri = encodedUri + URI_FRAGMENT_SEPARATOR + fileChecksum + 
URI_FRAGMENT_SEPARATOR + cmroot;
+} else {
+  encodedUri = encodedUri + URI_FRAGMENT_SEPARATOR + 
URI_FRAGMENT_SEPARATOR;
+}
+encodedUri = encodedUri + URI_FRAGMENT_SEPARATOR + ((encodedSubDir != 
null) ? encodedSubDir : "");
+if (LOG.isDebugEnabled()) {

Review comment:
   Do we need this check?

##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -522,6 +522,14 @@ private static void populateLlapDaemonVarsSet(Set 
llapDaemonVarsSetLocal
 REPLCMINTERVAL("hive.repl.cm.interval","3600s",
 new TimeValidator(TimeUnit.SECONDS),
 "Inteval for cmroot cleanup thread."),
+
REPL_HA_DATAPATH_REPLACE_REMOTE_NAMESERVICE("hive.repl.ha.datapath.replace.remote.nameservice",
 false,
+"When HDFS is HA enabled and both source and target clusters are 
configured with same nameservice names," +
+"enable this flag and provide a "),

Review comment:
   sentence is incomplete

##
File path: 
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/ReplChangeManager.java
##
@@ -424,6 +424,20 @@ public String encodeFileUri(String fileUriStr, String 
fileChecksum, String encod
 return encodedUri;
   }
 
+  public static String encodeFileUri(String fileUriStr, String fileChecksum, 
String cmroot, String encodedSubDir) {
+String encodedUri = fileUriStr;
+if ((fileChecksum != null) && (cmroot != null)) {

Review comment:
   empty check not needed?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/dump/Utils.java
##
@@ -72,6 +76,40 @@ public static void writeOutput(List> 
listValues, Path outputFile, H
 writeOutput(listValues, outputFile, hiveConf, false);
   }
 
+  /**
+   * Given a ReplChangeManger's encoded uri, replaces the namespace and 
returns the modified encoded uri.
+   */
+  public static String replaceNameSpaceInEncodedURI(String cmEncodedURI, 
HiveConf hiveConf) throws SemanticException {

Review comment:
   replace name service?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 487928)
Time Spent: 0.5h  (was: 20m)

> Handle _files creation for HA config with same nameservice name on source and 
> destination
> -
>
> Key: HIVE-24187
> URL: https://issues.apache.org/jira/browse/HIVE-24187
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24187.01.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Current HA is supported only 

[jira] [Work logged] (HIVE-24187) Handle _files creation for HA config with same nameservice name on source and destination

2020-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24187?focusedWorklogId=487796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487796
 ]

ASF GitHub Bot logged work on HIVE-24187:
-

Author: ASF GitHub Bot
Created on: 22/Sep/20 03:36
Start Date: 22/Sep/20 03:36
Worklog Time Spent: 10m 
  Work Description: pkumarsinha opened a new pull request #1515:
URL: https://github.com/apache/hive/pull/1515


   …e name on source and destination
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 487796)
Time Spent: 20m  (was: 10m)

> Handle _files creation for HA config with same nameservice name on source and 
> destination
> -
>
> Key: HIVE-24187
> URL: https://issues.apache.org/jira/browse/HIVE-24187
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24187.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Current HA is supported only for different nameservices on Source and 
> Destination. We need to add support of same nameservice on Source and 
> Destination.
> Local nameservice will be passed correctly to the repl command.
> Remote nameservice will be a random name and corresponding configs for the 
> same.
> Example:
> Clusters originally configured with ns for hdfs:
> src: ns1
> target : ns1
> We can denote remote name with some random name, say for example: nsRemote. 
> This is how the command will see the ns w.r.t source and target:
> Repl Dump : src: ns1, target: nsRemote
> Repl Load: src: nsRemote, target: ns1
> Entries in the _files(for managed table data loc) will be made with nsRemote 
> in stead of ns1(for src).
> Example: 
> hdfs://nsRemote/whLoc/dbName.db/table1:checksum:subDir:hdfs://nsRemote/cmroot
> Same way list of external table data locations will also be modified using 
> nsRemote in stead of ns1(for src).
> New configs can control the behavior:
> *hive.repl.ha.datapath.replace.remote.nameservice = *
> *hive.repl.ha.datapath.replace.remote.nameservice.name = *
> Based on the above configs replacement of nameservice can be done.
> This will also require that 'hive.repl.rootdir' is passed accordingly during 
> dump and load:
> Repl dump:
> ||Repl Operation||Repl Command||
> |*Staging on source cluster*|
> |Repl Dump|repl dump dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |*Staging on target cluster*|
> |Repl Dump|repl dump dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24187) Handle _files creation for HA config with same nameservice name on source and destination

2020-09-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24187?focusedWorklogId=487337=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-487337
 ]

ASF GitHub Bot logged work on HIVE-24187:
-

Author: ASF GitHub Bot
Created on: 21/Sep/20 23:42
Start Date: 21/Sep/20 23:42
Worklog Time Spent: 10m 
  Work Description: pkumarsinha opened a new pull request #1515:
URL: https://github.com/apache/hive/pull/1515


   …e name on source and destination
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 487337)
Remaining Estimate: 0h
Time Spent: 10m

> Handle _files creation for HA config with same nameservice name on source and 
> destination
> -
>
> Key: HIVE-24187
> URL: https://issues.apache.org/jira/browse/HIVE-24187
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
> Attachments: HIVE-24187.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Current HA is supported only for different nameservices on Source and 
> Destination. We need to add support of same nameservice on Source and 
> Destination.
> Local nameservice will be passed correctly to the repl command.
> Remote nameservice will be a random name and corresponding configs for the 
> same.
> Example:
> Clusters originally configured with ns for hdfs:
> src: ns1
> target : ns1
> We can denote remote name with some random name, say for example: nsRemote. 
> This is how the command will see the ns w.r.t source and target:
> Repl Dump : src: ns1, target: nsRemote
> Repl Load: src: nsRemote, target: ns1
> Entries in the _files(for managed table data loc) will be made with nsRemote 
> in stead of ns1(for src).
> Example: 
> hdfs://nsRemote/whLoc/dbName.db/table1:checksum:subDir:hdfs://nsRemote/cmroot
> Same way list of external table data locations will also be modified using 
> nsRemote in stead of ns1(for src).
> New configs can control the behavior:
> *hive.repl.ha.datapath.replace.remote.nameservice = *
> *hive.repl.ha.datapath.replace.remote.nameservice.name = *
> Based on the above configs replacement of nameservice can be done.
> This will also require that 'hive.repl.rootdir' is passed accordingly during 
> dump and load:
> Repl dump:
> ||Repl Operation||Repl Command||
> |*Staging on source cluster*|
> |Repl Dump|repl dump dbName with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |*Staging on target cluster*|
> |Repl Dump|repl dump dbName 
> with('hive.repl.rootdir'='hdfs://nsRemote/stagingLoc')|
> |Repl Load|repl load dbName into dbName 
> with('hive.repl.rootdir'='hdfs://ns1/stagingLoc')|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)