Re: [PR] RANGER-4761: make lazy memory allocation for family map instead … [ranger]

2024-05-06 Thread via GitHub


kumaab merged PR #307:
URL: https://github.com/apache/ranger/pull/307


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] RANGER-4761: make lazy memory allocation for family map instead … [ranger]

2024-04-22 Thread via GitHub


fateh288 commented on code in PR #307:
URL: https://github.com/apache/ranger/pull/307#discussion_r1575414974


##
ranger-tools/src/main/python/stress/stress-hbase-loadgenerator.py:
##
@@ -0,0 +1,125 @@
+import subprocess
+import time
+import argparse
+import os
+from datetime import datetime
+
+def increase_memory_for_loadgenerator():
+try:
+cmd = "export HBASE_OPTS='-Xmx10g'"
+print(cmd)
+op = subprocess.call(cmd, shell=True)

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] RANGER-4761: make lazy memory allocation for family map instead … [ranger]

2024-04-22 Thread via GitHub


fateh288 commented on code in PR #307:
URL: https://github.com/apache/ranger/pull/307#discussion_r1575414848


##
ranger-tools/src/main/python/stress/stress-hbase-loadgenerator.py:
##
@@ -0,0 +1,125 @@
+import subprocess
+import time
+import argparse
+import os
+from datetime import datetime
+
+def increase_memory_for_loadgenerator():
+try:
+cmd = "export HBASE_OPTS='-Xmx10g'"
+print(cmd)
+op = subprocess.call(cmd, shell=True)
+print("Output:", op)
+except subprocess.CalledProcessError as e:
+print("Error in setting HBASE_HEAPSIZE:", e)
+exit(1)
+def login(keytab_path, user_name):
+try:
+cmd = f"kinit -kt {keytab_path} {user_name}"
+print(cmd)
+login_op = subprocess.call(cmd, shell=True)

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] RANGER-4761: make lazy memory allocation for family map instead … [ranger]

2024-04-22 Thread via GitHub


fateh288 commented on code in PR #307:
URL: https://github.com/apache/ranger/pull/307#discussion_r1575408160


##
ranger-tools/src/main/python/stress/stress-hbase-loadgenerator.py:
##
@@ -0,0 +1,125 @@
+import subprocess
+import time
+import argparse
+import os
+from datetime import datetime
+
+def increase_memory_for_loadgenerator():
+try:
+cmd = "export HBASE_OPTS='-Xmx10g'"
+print(cmd)
+op = subprocess.call(cmd, shell=True)
+print("Output:", op)
+except subprocess.CalledProcessError as e:
+print("Error in setting HBASE_HEAPSIZE:", e)
+exit(1)
+def login(keytab_path, user_name):
+try:
+cmd = f"kinit -kt {keytab_path} {user_name}"
+print(cmd)
+login_op = subprocess.call(cmd, shell=True)
+print("Login output:", login_op)
+except subprocess.CalledProcessError as e:
+print("Error in login:", e)
+exit(1)
+
+def create_ltt_command_multiput(num_cols_per_cf=1000, num_threads=10, 
num_keys=100, table_name = "multitest",avg_data_size=2, num_col_families=3, 
col_family_pattern="cf%d", num_regions_per_server=1):
+def get_column_families():
+col_families = []
+for i in range(num_col_families):
+col_families.append(col_family_pattern % i)
+return ','.join(col_families)
+#Sample: hbase ltt -tn multitest -families f1,f2,f3 -write 2:2:20 
-multiput -num_keys 1000 -num_regions_per_server 1
+cmd = f"hbase ltt -tn {table_name} -families {get_column_families()} 
-write {num_cols_per_cf}:{avg_data_size}:{num_threads}" \
+  f" -multiput -num_keys {num_keys} -num_regions_per_server 
{num_regions_per_server}"
+return cmd
+
+
+def create_pe_command_multiget(multiget_batchsize=500, num_threads=10, 
num_keys=100, table_name="multitest", num_col_families=3):
+#Sample: hbase pe --table=multitest --families=3 --columns=1 
--multiGet=10 --rows=1000 --nomapred randomRead 5
+
+cmd = f"hbase pe --table={table_name} --families={num_col_families} 
--columns={num_cols_per_cf} " \
+  f"--multiGet={multiget_batchsize} --rows={num_keys} --nomapred 
randomRead {num_threads}"
+return cmd
+
+
+
+def generate_hbase_load(op_type, multiget_batchsize, num_cf, num_keys_list, 
num_cols_per_cf, num_threads_list, metadata, 
csv_outfile="/root/ltt_output.csv", ):

Review Comment:
   Command line argument -csv_output can be used in that case to provide a 
custom path where one has appropriate access



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] RANGER-4761: make lazy memory allocation for family map instead … [ranger]

2024-04-22 Thread via GitHub


kumaab commented on code in PR #307:
URL: https://github.com/apache/ranger/pull/307#discussion_r1575399634


##
ranger-tools/src/main/python/stress/stress-hbase-loadgenerator.py:
##
@@ -0,0 +1,125 @@
+import subprocess
+import time
+import argparse
+import os
+from datetime import datetime
+
+def increase_memory_for_loadgenerator():
+try:
+cmd = "export HBASE_OPTS='-Xmx10g'"
+print(cmd)
+op = subprocess.call(cmd, shell=True)

Review Comment:
   subprocess.call() -> subprocess.run() 



##
ranger-tools/src/main/python/stress/stress-hbase-loadgenerator.py:
##
@@ -0,0 +1,125 @@
+import subprocess
+import time
+import argparse
+import os
+from datetime import datetime
+
+def increase_memory_for_loadgenerator():
+try:
+cmd = "export HBASE_OPTS='-Xmx10g'"
+print(cmd)
+op = subprocess.call(cmd, shell=True)
+print("Output:", op)
+except subprocess.CalledProcessError as e:
+print("Error in setting HBASE_HEAPSIZE:", e)
+exit(1)
+def login(keytab_path, user_name):
+try:
+cmd = f"kinit -kt {keytab_path} {user_name}"
+print(cmd)
+login_op = subprocess.call(cmd, shell=True)

Review Comment:
   subprocess.call() -> subprocess.run()



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] RANGER-4761: make lazy memory allocation for family map instead … [ranger]

2024-04-22 Thread via GitHub


kumaab commented on code in PR #307:
URL: https://github.com/apache/ranger/pull/307#discussion_r1575399048


##
ranger-tools/src/main/python/stress/stress-hbase-loadgenerator.py:
##
@@ -0,0 +1,125 @@
+import subprocess
+import time
+import argparse
+import os
+from datetime import datetime
+
+def increase_memory_for_loadgenerator():
+try:
+cmd = "export HBASE_OPTS='-Xmx10g'"
+print(cmd)
+op = subprocess.call(cmd, shell=True)
+print("Output:", op)
+except subprocess.CalledProcessError as e:
+print("Error in setting HBASE_HEAPSIZE:", e)
+exit(1)
+def login(keytab_path, user_name):
+try:
+cmd = f"kinit -kt {keytab_path} {user_name}"
+print(cmd)
+login_op = subprocess.call(cmd, shell=True)
+print("Login output:", login_op)
+except subprocess.CalledProcessError as e:
+print("Error in login:", e)
+exit(1)
+
+def create_ltt_command_multiput(num_cols_per_cf=1000, num_threads=10, 
num_keys=100, table_name = "multitest",avg_data_size=2, num_col_families=3, 
col_family_pattern="cf%d", num_regions_per_server=1):
+def get_column_families():
+col_families = []
+for i in range(num_col_families):
+col_families.append(col_family_pattern % i)
+return ','.join(col_families)
+#Sample: hbase ltt -tn multitest -families f1,f2,f3 -write 2:2:20 
-multiput -num_keys 1000 -num_regions_per_server 1
+cmd = f"hbase ltt -tn {table_name} -families {get_column_families()} 
-write {num_cols_per_cf}:{avg_data_size}:{num_threads}" \
+  f" -multiput -num_keys {num_keys} -num_regions_per_server 
{num_regions_per_server}"
+return cmd
+
+
+def create_pe_command_multiget(multiget_batchsize=500, num_threads=10, 
num_keys=100, table_name="multitest", num_col_families=3):
+#Sample: hbase pe --table=multitest --families=3 --columns=1 
--multiGet=10 --rows=1000 --nomapred randomRead 5
+
+cmd = f"hbase pe --table={table_name} --families={num_col_families} 
--columns={num_cols_per_cf} " \
+  f"--multiGet={multiget_batchsize} --rows={num_keys} --nomapred 
randomRead {num_threads}"
+return cmd
+
+
+
+def generate_hbase_load(op_type, multiget_batchsize, num_cf, num_keys_list, 
num_cols_per_cf, num_threads_list, metadata, 
csv_outfile="/root/ltt_output.csv", ):

Review Comment:
   User might not have write permissions writing to the /root dir, please check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] RANGER-4761: make lazy memory allocation for family map instead … [ranger]

2024-04-02 Thread via GitHub


kumaab commented on code in PR #307:
URL: https://github.com/apache/ranger/pull/307#discussion_r1548609407


##
ranger-tools/src/main/python/stress/stress-hbase-loadgenerator.py:
##
@@ -0,0 +1,106 @@
+import subprocess
+import time
+import argparse
+import os
+from datetime import datetime
+
+def increase_memory_for_loadgenerator():
+try:
+cmd = "export HBASE_OPTS='-Xmx10g'"
+print(cmd)
+op = subprocess.call(cmd, shell=True)
+print("Output:", op)
+except subprocess.CalledProcessError as e:
+print("Error in setting HBASE_HEAPSIZE:", e)
+exit(1)
+def login():
+try:
+cmd = "kinit -kt  systest"

Review Comment:
   Instead of hardcoding the value it is better to provide the keytab as an 
argument. Alternatively, doing the kinit externally and documenting the steps 
in readme also works.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] RANGER-4761: make lazy memory allocation for family map instead … [ranger]

2024-04-02 Thread via GitHub


kumaab commented on code in PR #307:
URL: https://github.com/apache/ranger/pull/307#discussion_r1548604094


##
ranger-tools/src/main/python/stress/stress-hbase-loadgenerator.py:
##
@@ -0,0 +1,106 @@
+import subprocess
+import time
+import argparse
+import os
+from datetime import datetime
+
+def increase_memory_for_loadgenerator():
+try:
+cmd = "export HBASE_OPTS='-Xmx10g'"
+print(cmd)
+op = subprocess.call(cmd, shell=True)
+print("Output:", op)
+except subprocess.CalledProcessError as e:
+print("Error in setting HBASE_HEAPSIZE:", e)
+exit(1)
+def login():
+try:
+cmd = "kinit -kt  systest"
+print(cmd)
+login_op = subprocess.call(cmd, shell=True)
+print("Login output:", login_op)
+except subprocess.CalledProcessError as e:
+print("Error in login:", e)
+exit(1)
+
+def create_ltt_command_multiput(num_cols_per_cf=1000, num_threads=10, 
num_keys=100, table_name = "multitest",avg_data_size=2, num_col_families=3, 
col_family_pattern="cf%d", num_regions_per_server=1):
+def get_column_families():
+col_families = []
+for i in range(num_col_families):
+col_families.append(col_family_pattern % i)
+return ','.join(col_families)
+#Sample: hbase ltt -tn multitest -families f1,f2,f3 -write 2:2:20 
-multiput -num_keys 1000 -num_regions_per_server 1
+cmd = f"hbase ltt -tn {table_name} -families {get_column_families()} 
-write {num_cols_per_cf}:{avg_data_size}:{num_threads}" \
+  f" -multiput -num_keys {num_keys} -num_regions_per_server 
{num_regions_per_server}"
+return cmd
+
+
+def create_pe_command_multiget(multiget_batchsize=500, num_threads=10, 
num_keys=100, table_name="multitest", num_col_families=3):
+#Sample: hbase pe --table=multitest --families=3 --columns=1 
--multiGet=10 --rows=1000 --nomapred randomRead 5
+
+cmd = f"hbase pe --table={table_name} --families={num_col_families} 
--columns={num_cols_per_cf} " \
+  f"--multiGet={multiget_batchsize} --rows={num_keys} --nomapred 
randomRead {num_threads}"
+return cmd
+
+
+
+def generate_hbase_load(op_type, multiget_batchsize, num_cf, num_rows_list, 
num_cols_per_cf, num_threads_list, metadata, 
csv_outfile="/root/ltt_output.csv", ):
+#if  output file does not exist only then write the header
+if(not os.path.exists(csv_outfile)):
+with open(csv_outfile, "w") as f:
+
f.write("op,num_cf,num_keys,num_cols_per_cf,num_threads,time_taken,command,metadata,date_start,time_start,date_end,time_end\n")
+assert type(num_threads_list) == list
+assert type(num_rows_list) == list
+for num_keys in num_rows_list:
+for num_threads in num_threads_list:
+if op_type == "multiput":
+cmd = 
create_ltt_command_multiput(num_cols_per_cf=num_cols_per_cf,
+  num_threads=num_threads,
+  num_keys=num_keys,
+  num_col_families=num_cf)
+elif op_type == "multiget":
+cmd = 
create_pe_command_multiget(multiget_batchsize=multiget_batchsize,
+ num_threads=num_threads,
+ num_keys=num_keys,
+ num_col_families=num_cf)
+else:
+print("Invalid op_type")
+exit(1)
+
+datetime_start = datetime.now()
+date_start_str = datetime_start.date()
+time_start_str = str(datetime_start.time()).split(".")[0]
+time_start = time.time()
+ltt_out = subprocess.call(cmd, shell=True)

Review Comment:
   Consider adding error handling with hbase commands.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] RANGER-4761: make lazy memory allocation for family map instead … [ranger]

2024-04-02 Thread via GitHub


kumaab commented on code in PR #307:
URL: https://github.com/apache/ranger/pull/307#discussion_r1548582891


##
ranger-tools/src/main/python/stress/stress-hbase-loadgenerator.py:
##
@@ -0,0 +1,106 @@
+import subprocess
+import time
+import argparse
+import os
+from datetime import datetime
+
+def increase_memory_for_loadgenerator():
+try:
+cmd = "export HBASE_OPTS='-Xmx10g'"
+print(cmd)
+op = subprocess.call(cmd, shell=True)
+print("Output:", op)
+except subprocess.CalledProcessError as e:
+print("Error in setting HBASE_HEAPSIZE:", e)
+exit(1)
+def login():
+try:
+cmd = "kinit -kt  systest"
+print(cmd)
+login_op = subprocess.call(cmd, shell=True)
+print("Login output:", login_op)
+except subprocess.CalledProcessError as e:
+print("Error in login:", e)
+exit(1)
+
+def create_ltt_command_multiput(num_cols_per_cf=1000, num_threads=10, 
num_keys=100, table_name = "multitest",avg_data_size=2, num_col_families=3, 
col_family_pattern="cf%d", num_regions_per_server=1):
+def get_column_families():
+col_families = []
+for i in range(num_col_families):
+col_families.append(col_family_pattern % i)
+return ','.join(col_families)
+#Sample: hbase ltt -tn multitest -families f1,f2,f3 -write 2:2:20 
-multiput -num_keys 1000 -num_regions_per_server 1
+cmd = f"hbase ltt -tn {table_name} -families {get_column_families()} 
-write {num_cols_per_cf}:{avg_data_size}:{num_threads}" \
+  f" -multiput -num_keys {num_keys} -num_regions_per_server 
{num_regions_per_server}"
+return cmd
+
+
+def create_pe_command_multiget(multiget_batchsize=500, num_threads=10, 
num_keys=100, table_name="multitest", num_col_families=3):
+#Sample: hbase pe --table=multitest --families=3 --columns=1 
--multiGet=10 --rows=1000 --nomapred randomRead 5
+
+cmd = f"hbase pe --table={table_name} --families={num_col_families} 
--columns={num_cols_per_cf} " \
+  f"--multiGet={multiget_batchsize} --rows={num_keys} --nomapred 
randomRead {num_threads}"
+return cmd
+
+
+
+def generate_hbase_load(op_type, multiget_batchsize, num_cf, num_rows_list, 
num_cols_per_cf, num_threads_list, metadata, 
csv_outfile="/root/ltt_output.csv", ):
+#if  output file does not exist only then write the header
+if(not os.path.exists(csv_outfile)):
+with open(csv_outfile, "w") as f:
+
f.write("op,num_cf,num_keys,num_cols_per_cf,num_threads,time_taken,command,metadata,date_start,time_start,date_end,time_end\n")
+assert type(num_threads_list) == list
+assert type(num_rows_list) == list
+for num_keys in num_rows_list:
+for num_threads in num_threads_list:
+if op_type == "multiput":
+cmd = 
create_ltt_command_multiput(num_cols_per_cf=num_cols_per_cf,
+  num_threads=num_threads,
+  num_keys=num_keys,
+  num_col_families=num_cf)
+elif op_type == "multiget":
+cmd = 
create_pe_command_multiget(multiget_batchsize=multiget_batchsize,
+ num_threads=num_threads,
+ num_keys=num_keys,
+ num_col_families=num_cf)
+else:
+print("Invalid op_type")
+exit(1)
+
+datetime_start = datetime.now()
+date_start_str = datetime_start.date()
+time_start_str = str(datetime_start.time()).split(".")[0]
+time_start = time.time()
+ltt_out = subprocess.call(cmd, shell=True)

Review Comment:
   subprocess.run() is a recommended approach, please see: 
https://docs.python.org/3/library/subprocess.html#using-the-subprocess-module 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] RANGER-4761: make lazy memory allocation for family map instead … [ranger]

2024-04-02 Thread via GitHub


kumaab commented on code in PR #307:
URL: https://github.com/apache/ranger/pull/307#discussion_r1548578831


##
ranger-tools/src/main/python/stress/stress-hbase-loadgenerator.py:
##
@@ -0,0 +1,106 @@
+import subprocess
+import time
+import argparse
+import os
+from datetime import datetime
+
+def increase_memory_for_loadgenerator():
+try:
+cmd = "export HBASE_OPTS='-Xmx10g'"
+print(cmd)
+op = subprocess.call(cmd, shell=True)
+print("Output:", op)
+except subprocess.CalledProcessError as e:
+print("Error in setting HBASE_HEAPSIZE:", e)
+exit(1)
+def login():
+try:
+cmd = "kinit -kt  systest"
+print(cmd)
+login_op = subprocess.call(cmd, shell=True)
+print("Login output:", login_op)
+except subprocess.CalledProcessError as e:
+print("Error in login:", e)
+exit(1)
+
+def create_ltt_command_multiput(num_cols_per_cf=1000, num_threads=10, 
num_keys=100, table_name = "multitest",avg_data_size=2, num_col_families=3, 
col_family_pattern="cf%d", num_regions_per_server=1):
+def get_column_families():
+col_families = []
+for i in range(num_col_families):
+col_families.append(col_family_pattern % i)
+return ','.join(col_families)
+#Sample: hbase ltt -tn multitest -families f1,f2,f3 -write 2:2:20 
-multiput -num_keys 1000 -num_regions_per_server 1
+cmd = f"hbase ltt -tn {table_name} -families {get_column_families()} 
-write {num_cols_per_cf}:{avg_data_size}:{num_threads}" \
+  f" -multiput -num_keys {num_keys} -num_regions_per_server 
{num_regions_per_server}"
+return cmd
+
+
+def create_pe_command_multiget(multiget_batchsize=500, num_threads=10, 
num_keys=100, table_name="multitest", num_col_families=3):
+#Sample: hbase pe --table=multitest --families=3 --columns=1 
--multiGet=10 --rows=1000 --nomapred randomRead 5
+
+cmd = f"hbase pe --table={table_name} --families={num_col_families} 
--columns={num_cols_per_cf} " \
+  f"--multiGet={multiget_batchsize} --rows={num_keys} --nomapred 
randomRead {num_threads}"
+return cmd
+
+
+
+def generate_hbase_load(op_type, multiget_batchsize, num_cf, num_rows_list, 
num_cols_per_cf, num_threads_list, metadata, 
csv_outfile="/root/ltt_output.csv", ):
+#if  output file does not exist only then write the header
+if(not os.path.exists(csv_outfile)):
+with open(csv_outfile, "w") as f:
+
f.write("op,num_cf,num_keys,num_cols_per_cf,num_threads,time_taken,command,metadata,date_start,time_start,date_end,time_end\n")
+assert type(num_threads_list) == list
+assert type(num_rows_list) == list
+for num_keys in num_rows_list:
+for num_threads in num_threads_list:
+if op_type == "multiput":
+cmd = 
create_ltt_command_multiput(num_cols_per_cf=num_cols_per_cf,
+  num_threads=num_threads,
+  num_keys=num_keys,
+  num_col_families=num_cf)
+elif op_type == "multiget":
+cmd = 
create_pe_command_multiget(multiget_batchsize=multiget_batchsize,
+ num_threads=num_threads,
+ num_keys=num_keys,
+ num_col_families=num_cf)
+else:
+print("Invalid op_type")
+exit(1)
+
+datetime_start = datetime.now()
+date_start_str = datetime_start.date()
+time_start_str = str(datetime_start.time()).split(".")[0]
+time_start = time.time()
+ltt_out = subprocess.call(cmd, shell=True)
+time_end = time.time()
+datetime_end = datetime.now()
+date_end_str = datetime_end.date()
+time_end_str = str(datetime_end.time()).split(".")[0]
+time_taken = time_end - time_start
+
+print("cmd:", cmd)
+print("LTT output:", ltt_out)
+print("Time taken:", time_taken)
+with open(csv_outfile, "a") as f:
+if ltt_out != 0:
+time_taken = "non_zero_exit_code"
+
f.write(f'{op_type},{num_cf},{num_keys},{num_cols_per_cf},{num_threads},{time_taken},"{cmd}",{metadata},{date_start_str},{time_start_str},{date_end_str},{time_end_str}\n')
+print(f"Written to file: {csv_outfile}")
+# Sleep added so that the next command does not start immediately 
and any metric measurement such as heap useage can be captured more accurately
+time.sleep(90)
+
+if __name__ == '__main__':
+argparser = argparse.ArgumentParser("Generate LTT load and create report")
+argparser.add_argument('-csv_output', '--csv_output', help='Full path to 
the csv output file', default="/root/ltt_outp

Re: [PR] RANGER-4761: make lazy memory allocation for family map instead … [ranger]

2024-04-02 Thread via GitHub


fateh288 commented on PR #307:
URL: https://github.com/apache/ranger/pull/307#issuecomment-2032639005

   @mneethiraj @rameeshm 
   Following up for review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] RANGER-4761: make lazy memory allocation for family map instead … [ranger]

2024-03-27 Thread via GitHub


fateh288 commented on PR #307:
URL: https://github.com/apache/ranger/pull/307#issuecomment-2023262368

   Requesting review for this PR.
   @kumaab @bhavikpatel9977 @mneethiraj 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] RANGER-4761: make lazy memory allocation for family map instead … [ranger]

2024-03-26 Thread via GitHub


fateh288 opened a new pull request, #307:
URL: https://github.com/apache/ranger/pull/307

   …of ahead of time memory allocation for family map of type Map>. Removed ColumnFailyCache.
   
   Impact: Memory and computational benefit - Cache memory saved & huge 
reduction in memory when large number of columns accessed. Since 
ColumnFamilyCache is always a miss because of non deterministic access patterns 
and also a bug wherein address of byte array is used as key in cache, we get 
computational benefit by removing ColumnFamilyCache. Memory footprint will get 
reduced even further when enabling column auth optimization supported by 
RANGER-4670
   
   ## What changes were proposed in this pull request?
   
   Map> getColumnFamilies(Map> families) function was the original bottleneck for resulting in 
Ranger Authz CP taking 60% of Put RPC time — it is a computationally heavy 
function converting bytes to string and type-casting Collection to set of 
strings.
   ColumnFamilyCache was introduced to fix this issue. But this caching does 
not work because the columns and column families accessed for a table are 
inconsistent and results in many many cache misses and thus results in 
getColumnFamilies() getting called – so additional memory requirement for cache 
(grows exponentially due to large number of columns) and also double 
computation (serialization of new family map to check in cache and also 
getColumnFamilies is computed). E.g. First call comes for cf1:c1,c3,c4 Second 
call comes for cf1:c1,c2,c5 then the second call is a cache miss because the 
set of columns accessed is different from the first call. Both these entries 
get added to the cache. 
   Validation: Added debug statements for cache hit and miss counts.  word 
count in log file for Cache Miss and Cache Hit, Cache size reaches max default 
size of 1024
   
   ```
   root@ccycloud-2 hbase]# grep 'Cache Miss' 
hbase-cmf-HBASE-1-REGIONSERVER-ccycloud-2.fs2-7192.root.comops.site.log.out | 
wc -l
   10584
   [root@ccycloud-2 hbase]# grep 'Cache Hit' 
hbase-cmf-HBASE-1-REGIONSERVER-ccycloud-2.fs2-7192.root.comops.site.log.out | 
wc -l
   0
   2024-02-09 17:12:31,093 DEBUG 
org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor: 
evaluateAccess: Cache Size:1024
   2024-02-09 17:12:31,096 DEBUG 
org.apache.ranger.authorization.hbase.RangerAuthorizationCoprocessor: 
evaluateAccess: Cache Size:1024
   
   ```
   In the patch, make lazy memory allocation for family map instead of ahead of 
time memory allocation for family map of type Map>. Removed 
ColumnFamilyCache from implementation since it was always a miss. 
   
   ## How was this patch tested?
   Unit test cases pass for hbase plugin
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org