[madlib] 02/02: DL: Drop leftover tables

2020-01-15 Thread domino
This is an automated email from the ASF dual-hosted git repository.

domino pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git

commit 5ddec47242415b32f8b26849c73e70f3a4a20448
Author: Nikhil Kak 
AuthorDate: Tue Jan 14 11:12:33 2020 -0800

DL: Drop leftover tables

JIRA: MADLIB-1404

While calling the function `get_accessible_gpus_for_seg`, we create an
intermediate gpu table by calling the madlib function gpu_configuration.
This table was not getting dropped at the end.

This commit drops the intermediate gpu table created in
get_accessible_gpus_for_seg.

Co-authored-by: Ekta Khanna 
---
 .../modules/deep_learning/input_data_preprocessor.py_in|  4 +++-
 .../postgres/modules/deep_learning/madlib_keras_helper.py_in   | 10 +-
 .../deep_learning/test/unit_tests/test_madlib_keras.py_in  |  4 +++-
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git 
a/src/ports/postgres/modules/deep_learning/input_data_preprocessor.py_in 
b/src/ports/postgres/modules/deep_learning/input_data_preprocessor.py_in
index 3a5f118..605d439 100644
--- a/src/ports/postgres/modules/deep_learning/input_data_preprocessor.py_in
+++ b/src/ports/postgres/modules/deep_learning/input_data_preprocessor.py_in
@@ -328,6 +328,8 @@ class InputDataPreprocessorDL(object):
 all_segments = False
 
 if self.distribution_rules == 'gpu_segments':
+#TODO can we reuse the function `get_accessible_gpus_for_seg` from
+# madlib_keras_helper
 gpu_info_table = unique_string(desp='gpu_info')
 plpy.execute("""
 SELECT 
{self.schema_madlib}.gpu_configuration('{gpu_info_table}')
@@ -339,6 +341,7 @@ class InputDataPreprocessorDL(object):
 gpu_query_result = plpy.execute(gpu_query)[0]['gpu_config']
 if not gpu_query_result:
plpy.error("{self.module_name}: No GPUs configured on 
hosts.".format(self=self))
+plpy.execute("DROP TABLE IF EXISTS {0}".format(gpu_info_table))
 
 gpu_config_hostnames = "ARRAY{0}".format(gpu_query_result)
 # find hosts with gpus
@@ -351,7 +354,6 @@ class InputDataPreprocessorDL(object):
 AND hostname=ANY({gpu_config_hostnames})
 """.format(**locals())
 segment_ids_result = plpy.execute(get_segment_query)[0]
-plpy.execute("DROP TABLE IF EXISTS {0}".format(gpu_info_table))
 
 self.gpu_config = 
"ARRAY{0}".format(sorted(segment_ids_result['segment_ids']))
 self.distribution_rules = 
"ARRAY{0}".format(sorted(segment_ids_result['dbid']))
diff --git a/src/ports/postgres/modules/deep_learning/madlib_keras_helper.py_in 
b/src/ports/postgres/modules/deep_learning/madlib_keras_helper.py_in
index 6e006d5..5be078b 100644
--- a/src/ports/postgres/modules/deep_learning/madlib_keras_helper.py_in
+++ b/src/ports/postgres/modules/deep_learning/madlib_keras_helper.py_in
@@ -269,24 +269,24 @@ def create_summary_view(module_name, model_table, 
mst_key):
 """.format(**locals()))
 return tmp_view_summary
 
-def get_accessible_gpus_for_seg(schema_madlib, segments_per_host, module_name):
 
+def get_accessible_gpus_for_seg(schema_madlib, segments_per_host, module_name):
 if is_platform_pg():
 gpus = GPUInfoFunctions.get_gpu_info_from_tensorflow()
 if not gpus:
 plpy.error("{0} error: No GPUs configured on 
host.".format(module_name))
 return [len(gpus)]
 else:
-gpu_table_name = unique_string(desp = 'gpu_table')
+gpu_info_table = unique_string(desp = 'gpu_info')
 gpu_table_query = """
-SELECT {schema_madlib}.gpu_configuration('{gpu_table_name}')
+SELECT {schema_madlib}.gpu_configuration('{gpu_info_table}')
 """.format(**locals())
 plpy.execute(gpu_table_query)
 gpu_query = """
-SELECT hostname, count(*) AS count FROM {gpu_table_name} GROUP BY 
hostname
+SELECT hostname, count(*) AS count FROM {gpu_info_table} GROUP BY 
hostname
 """.format(**locals())
 gpu_query_result = plpy.execute(gpu_query)
-
+plpy.execute("DROP TABLE IF EXISTS {0}".format(gpu_info_table))
 if not gpu_query_result:
plpy.error("{0} error: No GPUs configured on 
hosts.".format(module_name))
 
diff --git 
a/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras.py_in
 
b/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras.py_in
index a8bb629..3714de5 100644
--- 
a/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras.py_in
+++ 
b/src/ports/postgres/modules/deep_learning/test/unit_tests/test_madlib_keras.py_in
@@ -1336,6 +1336,7 @@ class MadlibKerasHelperTestCase(unittest.TestCase):
 self.plpy_mock_execute.side_effect = \
 [ [],
 [ {'hostname': 'mdw0', 'count' : 

[madlib] branch master updated (515dc25 -> 5ddec47)

2020-01-15 Thread domino
This is an automated email from the ASF dual-hosted git repository.

domino pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git.


from 515dc25  misc user doc clarifications
 new dbae722  DL: Fix metrics_elapsed_time for fit multi model
 new 5ddec47  DL: Drop leftover tables

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../modules/deep_learning/input_data_preprocessor.py_in|  4 +++-
 .../deep_learning/madlib_keras_fit_multiple_model.py_in|  5 +++--
 .../postgres/modules/deep_learning/madlib_keras_helper.py_in   | 10 +-
 .../deep_learning/test/madlib_keras_model_selection.sql_in |  4 
 .../deep_learning/test/unit_tests/test_madlib_keras.py_in  |  4 +++-
 5 files changed, 18 insertions(+), 9 deletions(-)



[madlib] 01/02: DL: Fix metrics_elapsed_time for fit multi model

2020-01-15 Thread domino
This is an automated email from the ASF dual-hosted git repository.

domino pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git

commit dbae72239ed3d201bf675e7aafd6f44903f6f007
Author: Ekta Khanna 
AuthorDate: Mon Jan 13 15:48:11 2020 -0800

DL: Fix metrics_elapsed_time for fit multi model

JIRA: MADLIB-1403

Prior to this commit, for madlib_keras_fit_multiple_model(), the
metrics_elapsed_time in info table was incorrectly stored as the per
iteration evaluation time instead of the elapsed time. This commit
fixes that issue and adds an assert for it too.

Co-authored-by: Nikhil Kak 
---
 .../modules/deep_learning/madlib_keras_fit_multiple_model.py_in  | 5 +++--
 .../modules/deep_learning/test/madlib_keras_model_selection.sql_in   | 4 
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git 
a/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.py_in
 
b/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.py_in
index 273321e..ae577e8 100644
--- 
a/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.py_in
+++ 
b/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.py_in
@@ -172,6 +172,7 @@ class FitMultipleModel():
 # WARNING: set orca off to prevent unwanted redistribution
 with OptimizerControl(False):
 self.start_training_time = datetime.datetime.now()
+self.metrics_elapsed_start_time = time.time()
 self.train_multiple_model()
 self.end_training_time = datetime.datetime.now()
 self.insert_info_table()
@@ -219,7 +220,7 @@ class FitMultipleModel():
 weights = query_weights(self.model_output_table, 
self.model_weights_col,
 self.mst_key_col, mst[self.mst_key_col])
 model_arch, _ = get_model_arch_weights(self.model_arch_table, 
mst[self.model_id_col])
-metric_eval_time, metric, loss = compute_loss_and_metrics(
+_, metric, loss = compute_loss_and_metrics(
 self.schema_madlib, table, "$madlib${0}$madlib$".format(
 mst[self.compile_params_col]),
 model_arch,
@@ -230,7 +231,7 @@ class FitMultipleModel():
 images_per_seg,
 [], [], epoch, True)
 mst_metric_eval_time[mst[self.mst_key_col]] \
-.append(metric_eval_time)
+.append(time.time() - self.metrics_elapsed_start_time)
 mst_loss[mst[self.mst_key_col]].append(loss)
 mst_metric[mst[self.mst_key_col]].append(metric)
 self.info_str += "\n\tmst_key={0}: metric={1}, 
loss={2}".format(mst[self.mst_key_col], metric, loss)
diff --git 
a/src/ports/postgres/modules/deep_learning/test/madlib_keras_model_selection.sql_in
 
b/src/ports/postgres/modules/deep_learning/test/madlib_keras_model_selection.sql_in
index c36d48f..26c1a34 100644
--- 
a/src/ports/postgres/modules/deep_learning/test/madlib_keras_model_selection.sql_in
+++ 
b/src/ports/postgres/modules/deep_learning/test/madlib_keras_model_selection.sql_in
@@ -320,6 +320,10 @@ SELECT assert(
 'Keras Fit Multiple Output Info Validation failed. Actual:' || 
__to_char(info))
 FROM (SELECT * FROM iris_multiple_model_info) info;
 
+SELECT assert(metrics_elapsed_time[3] - metrics_elapsed_time[1] > 0,
+'Keras Fit Multiple invalid elapsed time calculation.')
+FROM (SELECT * FROM iris_multiple_model_info) info;
+
 SELECT assert(
 name = 'multi_model_name' AND
 description = 'multi_model_descr' AND



[madlib] branch master updated: misc user doc clarifications

2020-01-15 Thread fmcquillan
This is an automated email from the ASF dual-hosted git repository.

fmcquillan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git


The following commit(s) were added to refs/heads/master by this push:
 new 515dc25  misc user doc clarifications
515dc25 is described below

commit 515dc2574f2800c0459ec2f0b10d17071f456186
Author: Frank McQuillan 
AuthorDate: Wed Jan 15 16:41:10 2020 -0800

misc user doc clarifications
---
 .../madlib_keras_fit_multiple_model.sql_in | 97 --
 .../madlib_keras_model_selection.sql_in| 10 +++
 2 files changed, 64 insertions(+), 43 deletions(-)

diff --git 
a/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
 
b/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
index 0468942..33699a4 100644
--- 
a/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
+++ 
b/src/ports/postgres/modules/deep_learning/madlib_keras_fit_multiple_model.sql_in
@@ -130,6 +130,13 @@ madlib_keras_fit_multiple_model(
 
   num_iterations
   INTEGER.  Number of iterations to train.
+
+@note
+This parameter is different than the number of passes over the dataset,
+which is commonly referred to as the number of epochs.  Since MADlib 
operates
+in a distributed system, the number of
+epochs is actually equal to this parameter 'num_iterations' X 'epochs' as
+specified in the Keras fit parameter.
   
 
   use_gpus (optional)
@@ -1016,18 +1023,18 @@ SELECT * FROM iris_multi_model_info ORDER BY 
training_metrics_final DESC, traini
 
  mst_key | model_id | compile_params   
   |  fit_params   |  model_type  |  model_size  | 
metrics_elapsed_time | metrics_type | training_metrics_final | 
training_loss_final |  training_metrics   |training_loss| 
validation_metrics_final | validation_loss_final | validation_metrics | 
validation_loss
 
-+--+-+---+--+--+--+--++-+-+-+--+---++-
-   9 |2 | loss='categorical_crossentropy', 
optimizer='Adam(lr=0.01)',metrics=['accuracy'] | batch_size=4,epochs=1 | 
madlib_keras | 1.2197265625 | {0.189763069152832}  | {accuracy}   | 
0.98349228 |  0.102392569184 | {0.98349227905} | 
{0.102392569184303} |  |   |
|
-   4 |1 | loss='categorical_crossentropy', 
optimizer='Adam(lr=0.01)',metrics=['accuracy'] | batch_size=8,epochs=1 | 
madlib_keras | 0.7900390625 | {0.170287847518921}  | {accuracy}   | 
0.97523842 |  0.159002527595 | {0.97523841858} | 
{0.159002527594566} |  |   |
|
-   3 |1 | loss='categorical_crossentropy', 
optimizer='Adam(lr=0.01)',metrics=['accuracy'] | batch_size=4,epochs=1 | 
madlib_keras | 0.7900390625 | {0.165465116500854}  | {accuracy}   | 
0.96638851 |   0.10245500505 | {0.96638851166} | 
{0.102455005049706} |  |   |
|
-  10 |2 | loss='categorical_crossentropy', 
optimizer='Adam(lr=0.01)',metrics=['accuracy'] | batch_size=8,epochs=1 | 
madlib_keras | 1.2197265625 | {0.199872970581055}  | {accuracy}   | 
0.94162693 |   0.12242924422 | {0.94162693024} | 
{0.122429244220257} |  |   |
|
-   5 |1 | 
loss='categorical_crossentropy',optimizer='Adam(lr=0.001)',metrics=['accuracy'] 
| batch_size=4,epochs=1 | madlib_keras | 0.7900390625 | {0.16815185546875}   | 
{accuracy}   | 0.88325386 |  0.437314987183 | 
{0.88325386047} | {0.437314987182617} |  |  
 ||
-  11 |2 | 
loss='categorical_crossentropy',optimizer='Adam(lr=0.001)',metrics=['accuracy'] 
| batch_size=4,epochs=1 | madlib_keras | 1.2197265625 | {0.430488109588623}  | 
{accuracy}   | 0.85849228 |  0.400548309088 | 
{0.85849227905} | {0.400548309087753} |  |  
 ||
-   6 |1 | 
loss='categorical_crossentropy',optimizer='Adam(lr=0.001)',metrics=['accuracy'] 
| batch_size=8,epochs=1 | madlib_keras | 0.7900390625 | {0.154508113861084}  | 
{accuracy}   | 0.68337307 |  0.634458899498 | 
{0.68337306976} | {0.634458899497986} |  |  
 ||
-  12 |2 | 

[madlib] branch master updated (7625ae0 -> 273301e)

2020-01-15 Thread fmcquillan
This is an automated email from the ASF dual-hosted git repository.

fmcquillan pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git.


from 7625ae0  DL: Fix failure on GPDB6 for preprocessor
 new 96a4424  Decrease the learning rate for transfer learning test
 new 273301e  Update Apache Copyright date

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 NOTICE| 2 +-
 .../modules/deep_learning/test/madlib_keras_transfer_learning.sql_in  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)



[madlib] 02/02: Update Apache Copyright date

2020-01-15 Thread fmcquillan
This is an automated email from the ASF dual-hosted git repository.

fmcquillan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git

commit 273301e3c2e150b9648a886761607695a04ce236
Author: Domino Valdano 
AuthorDate: Wed Jan 15 10:42:52 2020 -0800

Update Apache Copyright date
---
 NOTICE | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/NOTICE b/NOTICE
index 7cbfa51..10b9387 100644
--- a/NOTICE
+++ b/NOTICE
@@ -1,5 +1,5 @@
 Apache MADlib
-Copyright 2016-2019 The Apache Software Foundation.
+Copyright 2016-2020 The Apache Software Foundation.
 
 This product includes software developed at
 The Apache Software Foundation (http://www.apache.org/).