[GitHub] ThomasDelteil commented on a change in pull request #11626: [MXNET-651] MXNet Model Backwards Compatibility Checker

2018-07-11 Thread GitBox
ThomasDelteil commented on a change in pull request #11626: [MXNET-651] MXNet 
Model Backwards Compatibility Checker
URL: https://github.com/apache/incubator-mxnet/pull/11626#discussion_r201781970
 
 

 ##
 File path: 
tests/nightly/model_backwards_compatibility_check/train_mxnet_legacy_models.sh
 ##
 @@ -0,0 +1,57 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+#Author: Piyush Ghai
+
+run_models() {
+   echo '=='
+   echo "Running training files and preparing models"
+   echo '=='
+   python mnist_mlp_module_api_train.py
 
 Review comment:
   if one of the training script randomly crashes, let's say for dataset 
unavailability, what are the consequences of the fact that the data stored in 
s3 will be incomplete for that given run?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ThomasDelteil commented on a change in pull request #11626: [MXNET-651] MXNet Model Backwards Compatibility Checker

2018-07-11 Thread GitBox
ThomasDelteil commented on a change in pull request #11626: [MXNET-651] MXNet 
Model Backwards Compatibility Checker
URL: https://github.com/apache/incubator-mxnet/pull/11626#discussion_r201773901
 
 

 ##
 File path: 
tests/nightly/model_backwards_compatibility_check/lm_rnn_gluon_inference.py
 ##
 @@ -17,197 +17,31 @@
 # specific language governing permissions and limitations
 # under the License.
 
-import math
-import os
-import time
-import numpy as np
-import mxnet as mx
-from mxnet import gluon, autograd
-from mxnet.gluon import nn, rnn
-import logging
-import boto3
-import json
-logging.getLogger().setLevel(logging.DEBUG)
-mx.random.seed(7)
-np.random.seed(7)
+from common import *
 
-mxnet_version = mx.__version__
-bucket_name = 'mxnet-model-backwards-compatibility'
-ctx = mx.cpu()
-backslash = '/'
 model_name = 'lm_rnn_gluon_api'
-s3 = boto3.resource('s3')
-
-
-args_data = 'ptb.'
-args_model = 'rnn_relu'
-args_emsize = 100
-args_nhid = 100
-args_nlayers = 2
-args_lr = 1.0
-args_clip = 0.2
-args_epochs = 2
-args_batch_size = 32
-args_bptt = 5
-args_dropout = 0.2
-args_tied = True
-args_cuda = 'store_true'
-args_log_interval = 500
-args_save = model_name + '.params'
-
-class Dictionary(object):
-def __init__(self):
-self.word2idx = {}
-self.idx2word = []
-
-def add_word(self, word):
-if word not in self.word2idx:
-self.idx2word.append(word)
-self.word2idx[word] = len(self.idx2word) - 1
-return self.word2idx[word]
-
-def __len__(self):
-return len(self.idx2word)
-
-class Corpus(object):
-def __init__(self, path):
-self.dictionary = Dictionary()
-self.download_data_from_s3()
-self.train = self.tokenize(path + 'train.txt')
-self.valid = self.tokenize(path + 'valid.txt')
-self.test = self.tokenize(path + 'test.txt')
-
-def download_data_from_s3(self):
-   print ('Downloading files from bucket : %s' %bucket_name)
-   bucket = s3.Bucket(bucket_name)
-   files = ['test.txt', 'train.txt', 'valid.txt']
-   for file in files:
-   if os.path.exists(args_data + file) :
-   print ('File %s'%(args_data + file), 'already exists. 
Skipping download')
-   continue
-   file_path = str(mxnet_version) + backslash + model_name + 
backslash + args_data + file
-   bucket.download_file(file_path, args_data + file) 
-
-def tokenize(self, path):
-"""Tokenizes a text file."""
-assert os.path.exists(path)
-# Add words to the dictionary
-with open(path, 'r') as f:
-tokens = 0
-for line in f:
-words = line.split() + ['']
-tokens += len(words)
-for word in words:
-self.dictionary.add_word(word)
-
-# Tokenize file content
-with open(path, 'r') as f:
-ids = np.zeros((tokens,), dtype='int32')
-token = 0
-for line in f:
-words = line.split() + ['']
-for word in words:
-ids[token] = self.dictionary.word2idx[word]
-token += 1
-
-return mx.nd.array(ids, dtype='int32')
-
-class RNNModel(gluon.Block):
-"""A model with an encoder, recurrent layer, and a decoder."""
-
-def __init__(self, mode, vocab_size, num_embed, num_hidden,
- num_layers, dropout=0.5, tie_weights=False, **kwargs):
-super(RNNModel, self).__init__(**kwargs)
-with self.name_scope():
-self.drop = nn.Dropout(dropout)
-self.encoder = nn.Embedding(vocab_size, num_embed,
-weight_initializer = 
mx.init.Uniform(0.1))
-if mode == 'rnn_relu':
-self.rnn = rnn.RNN(num_hidden, num_layers, activation='relu', 
dropout=dropout,
-   input_size=num_embed)
-elif mode == 'rnn_tanh':
-self.rnn = rnn.RNN(num_hidden, num_layers, dropout=dropout,
-   input_size=num_embed)
-elif mode == 'lstm':
-self.rnn = rnn.LSTM(num_hidden, num_layers, dropout=dropout,
-input_size=num_embed)
-elif mode == 'gru':
-self.rnn = rnn.GRU(num_hidden, num_layers, dropout=dropout,
-   input_size=num_embed)
-else:
-raise ValueError("Invalid mode %s. Options are rnn_relu, "
- "rnn_tanh, lstm, and gru"%mode)
-if tie_weights:
-self.decoder = nn.Dense(vocab_size, in_units = num_hidden,
-params = self.encoder.params)
-else:
-self.decoder = nn.Dense(vocab_size, in_units = num_hidden)
-self.num_hidden = num_hidden
-
-def forward(self, 

[GitHub] ThomasDelteil commented on a change in pull request #11626: [MXNET-651] MXNet Model Backwards Compatibility Checker

2018-07-11 Thread GitBox
ThomasDelteil commented on a change in pull request #11626: [MXNET-651] MXNet 
Model Backwards Compatibility Checker
URL: https://github.com/apache/incubator-mxnet/pull/11626#discussion_r201773565
 
 

 ##
 File path: tests/nightly/model_backwards_compatibility_check/common.py
 ##
 @@ -165,3 +177,163 @@ def forward(self, x):
 x = F.tanh(self.fc1(x))
 x = F.tanh(self.fc2(x))
 return x
+
+class Dictionary(object):
+def __init__(self):
+self.word2idx = {}
+self.idx2word = []
+
+def add_word(self, word):
+if word not in self.word2idx:
+self.idx2word.append(word)
+self.word2idx[word] = len(self.idx2word) - 1
+return self.word2idx[word]
+
+def __len__(self):
+return len(self.idx2word)
+
+class Corpus(object):
+def __init__(self, path):
+self.dictionary = Dictionary()
+self.download_data_from_s3()
+self.train = self.tokenize(path + 'train.txt')
+self.valid = self.tokenize(path + 'valid.txt')
+self.test = self.tokenize(path + 'test.txt')
+
+def download_data_from_s3(self, ):
+print ('Downloading files from bucket : ptb-small-dataset' )
+bucket = s3.Bucket('ptb-small-dataset')
 
 Review comment:
   where are we downloading this from? since we can't host the ptb dataset 
ourselves because of the license, I wonder if it is a good idea to depend on it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] ThomasDelteil commented on a change in pull request #11626: [MXNET-651] MXNet Model Backwards Compatibility Checker

2018-07-11 Thread GitBox
ThomasDelteil commented on a change in pull request #11626: [MXNET-651] MXNet 
Model Backwards Compatibility Checker
URL: https://github.com/apache/incubator-mxnet/pull/11626#discussion_r201774694
 
 

 ##
 File path: 
tests/nightly/model_backwards_compatibility_check/train_mxnet_legacy_models.sh
 ##
 @@ -0,0 +1,57 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+#Author: Piyush Ghai
+
+run_models() {
+   echo '=='
+   echo "Running training files and preparing models"
+   echo '=='
+   python mnist_mlp_module_api_train.py
 
 Review comment:
   what are the consequences of having one of these scripts crash with respect 
to the integrity of the generated data?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services