[GitHub] incubator-hivemall issue #149: [WIP][HIVEMALL-201] Evaluate, fix and documen...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/149 This kind of behavior could often be happen and Libffm's early stopping strategy is too aggressive. ``` 7 0.43239 0.46952 8 0.42362 0.46999 9 0.41394 0.45088 ``` ---
[GitHub] incubator-hivemall issue #149: [WIP][HIVEMALL-201] Evaluate, fix and documen...
Github user takuti commented on the issue: https://github.com/apache/incubator-hivemall/pull/149 Make sense as a compromise in terms of memory consumption. I'll note on documentation to clarify the fact that our `-early_stopping` option does not return the best of the best model; users expect the option returns the best model achieved at the 7-th iteration, but our UDF does not behave so as discussed above. ---
[GitHub] incubator-hivemall issue #149: [WIP][HIVEMALL-201] Evaluate, fix and documen...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/149 ``` iter tr_logloss va_logloss 1 0.49738 0.48776 2 0.47383 0.47995 3 0.46366 0.47480 4 0.45561 0.47231 5 0.44810 0.47034 6 0.44037 0.47003 7 0.43239 0.46952 8 0.42362 0.46999 <- ffm stops one va_logloss is increased but va_logloss might decrease in the next iteration 9 0.41394 0.47088 <- once ``` In 8-th iteration, `ready to stop once va_logloss increase`. If va_logloss descreases in the 9th iteration, then continue iteration (set not ready to finish). If va_logloss increases in the 9th iteration, then emit the current model in the 9th iteration. ---
[GitHub] incubator-hivemall issue #149: [WIP][HIVEMALL-201] Evaluate, fix and documen...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/149 It might be better to reconsider `eta0` when enabling `l2norm` by the default and by enlarging`max_init_size`. In my experience for FM, init random size should be small when the avg feature dimension is large (gradients will be large). I think `1.0` is too aggressive for the default though. `0.2` or `0.5`? Better to research other implementations. ---
[GitHub] incubator-hivemall issue #149: [WIP][HIVEMALL-201] Evaluate, fix and documen...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/149 @takuti so then, better to enable l2_norm by the default and `-disable_l2norm` to disable l2 normalization. My concern is that L2 normalization performed worse for small datasets with adequate learning rate `[0.1,1.0]`. FieldAwareFactorizationMachineUDTFTest contains several tests. It's better to find that accuracy will not be bad with new default options, enabling L2 normalization. ---
[GitHub] incubator-hivemall issue #149: [WIP][HIVEMALL-201] Evaluate, fix and documen...
Github user takuti commented on the issue: https://github.com/apache/incubator-hivemall/pull/149 I'll change default options and consider to implement early stopping option as you suggested. > What happens without `-l2norm` ? Once we drop instance-wise L2 normalization, a model easily overfits to training samples, and prediction accuracy gets exceptionally worse. **LIBFFM**: ``` $ ./ffm-train -k 4 -t 15 -l 0.2 -r 0.2 -s 1 --no-norm ../tr.sp model First check if the text file has already been converted to binary format (0.0 seconds) Binary file NOT found. Convert text file to binary file (0.0 seconds) iter tr_logloss tr_time 1 4.24374 0.0 2 0.53960 0.1 3 0.09525 0.2 4 0.01288 0.2 5 0.00215 0.3 6 0.00133 0.3 7 0.00112 0.3 8 0.00098 0.4 9 0.00089 0.4 10 0.00082 0.5 11 0.00076 0.5 12 0.00072 0.6 13 0.00068 0.6 14 0.00064 0.6 15 0.00061 0.7 $ ./ffm-predict ../va.sp model submission.csv logloss = 1.75623 ``` **Hivemall**: ``` Iteration #2 | average loss=0.5186307939402891, current cumulative loss=823.0670699832388, previous cumulative loss=6640.3299608989755, change rate=0.876050275388452, #trainingExamples=1587 Iteration #3 | average loss=0.06870252595245425, current cumulative loss=109.0309086865449, previous cumulative loss=823.0670699832388, change rate=0.8675309550547743, #trainingExamples=1587 Iteration #4 | average loss=0.01701292407900819, current cumulative loss=26.999510513386, previous cumulative loss=109.0309086865449, change rate=0.7523682886014696, #trainingExamples=1587 Iteration #5 | average loss=0.003132377872105223, current cumulative loss=4.971083683030989, previous cumulative loss=26.999510513386, change rate=0.8158824516256917, #trainingExamples=1587 Iteration #6 | average loss=0.001693780516846469, current cumulative loss=2.6880296802353465, previous cumulative loss=4.971083683030989, change rate=0.4592668617888987, #trainingExamples=1587 Iteration #7 | average loss=0.0013357168592237345, current cumulative loss=2.1197826555880668, previous cumulative loss=2.6880296802353465, change rate=0.21139908864307172, #trainingExamples=1587 Iteration #8 | average loss=0.0011459013923848537, current cumulative loss=1.8185455097147627, previous cumulative loss=2.1197826555880668, change rate=0.1421075623386188, #trainingExamples=1587 Iteration #9 | average loss=0.001017751388111345, current cumulative loss=1.6151714529327046, previous cumulative loss=1.8185455097147627, change rate=0.11183336116452601, #trainingExamples=1587 Iteration #10 | average loss=9.230266490923267E-4, current cumulative loss=1.4648432921095225, previous cumulative loss=1.6151714529327046, change rate=0.0930725716766649, #trainingExamples=1587 Iteration #11 | average loss=8.493080071393429E-4, current cumulative loss=1.3478518073301373, previous cumulative loss=1.4648432921095225, change rate=0.07986621190783184, #trainingExamples=1587 Iteration #12 | average loss=7.898623710141035E-4, current cumulative loss=1.2535115827993821, previous cumulative loss=1.3478518073301373, change rate=0.0699930244687856, #trainingExamples=1587 Iteration #13 | average loss=7.406521210973545E-4, current cumulative loss=1.1754149161815017, previous cumulative loss=1.2535115827993821, change rate=0.06230230951952787, #trainingExamples=1587 Iteration #14 | average loss=6.990685420175246E-4, current cumulative loss=1.1094217761818115, previous cumulative loss=1.1754149161815017, change rate=0.056144548696113294, #trainingExamples=1587 Iteration #15 | average loss=6.633493164996776E-4, current cumulative loss=1.0527353652849885, previous cumulative loss=1.1094217761818115, change rate=0.051095455410939475, #trainingExamples=1587 Performed 15 iterations of 1,587 training examples on memory (thus 23,805 training updates in total) ``` ``` LogLoss: 1.8970086009757248 ``` ---
[GitHub] incubator-hivemall issue #149: [WIP][HIVEMALL-201] Evaluate, fix and documen...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/149 Also, it's better to revise default `-iters` from 1 to 10 (at least 10 iterations with early stopping). ---
[GitHub] incubator-hivemall issue #149: [WIP][HIVEMALL-201] Evaluate, fix and documen...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/149 BTW, it might be better to implement `early stopping` using validation data. https://github.com/guestwalk/libffm We can use a similar approaches to `_validationRatio` used in `FactorizationMachineUDTF` instead of preparing validation dataset. ---
[GitHub] incubator-hivemall issue #149: [WIP][HIVEMALL-201] Evaluate, fix and documen...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/149 @takuti Thank you for detailed verification. Let's disable linear term by the default. Remove `-disable_wi` and `-enable_wi` (alias `-linear_term` ) to enable linear term. I'm not sure `-l2norm` should be enabled by default. What happens without `-l2norm` ? ---
[GitHub] incubator-hivemall issue #149: [WIP][HIVEMALL-201] Evaluate, fix and documen...
Github user takuti commented on the issue: https://github.com/apache/incubator-hivemall/pull/149 ### With linear terms Hivemall ```sql INSERT OVERWRITE TABLE criteo.ffm_model SELECT train_ffm(features, label, '-init_v random -max_init_value 0.5 -classification -iterations 15 -factors 4 -eta 0.2 -l2norm -optimizer adagrad -lambda 0.2 -cv_rate 0.0') FROM ( SELECT features, label FROM criteo.train_vectorized CLUSTER BY rand(1) ) t ; ``` ``` Iteration #2 | average loss=0.474651712453725, current cumulative loss=753.2722676640616, previous cumulative loss=990.2550021169766, change rate=0.23931485722999737, #trainingExamples=1587 Iteration #3 | average loss=0.4499051385165006, current cumulative loss=713.9994548256865, previous cumulative loss=753.2722676640616, change rate=0.05213627863954456, #trainingExamples=1587 Iteration #4 | average loss=0.4342257595710771, current cumulative loss=689.1162804392994, previous cumulative loss=713.9994548256865, change rate=0.03485041090467212, #trainingExamples=1587 Iteration #5 | average loss=0.4225120903723549, current cumulative loss=670.5266874209271, previous cumulative loss=689.1162804392994, change rate=0.026975988735198287, #trainingExamples=1587 Iteration #6 | average loss=0.41300825971798527, current cumulative loss=655.4441081724426, previous cumulative loss=670.5266874209271, change rate=0.022493630054453533, #trainingExamples=1587 Iteration #7 | average loss=0.40491514701335013, current cumulative loss=642.6003383101867, previous cumulative loss=655.4441081724426, change rate=0.019595522641995967, #trainingExamples=1587 Iteration #8 | average loss=0.3978014571916465, current cumulative loss=631.310912563143, previous cumulative loss=642.6003383101867, change rate=0.017568347033135524, #trainingExamples=1587 Iteration #9 | average loss=0.3914067263636397, current cumulative loss=621.1624747390962, previous cumulative loss=631.310912563143, change rate=0.016075182009517044, #trainingExamples=1587 Iteration #10 | average loss=0.3855609819906249, current cumulative loss=611.8852784191217, previous cumulative loss=621.1624747390962, change rate=0.014935216947661086, #trainingExamples=1587 Iteration #11 | average loss=0.3801467153362753, current cumulative loss=603.2928372386689, previous cumulative loss=611.8852784191217, change rate=0.01404256889894858, #trainingExamples=1587 Iteration #12 | average loss=0.3750791243746283, current cumulative loss=595.2505703825351, previous cumulative loss=603.2928372386689, change rate=0.01333061883005943, #trainingExamples=1587 Iteration #13 | average loss=0.37029474458756273, current cumulative loss=587.657759660462, previous cumulative loss=595.2505703825351, change rate=0.012755654676976761, #trainingExamples=1587 Iteration #14 | average loss=0.36574472099268607, current cumulative loss=580.4368722153928, previous cumulative loss=587.657759660462, change rate=0.012287572700888608, #trainingExamples=1587 Iteration #15 | average loss=0.3613904840032808, current cumulative loss=573.5266981132066, previous cumulative loss=580.4368722153928, change rate=0.011905126005885216, #trainingExamples=1587 Performed 15 iterations of 1,587 training examples on memory (thus 23,805 training updates in total) ``` > LogLoss: 0.4771035166468042 LIBFFM ``` $ ./ffm-train -k 4 -t 15 -l 0.2 -r 0.2 -s 1 ../tr.sp model First check if the text file has already been converted to binary format (0.0 seconds) Binary file NOT found. Convert text file to binary file (0.0 seconds) iter tr_logloss tr_time 1 0.62043 0.0 2 0.47533 0.1 3 0.44968 0.1 4 0.43548 0.2 5 0.42261 0.2 6 0.41322 0.3 7 0.40489 0.3 8 0.39687 0.4 9 0.39085 0.4 10 0.38530 0.4 11 0.37965 0.5 12 0.37450 0.5 13 0.36937 0.6 14 0.36444 0.6 15 0.36031 0.7 $ ./ffm-predict ../va.sp model submission.csv logloss = 0.47818 ``` ### Without linear terms (i.e., adding `-disable_wi` option) Hivemall ``` Iteration #2 | average loss=0.539961924393562, current cumulative loss=856.919574012583, previous cumulative loss=1651.6985545424677, change rate=0.4811179934516, #trainingExamples=1587 Iteration #3 | average loss=0.5106114115327627, current cumulative loss=810.3403101024943, previous cumulative loss=856.919574012583, change rate=0.05435663430113771, #trainingExamples=1587 Iteration #4 | average loss=0.4906722901321148, current cumulative loss=778.6969244396662, previous cumulative
[GitHub] incubator-hivemall issue #149: [WIP][HIVEMALL-201] Evaluate, fix and documen...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/149 @takuti I advice to check 2-3 updates to investigate how gradient updates differ. ---
[GitHub] incubator-hivemall issue #149: [WIP][HIVEMALL-201] Evaluate, fix and documen...
Github user takuti commented on the issue: https://github.com/apache/incubator-hivemall/pull/149 Note: I've extended LIBFFM code so it uses linear terms: https://github.com/takuti/criteo-ffm/commit/9aca61d93ed8f583025729206ed0dbfd54806a44 However, I cannot observe significant difference between LogLoss achieved with/without linear terms. ---
[GitHub] incubator-hivemall issue #149: [WIP][HIVEMALL-201] Evaluate, fix and documen...
Github user takuti commented on the issue: https://github.com/apache/incubator-hivemall/pull/149 Evaluation has been conducted at: [takuti/criteo-ffm](https://github.com/takuti/criteo-ffm). See the repository for detail. As an example, I have used tiny data provided at [guestwalk/kaggle-2014-criteo](https://github.com/guestwalk/kaggle-2014-criteo) which is already preprocessed and converted into the LIBFFM format: - Split 2,000 samples in `train.tiny.csv` to: - 1,587 training samples `tr.sp` - 412 validation samples `va.sp` As a consequence, FFM model created by LIBFFM and Hivemall with the following (almost similar) configuration showed very similar training loss and accuracy as follows. **LIBFFM**: ``` $ ./ffm-train -k 4 -t 15 -l 0.2 -r 0.2 -s 10 ../tr.sp model iter tr_logloss tr_time 1 1.04980 0.0 2 0.53771 0.0 3 0.50963 0.0 4 0.48980 0.1 5 0.47469 0.1 6 0.46304 0.1 7 0.45289 0.1 8 0.44400 0.1 9 0.43653 0.1 10 0.42947 0.1 11 0.42330 0.1 12 0.41727 0.1 13 0.41130 0.1 14 0.40558 0.1 15 0.40036 0.1 ``` > LogLoss on validation set `va.sp`: 0.47237 **Hivemall**: ``` $ hive --hiveconf hive.root.logger=INFO,console hive> INSERT OVERWRITE TABLE criteo.ffm_model > SELECT > train_ffm(features, label, '-init_v random -max_init_value 1.0 -classification -iterations 15 -factors 4 -eta 0.2 -l2norm -optimizer sgd -lambda 0.2 -cv_rate 0.0 -disable_wi') > FROM ( > SELECT > features, label > FROM > criteo.train_vectorized > CLUSTER BY rand(1) > ) t > ; Record training examples to a file: /var/folders/rg/6mhvj7h567x_ys7brmf2bb6wgn/T/hivemall_fm6211397472147242886.sgmt Iteration #2 | average loss=0.5316043797079182, current cumulative loss=843.6561505964662, previous cumulative loss=1214.5909560888044, change rate=0.30539895232450376, #trainingExamples=1587 Iteration #3 | average loss=0.5065999656968238, current cumulative loss=803.9741455608594, previous cumulative loss=843.6561505964662, change rate=0.04703575622313853, #trainingExamples=1587 Iteration #4 | average loss=0.49634490612175397, current cumulative loss=787.6993660152235, previous cumulative loss=803.9741455608594, change rate=0.0202429140731664, #trainingExamples=1587 Iteration #5 | average loss=0.48804954980765963, current cumulative loss=774.5346355447558, previous cumulative loss=787.6993660152235, change rate=0.0167128869698916, #trainingExamples=1587 Iteration #6 | average loss=0.48072518575956447, current cumulative loss=762.9108698004288, previous cumulative loss=774.5346355447558, change rate=0.015007418920848658, #trainingExamples=1587 Iteration #7 | average loss=0.47402279755334875, current cumulative loss=752.2741797171644, previous cumulative loss=762.9108698004288, change rate=0.01394224476803, #trainingExamples=1587 Iteration #8 | average loss=0.4677507471836629, current cumulative loss=742.320435780473, previous cumulative loss=752.2741797171644, change rate=0.013231537390308698, #trainingExamples=1587 Iteration #9 | average loss=0.4618142861358177, current cumulative loss=732.8992720975427, previous cumulative loss=742.320435780473, change rate=0.012691505216375798, #trainingExamples=1587 Iteration #10 | average loss=0.4561878517855827, current cumulative loss=723.9701207837197, previous cumulative loss=732.8992720975427, change rate=0.012183326759580433, #trainingExamples=1587 Iteration #11 | average loss=0.45087834343992406, current cumulative loss=715.5439310391595, previous cumulative loss=723.9701207837197, change rate=0.01163886395675921, #trainingExamples=1587 Iteration #12 | average loss=0.4458864402438874, current cumulative loss=707.6217806670493, previous cumulative loss=715.5439310391595, change rate=0.011071508021324606, #trainingExamples=1587 Iteration #13 | average loss=0.44118468270053807, current cumulative loss=700.1600914457539, previous cumulative loss=707.6217806670493, change rate=0.010544742156271002, #trainingExamples=1587 Iteration #14 | average loss=0.4367191822212713, current cumulative loss=693.0733421851576, previous cumulative loss=700.1600914457539, change rate=0.01012161268141256, #trainingExamples=1587 Iteration #15 | average loss=0.4324248854220929, current cumulative loss=686.2582931648615, previous cumulative loss=693.0733421851576, change rate=0.009833084906727563, #trainingExamples=1587 Performed 15 iterations of 1,587 training examples on memory (thus 23,805 training