I wrote an implementation by choosing linear divide points, c[n/k], and a member variable (h), to ensure test set have h(default 1) future from time t. But with these implementation, train set size becomes as low as [n/k]. Should I implement another variant to increase train set size?
PS : Heiko, can you please also elaborate or provide link for third point? On Aug 8, 2017 10:25 PM, "Heiko Strathmann" <[email protected]> wrote: > The splitting for the time series could be > > -deterministic, that is in increasing window sizes to the past: train set > is everything up to point t, test set is everything from t. > -there could be variants on this that are limiting the sizes of the folds > -shuffling "blocks" that are approximately independent (i.e. the time > series forgets its past after t observations), this should re-use existing > code on shuffling > > 2017-08-08 16:34 GMT+01:00 sahil chaddha <[email protected]>: > >> I read the implementation of cross-validation splitting and >> cross-validation. The build_subset() implements a random process to build >> the subsets and thus, makes sense to run evaluate_one_run() several times >> in cross-validation. Does the time-series split also require random >> process? And also, generate_subset_indices() is like test set and >> generate_subset_inverse() is like train set. So, to respect the time, the >> subsets are bound to have a non-empty intersection. Am I right? >> >> *Sahil Chaddha* >> Third Year Undergraduate Student >> Department of Metallurgy and Materials Engineering >> IIT Kharagpur, West Bengal - 721302 >> +91-7872705997 <+91%2078727%2005997>, LinkedIn >> <https://www.linkedin.com/in/sahil-chaddha-a0a376b7/> | Github >> <https://github.com/Sahil333> >> >> On Mon, Aug 7, 2017 at 1:56 PM, Fernando J. Iglesias García < >> [email protected]> wrote: >> >>> Welcome Sahil! >>> >>> Great that you have already successfully set up your dev environment. >>> >>> For this particular task, I think it will be useful to get familiar with >>> Shogun's cross-validation. You could start by checking the related examples >>> (like this one >>> <https://github.com/shogun-toolbox/shogun/blob/develop/examples/undocumented/libshogun/splitting_standard_crossvalidation.cpp>). >>> Then, you can get into understanding how the splitting strategy is >>> implemented internally (you can find the implementation by following the >>> appropriate include file from the example). You will also need to >>> understand details about the time-series splitting strategy, the links in >>> the github issue will be useful for this. >>> >>> After, you should be ready to start implementing the time-series >>> splitting. Let us know how it goes. >>> >>> Hope that helps! >>> >>> Cheers, >>> Fernando. >>> >>> On 5 August 2017 at 20:29, sahil chaddha <[email protected]> wrote: >>> >>>> Ma'am/Sir, >>>> >>>> I want to work on this https://github.com/shogun >>>> -toolbox/shogun/issues/3847. But I have no idea where to start. I am >>>> new to such big projects. Can anyone guide me through it? I have already >>>> setup the environment, ran tests and examples successfully. >>>> >>>> *Sahil Chaddha* >>>> Fourth Year Undergraduate Student >>>> Department of Metallurgy and Materials Engineering >>>> IIT Kharagpur, West Bengal - 721302 >>>> +91-7872705997 <+91%2078727%2005997>, LinkedIn >>>> <https://www.linkedin.com/in/sahil-chaddha-a0a376b7/> | Github >>>> <https://github.com/Sahil333> >>>> >>> >>> >> >
