This is a follow-up regarding last mail. *Sahil Chaddha* Third Year Undergraduate Student Department of Metallurgy and Materials Engineering IIT Kharagpur, West Bengal - 721302 +91-7872705997, LinkedIn <https://www.linkedin.com/in/sahil-chaddha-a0a376b7/> | Github <https://github.com/Sahil333>
On Fri, Aug 11, 2017 at 1:34 AM, sahil chaddha <[email protected]> wrote: > I wrote an implementation by choosing linear divide points, c[n/k], and a > member variable (h), to ensure test set have h(default 1) future from time > t. > But with these implementation, train set size becomes as low as [n/k]. > Should I implement another variant to increase train set size? > > > PS : Heiko, can you please also elaborate or provide link for third point? > > On Aug 8, 2017 10:25 PM, "Heiko Strathmann" <[email protected]> > wrote: > >> The splitting for the time series could be >> >> -deterministic, that is in increasing window sizes to the past: train set >> is everything up to point t, test set is everything from t. >> -there could be variants on this that are limiting the sizes of the folds >> -shuffling "blocks" that are approximately independent (i.e. the time >> series forgets its past after t observations), this should re-use existing >> code on shuffling >> >> 2017-08-08 16:34 GMT+01:00 sahil chaddha <[email protected]>: >> >>> I read the implementation of cross-validation splitting and >>> cross-validation. The build_subset() implements a random process to build >>> the subsets and thus, makes sense to run evaluate_one_run() several times >>> in cross-validation. Does the time-series split also require random >>> process? And also, generate_subset_indices() is like test set and >>> generate_subset_inverse() is like train set. So, to respect the time, the >>> subsets are bound to have a non-empty intersection. Am I right? >>> >>> *Sahil Chaddha* >>> Third Year Undergraduate Student >>> Department of Metallurgy and Materials Engineering >>> IIT Kharagpur, West Bengal - 721302 >>> +91-7872705997 <+91%2078727%2005997>, LinkedIn >>> <https://www.linkedin.com/in/sahil-chaddha-a0a376b7/> | Github >>> <https://github.com/Sahil333> >>> >>> On Mon, Aug 7, 2017 at 1:56 PM, Fernando J. Iglesias García < >>> [email protected]> wrote: >>> >>>> Welcome Sahil! >>>> >>>> Great that you have already successfully set up your dev environment. >>>> >>>> For this particular task, I think it will be useful to get familiar >>>> with Shogun's cross-validation. You could start by checking the related >>>> examples (like this one >>>> <https://github.com/shogun-toolbox/shogun/blob/develop/examples/undocumented/libshogun/splitting_standard_crossvalidation.cpp>). >>>> Then, you can get into understanding how the splitting strategy is >>>> implemented internally (you can find the implementation by following the >>>> appropriate include file from the example). You will also need to >>>> understand details about the time-series splitting strategy, the links in >>>> the github issue will be useful for this. >>>> >>>> After, you should be ready to start implementing the time-series >>>> splitting. Let us know how it goes. >>>> >>>> Hope that helps! >>>> >>>> Cheers, >>>> Fernando. >>>> >>>> On 5 August 2017 at 20:29, sahil chaddha <[email protected]> wrote: >>>> >>>>> Ma'am/Sir, >>>>> >>>>> I want to work on this https://github.com/shogun >>>>> -toolbox/shogun/issues/3847. But I have no idea where to start. I am >>>>> new to such big projects. Can anyone guide me through it? I have already >>>>> setup the environment, ran tests and examples successfully. >>>>> >>>>> *Sahil Chaddha* >>>>> Fourth Year Undergraduate Student >>>>> Department of Metallurgy and Materials Engineering >>>>> IIT Kharagpur, West Bengal - 721302 >>>>> +91-7872705997 <+91%2078727%2005997>, LinkedIn >>>>> <https://www.linkedin.com/in/sahil-chaddha-a0a376b7/> | Github >>>>> <https://github.com/Sahil333> >>>>> >>>> >>>> >>> >>
