Re: [shogun] ISSUE #3847

sahil chaddha Thu, 10 Aug 2017 13:07:00 -0700

I wrote an implementation by choosing linear divide points, c[n/k], and a
member variable (h), to ensure test set have h(default 1) future from time
t.
But with these implementation, train set size becomes as low as [n/k].
Should I implement another variant to increase train set size?



PS : Heiko, can you please also elaborate or provide link for third point?

On Aug 8, 2017 10:25 PM, "Heiko Strathmann" <[email protected]>
wrote:

> The splitting for the time series could be
>
> -deterministic, that is in increasing window sizes to the past: train set
> is everything up to point t, test set is everything from t.
> -there could be variants on this that are limiting the sizes of the folds
> -shuffling "blocks" that are approximately independent (i.e. the time
> series forgets its past after t observations), this should re-use existing
> code on shuffling
>
> 2017-08-08 16:34 GMT+01:00 sahil chaddha <[email protected]>:
>
>> I read the implementation of cross-validation splitting and
>> cross-validation. The build_subset() implements a random process to build
>> the subsets and thus, makes sense to run evaluate_one_run() several times
>> in cross-validation. Does the time-series split also require random
>> process? And also, generate_subset_indices() is like test set and
>> generate_subset_inverse() is like train set. So, to respect the time, the
>> subsets are bound to have a non-empty intersection. Am I right?
>>
>> *Sahil Chaddha*
>> Third Year Undergraduate Student
>> Department of Metallurgy and Materials Engineering
>> IIT Kharagpur, West Bengal - 721302
>> +91-7872705997 <+91%2078727%2005997>,  LinkedIn
>> <https://www.linkedin.com/in/sahil-chaddha-a0a376b7/> | Github
>> <https://github.com/Sahil333>
>>
>> On Mon, Aug 7, 2017 at 1:56 PM, Fernando J. Iglesias García <
>> [email protected]> wrote:
>>
>>> Welcome Sahil!
>>>
>>> Great that you have already successfully set up your dev environment.
>>>
>>> For this particular task, I think it will be useful to get familiar with
>>> Shogun's cross-validation. You could start by checking the related examples
>>> (like this one
>>> <https://github.com/shogun-toolbox/shogun/blob/develop/examples/undocumented/libshogun/splitting_standard_crossvalidation.cpp>).
>>> Then, you can get into understanding how the splitting strategy is
>>> implemented internally (you can find the implementation by following the
>>> appropriate include file from the example). You will also need to
>>> understand details about the time-series splitting strategy, the links in
>>> the github issue will be useful for this.
>>>
>>> After, you should be ready to start implementing the time-series
>>> splitting. Let us know how it goes.
>>>
>>> Hope that helps!
>>>
>>> Cheers,
>>> Fernando.
>>>
>>> On 5 August 2017 at 20:29, sahil chaddha <[email protected]> wrote:
>>>
>>>> Ma'am/Sir,
>>>>
>>>>    I want to work on this https://github.com/shogun
>>>> -toolbox/shogun/issues/3847. But I have no idea where to start. I am
>>>> new to such big projects. Can anyone guide me through it? I have already
>>>> setup the environment, ran tests and examples successfully.
>>>>
>>>> *Sahil Chaddha*
>>>> Fourth Year Undergraduate Student
>>>> Department of Metallurgy and Materials Engineering
>>>> IIT Kharagpur, West Bengal - 721302
>>>> +91-7872705997 <+91%2078727%2005997>,  LinkedIn
>>>> <https://www.linkedin.com/in/sahil-chaddha-a0a376b7/> | Github
>>>> <https://github.com/Sahil333>
>>>>
>>>
>>>
>>
>

Re: [shogun] ISSUE #3847

Reply via email to