Re: [shogun] ISSUE #3847

sahil chaddha Sun, 13 Aug 2017 05:09:41 -0700

This is a follow-up regarding last mail.

*Sahil Chaddha*
Third Year Undergraduate Student
Department of Metallurgy and Materials Engineering
IIT Kharagpur, West Bengal - 721302
+91-7872705997,  LinkedIn
<https://www.linkedin.com/in/sahil-chaddha-a0a376b7/> | Github
<https://github.com/Sahil333>


On Fri, Aug 11, 2017 at 1:34 AM, sahil chaddha <[email protected]> wrote:

> I wrote an implementation by choosing linear divide points, c[n/k], and a
> member variable (h), to ensure test set have h(default 1) future from time
> t.
> But with these implementation, train set size becomes as low as [n/k].
> Should I implement another variant to increase train set size?
>
>
> PS : Heiko, can you please also elaborate or provide link for third point?
>
> On Aug 8, 2017 10:25 PM, "Heiko Strathmann" <[email protected]>
> wrote:
>
>> The splitting for the time series could be
>>
>> -deterministic, that is in increasing window sizes to the past: train set
>> is everything up to point t, test set is everything from t.
>> -there could be variants on this that are limiting the sizes of the folds
>> -shuffling "blocks" that are approximately independent (i.e. the time
>> series forgets its past after t observations), this should re-use existing
>> code on shuffling
>>
>> 2017-08-08 16:34 GMT+01:00 sahil chaddha <[email protected]>:
>>
>>> I read the implementation of cross-validation splitting and
>>> cross-validation. The build_subset() implements a random process to build
>>> the subsets and thus, makes sense to run evaluate_one_run() several times
>>> in cross-validation. Does the time-series split also require random
>>> process? And also, generate_subset_indices() is like test set and
>>> generate_subset_inverse() is like train set. So, to respect the time, the
>>> subsets are bound to have a non-empty intersection. Am I right?
>>>
>>> *Sahil Chaddha*
>>> Third Year Undergraduate Student
>>> Department of Metallurgy and Materials Engineering
>>> IIT Kharagpur, West Bengal - 721302
>>> +91-7872705997 <+91%2078727%2005997>,  LinkedIn
>>> <https://www.linkedin.com/in/sahil-chaddha-a0a376b7/> | Github
>>> <https://github.com/Sahil333>
>>>
>>> On Mon, Aug 7, 2017 at 1:56 PM, Fernando J. Iglesias García <
>>> [email protected]> wrote:
>>>
>>>> Welcome Sahil!
>>>>
>>>> Great that you have already successfully set up your dev environment.
>>>>
>>>> For this particular task, I think it will be useful to get familiar
>>>> with Shogun's cross-validation. You could start by checking the related
>>>> examples (like this one
>>>> <https://github.com/shogun-toolbox/shogun/blob/develop/examples/undocumented/libshogun/splitting_standard_crossvalidation.cpp>).
>>>> Then, you can get into understanding how the splitting strategy is
>>>> implemented internally (you can find the implementation by following the
>>>> appropriate include file from the example). You will also need to
>>>> understand details about the time-series splitting strategy, the links in
>>>> the github issue will be useful for this.
>>>>
>>>> After, you should be ready to start implementing the time-series
>>>> splitting. Let us know how it goes.
>>>>
>>>> Hope that helps!
>>>>
>>>> Cheers,
>>>> Fernando.
>>>>
>>>> On 5 August 2017 at 20:29, sahil chaddha <[email protected]> wrote:
>>>>
>>>>> Ma'am/Sir,
>>>>>
>>>>>    I want to work on this https://github.com/shogun
>>>>> -toolbox/shogun/issues/3847. But I have no idea where to start. I am
>>>>> new to such big projects. Can anyone guide me through it? I have already
>>>>> setup the environment, ran tests and examples successfully.
>>>>>
>>>>> *Sahil Chaddha*
>>>>> Fourth Year Undergraduate Student
>>>>> Department of Metallurgy and Materials Engineering
>>>>> IIT Kharagpur, West Bengal - 721302
>>>>> +91-7872705997 <+91%2078727%2005997>,  LinkedIn
>>>>> <https://www.linkedin.com/in/sahil-chaddha-a0a376b7/> | Github
>>>>> <https://github.com/Sahil333>
>>>>>
>>>>
>>>>
>>>
>>

Re: [shogun] ISSUE #3847

Reply via email to