Re: [scikit-learn] scikit-learn Digest, Vol 6, Issue 40

Afarin Famili Mon, 26 Sep 2016 14:08:41 -0700

Hi David,

When applying Train_test_split to the sample space, we have a single row per 
subject. I am looking for some other function like Train_test_split that can 
deal with pairs of rows (for each subject), which does not lead to a biased 
accuracy. We are studying memory and have a row of features for successful 
memory encoding, and a second row for unsuccessful memory encoding in each of 
the subjects. Our target space being 1 for successful and 0 for unsuccessful 
encoding respectively. 
How do you recommend me to split this set of data in order to get a 
reasonable/unbiased accuracy?


Thanks,
Afarin



________________________________________
From: scikit-learn 
<[email protected]> on behalf of 
[email protected] <[email protected]>
Sent: Monday, September 26, 2016 2:43 PM
To: [email protected]
Subject: scikit-learn Digest, Vol 6, Issue 40

Send scikit-learn mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/scikit-learn
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."


Today's Topics:

   1. header intact (Afarin Famili)
   2. Is there a built-in function for pairs of data? (Afarin Famili)
   3. Re: Is there a built-in function for pairs of data?
      (Pedro Pazzini)
   4. Re: Is there a built-in function for pairs of data?
      (David Nicholson)
   5. Large computation time for homogeneous data with
      agglomerative clustering (Md. Khairullah)


----------------------------------------------------------------------

Message: 1
Date: Mon, 26 Sep 2016 18:03:27 +0000
From: Afarin Famili <[email protected]>
To: "[email protected]" <[email protected]>
Subject: [scikit-learn] header intact
Message-ID: <[email protected]>
Content-Type: text/plain; charset="iso-8859-1"

?



________________________________

UT Southwestern


Medical Center



The future of medicine, today.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.python.org/pipermail/scikit-learn/attachments/20160926/92efd185/attachment-0001.html>

------------------------------

Message: 2
Date: Mon, 26 Sep 2016 18:06:49 +0000
From: Afarin Famili <[email protected]>
To: "[email protected]" <[email protected]>
Subject: [scikit-learn] Is there a built-in function for pairs of
        data?
Message-ID: <[email protected]>
Content-Type: text/plain; charset="iso-8859-1"


Dear Scikit-learn team,


We need to deal with pairs of data in our classification task. I was wondering 
if there is already a built-in function in Scikit-learn that can partition the 
pairs of data into train and test sets?


Regards,

Afarin



________________________________

UT Southwestern


Medical Center



The future of medicine, today.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.python.org/pipermail/scikit-learn/attachments/20160926/983b9036/attachment-0001.html>

------------------------------

Message: 3
Date: Mon, 26 Sep 2016 15:47:26 -0300
From: Pedro Pazzini <[email protected]>
To: Scikit-learn user and developer mailing list
        <[email protected]>
Subject: Re: [scikit-learn] Is there a built-in function for pairs of
        data?
Message-ID:
        <CAAY8FkB2LjnegwFbn=gsoawlbcbq3dnya6bxdxn6-cvlt1r...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Like this?:
http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html

2016-09-26 15:06 GMT-03:00 Afarin Famili <[email protected]>:

>
> Dear Scikit-learn team,
>
>
> We need to deal with pairs of data in our classification task. I was
> wondering if there is already a built-in function in Scikit-learn that can
> partition the pairs of data into train and test sets?
>
>
> Regards,
>
> Afarin
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.python.org/pipermail/scikit-learn/attachments/20160926/2ba60e6a/attachment-0001.html>

------------------------------

Message: 4
Date: Mon, 26 Sep 2016 14:53:05 -0400
From: David Nicholson <[email protected]>
To: Scikit-learn user and developer mailing list
        <[email protected]>
Subject: Re: [scikit-learn] Is there a built-in function for pairs of
        data?
Message-ID:
        <camabfbxamb5kzqy9_wu+8bfxpsecbs2fsiqqad18zi9zmoj...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Do you mean like train_test_split?
http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html

On Sep 26, 2016 14:43, "Afarin Famili" <[email protected]>
wrote:

>
> Dear Scikit-learn team,
>
>
> We need to deal with pairs of data in our classification task. I was
> wondering if there is already a built-in function in Scikit-learn that can
> partition the pairs of data into train and test sets?
>
>
> Regards,
>
> Afarin
>
>
>
> ------------------------------
>
> UT Southwestern
>
> Medical Center
>
> The future of medicine, today.
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.python.org/pipermail/scikit-learn/attachments/20160926/191ef81d/attachment-0001.html>

------------------------------

Message: 5
Date: Mon, 26 Sep 2016 21:43:05 +0200
From: "Md. Khairullah" <[email protected]>
To: [email protected]
Subject: [scikit-learn] Large computation time for homogeneous data
        with agglomerative clustering
Message-ID:
        <ca+xrtckmkwsn2y7jfg12nex-ch_v5bw7elhg5uo39wn+ebb...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Scikit-learners,
This is my first post here and I hope you experts can help me a lot.

We are using the agglomerative clustering with ward's linkage and
connectivity constraint. The data size is around 205,000 (each is a single
scalar feature). The data set is dynamic (in time) and we need to apply
clustering at different time thorough the process. Initially all data is 0
and they increase gradually. Alternatively, in the early stage the data is
more homogeneous and the heterogeneity among the data increases gradually.
If the clustering is applied at the final stage (most heterogeneous data,
but off course having patterns/clusters) requesting 20 clusters it takes
only 61s of CPU time. But, if clustering is run in an early stage (more
homogeneous data but all are not 0 and off course there are
patterns/clusters in the data) with the same settings the time rises up to
1h 5m. The CPU time is in-between of these two if the data come from an
in-between time stamp. I also tried the the other linkage options too, but
the situation does not improve. My understanding is that the homogeneity is
playing the role.

Have you experienced this too? What solution do you suggest?

Thanks in advance for your attention and help.

--
Best regards

Md. Khairullah
PhD Student, KU Leuven
Numerical Analysis and Applied Mathematics Section
Celestijnenlaan 200a - box 2402
3001 Leuven
room: 03.18
tel. +32 16 37 39 66
fax +32 16 3 27996
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.python.org/pipermail/scikit-learn/attachments/20160926/da13ef50/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn


------------------------------

End of scikit-learn Digest, Vol 6, Issue 40
*******************************************

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] scikit-learn Digest, Vol 6, Issue 40

Reply via email to