.
Dale T. Smith | Macy's Systems and Technology | IFS eCom CSE Data Science
5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.sm...@macys.com
From: scikit-learn
[mailto:scikit-learn-bounces+dale.t.smith=macys@python.org] On Behalf Of
Rohan Koodli
Sent: Monday, February 27, 2017 10:43 PM
T. Smith | Macy's Systems and Technology | IFS eCom CSE Data Science
5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.sm...@macys.com
From: scikit-learn
[mailto:scikit-learn-bounces+dale.t.smith=macys@python.org] On Behalf Of
Graham Arthur Mackenzie
Sent: Tuesday, December 13, 2016 5:02
I think you need to look at the examples.
__
Dale T. Smith | Macy's Systems and Technology | IFS eCom CSE Data Science
5985 State Bridge Road, Johns Creek, GA
/pcr_part2_yaware/
http://www.win-vector.com/blog/2016/06/y-aware-scaling-in-context/
__
Dale T. Smith | Macy's Systems and Technology | IFS eCom CSE Data Science
5985
of this on
the mailing list.
__
Dale T. Smith | Macy's Systems and Technology | IFS eCom CSE Data Science
5985 State Bridge Road, Johns Creek, GA 30097 | dale.t.sm
Searching the mailing list would be the best way to find out this information.
It may be in the contrib packages on github – have you checked?
__
Dale T. Smith
Please define “sensibly”. I would be strongly opposed to modifying any models
to incorporate “missingness”. No model handles missing data for you. That is
for you to decide based on your individual problem domain.
Take a look at a talk from last winter on missing data by Nina Zumel. Nina
Search for Jackknife at Wikipedia. That will give you a quick overview. Then
you will have the background to read the papers below.
While you are at Wikipedia, you may want to read on the bootstrap and random
forests as well.
definition
of a confidence interval.
--
Roman
On 01/09/16 20:32, Dale T Smith wrote:
> There is a scikit-learn-contrib project with confidence intervals for random
> forests.
>
> https://github.com/scikit-learn-contrib/forest-confide
There is a scikit-learn-contrib project with confidence intervals for random
forests.
https://github.com/scikit-learn-contrib/forest-confidence-interval
__
Dale Smith | Macy's Systems and Technology | IFS
]. And
there are also other approaches for comparing time series in the frequency
domain such as FFT and DWT [Ref:
http://infolab.usc.edu/csci599/Fall2003/Time%20Series/Efficient%20Similarity%20Search%20In%20Sequence%20Databases.pdf].
I hope it helps.
2016-08-05 9:26 GMT-03:00 Dale T Smith
<dale.t
I don’t think you should treat this as an outlier detection problem. Why not
try it as a classification problem? The dataset is highly unbalanced. Try
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html
Use sample_weight to tell the fit method about the
Use conda or a virtualenv to handle compatibility issues. Then you can control
when upgrades occur. I’ve used conda with good effect to handle version issues
such as yours.
Otherwise, use PMML. The Data Mining Group maintains a list of PMML producers
and consumers. I think there is a Python
I agree with everyone else – conda environments are specially designed for this
situation.
I’ve not used virtualenv myself
(http://docs.python-guide.org/en/latest/dev/virtualenvs/). I’m an Anaconda user.
months ago.
http://www.slideshare.net/rgrossman/how-to-lower-the-cost-of-deploying-analytics-an-introduction-to-the-portable-format-for-analytics
William
On Thu, Jul 14, 2016 at 8:35 AM, Dale T Smith
<dale.t.sm...@macys.com<mailto:dale.t.sm...@macys.com>> wrote:
Hello,
I investigated
15 matches
Mail list logo