Hi,
I am new to machine learning and was wondering if I could be pointed in the
direction for further reading with respect to my particular problem.
I have a lot of data from various sensors (Electricity, Temperature,
Humidity, Gas / Water Usage etc)
This data can be corrupt in many ways. For example
1, The data can have one off zero values
2, The data can have large gaps of zero's ranging from hours to weeks
3, The can be both positive and negative large spikes due to interference.
4, The data can be stuck high for a while due to various reasons.
Currently I am cleaning the data by replacing short gaps with the average
values of the two points it is between.
Large gaps get replaced with the median of the data for all of its time
values.
Spikes and one off zeros are replace with an adjacent value.
The spike is determined from a threshold of 3dp from the median value.
This works a lot but misses problems like a constant high value just below
the threshold chosen and seems problematic
for certain types of data.
I thought a machine learning approach may be more flexible / robust ? Maybe
I am wrong with that assumption though.
Has anyone got any advice on which area of machine learning I should
explore first.
Or maybe my problem is not suited to it ?
Thanks for any advice.
Glenn
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general