Dear Mike, Just my two cents about your inquiry, where I strictly a user of scikit-learn for many years.
- From your description of application context, I would say that scikit-learn is perfectly fine. However, I would suggest the awareness that a monolithic model incorporating all data (as is the image TV wrongfully projects) is not a valid strategy. Stratifying data into contextually correct subgroups and then running scikit-learn, for example to estimate during development the extent of predictability, will be helpful. - Duplicate checking should be easy to use using standard python objects (set or list counting), once the context derives how the objects are vectorized/featurized. I don't see a need to force scikit-learn for that context. - Missing data could be implemented by context-specific object classes that you design, which could contain something like a __bool__() method that could tell if you if the object has all of the required data populated and configured. - Detection of errors in configuration could be either explicitly driven by logic (of the context, again something to return a bool that an object is configured correctly), or potentially could be statistically derived as outliers from the given background data distribution, in which then scikit-learn could be of help. If there are too many variates (thousands or tens of thousands) in your data that prohibit explicit logic, then scikit-learn's Random Forest algorithms might be perfectly fine and provide verification through visualization of Decision Tree rules. Hope this helps, J.B. Brown 2022年10月8日(土) 10:59 Mike Oliver <m...@globalsaassol.com>: > Dear Sirs, > > > > I am evaluating SciKit-Learn for a new project. I am hoping to find a AI > Machine Learning package that can take a large dataset of objects that have > various object types and attributes. These objects are typically related > to other objects, such as a server to a Wifi device, or two network routers > to each other, etc. When these objects are setup data is gathered about > where they are located, what settings there are, the device type, etc. > > > > With large organizations there can be thousands of these objects and tens > of thousands of relationships, descriptions, settings, etc. My hope is > that with machine learning we can detect when an object is missing, or > configured in error, or duplicates. > > > > The question is, will SciKit-Learn help with this problem? I understand > that we will have to train it to identify what to look for and then act on > what was found and predicted to be the solution algorithm. Or instructions. > > > > Thanks for your help, > > > > Great looking product and already have the tutorial up and running and > have installed it in my Django platform. > > > > Mike > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn