Re: [scikit-learn] Questions about partial_fit and the Incremental library in Sci-kit learn
On 2019-09-09 12:12, Daniel Sullivan wrote: Hi Farzana, If I understand your question correctly you're asking how the SGD classifier works incrementally? The SGD algorithm maintains a single set of weights and iterates through all data points one at a time in a batch. It adjusts its weights on each iteration. So to answer your question, it trains on each instance, not on the batch. However, the algorithm can iterate multiple times through a single batch. Let me know if that answers your question. Best, Danny On Mon, Sep 9, 2019 at 11:56 AM Farzana Anowar wrote: Hello Sir/Madam, I subscribed to the link you sent me. I am posting my question again: This Is Farzana Anowar, a Ph.D. candidate in University of Regina. Currently, I'm working to develop a model that learns incrementally from non-stationary data. I have come across an Incremental library in sci-kit learn that actually allows to do that using partial_fit. I have searched a lot for the detailed information about this 'incremental' library and 'partial_fit', however, I couldn't find any. It would be great if you could provide me with some detailed information about these two regarding how they actually work. For example, If we take SGD as a classifier, the incremental library will allow me to take chunks/batches of data. My question is: Do this incremental library train (using parial_fit) the whole batch at a time and then produce a classification performance or it takes a batch and trains each instance at a time from the batch. Thanks in advance! -- Regards, Farzana Anowar ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn Hello Daniel, Thank you so much! I think your clarification makes sense. So, whatever batches I am passing through the classifier it will train each instance through a single batch. I was just wondering if you could give me some information about partial_fit. Just for your reference, I was having a look at this code. https://dask-ml.readthedocs.io/en/latest/incremental.html Thanks! -- Regards, Farzana Anowar ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Questions about partial_fit and the Incremental library in Sci-kit learn
Hello Sir/Madam, I subscribed to the link you sent me. I am posting my question again: This Is Farzana Anowar, a Ph.D. candidate in University of Regina. Currently, I'm working to develop a model that learns incrementally from non-stationary data. I have come across an Incremental library in sci-kit learn that actually allows to do that using partial_fit. I have searched a lot for the detailed information about this 'incremental' library and 'partial_fit', however, I couldn't find any. It would be great if you could provide me with some detailed information about these two regarding how they actually work. For example, If we take SGD as a classifier, the incremental library will allow me to take chunks/batches of data. My question is: Do this incremental library train (using parial_fit) the whole batch at a time and then produce a classification performance or it takes a batch and trains each instance at a time from the batch. Thanks in advance! -- Regards, Farzana Anowar ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Incremental learning in scikit-learn
Hello Sir/Madam, I am going through the incremental learning algorithm in Scikit-learn. SGD in sci-kit learn is such a kind of algorithm that allows learning incrementally by passing chunks/batches. Now my question is: does sci-kit learn keeps all the batches for training data in memory? Or it keeps chunks/batches in memory up to a certain amount of size? Or it keeps only one chunk/batch while training in memory and removes the other trained chunks/batches after training? Does that mean it suffers from catastrophic forgetting? Thanks! -- Regards, Farzana Anowar ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Incremental learning in scikit-learn
On 2019-09-09 17:53, Daniel Sullivan wrote: Hey Farzana, The algorithm only keeps one batch in memory at a time. Between processing over each batch, SGD keeps a set of weights that it alters with each iteration of a data point or instance within a batch. This set of weights functions as the persisted state between calls of partial_fit. That means you will get the same results with SGD regardless of your batch size and you can choose your batch size according to your memory constraints. Hope that helps. - Danny On Mon, Sep 9, 2019 at 5:53 PM Farzana Anowar wrote: Hello Sir/Madam, I am going through the incremental learning algorithm in Scikit-learn. SGD in sci-kit learn is such a kind of algorithm that allows learning incrementally by passing chunks/batches. Now my question is: does sci-kit learn keeps all the batches for training data in memory? Or it keeps chunks/batches in memory up to a certain amount of size? Or it keeps only one chunk/batch while training in memory and removes the other trained chunks/batches after training? Does that mean it suffers from catastrophic forgetting? Thanks! -- Regards, Farzana Anowar ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn Thanks a lot! -- Regards, Farzana Anowar ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] Attribute Incremental learning
On 2020-01-16 08:36, Max Halford wrote: Hello Farzana, You might want to check out scikit-multiflow [1] and creme [2] (I'm the author). Kind regards. On Tue, 14 Jan 2020 at 16:59, Farzana Anowar wrote: Hello, This is Farzana. I am trying to understand the attribute incremental learning ( or virtual concept drift) which is every time when a new feature will be available for a real-time dataset (i.e. any online auction dataset) a classifier will add that new feature with the existing features in a dataset and classify the new dataset (with previous features and new features) incrementally. I know that we can convert a static classifier to an incremental classifier in scikit-learn. However, I could not find any library or function for attribute incremental learning or any detail information. It would be great if anyone could give me some insight on this. Thanks! -- Best Regards, Farzana Anowar, PhD Candidate Department of Computer Science University of Regina ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn -- Max Halford +336 28 25 13 38 Links: -- [1] https://scikit-multiflow.github.io/ [2] https://creme-ml.github.io/ ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn Hello Max, Thanks a lot. -- Best Regards, Farzana Anowar, PhD Candidate Department of Computer Science University of Regina ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Attribute Incremental learning
Hello, This is Farzana. I am trying to understand the attribute incremental learning ( or virtual concept drift) which is every time when a new feature will be available for a real-time dataset (i.e. any online auction dataset) a classifier will add that new feature with the existing features in a dataset and classify the new dataset (with previous features and new features) incrementally. I know that we can convert a static classifier to an incremental classifier in scikit-learn. However, I could not find any library or function for attribute incremental learning or any detail information. It would be great if anyone could give me some insight on this. Thanks! -- Best Regards, Farzana Anowar, PhD Candidate Department of Computer Science University of Regina ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
Re: [scikit-learn] transfer learning doubt
On 2020-03-19 00:11, Praneet Singh wrote: I am training a SGD Classifier with some training dataset which is temporary and will be lost after sometime. So I am planning to save the model in pickle file and reuse it and train again with some another dataset that arrives. But It forgets the previously learned data. As far as I researched in google, tensorflow model allows transfer learning and not forgetting the previous learning but is there any other way with sklearn model to achieve this?? any help would be appreciated ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn Did you use incremental estimator and partial _fit? If not, try to use them. Should work. Another option is to us deep learning and store the weights for the first model and initialize the second model with that weight and keep doing it for the rest of the models. -- Best Regards, Farzana Anowar, PhD Candidate Department of Computer Science University of Regina ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Incremental learning in scikit-learn
Hello everyone, Currently, I am working with incremental learning. I know that scikit-learn allows using incremental learning for some classifiers i. e. SGD. In incremental learning, data is not available all together rather the data become available chunk by chunk over the time. Now, my question is: does scikit-learn allows to have different data chunk or all the chunks has to be of the same size? Thanks! -- Best Regards, Farzana Anowar, PhD Candidate Department of Computer Science University of Regina ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Issue in BIRCH clustering algo
Hello everyone, I was trying to run the BIRCH clustering algorithm. However, after fitting the model I am facing the following error: AttributeError: '_CFSubcluster' object has no attribute 'sq_norm_' This error occurs only after fitting the model and I couldn't find any proper explanation of this. Could anyone give me any suggestions on that? It would be really helpful. Here is my code: from sklearn.cluster import Birch # Creating the BIRCH clustering model model = Birch(n_clusters = None) # Fit the data (Training) model.fit(df) # Predict the same data pred = model.predict(df) -- Best Regards, Farzana Anowar, PhD Candidate Department of Computer Science University of Regina ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn
[scikit-learn] Batch Incremental Learning from Scikit-Multiflow
Hello Scikit-learn community, I hope you all are doing well! I am currently working with BatchIncrementalClassifier from Scikit-multiflow package. For this BatchIncrementalClassifier, the following example is given: # Setup a data stream stream = SEAGenerator(random_state=1) # Pre-training the classifier with 200 samples X, y = stream.next_sample(200) batch_incremental_cfier = BatchIncrementalClassifier() batch_incremental_cfier.partial_fit(X, y) # Preparing the processing of 5000 samples and correct prediction count n_samples = 0 correct_cnt = 0 while n_samples < 5000 and stream.has_more_samples(): X, y = stream.next_sample() y_pred = batch_incremental_cfier.predict(X) if y[0] == y_pred[0]: correct_cnt += 1 batch_incremental_cfier.partial_fit(X, y) n_samples += 1 # Display results print('Batch Incremental ensemble classifier example') print('{} samples analyzed'.format(n_samples)) print('Performance: {}'.format(correct_cnt / n_samples)) Now my questions are: 1. For pre-training the model, the classifier used 200 samples from the stream, and then it does the prequential evaluation (test-train) on 5000 samples. So, the 200 samples, are they considered as the 1st batch of data from the stream that is just used for pre-training and when the 2nd batch of data (5000) becomes available it does the evaluation based on the pre-train model??? (This makes sense to me, as in this way, we will have influence from the previous pre-trained model) or 2. Is this one batch (200+5000) from the stream where 1st 200 samples have been used to pre-train and the rest of the samples are used for evaluation?? And when the next batch will arrive from the stream, will it does the same thing (200 for pre-training and the rest of them for evaluation)?? (If this is the case, are not we training from the scratch each time which does not keep the BatchIncrementalClassifier as an incremental classifier anymore?) Thanks! -- Best Regards, Farzana Anowar, PhD Candidate Department of Computer Science University of Regina ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn