Re: [scikit-learn] merging the predicted labels with original dataframe

Julio Antonio Soto de Vicente Thu, 20 Jul 2017 08:40:17 -0700

Hi Ruchika,

The predictions outputted by all sklearn models are just 1-d Numpy arrays, so 
it should be trivial to add it to any existing DataFrame:


your_df["prediction"] = clf.predict(X_test)

--
Julio

> El 20 jul 2017, a las 17:23, Ruchika Nayyar <[email protected]> escribió:
> 
> Hi Scikit-learn Users, 
> 
> I am analyzing some proxy logs to use Machine learning to classify the events 
> recorded as either "OBSERVED" or "BLOCKED". This is a little snippet of my 
> code: 
> The input file is a csv with tokenized string fields. 
> 
> **************
> # load the file 
> M = pd.read_csv("output100k.csv").fillna('')
> 
> # define the fields to use 
> min_df = 0.001
> max_df = .7
> TxtCols = ['request__tokens', 'requestClientApplication__tokens',
>            'destinationZoneURI__tokens','cs-categories__tokens', 
>            'fileType__tokens', 'requestMethod__tokens','tcp_status1',
>            'app','tcp_status2','dhost'
>           ]
> NumCols = ['rt', 'out', 'in', 'time-taken','rt_length', 'dt_length']
> 
> # vectorize the fields 
> TfidfModels = [TfidfVectorizer(min_df = min_df, max_df=max_df).fit(M[t]) for 
> t in TxtCols]
> 
> # define the columns of sparse matrix 
> X = hstack([m.transform(M[n].fillna('')) for m,n in zip(TfidfModels, 
> TxtCols)] + \
>                [csr_matrix(pd.to_numeric(M[n]).fillna(-1).values).T for n in 
> NumCols])
>            
> # target variable 
> Y = M.act.values 
> 
> ## Define train/test parts and scale them 
> X_train, X_test, y_train, y_test = tts(X, Y, test_size=0.2)
> scaler = StandardScaler(with_mean=False, with_std=True)
> scaler.fit(X_train)
> X_train=scaler.transform(X_train)
> X_test=scaler.transform(X_test)
> 
> 
> # define the model and train 
> clf = MLPClassifier(activation='logistic', 
> solver='lbfgs').fit(X_train,y_train)
> # use the model to predict on X_test and convert into a data frame 
> df=pd.DataFrame(clf.predict(X_test))
> 
> **
> 199845  OBSERVED
> 199846  OBSERVED
> [199847 rows x 1 columns]>
> **
> Now at the end I have a DataFrame with 20K entries with just one column 
> "Label", how di I connect it to the main dataframe M, since I want to do some 
> investigations on this outcome ?
> 
> Any help? 
> 
> Thanks,
> Ruchika
> 
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] merging the predicted labels with original dataframe

Reply via email to