It turned out my elderly conda was using sklearn<1.2. After much wasted
time on piecemeal-upgrade conda solve attempts, this got sklearn to
1.6.1,
$ conda update -n base -c conda-forge conda
$ conda install conda=25.1.1
# fixes to upgrade compatibility breakages
$ conda install tensorflow-gpu
$ conda install conda-forge::imbalanced-learn
Mantra:
$ python -c "import sklearn; sklearn.show_versions()"
I still don't retain column order, but the end columns match (as below),
and now I notice it thanks to column labels existing with
.set_output(transform="pandas").
Thanks,
Bill
---
--
Phobrain.com
On 2025-02-11 01:31, Bill Ross wrote:
> I applied ColumnTransformer, but the results are unexpected. It could be my
> lack of python skill, but it seems like the value of p1_1 in the original
> should persist at 0,0 in the transformed?
>
> ------- pre-scale
> p1_1 p1_2 p1_3 p1_4 p2_1 ... resp1_4 resp2_1
> resp2_2 resp2_3 resp2_4
> 760 1.382658 1.440719 1.555705 1.120171 1.717319 ... 0.598736
> 0.659797 0.376331 0.403887 0.390283
>
> ------- scaled
> [[0.17045455 0.04680535 0.04372197 ... 0.37633118 0.40388673 0.39028345]
>
> Thanks,
>
> Bill
>
> Fingers crossed on the formatting.
>
> column_trans = make_column_transformer(
> (MinMaxScaler(),
> ['order_in_session','big_stime','big_time','load_time','user_time','user_time2','mouse_down_time','mouse_time','mouse_dist','mouse_dist2','dot_count','mouse_dx','mouse_dy','mouse_vecx','mouse_vecy','dot_vec_len','mouse_maxv','mouse_maxa','mouse_mina','mouse_maxj','dot_max_vel','dot_max_acc','dot_max_jerk','dot_start_scrn','dot_end_scrn','dot_vec_ang']),
> remainder='passthrough')
>
> print('------- pre-scale')
> print( str(X_train) )
>
> X_train = column_trans.fit_transform(X_train)
>
> print('------- scaled')
> print( str(X_train) )
> print('------- /scaled')
>
> split 414 414
> ------- pre-scale
> p1_1 p1_2 p1_3 p1_4 p2_1 ... resp1_4 resp2_1
> resp2_2 resp2_3 resp2_4
> 760 1.382658 1.440719 1.555705 1.120171 1.717319 ... 0.598736
> 0.659797 0.376331 0.403887 0.390283
> 218 0.985645 0.532462 0.780601 0.687588 0.781293 ... 0.890886
> 1.072392 0.536962 0.715136 0.792722
> 603 0.783806 0.437074 0.694766 0.371121 0.995891 ... 1.055465
> 1.518875 1.129209 1.201864 1.476702
> 0 0.501352 0.253304 0.427804 0.283380 0.571035 ... 1.035323
> 1.621431 0.838613 1.031724 1.131344
> 604 1.442482 1.019641 0.798387 1.055465 1.518875 ... 2.779447
> 1.636363 1.212313 1.274595 1.723697
>
> ...
>
> ...
>
> ------- scaled
> [[0.17045455 0.04680535 0.04372197 ... 0.37633118 0.40388673 0.39028345]
> [0.27272727 0.04502229 0.04204036 ... 0.53696203 0.7151355 0.7927222 ]
> [0.30681818 0.04517088 0.04456278 ... 1.1292094 1.201864 1.4767016 ]
> ...
> [0.02272727 0.04457652 0.1680213 ... 1.796316 1.939811 2.1776829 ]
> [0.55681818 0.04546805 0.04176009 ... 0.48330075 0.37375322 0.29931256]
> [0.5 0.04457652 0.04091928 ... 0.6759416 0.7517819 0.8801653 ]]
>
> ---
> --
>
> Phobrain.com
>
> On 2025-01-23 01:21, Bill Ross wrote:
>
>>> ColumnTransformer
>>
>> Thanks!
>>
>> I was also thinking of trying TabPFN, not researched yet, in case you can
>> comment. <peeks/> Their attribution requirement seems overboard for what I
>> want, unless it's flat-out miraculous for the flat-footed. :-)
>>
>> Some of us are working on a related package, skrub (https://skrub-data.org),
>> which is more focused to on heterogeneous dataframes. It does not currently
>> have something that would help you much, but we are heavily brain-storming a
>> variety of APIs to do flexible transformations of dataframes, including
>> easily doing what you want. The challenge is to address the variety of
>> cases.
>>
>> Those are the storms we want. I'd love to know if/how/which ML tools are
>> helping with that work, if appropriate here.
>>
>> Regards,
>> Bill
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn