2014-08-04 17:39 GMT+02:00 Philipp Singer <[email protected]>:
> Apart from that, does anyone know a solution of how I can efficiently
> calculate the resulting matrix Y = X * X.T? I am currently thinking about
> using PyTables with some sort of chunked calculation algorithm.
> Unfortunately, this is not the most efficient way of doing it in terms of
> speed but solves the memory bottleneck. I need the raw similarity scores
> between all documents in the end.
Just decompose it:
for i in range(0, X.shape[0], K):
Y_K = X * X[i:i+K].T
store_on_a_big_disk(Y_K)
(You can also use batches of rows instead of batches of columns, just
make sure you have a 1TB disk available.)
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general