Am 04.08.2014 um 22:14 schrieb Philipp Singer <[email protected]>: > > Am 04.08.2014 um 20:54 schrieb Lars Buitinck <[email protected]>: > >> 2014-08-04 17:39 GMT+02:00 Philipp Singer <[email protected]>: >>> Apart from that, does anyone know a solution of how I can efficiently >>> calculate the resulting matrix Y = X * X.T? I am currently thinking about >>> using PyTables with some sort of chunked calculation algorithm. >>> Unfortunately, this is not the most efficient way of doing it in terms of >>> speed but solves the memory bottleneck. I need the raw similarity scores >>> between all documents in the end. >> >> Just decompose it: >> >> for i in range(0, X.shape[0], K): >> Y_K = X * X[i:i+K].T >> store_on_a_big_disk(Y_K) >> > > This may work. Interesting that scipy can handle this „dimension mismatch“. > Do you know how to do this with numpy arrays? > > Would you suggest to store the result in a PyTable or memmap or maybe > something else?
Please, forget my comment about dimension mismatch. > >> (You can also use batches of rows instead of batches of columns, just >> make sure you have a 1TB disk available.) >> >> ------------------------------------------------------------------------------ >> Infragistics Professional >> Build stunning WinForms apps today! >> Reboot your WinForms applications with our WinForms controls. >> Build a bridge from your legacy apps to the future. >> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ Infragistics Professional Build stunning WinForms apps today! Reboot your WinForms applications with our WinForms controls. Build a bridge from your legacy apps to the future. http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
