Am 04.08.2014 um 22:14 schrieb Philipp Singer <[email protected]>:

> 
> Am 04.08.2014 um 20:54 schrieb Lars Buitinck <[email protected]>:
> 
>> 2014-08-04 17:39 GMT+02:00 Philipp Singer <[email protected]>:
>>> Apart from that, does anyone know a solution of how I can efficiently 
>>> calculate the resulting matrix Y = X * X.T? I am currently thinking about 
>>> using PyTables with some sort of chunked calculation algorithm. 
>>> Unfortunately, this is not the most efficient way of doing it in terms of 
>>> speed but solves the memory bottleneck. I need the raw similarity scores 
>>> between all documents in the end.
>> 
>> Just decompose it:
>> 
>> for i in range(0, X.shape[0], K):
>>   Y_K = X * X[i:i+K].T
>>   store_on_a_big_disk(Y_K)
>> 
> 
> This may work. Interesting that scipy can handle this „dimension mismatch“. 
> Do you know how to do this with numpy arrays?
> 
> Would you suggest to store the result in a PyTable or memmap or maybe 
> something else?

Please, forget my comment about dimension mismatch. 

> 
>> (You can also use batches of rows instead of batches of columns, just
>> make sure you have a 1TB disk available.)
>> 
>> ------------------------------------------------------------------------------
>> Infragistics Professional
>> Build stunning WinForms apps today!
>> Reboot your WinForms applications with our WinForms controls. 
>> Build a bridge from your legacy apps to the future.
>> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> 


------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to