I like Orhan's suggestion, it is less work. Slight correction to my comment above:
"For each of the n chunks, if there is no non-zero value in the 100th column, you will get an error that looks like this..." I meant For each of the n chunks, if there is no value of any kind (0 or otherwise) in the 100th column, you will get an error that looks like this..." Frank On Thu, Jan 4, 2018 at 5:26 PM, Orhan Kislal <okis...@pivotal.io> wrote: > Hello Anthony, > > I agree with Frank's suggestion, operating on chunks of the matrix should > work. An alternate workaround for the 100th column issue you might > encounter could be this: > > Check if there exists a value for the the first (or last or any other) > row, last column. If there is one, then you can use the chunk as is. If > not, put 0 as the value of that particular row/column. This will ensure the > matrix size is calculated correctly, will not affect the output and will > not require any additional operation for the assembly of the final vector. > > Please let us know if you have any questions. > > Thanks, > > Orhan Kislal > > On Thu, Jan 4, 2018 at 12:12 PM, Frank McQuillan <fmcquil...@pivotal.io> > wrote: > >> Anthony, >> >> In that case, I think you are hitting the 1GB PostgreSQL limit. >> >> Operations on sparse matrix format requires loading into memory 2 >> INTEGERS for row/col plus the value (INTEGER, DOUBLE PRECISION, whatever >> size it is). >> >> It means for your matrix the 2 INTEGERS alone are ~1.00E+09 bytes which >> is already on the limit without even considering the value yet. >> >> So I would suggest you do the computation in blocks. One approach to >> this: >> >> * chunk your long matrix into n smaller VIEWS, say n=10 (i.e., MADlib >> matrix operations do work on VIEWS) >> * call matrix*vector for each chunk >> * reassemble the n result vectors into the final vector >> >> You could do this in a PL/pgSQL or PL/Python function. >> >> There is one subtlety to be aware of though because you are working with >> sparse matrices. For each of the n chunks, if there is no non-zero value in >> the 100th column, you will get an error that looks like this: >> >> madlib=# SELECT madlib.matrix_vec_mult('mat_a_view', >> NULL, >> array[1,2,3,4,5,6,7,8,9,10] >> ); >> ERROR: plpy.Error: Matrix error: Dimension mismatch between matrix (1 x >> 9) and vector (10 x 1) >> CONTEXT: Traceback (most recent call last): >> PL/Python function "matrix_vec_mult", line 24, in <module> >> matrix_in, in_args, vector) >> PL/Python function "matrix_vec_mult", line 2031, in matrix_vec_mult >> PL/Python function "matrix_vec_mult", line 77, in _assert >> PL/Python function "matrix_vec_mult" >> >> See the explanation at the top of >> http://madlib.apache.org/docs/latest/group__grp__matrix.html >> regarding dimensionality of sparse matrices. >> >> One way around this is to add a (fake) row to the bottom of your VIEW >> with a 0 in the 100th column. But if you do this, be sure to drop the last >> (fake) entry of each of the n intermediate vectors before you assemble into >> the final vector. >> >> Frank >> >> >> >> >> >> On Wed, Jan 3, 2018 at 8:15 PM, Anthony Thomas <ahtho...@eng.ucsd.edu> >> wrote: >> >>> Thanks Frank - the answer to both your questions is "yes" >>> >>> Best, >>> >>> Anthony >>> >>> On Wed, Jan 3, 2018 at 3:13 PM, Frank McQuillan <fmcquil...@pivotal.io> >>> wrote: >>> >>>> >>>> Anthony, >>>> >>>> Correct the install check error you are seeing is not related. >>>> >>>> Cpl questions: >>>> >>>> (1) >>>> Are you using: >>>> >>>> -- Multiply matrix with vector >>>> matrix_vec_mult( matrix_in, in_args, vector) >>>> >>>> (2) >>>> Is matrix_in encoded in sparse format like at the top of >>>> http://madlib.apache.org/docs/latest/group__grp__matrix.html >>>> >>>> e.g., like this? >>>> >>>> row_id | col_id | value >>>> --------+--------+------- >>>> 1 | 1 | 9 >>>> 1 | 5 | 6 >>>> 1 | 6 | 6 >>>> 2 | 1 | 8 >>>> 3 | 1 | 3 >>>> 3 | 2 | 9 >>>> 4 | 7 | 0 >>>> >>>> >>>> Frank >>>> >>>> >>>> On Wed, Jan 3, 2018 at 2:52 PM, Anthony Thomas <ahtho...@eng.ucsd.edu> >>>> wrote: >>>> >>>>> Okay - thanks Ivan, and good to know about support for Ubuntu from >>>>> Greenplum! >>>>> >>>>> Best, >>>>> >>>>> Anthony >>>>> >>>>> On Wed, Jan 3, 2018 at 2:38 PM, Ivan Novick <inov...@pivotal.io> >>>>> wrote: >>>>> >>>>>> Hi Anthony, this does NOT look like a Ubuntu problem, and in fact >>>>>> there is OSS Greenplum officially on Ubuntu you can see here: >>>>>> http://greenplum.org/install-greenplum-oss-on-ubuntu/ >>>>>> >>>>>> Greenplum and PostgreSQL do limit to 1 Gig for each field (row/col >>>>>> combination) but there are techniques to manage data sets working within >>>>>> these constraints. I will let someone else who has more experience then >>>>>> me >>>>>> working with matrices answer how is the best way to do so in a case like >>>>>> you have provided. >>>>>> >>>>>> Cheers, >>>>>> Ivan >>>>>> >>>>>> On Wed, Jan 3, 2018 at 2:22 PM, Anthony Thomas <ahtho...@eng.ucsd.edu >>>>>> > wrote: >>>>>> >>>>>>> Hi Madlib folks, >>>>>>> >>>>>>> I have a large tall and skinny sparse matrix which I'm trying to >>>>>>> multiply by a dense vector. The matrix is 1.25e8 by 100 with >>>>>>> approximately >>>>>>> 1% nonzero values. This operations always triggers an error from >>>>>>> Greenplum: >>>>>>> >>>>>>> plpy.SPIError: invalid memory alloc request size 1073741824 (context >>>>>>> 'accumArrayResult') (mcxt.c:1254) (plpython.c:4957) >>>>>>> CONTEXT: Traceback (most recent call last): >>>>>>> PL/Python function "matrix_vec_mult", line 24, in <module> >>>>>>> matrix_in, in_args, vector) >>>>>>> PL/Python function "matrix_vec_mult", line 2044, in matrix_vec_mult >>>>>>> PL/Python function "matrix_vec_mult", line 2001, in >>>>>>> _matrix_vec_mult_dense >>>>>>> PL/Python function "matrix_vec_mult" >>>>>>> >>>>>>> Some Googling suggests this error is caused by a hard limit from >>>>>>> Postgres which restricts the maximum size of an array to 1GB. If this is >>>>>>> indeed the cause of the error I'm seeing does anyone have any >>>>>>> suggestions >>>>>>> about how to circumvent this issue? This comes up in other cases as well >>>>>>> like transposing a tall and skinny matrix. MVM with smaller matrices >>>>>>> works >>>>>>> fine. >>>>>>> >>>>>>> Here is relevant version information: >>>>>>> >>>>>>> SELECT VERSION(); >>>>>>> PostgreSQL 8.3.23 (Greenplum Database 5.1.0 build dev) on >>>>>>> x86_64-pc-linux-gnu, compiled by GCC gcc >>>>>>> (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609 compiled on Dec 21 >>>>>>> 2017 09:09:46 >>>>>>> >>>>>>> SELECT madlib.version(); >>>>>>> MADlib version: 1.12, git revision: unknown, cmake configuration >>>>>>> time: Thu Dec 21 18:04:47 UTC 201 >>>>>>> 7, build type: RelWithDebInfo, build system: >>>>>>> Linux-4.4.0-103-generic, C compiler: gcc 4.9.3, C++ co >>>>>>> mpiler: g++ 4.9.3 >>>>>>> >>>>>>> Madlib install-check reported one error in the "convex" module >>>>>>> related to "loss too high" which seems unrelated to the issue described >>>>>>> above. I know Ubuntu isn't officially supported by Greenplum so I'd >>>>>>> like to >>>>>>> be confident this issue isn't just the result of using an unsupported >>>>>>> OS. >>>>>>> Please let me know if any other information would be helpful. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Anthony >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ivan Novick, Product Manager Pivotal Greenplum >>>>>> inov...@pivotal.io -- (Mobile) 408-230-6491 <(408)%20230-6491> >>>>>> https://www.youtube.com/GreenplumDatabase >>>>>> >>>>>> >>>>> >>>> >>> >> >