Thanks Bo, Louis Louis - I will study your solution. J reports 'out of memory' on a million rows. It is 6x faster than the larger data set too. Good solution!
On Thu, Apr 13, 2017 at 3:31 AM, Louis de Forcrand <[email protected]> wrote: > Try this. On my phone, it runs almost 6 times faster than your version on > your small example data, but uses more space (maybe sparse matrices would > be faster?), with: > 1e3 timespacex 'applycredits''''' > 0.000630182 4608 > vs > 1e3 timespacex 'ac t' > 0.000131279 6400 > > Of course this extra space could be a big slowdown on larger data. At > least it's more J-ish: > > ac=: +/"1@s@p > s=: [: u&.(,&0) d&.(0&,) > u=: [: +/\.^:_1 (0<.+)/\. NB. u -: d&.(-@|.) > d=: [: +/\ ^:_1 (0>.+)/\.&.|. > p=: |:@=@:| * [: * {:"1 > > t=: (}."1 /: {."1) ".;._2 data > ac t > > It works on a minimal form of your data where records are sorted by their > index, and results are in the form: > 0: ok > 1: cancelled > _1: credit > > Louis > > > On 12 Apr 2017, at 21:57, 'Bo Jacoby' via Programming < > [email protected]> wrote: > > > > Hi Joe! > > My favorite datastructure is ORDINAL FRACTIONS - the algebra of data > > > > | > > | > > | > > | | | > > > > | > > > > | > > | > > | | > > ORDINAL FRACTIONS - the algebra of data > > This paper was submitted to the 10th World Computer Congress, IFIP 1986 > conference, but rejected by the referee.... | | > > > > | > > > > | > > > > > > Your data are coded like this > > 10 Joe > > 20 Bob > > 30 Jane > > 01 blue > > 02 red > > 03 purple > > 11 1 > > 11 -1 > > 11 1 > > 22 1 > > 22 1 > > 22 3 > > 22 -1 > > 22 -1 > > 33 5 > > 33 -2 > > 33 2 > > (Written with double CRs because the mail program has a history of > deleting my CRs). > > Summation gives the result > > 10 Joe > > 20 Bob > > 30 Jane > > 01 blue > > 02 red > > 03 purple > > 11 1 > > 22 3 > > > > 33 5 > > I have not done the summation in J, but I'd like to do it. > > Perhaps this helps you. > > Bo. > > > > > > > > Den 0:04 torsdag den 13. april 2017 skrev chris burke < > [email protected]>: > > > > > > Incidentally, for production code, I suggest starting by removing any > sales > > not matched in returns and vice versa, so that the matching algorithm is > > applied only to potential matches. > > > >> On Wed, Apr 12, 2017 at 2:53 PM, chris burke <[email protected]> > wrote: > >> > >> Great. > >> > >> In case you need more complicated handling of the "gray area" > >> transactions, I believe they would be relatively few in number, so most > of > >> the time you could do the matching efficiently, then check for any keys > >> with returns preceding sales. For those, setting aside the first such > >> return and repeating should clear them quickly. > >> > >> Timing should be well under 1 second for a million records. > >> > >>> On Wed, Apr 12, 2017 at 1:57 PM, Joe Bogner <[email protected]> > wrote: > >>> > >>> Just for completeness, I added a line that incorporates the sequence > check > >>> into the cancel logic. Works great > >>> > >>> NB. hui progressive index > >>> NB. http://code.jsoftware.com/wiki/Essays/Progressive_Index-Of > >>> oc=: i.~ (] - {) /:@/: > >>> pi=: #@[ ({. i.&(,.oc) }.) [ i. , > >>> > >>> NB. argument is 3-col table of seq,key,qty > >>> NB. result is the unmatched transactions > >>> matchtrans=: 3 : 0 > >>> msk=. 0<{:"1 y > >>> sales=. msk#y > >>> returns=. (-.msk)#y > >>> ndx=. (}."1 sales) pi | }."1 returns > >>> cancels=. ndx<#sales > >>> NB. ensure cancel is after sale > >>> cancels =. cancels *. (({."1 (<<(cancels)#ndx){sales) < ({."1 > >>> (cancels#returns))) > >>> ((<<<cancels#ndx){sales),(-.cancels)#returns > >>> ) > >>> > >>> > >>>> On Wed, Apr 12, 2017 at 4:14 PM, Joe Bogner <[email protected]> > wrote: > >>>> > >>>> Chris, this looks promising. Thanks for sharing. It's nearly instant > on > >>> a > >>>> million rows. > >>>> > >>>> Which row had a return before a transaction? seq 10 was an example of > a > >>>> partial return. The hypothetical customer returned 2 out of the 5 > >>> purchased > >>>> prior. I added that example since technically per the original spec it > >>>> wouldn't be cancelled out in this pass. It's a gray area so I may be > >>> able > >>>> to use this approach, especially since I don't see how to incorporate > >>> the > >>>> time element into the progressive index. > >>>> > >>>> Thanks again > >>>> > >>>> > >>>> On Wed, Apr 12, 2017 at 3:52 PM, chris burke <[email protected]> > >>> wrote: > >>>> > >>>>> This might be done by comparing matrices of sales and returns. The > >>>>> function > >>>>> below seems to be close to what you want. It doesn't exactly match > your > >>>>> example, but your example has cases where returns are made before the > >>>>> transactions. Was this intentional? > >>>>> > >>>>> The code should run faster than a looping solution. > >>>>> > >>>>> Code: > >>>>> > >>>>> NB. hui progressive index > >>>>> NB. http://code.jsoftware.com/wiki/Essays/Progressive_Index-Of > >>>>> oc=: i.~ (] - {) /:@/: > >>>>> pi=: #@[ ({. i.&(,.oc) }.) [ i. , > >>>>> > >>>>> NB. argument is 3-col table of seq,key,qty > >>>>> NB. result is the unmatched transactions > >>>>> matchtrans=: 3 : 0 > >>>>> msk=. 0<{:"1 y > >>>>> sales=. msk#y > >>>>> returns=. (-.msk)#y > >>>>> ndx=. (}."1 sales) pi | }."1 returns > >>>>> cancels=. ndx<#sales > >>>>> ((<<<cancels#ndx){sales),(-.cancels)#returns > >>>>> ) > >>>>> > >>>>> Example: > >>>>> > >>>>> dat=: ".;._2 (0 : 0) > >>>>> 1 1 1 > >>>>> 2 1 _1 > >>>>> 3 1 1 > >>>>> 4 2 1 > >>>>> 5 2 1 > >>>>> 6 2 3 > >>>>> 7 2 _1 > >>>>> 8 2 _1 > >>>>> 9 3 5 > >>>>> 10 3 _2 > >>>>> 11 3 2 > >>>>> ) > >>>>> > >>>>> matchtrans dat > >>>>> 3 1 1 > >>>>> 6 2 3 > >>>>> 9 3 5 > >>>>> > >>>>> > >>>>> On Wed, Apr 12, 2017 at 9:35 AM, Joe Bogner <[email protected]> > >>> wrote: > >>>>> > >>>>>> I have a problem I'm trying to solve in different languages. I have > a > >>>>>> solution in SQL and also in kdb which largely resembles the SQL > >>>>> solution. > >>>>>> I'm curious what a J solution would look like. More specifically, > I'm > >>>>>> interested in picking the brains of others here to see if this type > >>> of > >>>>>> problem can be solved without looping (some form of scan?). > >>>>>> > >>>>>> EDIT: Initially I wrote this up thinking the J solution would > >>> difficult, > >>>>>> but it was actually fairly straightforward -- about 15 minutes, but > >>>>> still > >>>>>> would like to see if there are alternatives. If nothing else, maybe > >>> an > >>>>>> interesting problem to share. > >>>>>> > >>>>>> Example data: > >>>>>> > >>>>>> A store has a transaction log with a sequence for each transaction. > >>> The > >>>>>> transaction log records a key for a unique customer/item > combination. > >>>>> The > >>>>>> transaction log records how many units were purchased or returned. > >>>>>> > >>>>>> Goal: > >>>>>> Attempt to match up related transactions and cancel out instances > >>> when > >>>>> the > >>>>>> customer/item combination is returned at the same quantity as a > >>> previous > >>>>>> transaction > >>>>>> > >>>>>> Examples: > >>>>>> > >>>>>> Joe buys 1 blue pen, which is defective, then returns the 1 > defective > >>>>> blue > >>>>>> pen, then buys another blue pen. EXPECTED: cancel out first two > >>>>>> transactions and leave the the last one for 1 pen > >>>>>> > >>>>>> Bob buys 2 red pens in two separate transactions. He then buys 3 > >>> more. > >>>>> He > >>>>>> returns the first two purchases as two separate return transactions. > >>>>>> EXPECTED: cancel out all transactions except the one for qty 3 > >>>>>> > >>>>>> Jane buys 5 purple pens and subsequently returns two of them. She > >>> buys > >>>>> two > >>>>>> more. EXPECTED: No transactions match exactly, so nothing is > >>> cancelled > >>>>> out > >>>>>> > >>>>>> > >>>>>> Data: > >>>>>> > >>>>>> data=: 0 : 0 > >>>>>> seq key qty > >>>>>> 1 1 1 > >>>>>> 2 1 _1 > >>>>>> 3 1 1 > >>>>>> 4 2 1 > >>>>>> 5 2 1 > >>>>>> 6 2 3 > >>>>>> 7 2 _1 > >>>>>> 8 2 _1 > >>>>>> 9 3 5 > >>>>>> 10 3 _2 > >>>>>> 11 3 2 > >>>>>> ) > >>>>>> tbl =: ,. ' ' cut every cutLF data > >>>>>> 'seqs keys qtys' =: |: ". every }. tbl > >>>>>> > >>>>>> > >>>>>> Goal: > >>>>>> > >>>>>> goals =: 0 : 0 > >>>>>> > >>>>>> goal > >>>>>> > >>>>>> cancelled > >>>>>> > >>>>>> credit > >>>>>> > >>>>>> ok > >>>>>> > >>>>>> cancelled > >>>>>> > >>>>>> cancelled > >>>>>> > >>>>>> ok > >>>>>> > >>>>>> credit > >>>>>> > >>>>>> credit > >>>>>> > >>>>>> ok > >>>>>> > >>>>>> ok > >>>>>> > >>>>>> ok > >>>>>> > >>>>>> ) > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> tbl,.(cutLF goals) > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |seq|key|qty|goal | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |1 |1 |1 |cancelled| > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |2 |1 |_1 |credit | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |3 |1 |1 |ok | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |4 |2 |1 |cancelled| > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |5 |2 |1 |cancelled| > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |6 |2 |3 |ok | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |7 |2 |_1 |credit | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |8 |2 |_1 |credit | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |9 |3 |5 |ok | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |10 |3 |_2 |ok | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |11 |3 |2 |ok | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> > >>>>>> > >>>>>> One approach: > >>>>>> > >>>>>> applycredits =: 3 : 0 > >>>>>> > >>>>>> goals=.(<'goal') > >>>>>> > >>>>>> creditids=.0 > >>>>>> > >>>>>> for_i. (i. # seqs) do. > >>>>>> > >>>>>> key=.i{keys > >>>>>> > >>>>>> seq=.i{seqs > >>>>>> > >>>>>> qty=.i{qtys > >>>>>> > >>>>>> nextcredit =.| {. qtys #~ ((key=keys)*(seqs>seq)*(qtys<0)) > >>>>>> > >>>>>> if. nextcredit = qty do. > >>>>>> > >>>>>> goals=.goals,<'cancelled' > >>>>>> > >>>>>> creditids =. creditids, seqs #~ ((key=keys)*(seqs>seq)*(qtys<0)) > >>>>>> > >>>>>> elseif. creditids e.~ seq do. > >>>>>> > >>>>>> goals=.goals,<'credit' > >>>>>> > >>>>>> elseif. do. > >>>>>> > >>>>>> goals=.goals,<'ok' > >>>>>> > >>>>>> end. > >>>>>> > >>>>>> end. > >>>>>> > >>>>>> goals > >>>>>> > >>>>>> ) > >>>>>> > >>>>>> tbl ,. ( applycredits 0 ) > >>>>>> > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |seq|key|qty|goal | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |1 |1 |1 |cancelled| > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |2 |1 |_1 |credit | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |3 |1 |1 |ok | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |4 |2 |1 |cancelled| > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |5 |2 |1 |cancelled| > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |6 |2 |3 |ok | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |7 |2 |_1 |credit | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |8 |2 |_1 |credit | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |9 |3 |5 |ok | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |10 |3 |_2 |ok | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> |11 |3 |2 |ok | > >>>>>> > >>>>>> +---+---+---+---------+ > >>>>>> > >>>>>> > >>>>>> > >>>>>> (cutLF goals) -: ( applycredits 0 ) > >>>>>> > >>>>>> 1 > >>>>>> > >>>>>> > >>>>>> thanks for any input > >>>>>> ------------------------------------------------------------ > >>> ---------- > >>>>>> For information about J forums see http://www.jsoftware.com/forum > >>> s.htm > >>>>> ------------------------------------------------------------ > ---------- > >>>>> For information about J forums see http://www.jsoftware.com/ > forums.htm > >>>>> > >>>> > >>>> > >>> ---------------------------------------------------------------------- > >>> For information about J forums see http://www.jsoftware.com/forums.htm > >>> > >> > >> > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > > > > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
