Thanks Bo, Louis

Louis -  I will study your solution. J reports 'out of memory' on a million
rows. It is 6x faster than the larger data set too. Good solution!

On Thu, Apr 13, 2017 at 3:31 AM, Louis de Forcrand <[email protected]> wrote:

> Try this. On my phone, it runs almost 6 times faster than your version on
> your small example data, but uses more space (maybe sparse matrices would
> be faster?), with:
>    1e3 timespacex 'applycredits'''''
> 0.000630182 4608
> vs
>    1e3 timespacex 'ac t'
> 0.000131279 6400
>
> Of course this extra space could be a big slowdown on larger data. At
> least it's more J-ish:
>
> ac=: +/"1@s@p
> s=: [: u&.(,&0) d&.(0&,)
> u=: [: +/\.^:_1 (0<.+)/\.   NB. u -: d&.(-@|.)
> d=: [: +/\ ^:_1 (0>.+)/\.&.|.
> p=: |:@=@:| * [: * {:"1
>
> t=: (}."1 /: {."1) ".;._2 data
> ac t
>
> It works on a minimal form of your data where records are sorted by their
> index, and results are in the form:
>  0: ok
>  1: cancelled
> _1: credit
>
> Louis
>
> > On 12 Apr 2017, at 21:57, 'Bo Jacoby' via Programming <
> [email protected]> wrote:
> >
> > Hi Joe!
> > My favorite datastructure is ORDINAL FRACTIONS - the algebra of data
> >
> > |
> > |
> > |
> > |   |    |
> >
> >   |
> >
> >  |
> > |
> > |    |
> > ORDINAL FRACTIONS - the algebra of data
> > This paper was submitted to the 10th World Computer Congress, IFIP 1986
> conference, but rejected by the referee....  |   |
> >
> >  |
> >
> >  |
> >
> >
> > Your data are coded like this
> > 10 Joe
> > 20 Bob
> > 30 Jane
> > 01 blue
> > 02 red
> > 03 purple
> > 11 1
> > 11 -1
> > 11 1
> > 22 1
> > 22 1
> > 22 3
> > 22 -1
> > 22 -1
> > 33 5
> > 33 -2
> > 33 2
> > (Written with double CRs because the mail program has a history of
> deleting my CRs).
> > Summation gives the result
> > 10 Joe
> > 20 Bob
> > 30 Jane
> > 01 blue
> > 02 red
> > 03 purple
> > 11 1
> > 22 3
> >
> > 33 5
> > I have not done the summation in J, but I'd like to do it.
> > Perhaps this helps you.
> > Bo.
> >
> >
> >
> >    Den 0:04 torsdag den 13. april 2017 skrev chris burke <
> [email protected]>:
> >
> >
> > Incidentally, for production code, I suggest starting by removing any
> sales
> > not matched in returns and vice versa, so that the matching algorithm is
> > applied only to potential matches.
> >
> >> On Wed, Apr 12, 2017 at 2:53 PM, chris burke <[email protected]>
> wrote:
> >>
> >> Great.
> >>
> >> In case you need more complicated handling of the "gray area"
> >> transactions, I believe they would be relatively few in number, so most
> of
> >> the time you could do the matching efficiently, then check for any keys
> >> with returns preceding sales. For those, setting aside the first such
> >> return and repeating should clear them quickly.
> >>
> >> Timing should be well under 1 second for a million records.
> >>
> >>> On Wed, Apr 12, 2017 at 1:57 PM, Joe Bogner <[email protected]>
> wrote:
> >>>
> >>> Just for completeness, I added a line that incorporates the sequence
> check
> >>> into the cancel logic. Works great
> >>>
> >>> NB. hui progressive index
> >>> NB. http://code.jsoftware.com/wiki/Essays/Progressive_Index-Of
> >>> oc=: i.~ (] - {) /:@/:
> >>> pi=: #@[ ({. i.&(,.oc) }.) [ i. ,
> >>>
> >>> NB. argument is 3-col table of seq,key,qty
> >>> NB. result is the unmatched transactions
> >>> matchtrans=: 3 : 0
> >>> msk=. 0<{:"1 y
> >>> sales=. msk#y
> >>> returns=. (-.msk)#y
> >>> ndx=. (}."1 sales) pi | }."1 returns
> >>> cancels=. ndx<#sales
> >>> NB. ensure cancel is after sale
> >>> cancels =. cancels *. (({."1 (<<(cancels)#ndx){sales) < ({."1
> >>> (cancels#returns)))
> >>> ((<<<cancels#ndx){sales),(-.cancels)#returns
> >>> )
> >>>
> >>>
> >>>> On Wed, Apr 12, 2017 at 4:14 PM, Joe Bogner <[email protected]>
> wrote:
> >>>>
> >>>> Chris, this looks promising. Thanks for sharing. It's nearly instant
> on
> >>> a
> >>>> million rows.
> >>>>
> >>>> Which row had a return before a transaction? seq 10 was an example of
> a
> >>>> partial return. The hypothetical customer returned 2 out of the 5
> >>> purchased
> >>>> prior. I added that example since technically per the original spec it
> >>>> wouldn't be cancelled out in this pass.  It's a gray area so I may be
> >>> able
> >>>> to use this approach, especially since I don't see how to incorporate
> >>> the
> >>>> time element into the progressive index.
> >>>>
> >>>> Thanks again
> >>>>
> >>>>
> >>>> On Wed, Apr 12, 2017 at 3:52 PM, chris burke <[email protected]>
> >>> wrote:
> >>>>
> >>>>> This might be done by comparing matrices of sales and returns. The
> >>>>> function
> >>>>> below seems to be close to what you want. It doesn't exactly match
> your
> >>>>> example, but your example has cases where returns are made before the
> >>>>> transactions. Was this intentional?
> >>>>>
> >>>>> The code should run faster than a looping solution.
> >>>>>
> >>>>> Code:
> >>>>>
> >>>>> NB. hui progressive index
> >>>>> NB. http://code.jsoftware.com/wiki/Essays/Progressive_Index-Of
> >>>>> oc=: i.~ (] - {) /:@/:
> >>>>> pi=: #@[ ({. i.&(,.oc) }.) [ i. ,
> >>>>>
> >>>>> NB. argument is 3-col table of seq,key,qty
> >>>>> NB. result is the unmatched transactions
> >>>>> matchtrans=: 3 : 0
> >>>>> msk=. 0<{:"1 y
> >>>>> sales=. msk#y
> >>>>> returns=. (-.msk)#y
> >>>>> ndx=. (}."1 sales) pi | }."1 returns
> >>>>> cancels=. ndx<#sales
> >>>>> ((<<<cancels#ndx){sales),(-.cancels)#returns
> >>>>> )
> >>>>>
> >>>>> Example:
> >>>>>
> >>>>>     dat=: ".;._2 (0 : 0)
> >>>>> 1 1 1
> >>>>> 2 1 _1
> >>>>> 3 1 1
> >>>>> 4 2 1
> >>>>> 5 2 1
> >>>>> 6 2 3
> >>>>> 7 2 _1
> >>>>> 8 2 _1
> >>>>> 9 3 5
> >>>>> 10 3 _2
> >>>>> 11 3 2
> >>>>> )
> >>>>>
> >>>>>     matchtrans dat
> >>>>> 3 1 1
> >>>>> 6 2 3
> >>>>> 9 3 5
> >>>>>
> >>>>>
> >>>>> On Wed, Apr 12, 2017 at 9:35 AM, Joe Bogner <[email protected]>
> >>> wrote:
> >>>>>
> >>>>>> I have a problem I'm trying to solve in different languages. I have
> a
> >>>>>> solution in SQL and also in kdb which largely resembles the SQL
> >>>>> solution.
> >>>>>> I'm curious what a J solution would look like. More specifically,
> I'm
> >>>>>> interested in picking the brains of others here to see if this type
> >>> of
> >>>>>> problem can be solved without looping (some form of scan?).
> >>>>>>
> >>>>>> EDIT: Initially I wrote this up thinking the J solution would
> >>> difficult,
> >>>>>> but it was actually fairly straightforward -- about 15 minutes, but
> >>>>> still
> >>>>>> would like to see if there are alternatives. If nothing else, maybe
> >>> an
> >>>>>> interesting problem to share.
> >>>>>>
> >>>>>> Example data:
> >>>>>>
> >>>>>> A store has a transaction log with a sequence for each transaction.
> >>> The
> >>>>>> transaction log records a key for a unique customer/item
> combination.
> >>>>> The
> >>>>>> transaction log records how many units were purchased or returned.
> >>>>>>
> >>>>>> Goal:
> >>>>>> Attempt to match up related transactions and cancel out instances
> >>> when
> >>>>> the
> >>>>>> customer/item combination is returned at the same quantity as a
> >>> previous
> >>>>>> transaction
> >>>>>>
> >>>>>> Examples:
> >>>>>>
> >>>>>> Joe buys 1 blue pen, which is defective, then returns the 1
> defective
> >>>>> blue
> >>>>>> pen, then buys another blue pen. EXPECTED: cancel out first two
> >>>>>> transactions and leave the the last one for 1 pen
> >>>>>>
> >>>>>> Bob buys 2 red pens in two separate transactions. He then buys 3
> >>> more.
> >>>>> He
> >>>>>> returns the first two purchases as two separate return transactions.
> >>>>>> EXPECTED: cancel out all transactions except the one for qty 3
> >>>>>>
> >>>>>> Jane buys 5 purple pens and subsequently returns two of them. She
> >>> buys
> >>>>> two
> >>>>>> more. EXPECTED: No transactions match exactly, so nothing is
> >>> cancelled
> >>>>> out
> >>>>>>
> >>>>>>
> >>>>>> Data:
> >>>>>>
> >>>>>> data=: 0 : 0
> >>>>>> seq key qty
> >>>>>> 1 1 1
> >>>>>> 2 1 _1
> >>>>>> 3 1 1
> >>>>>> 4 2 1
> >>>>>> 5 2 1
> >>>>>> 6 2 3
> >>>>>> 7 2 _1
> >>>>>> 8 2 _1
> >>>>>> 9 3 5
> >>>>>> 10 3 _2
> >>>>>> 11 3 2
> >>>>>> )
> >>>>>> tbl =: ,. ' ' cut every cutLF data
> >>>>>> 'seqs keys qtys' =: |: ". every }. tbl
> >>>>>>
> >>>>>>
> >>>>>> Goal:
> >>>>>>
> >>>>>> goals =: 0 : 0
> >>>>>>
> >>>>>> goal
> >>>>>>
> >>>>>> cancelled
> >>>>>>
> >>>>>> credit
> >>>>>>
> >>>>>> ok
> >>>>>>
> >>>>>> cancelled
> >>>>>>
> >>>>>> cancelled
> >>>>>>
> >>>>>> ok
> >>>>>>
> >>>>>> credit
> >>>>>>
> >>>>>> credit
> >>>>>>
> >>>>>> ok
> >>>>>>
> >>>>>> ok
> >>>>>>
> >>>>>> ok
> >>>>>>
> >>>>>> )
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> tbl,.(cutLF goals)
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |seq|key|qty|goal |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |1 |1 |1 |cancelled|
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |2 |1 |_1 |credit |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |3 |1 |1 |ok |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |4 |2 |1 |cancelled|
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |5 |2 |1 |cancelled|
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |6 |2 |3 |ok |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |7 |2 |_1 |credit |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |8 |2 |_1 |credit |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |9 |3 |5 |ok |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |10 |3 |_2 |ok |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |11 |3 |2 |ok |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> One approach:
> >>>>>>
> >>>>>> applycredits =: 3 : 0
> >>>>>>
> >>>>>> goals=.(<'goal')
> >>>>>>
> >>>>>> creditids=.0
> >>>>>>
> >>>>>> for_i. (i. # seqs) do.
> >>>>>>
> >>>>>>   key=.i{keys
> >>>>>>
> >>>>>>   seq=.i{seqs
> >>>>>>
> >>>>>>   qty=.i{qtys
> >>>>>>
> >>>>>>   nextcredit =.| {. qtys #~ ((key=keys)*(seqs>seq)*(qtys<0))
> >>>>>>
> >>>>>>   if. nextcredit = qty do.
> >>>>>>
> >>>>>>   goals=.goals,<'cancelled'
> >>>>>>
> >>>>>>   creditids =. creditids, seqs #~ ((key=keys)*(seqs>seq)*(qtys<0))
> >>>>>>
> >>>>>>   elseif. creditids e.~ seq do.
> >>>>>>
> >>>>>>     goals=.goals,<'credit'
> >>>>>>
> >>>>>>   elseif. do.
> >>>>>>
> >>>>>>     goals=.goals,<'ok'
> >>>>>>
> >>>>>> end.
> >>>>>>
> >>>>>> end.
> >>>>>>
> >>>>>> goals
> >>>>>>
> >>>>>> )
> >>>>>>
> >>>>>> tbl ,. ( applycredits 0 )
> >>>>>>
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |seq|key|qty|goal |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |1 |1 |1 |cancelled|
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |2 |1 |_1 |credit |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |3 |1 |1 |ok |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |4 |2 |1 |cancelled|
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |5 |2 |1 |cancelled|
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |6 |2 |3 |ok |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |7 |2 |_1 |credit |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |8 |2 |_1 |credit |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |9 |3 |5 |ok |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |10 |3 |_2 |ok |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>> |11 |3 |2 |ok |
> >>>>>>
> >>>>>> +---+---+---+---------+
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> (cutLF goals) -: ( applycredits 0 )
> >>>>>>
> >>>>>> 1
> >>>>>>
> >>>>>>
> >>>>>> thanks for any input
> >>>>>> ------------------------------------------------------------
> >>> ----------
> >>>>>> For information about J forums see http://www.jsoftware.com/forum
> >>> s.htm
> >>>>> ------------------------------------------------------------
> ----------
> >>>>> For information about J forums see http://www.jsoftware.com/
> forums.htm
> >>>>>
> >>>>
> >>>>
> >>> ----------------------------------------------------------------------
> >>> For information about J forums see http://www.jsoftware.com/forums.htm
> >>>
> >>
> >>
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> >
> >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to