Try this. On my phone, it runs almost 6 times faster than your version on your
small example data, but uses more space (maybe sparse matrices would be
faster?), with:
1e3 timespacex 'applycredits'''''
0.000630182 4608
vs
1e3 timespacex 'ac t'
0.000131279 6400
Of course this extra space could be a big slowdown on larger data. At least
it's more J-ish:
ac=: +/"1@s@p
s=: [: u&.(,&0) d&.(0&,)
u=: [: +/\.^:_1 (0<.+)/\. NB. u -: d&.(-@|.)
d=: [: +/\ ^:_1 (0>.+)/\.&.|.
p=: |:@=@:| * [: * {:"1
t=: (}."1 /: {."1) ".;._2 data
ac t
It works on a minimal form of your data where records are sorted by their
index, and results are in the form:
0: ok
1: cancelled
_1: credit
Louis
> On 12 Apr 2017, at 21:57, 'Bo Jacoby' via Programming
> <[email protected]> wrote:
>
> Hi Joe!
> My favorite datastructure is ORDINAL FRACTIONS - the algebra of data
>
> |
> |
> |
> | | |
>
> |
>
> |
> |
> | |
> ORDINAL FRACTIONS - the algebra of data
> This paper was submitted to the 10th World Computer Congress, IFIP 1986
> conference, but rejected by the referee.... | |
>
> |
>
> |
>
>
> Your data are coded like this
> 10 Joe
> 20 Bob
> 30 Jane
> 01 blue
> 02 red
> 03 purple
> 11 1
> 11 -1
> 11 1
> 22 1
> 22 1
> 22 3
> 22 -1
> 22 -1
> 33 5
> 33 -2
> 33 2
> (Written with double CRs because the mail program has a history of deleting
> my CRs).
> Summation gives the result
> 10 Joe
> 20 Bob
> 30 Jane
> 01 blue
> 02 red
> 03 purple
> 11 1
> 22 3
>
> 33 5
> I have not done the summation in J, but I'd like to do it.
> Perhaps this helps you.
> Bo.
>
>
>
> Den 0:04 torsdag den 13. april 2017 skrev chris burke
> <[email protected]>:
>
>
> Incidentally, for production code, I suggest starting by removing any sales
> not matched in returns and vice versa, so that the matching algorithm is
> applied only to potential matches.
>
>> On Wed, Apr 12, 2017 at 2:53 PM, chris burke <[email protected]> wrote:
>>
>> Great.
>>
>> In case you need more complicated handling of the "gray area"
>> transactions, I believe they would be relatively few in number, so most of
>> the time you could do the matching efficiently, then check for any keys
>> with returns preceding sales. For those, setting aside the first such
>> return and repeating should clear them quickly.
>>
>> Timing should be well under 1 second for a million records.
>>
>>> On Wed, Apr 12, 2017 at 1:57 PM, Joe Bogner <[email protected]> wrote:
>>>
>>> Just for completeness, I added a line that incorporates the sequence check
>>> into the cancel logic. Works great
>>>
>>> NB. hui progressive index
>>> NB. http://code.jsoftware.com/wiki/Essays/Progressive_Index-Of
>>> oc=: i.~ (] - {) /:@/:
>>> pi=: #@[ ({. i.&(,.oc) }.) [ i. ,
>>>
>>> NB. argument is 3-col table of seq,key,qty
>>> NB. result is the unmatched transactions
>>> matchtrans=: 3 : 0
>>> msk=. 0<{:"1 y
>>> sales=. msk#y
>>> returns=. (-.msk)#y
>>> ndx=. (}."1 sales) pi | }."1 returns
>>> cancels=. ndx<#sales
>>> NB. ensure cancel is after sale
>>> cancels =. cancels *. (({."1 (<<(cancels)#ndx){sales) < ({."1
>>> (cancels#returns)))
>>> ((<<<cancels#ndx){sales),(-.cancels)#returns
>>> )
>>>
>>>
>>>> On Wed, Apr 12, 2017 at 4:14 PM, Joe Bogner <[email protected]> wrote:
>>>>
>>>> Chris, this looks promising. Thanks for sharing. It's nearly instant on
>>> a
>>>> million rows.
>>>>
>>>> Which row had a return before a transaction? seq 10 was an example of a
>>>> partial return. The hypothetical customer returned 2 out of the 5
>>> purchased
>>>> prior. I added that example since technically per the original spec it
>>>> wouldn't be cancelled out in this pass. It's a gray area so I may be
>>> able
>>>> to use this approach, especially since I don't see how to incorporate
>>> the
>>>> time element into the progressive index.
>>>>
>>>> Thanks again
>>>>
>>>>
>>>> On Wed, Apr 12, 2017 at 3:52 PM, chris burke <[email protected]>
>>> wrote:
>>>>
>>>>> This might be done by comparing matrices of sales and returns. The
>>>>> function
>>>>> below seems to be close to what you want. It doesn't exactly match your
>>>>> example, but your example has cases where returns are made before the
>>>>> transactions. Was this intentional?
>>>>>
>>>>> The code should run faster than a looping solution.
>>>>>
>>>>> Code:
>>>>>
>>>>> NB. hui progressive index
>>>>> NB. http://code.jsoftware.com/wiki/Essays/Progressive_Index-Of
>>>>> oc=: i.~ (] - {) /:@/:
>>>>> pi=: #@[ ({. i.&(,.oc) }.) [ i. ,
>>>>>
>>>>> NB. argument is 3-col table of seq,key,qty
>>>>> NB. result is the unmatched transactions
>>>>> matchtrans=: 3 : 0
>>>>> msk=. 0<{:"1 y
>>>>> sales=. msk#y
>>>>> returns=. (-.msk)#y
>>>>> ndx=. (}."1 sales) pi | }."1 returns
>>>>> cancels=. ndx<#sales
>>>>> ((<<<cancels#ndx){sales),(-.cancels)#returns
>>>>> )
>>>>>
>>>>> Example:
>>>>>
>>>>> dat=: ".;._2 (0 : 0)
>>>>> 1 1 1
>>>>> 2 1 _1
>>>>> 3 1 1
>>>>> 4 2 1
>>>>> 5 2 1
>>>>> 6 2 3
>>>>> 7 2 _1
>>>>> 8 2 _1
>>>>> 9 3 5
>>>>> 10 3 _2
>>>>> 11 3 2
>>>>> )
>>>>>
>>>>> matchtrans dat
>>>>> 3 1 1
>>>>> 6 2 3
>>>>> 9 3 5
>>>>>
>>>>>
>>>>> On Wed, Apr 12, 2017 at 9:35 AM, Joe Bogner <[email protected]>
>>> wrote:
>>>>>
>>>>>> I have a problem I'm trying to solve in different languages. I have a
>>>>>> solution in SQL and also in kdb which largely resembles the SQL
>>>>> solution.
>>>>>> I'm curious what a J solution would look like. More specifically, I'm
>>>>>> interested in picking the brains of others here to see if this type
>>> of
>>>>>> problem can be solved without looping (some form of scan?).
>>>>>>
>>>>>> EDIT: Initially I wrote this up thinking the J solution would
>>> difficult,
>>>>>> but it was actually fairly straightforward -- about 15 minutes, but
>>>>> still
>>>>>> would like to see if there are alternatives. If nothing else, maybe
>>> an
>>>>>> interesting problem to share.
>>>>>>
>>>>>> Example data:
>>>>>>
>>>>>> A store has a transaction log with a sequence for each transaction.
>>> The
>>>>>> transaction log records a key for a unique customer/item combination.
>>>>> The
>>>>>> transaction log records how many units were purchased or returned.
>>>>>>
>>>>>> Goal:
>>>>>> Attempt to match up related transactions and cancel out instances
>>> when
>>>>> the
>>>>>> customer/item combination is returned at the same quantity as a
>>> previous
>>>>>> transaction
>>>>>>
>>>>>> Examples:
>>>>>>
>>>>>> Joe buys 1 blue pen, which is defective, then returns the 1 defective
>>>>> blue
>>>>>> pen, then buys another blue pen. EXPECTED: cancel out first two
>>>>>> transactions and leave the the last one for 1 pen
>>>>>>
>>>>>> Bob buys 2 red pens in two separate transactions. He then buys 3
>>> more.
>>>>> He
>>>>>> returns the first two purchases as two separate return transactions.
>>>>>> EXPECTED: cancel out all transactions except the one for qty 3
>>>>>>
>>>>>> Jane buys 5 purple pens and subsequently returns two of them. She
>>> buys
>>>>> two
>>>>>> more. EXPECTED: No transactions match exactly, so nothing is
>>> cancelled
>>>>> out
>>>>>>
>>>>>>
>>>>>> Data:
>>>>>>
>>>>>> data=: 0 : 0
>>>>>> seq key qty
>>>>>> 1 1 1
>>>>>> 2 1 _1
>>>>>> 3 1 1
>>>>>> 4 2 1
>>>>>> 5 2 1
>>>>>> 6 2 3
>>>>>> 7 2 _1
>>>>>> 8 2 _1
>>>>>> 9 3 5
>>>>>> 10 3 _2
>>>>>> 11 3 2
>>>>>> )
>>>>>> tbl =: ,. ' ' cut every cutLF data
>>>>>> 'seqs keys qtys' =: |: ". every }. tbl
>>>>>>
>>>>>>
>>>>>> Goal:
>>>>>>
>>>>>> goals =: 0 : 0
>>>>>>
>>>>>> goal
>>>>>>
>>>>>> cancelled
>>>>>>
>>>>>> credit
>>>>>>
>>>>>> ok
>>>>>>
>>>>>> cancelled
>>>>>>
>>>>>> cancelled
>>>>>>
>>>>>> ok
>>>>>>
>>>>>> credit
>>>>>>
>>>>>> credit
>>>>>>
>>>>>> ok
>>>>>>
>>>>>> ok
>>>>>>
>>>>>> ok
>>>>>>
>>>>>> )
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> tbl,.(cutLF goals)
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |seq|key|qty|goal |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |1 |1 |1 |cancelled|
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |2 |1 |_1 |credit |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |3 |1 |1 |ok |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |4 |2 |1 |cancelled|
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |5 |2 |1 |cancelled|
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |6 |2 |3 |ok |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |7 |2 |_1 |credit |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |8 |2 |_1 |credit |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |9 |3 |5 |ok |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |10 |3 |_2 |ok |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |11 |3 |2 |ok |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>>
>>>>>>
>>>>>> One approach:
>>>>>>
>>>>>> applycredits =: 3 : 0
>>>>>>
>>>>>> goals=.(<'goal')
>>>>>>
>>>>>> creditids=.0
>>>>>>
>>>>>> for_i. (i. # seqs) do.
>>>>>>
>>>>>> key=.i{keys
>>>>>>
>>>>>> seq=.i{seqs
>>>>>>
>>>>>> qty=.i{qtys
>>>>>>
>>>>>> nextcredit =.| {. qtys #~ ((key=keys)*(seqs>seq)*(qtys<0))
>>>>>>
>>>>>> if. nextcredit = qty do.
>>>>>>
>>>>>> goals=.goals,<'cancelled'
>>>>>>
>>>>>> creditids =. creditids, seqs #~ ((key=keys)*(seqs>seq)*(qtys<0))
>>>>>>
>>>>>> elseif. creditids e.~ seq do.
>>>>>>
>>>>>> goals=.goals,<'credit'
>>>>>>
>>>>>> elseif. do.
>>>>>>
>>>>>> goals=.goals,<'ok'
>>>>>>
>>>>>> end.
>>>>>>
>>>>>> end.
>>>>>>
>>>>>> goals
>>>>>>
>>>>>> )
>>>>>>
>>>>>> tbl ,. ( applycredits 0 )
>>>>>>
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |seq|key|qty|goal |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |1 |1 |1 |cancelled|
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |2 |1 |_1 |credit |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |3 |1 |1 |ok |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |4 |2 |1 |cancelled|
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |5 |2 |1 |cancelled|
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |6 |2 |3 |ok |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |7 |2 |_1 |credit |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |8 |2 |_1 |credit |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |9 |3 |5 |ok |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |10 |3 |_2 |ok |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>> |11 |3 |2 |ok |
>>>>>>
>>>>>> +---+---+---+---------+
>>>>>>
>>>>>>
>>>>>>
>>>>>> (cutLF goals) -: ( applycredits 0 )
>>>>>>
>>>>>> 1
>>>>>>
>>>>>>
>>>>>> thanks for any input
>>>>>> ------------------------------------------------------------
>>> ----------
>>>>>> For information about J forums see http://www.jsoftware.com/forum
>>> s.htm
>>>>> ----------------------------------------------------------------------
>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>>
>>>>
>>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>
>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm