Here is a more space-efficient version. Effectively the same, except it works
on different customers seperately. Note that ac and ac2 work on records where
orders from different clients may appear in any order, while ac1 expects all
orders from a single client to be grouped together:
ac1=: [: ; <@c/./@|:
ac2=: (/:@/:@(i.~)@[ { [: ; <@c/.)/@|:
c=: +/"1@s@p1 NB. s is the same
p1=: * * |:@=@:|
Not quite as fast, but still about twice as fast as applycredits, and they use
much less space than ac.
Cheers,
Louis
> On 13 Apr 2017, at 07:12, Joe Bogner <[email protected]> wrote:
>
> Thanks Bo, Louis
>
> Louis - I will study your solution. J reports 'out of memory' on a million
> rows. It is 6x faster than the larger data set too. Good solution!
>
>> On Thu, Apr 13, 2017 at 3:31 AM, Louis de Forcrand <[email protected]> wrote:
>>
>> Try this. On my phone, it runs almost 6 times faster than your version on
>> your small example data, but uses more space (maybe sparse matrices would
>> be faster?), with:
>> 1e3 timespacex 'applycredits'''''
>> 0.000630182 4608
>> vs
>> 1e3 timespacex 'ac t'
>> 0.000131279 6400
>>
>> Of course this extra space could be a big slowdown on larger data. At
>> least it's more J-ish:
>>
>> ac=: +/"1@s@p
>> s=: [: u&.(,&0) d&.(0&,)
>> u=: [: +/\.^:_1 (0<.+)/\. NB. u -: d&.(-@|.)
>> d=: [: +/\ ^:_1 (0>.+)/\.&.|.
>> p=: |:@=@:| * [: * {:"1
>>
>> t=: (}."1 /: {."1) ".;._2 data
>> ac t
>>
>> It works on a minimal form of your data where records are sorted by their
>> index, and results are in the form:
>> 0: ok
>> 1: cancelled
>> _1: credit
>>
>> Louis
>>
>>> On 12 Apr 2017, at 21:57, 'Bo Jacoby' via Programming <
>> [email protected]> wrote:
>>>
>>> Hi Joe!
>>> My favorite datastructure is ORDINAL FRACTIONS - the algebra of data
>>>
>>> |
>>> |
>>> |
>>> | | |
>>>
>>> |
>>>
>>> |
>>> |
>>> | |
>>> ORDINAL FRACTIONS - the algebra of data
>>> This paper was submitted to the 10th World Computer Congress, IFIP 1986
>> conference, but rejected by the referee.... | |
>>>
>>> |
>>>
>>> |
>>>
>>>
>>> Your data are coded like this
>>> 10 Joe
>>> 20 Bob
>>> 30 Jane
>>> 01 blue
>>> 02 red
>>> 03 purple
>>> 11 1
>>> 11 -1
>>> 11 1
>>> 22 1
>>> 22 1
>>> 22 3
>>> 22 -1
>>> 22 -1
>>> 33 5
>>> 33 -2
>>> 33 2
>>> (Written with double CRs because the mail program has a history of
>> deleting my CRs).
>>> Summation gives the result
>>> 10 Joe
>>> 20 Bob
>>> 30 Jane
>>> 01 blue
>>> 02 red
>>> 03 purple
>>> 11 1
>>> 22 3
>>>
>>> 33 5
>>> I have not done the summation in J, but I'd like to do it.
>>> Perhaps this helps you.
>>> Bo.
>>>
>>>
>>>
>>> Den 0:04 torsdag den 13. april 2017 skrev chris burke <
>> [email protected]>:
>>>
>>>
>>> Incidentally, for production code, I suggest starting by removing any
>> sales
>>> not matched in returns and vice versa, so that the matching algorithm is
>>> applied only to potential matches.
>>>
>>>> On Wed, Apr 12, 2017 at 2:53 PM, chris burke <[email protected]>
>> wrote:
>>>>
>>>> Great.
>>>>
>>>> In case you need more complicated handling of the "gray area"
>>>> transactions, I believe they would be relatively few in number, so most
>> of
>>>> the time you could do the matching efficiently, then check for any keys
>>>> with returns preceding sales. For those, setting aside the first such
>>>> return and repeating should clear them quickly.
>>>>
>>>> Timing should be well under 1 second for a million records.
>>>>
>>>>> On Wed, Apr 12, 2017 at 1:57 PM, Joe Bogner <[email protected]>
>> wrote:
>>>>>
>>>>> Just for completeness, I added a line that incorporates the sequence
>> check
>>>>> into the cancel logic. Works great
>>>>>
>>>>> NB. hui progressive index
>>>>> NB. http://code.jsoftware.com/wiki/Essays/Progressive_Index-Of
>>>>> oc=: i.~ (] - {) /:@/:
>>>>> pi=: #@[ ({. i.&(,.oc) }.) [ i. ,
>>>>>
>>>>> NB. argument is 3-col table of seq,key,qty
>>>>> NB. result is the unmatched transactions
>>>>> matchtrans=: 3 : 0
>>>>> msk=. 0<{:"1 y
>>>>> sales=. msk#y
>>>>> returns=. (-.msk)#y
>>>>> ndx=. (}."1 sales) pi | }."1 returns
>>>>> cancels=. ndx<#sales
>>>>> NB. ensure cancel is after sale
>>>>> cancels =. cancels *. (({."1 (<<(cancels)#ndx){sales) < ({."1
>>>>> (cancels#returns)))
>>>>> ((<<<cancels#ndx){sales),(-.cancels)#returns
>>>>> )
>>>>>
>>>>>
>>>>>> On Wed, Apr 12, 2017 at 4:14 PM, Joe Bogner <[email protected]>
>> wrote:
>>>>>>
>>>>>> Chris, this looks promising. Thanks for sharing. It's nearly instant
>> on
>>>>> a
>>>>>> million rows.
>>>>>>
>>>>>> Which row had a return before a transaction? seq 10 was an example of
>> a
>>>>>> partial return. The hypothetical customer returned 2 out of the 5
>>>>> purchased
>>>>>> prior. I added that example since technically per the original spec it
>>>>>> wouldn't be cancelled out in this pass. It's a gray area so I may be
>>>>> able
>>>>>> to use this approach, especially since I don't see how to incorporate
>>>>> the
>>>>>> time element into the progressive index.
>>>>>>
>>>>>> Thanks again
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 12, 2017 at 3:52 PM, chris burke <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>>> This might be done by comparing matrices of sales and returns. The
>>>>>>> function
>>>>>>> below seems to be close to what you want. It doesn't exactly match
>> your
>>>>>>> example, but your example has cases where returns are made before the
>>>>>>> transactions. Was this intentional?
>>>>>>>
>>>>>>> The code should run faster than a looping solution.
>>>>>>>
>>>>>>> Code:
>>>>>>>
>>>>>>> NB. hui progressive index
>>>>>>> NB. http://code.jsoftware.com/wiki/Essays/Progressive_Index-Of
>>>>>>> oc=: i.~ (] - {) /:@/:
>>>>>>> pi=: #@[ ({. i.&(,.oc) }.) [ i. ,
>>>>>>>
>>>>>>> NB. argument is 3-col table of seq,key,qty
>>>>>>> NB. result is the unmatched transactions
>>>>>>> matchtrans=: 3 : 0
>>>>>>> msk=. 0<{:"1 y
>>>>>>> sales=. msk#y
>>>>>>> returns=. (-.msk)#y
>>>>>>> ndx=. (}."1 sales) pi | }."1 returns
>>>>>>> cancels=. ndx<#sales
>>>>>>> ((<<<cancels#ndx){sales),(-.cancels)#returns
>>>>>>> )
>>>>>>>
>>>>>>> Example:
>>>>>>>
>>>>>>> dat=: ".;._2 (0 : 0)
>>>>>>> 1 1 1
>>>>>>> 2 1 _1
>>>>>>> 3 1 1
>>>>>>> 4 2 1
>>>>>>> 5 2 1
>>>>>>> 6 2 3
>>>>>>> 7 2 _1
>>>>>>> 8 2 _1
>>>>>>> 9 3 5
>>>>>>> 10 3 _2
>>>>>>> 11 3 2
>>>>>>> )
>>>>>>>
>>>>>>> matchtrans dat
>>>>>>> 3 1 1
>>>>>>> 6 2 3
>>>>>>> 9 3 5
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Apr 12, 2017 at 9:35 AM, Joe Bogner <[email protected]>
>>>>> wrote:
>>>>>>>
>>>>>>>> I have a problem I'm trying to solve in different languages. I have
>> a
>>>>>>>> solution in SQL and also in kdb which largely resembles the SQL
>>>>>>> solution.
>>>>>>>> I'm curious what a J solution would look like. More specifically,
>> I'm
>>>>>>>> interested in picking the brains of others here to see if this type
>>>>> of
>>>>>>>> problem can be solved without looping (some form of scan?).
>>>>>>>>
>>>>>>>> EDIT: Initially I wrote this up thinking the J solution would
>>>>> difficult,
>>>>>>>> but it was actually fairly straightforward -- about 15 minutes, but
>>>>>>> still
>>>>>>>> would like to see if there are alternatives. If nothing else, maybe
>>>>> an
>>>>>>>> interesting problem to share.
>>>>>>>>
>>>>>>>> Example data:
>>>>>>>>
>>>>>>>> A store has a transaction log with a sequence for each transaction.
>>>>> The
>>>>>>>> transaction log records a key for a unique customer/item
>> combination.
>>>>>>> The
>>>>>>>> transaction log records how many units were purchased or returned.
>>>>>>>>
>>>>>>>> Goal:
>>>>>>>> Attempt to match up related transactions and cancel out instances
>>>>> when
>>>>>>> the
>>>>>>>> customer/item combination is returned at the same quantity as a
>>>>> previous
>>>>>>>> transaction
>>>>>>>>
>>>>>>>> Examples:
>>>>>>>>
>>>>>>>> Joe buys 1 blue pen, which is defective, then returns the 1
>> defective
>>>>>>> blue
>>>>>>>> pen, then buys another blue pen. EXPECTED: cancel out first two
>>>>>>>> transactions and leave the the last one for 1 pen
>>>>>>>>
>>>>>>>> Bob buys 2 red pens in two separate transactions. He then buys 3
>>>>> more.
>>>>>>> He
>>>>>>>> returns the first two purchases as two separate return transactions.
>>>>>>>> EXPECTED: cancel out all transactions except the one for qty 3
>>>>>>>>
>>>>>>>> Jane buys 5 purple pens and subsequently returns two of them. She
>>>>> buys
>>>>>>> two
>>>>>>>> more. EXPECTED: No transactions match exactly, so nothing is
>>>>> cancelled
>>>>>>> out
>>>>>>>>
>>>>>>>>
>>>>>>>> Data:
>>>>>>>>
>>>>>>>> data=: 0 : 0
>>>>>>>> seq key qty
>>>>>>>> 1 1 1
>>>>>>>> 2 1 _1
>>>>>>>> 3 1 1
>>>>>>>> 4 2 1
>>>>>>>> 5 2 1
>>>>>>>> 6 2 3
>>>>>>>> 7 2 _1
>>>>>>>> 8 2 _1
>>>>>>>> 9 3 5
>>>>>>>> 10 3 _2
>>>>>>>> 11 3 2
>>>>>>>> )
>>>>>>>> tbl =: ,. ' ' cut every cutLF data
>>>>>>>> 'seqs keys qtys' =: |: ". every }. tbl
>>>>>>>>
>>>>>>>>
>>>>>>>> Goal:
>>>>>>>>
>>>>>>>> goals =: 0 : 0
>>>>>>>>
>>>>>>>> goal
>>>>>>>>
>>>>>>>> cancelled
>>>>>>>>
>>>>>>>> credit
>>>>>>>>
>>>>>>>> ok
>>>>>>>>
>>>>>>>> cancelled
>>>>>>>>
>>>>>>>> cancelled
>>>>>>>>
>>>>>>>> ok
>>>>>>>>
>>>>>>>> credit
>>>>>>>>
>>>>>>>> credit
>>>>>>>>
>>>>>>>> ok
>>>>>>>>
>>>>>>>> ok
>>>>>>>>
>>>>>>>> ok
>>>>>>>>
>>>>>>>> )
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> tbl,.(cutLF goals)
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |seq|key|qty|goal |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |1 |1 |1 |cancelled|
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |2 |1 |_1 |credit |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |3 |1 |1 |ok |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |4 |2 |1 |cancelled|
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |5 |2 |1 |cancelled|
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |6 |2 |3 |ok |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |7 |2 |_1 |credit |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |8 |2 |_1 |credit |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |9 |3 |5 |ok |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |10 |3 |_2 |ok |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |11 |3 |2 |ok |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> One approach:
>>>>>>>>
>>>>>>>> applycredits =: 3 : 0
>>>>>>>>
>>>>>>>> goals=.(<'goal')
>>>>>>>>
>>>>>>>> creditids=.0
>>>>>>>>
>>>>>>>> for_i. (i. # seqs) do.
>>>>>>>>
>>>>>>>> key=.i{keys
>>>>>>>>
>>>>>>>> seq=.i{seqs
>>>>>>>>
>>>>>>>> qty=.i{qtys
>>>>>>>>
>>>>>>>> nextcredit =.| {. qtys #~ ((key=keys)*(seqs>seq)*(qtys<0))
>>>>>>>>
>>>>>>>> if. nextcredit = qty do.
>>>>>>>>
>>>>>>>> goals=.goals,<'cancelled'
>>>>>>>>
>>>>>>>> creditids =. creditids, seqs #~ ((key=keys)*(seqs>seq)*(qtys<0))
>>>>>>>>
>>>>>>>> elseif. creditids e.~ seq do.
>>>>>>>>
>>>>>>>> goals=.goals,<'credit'
>>>>>>>>
>>>>>>>> elseif. do.
>>>>>>>>
>>>>>>>> goals=.goals,<'ok'
>>>>>>>>
>>>>>>>> end.
>>>>>>>>
>>>>>>>> end.
>>>>>>>>
>>>>>>>> goals
>>>>>>>>
>>>>>>>> )
>>>>>>>>
>>>>>>>> tbl ,. ( applycredits 0 )
>>>>>>>>
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |seq|key|qty|goal |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |1 |1 |1 |cancelled|
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |2 |1 |_1 |credit |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |3 |1 |1 |ok |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |4 |2 |1 |cancelled|
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |5 |2 |1 |cancelled|
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |6 |2 |3 |ok |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |7 |2 |_1 |credit |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |8 |2 |_1 |credit |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |9 |3 |5 |ok |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |10 |3 |_2 |ok |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>> |11 |3 |2 |ok |
>>>>>>>>
>>>>>>>> +---+---+---+---------+
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> (cutLF goals) -: ( applycredits 0 )
>>>>>>>>
>>>>>>>> 1
>>>>>>>>
>>>>>>>>
>>>>>>>> thanks for any input
>>>>>>>> ------------------------------------------------------------
>>>>> ----------
>>>>>>>> For information about J forums see http://www.jsoftware.com/forum
>>>>> s.htm
>>>>>>> ------------------------------------------------------------
>> ----------
>>>>>>> For information about J forums see http://www.jsoftware.com/
>> forums.htm
>>>>>>>
>>>>>>
>>>>>>
>>>>> ----------------------------------------------------------------------
>>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>>
>>>>
>>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm