Henrik - Thank you for posting the code. I enjoyed tinkering around with
it. The inserts took a long time --- I stopped after about 30m and then
added some timing info. I think it was taking about 20 seconds per day and
that time will grow if I recall correctly. I am guessing it would take 2-3
hours to insert the 31M rows (on SSD and a xen environment) and a fair
amount of disk space. I think I was up to about 2 gig with 50 days.

I may look further into experimenting with different block sizes:
https://www.mail-archive.com/picolisp@software-lab.de/msg03304.html

If you end up speeding up please share. I know it's just a mock example so
may not be worth the time. It's nice to have small reproducible examples.

It's neat to hear that the queries are sub second.

Thanks
Joe




On Sun, Feb 9, 2014 at 6:24 AM, Henrik Sarvell <hsarv...@gmail.com> wrote:

> "Yes, a bit perhaps."
>
> I tested, it is of no consequence (at least for my applications), given
> one transaction per second for a full year, fetching a random +Ref +String
> day takes a fraction of a second on my PC equipped with SSD, here is the
> code:
>
> Note that it's only the collect at the end that takes a fraction of a
> section, the insertions do NOT.
>
> (class +Transaction +Entity)
> (rel amount (+Number))
> (rel createdAt (+Ref +String))
>
> (dbs
>    (4 +Transaction)
>    (4 (+Transaction createdAt)) )
>
> (pool "/opt/picolisp/projects/test/db/db" *Dbs)
>
> (setq Sday (date 2013 01 01))
> (setq Eday (+ Sday 364))
> (setq F (db: +Transaction))
>
> (for (D Sday (>= Eday D) (inc D))
>    (for (S 1 (>= 86400 S) (inc S))
>       (let Stamp (stamp D S)
>          (println Stamp)
>          (new F '(+Transaction) 'amount 100 'createdAt Stamp) ) )
>    (commit)
>    (prune) )
>
> (commit)
> (prune T)
>
> (println (collect 'createdAt '+Transaction "2013-10-05 00:00:00"
> "2013-10-05 23:59:59"))
>
> (bye)
>
>
>
>
> On Sat, Feb 8, 2014 at 5:44 PM, Alexander Burger <a...@software-lab.de>wrote:
>
>> Hi Henrik,
>>
>> On Fri, Feb 07, 2014 at 08:29:07PM +0700, Henrik Sarvell wrote:
>> > Given a very large amount of external objects, representing for instance
>> > transactions, what would be the quickest way of handling the creation
>> stamp
>> > be with regards to future lookups by way of start stamp and end stamp?
>> >
>> > It seems to me that using two relations might be optimal, one +Ref +Date
>> > and an extra +Ref +Time. Then a lookup could first use the +Date
>> relation
>> > to filter out all transactions that weren't created during the specified
>> > days followed by (optionally) a filter by +Time.
>>
>> You could use two separate relations, but then I would definitely
>> combine them with '+Aux'
>>
>>    (rel d (+Aux +Ref +Date) (t)) # Date
>>    (rel t (+Time))               # Time
>>
>> In this way a single B-Tree access is sufficient to find any time range.
>> For example, to find all entities between today noon and tomorrow noon:
>>
>>    (collect 'd '+Mup
>>       (list (date) (time 12 0 0))
>>       (list (inc (date)) (time 11 59 59)) )
>>
>>
>> Another possibility is using not two separate relations, but a single
>> bag relation
>>
>>    (rel ts (+Ref +Bag) ((+Date)) ((+Time)))  # Timestamp
>>
>> This saves a little space in the objects, but results in the same index
>> entry format.
>>
>>
>> But anyway, in both cases a single index tree is used. In the first case
>> you also have the option to define the time as
>>
>>    (rel t (+Ref +Time))          # Time
>>
>> with an additional separate index, so that you can search also for
>> certain time ranges only (no matter what the date is).
>>
>>
>> > Or am I over-thinking it, is a simple +Ref +Number with a UNIX
>> timestamp an
>> > easier approach that is just as fast?
>>
>> I think this would not make any difference in speed (regarding index
>> access), but would have some disadvantages, like having to convert this
>> format to/from PicoLisp date and time values, and being limited in range
>> (the Unix timestamp cannot represent dates before 1970).
>>
>>
>> > A +Ref +String storing the result of a call to stamp would be ideal as
>> the
>> > information is human readable without conversions. However, I suspect
>> that
>> > a start-end lookup on it would be much slower than the above, or?
>>
>> Yes, a bit perhaps. Parsing and printing human readable date and time
>> values is simple in PicoLisp (e.g. with 'date', 'stamp', 'datStr' and
>> related functions, see http://software-lab.de/doc/refD.html#date).
>>
>> ♪♫ Alex
>> --
>> UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
>>
>
>

Reply via email to