Re: [GENERAL] Updated RUM-index and support for bigint as part of index

2016-08-25 Thread Andreas Joseph Krogh
På torsdag 25. august 2016 kl. 18:12:34, skrev Oleg Bartunov <
obartu...@gmail.com >:
Andreas,
  
sorry for delay,
it looks like a bug to me, could you please, share your dataset with me, so I 
could reproduce the behaviour.

 
 
I'll send you a Google Drive link on your email
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


 


Re: [GENERAL] Updated RUM-index and support for bigint as part of index

2016-08-25 Thread Oleg Bartunov
Andreas,

sorry for delay,
it looks like a bug to me, could you please, share your dataset with me, so
I could reproduce the behaviour.

Regards,
Oleg

On Sun, Aug 7, 2016 at 11:05 AM, Andreas Joseph Krogh 
wrote:

> På søndag 07. august 2016 kl. 08:27:06, skrev Oleg Bartunov <
> obartu...@gmail.com>:
>
> [snip]
> have you considered <=| and |=> operators ? <=> in ORDER BY works like KNN.
>
>
> I don't get how these operators should work. Neither give me the expected
> results.
>
> Using <=>
>
>
> SELECTdel.entity_id,
> del.folder_id,
> del.received_timestampFROM origo_email_delivery delWHERE del.fts_all @@ 
> to_tsquery('simple', 'andreas:*:*')ORDER BY '2000-01-01' :: TIMESTAMP 
> <=> del.received_timestampLIMIT 10;
>
>  entity_id | folder_id |   received_timestamp
> ---+---+-
>   1224278 |   1068087 | 2015-08-17 23:53:26
>   1224382 |   1068087 | 2015-08-18 03:07:55
>   1224404 |   1068087 | 2015-08-18 03:49:02
>   1505713 | 48496 | 2015-10-27 14:51:45
>142132 | 66658 | 2012-12-03 14:14:05.488
>122565 | 90115 | 2012-11-20 15:41:04.936
>200744 | 66655 | 2013-01-28 21:47:44.561
>   1445927 |888665 | 2015-09-29 00:26:56
>123671 | 83509 | 2012-11-21 14:16:26.448
>   1129928 | 66658 | 2015-05-09 08:39:14.128
> (10 rows)
>
>
> Using <=|
>
> SELECTdel.entity_id,
> del.folder_id,
> del.received_timestampFROM origo_email_delivery delWHERE del.fts_all @@ 
> to_tsquery('simple', 'andreas:*:*')ORDER BY '2000-01-01' :: TIMESTAMP 
> <=| del.received_timestampLIMIT 10;
>
>
>  entity_id | folder_id |   received_timestamp
> ---+---+-
>   1224278 |   1068087 | 2015-08-17 23:53:26
>   1224382 |   1068087 | 2015-08-18 03:07:55
>   1224404 |   1068087 | 2015-08-18 03:49:02
>   1505713 | 48496 | 2015-10-27 14:51:45
>142132 | 66658 | 2012-12-03 14:14:05.488
>122565 | 90115 | 2012-11-20 15:41:04.936
>200744 | 66655 | 2013-01-28 21:47:44.561
>   1445927 |888665 | 2015-09-29 00:26:56
>123671 | 83509 | 2012-11-21 14:16:26.448
>   1129928 | 66658 | 2015-05-09 08:39:14.128
> (10 rows)
>
>
> Neither are ordered by received_timestamp
>
> Can you explain how to get ORDER BY received_timestamp DESC?
>
> Thanks.
>
> --
> *Andreas Joseph Krogh*
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andr...@visena.com
> www.visena.com
> 
>
>


Re: [GENERAL] Updated RUM-index and support for bigint as part of index

2016-08-11 Thread Andreas Joseph Krogh
På torsdag 11. august 2016 kl. 19:13:10, skrev Artur Zakirov <
a.zaki...@postgrespro.ru >:
On 07.08.2016 11:05, Andreas Joseph Krogh wrote:
 > På søndag 07. august 2016 kl. 08:27:06, skrev Oleg Bartunov
 > >:
 >
 >     [snip]
 >     have you considered <=| and |=> operators ? <=> in ORDER BY works
 >     like KNN.
 >
 >
 > I don't get how these operators should work. Neither give me the
 > expected results.
 >
 > Using <=>
 >
 >
 > SELECT del.entity_id, del.folder_id, del.received_timestamp FROM
 > origo_email_delivery del WHERE del.fts_all @@ to_tsquery('simple',
 > 'andreas:*:*') ORDER BY '2000-01-01' :: TIMESTAMP <=>
 > del.received_timestamp LIMIT 10;
 >
 >  entity_id | folder_id |   received_timestamp
 > ---+---+-
 >   1224278 |   1068087 | 2015-08-17 23:53:26
 >   1224382 |   1068087 | 2015-08-18 03:07:55
 >   1224404 |   1068087 | 2015-08-18 03:49:02
 >   1505713 |     48496 | 2015-10-27 14:51:45
 >    142132 |     66658 | 2012-12-03 14:14:05.488
 >    122565 |     90115 | 2012-11-20 15:41:04.936
 >    200744 |     66655 | 2013-01-28 21:47:44.561
 >   1445927 |    888665 | 2015-09-29 00:26:56
 >    123671 |     83509 | 2012-11-21 14:16:26.448
 >   1129928 |     66658 | 2015-05-09 08:39:14.128
 > (10 rows)
 >
 >
 > Using <=|
 >
 > SELECT del.entity_id, del.folder_id, del.received_timestamp FROM
 > origo_email_delivery del WHERE del.fts_all @@ to_tsquery('simple',
 > 'andreas:*:*') ORDER BY '2000-01-01' :: TIMESTAMP <=|
 > del.received_timestamp LIMIT 10;
 >
 >
 >  entity_id | folder_id |   received_timestamp
 > ---+---+-
 >   1224278 |   1068087 | 2015-08-17 23:53:26
 >   1224382 |   1068087 | 2015-08-18 03:07:55
 >   1224404 |   1068087 | 2015-08-18 03:49:02
 >   1505713 |     48496 | 2015-10-27 14:51:45
 >    142132 |     66658 | 2012-12-03 14:14:05.488
 >    122565 |     90115 | 2012-11-20 15:41:04.936
 >    200744 |     66655 | 2013-01-28 21:47:44.561
 >   1445927 |    888665 | 2015-09-29 00:26:56
 >    123671 |     83509 | 2012-11-21 14:16:26.448
 >   1129928 |     66658 | 2015-05-09 08:39:14.128
 > (10 rows)
 >
 >
 > Neither are ordered by received_timestamp
 >
 > Can you explain how to get ORDER BY received_timestamp DESC?
 >
 > Thanks.
 >
 > --
 > *Andreas Joseph Krogh*
 > CTO / Partner - Visena AS
 > Mobile: +47 909 56 963
 > andr...@visena.com 
 > www.visena.com 
 > 
 >

 Do you need simple ordering by received_timestamp column? Not ordering
 by distance between received_timestamp and some date?

 Then you can use simple "ORDER BY received_timestamp". For example, we
 have data:

 =# SELECT * FROM test;
   id |     fts     |        received
 +-+-
    1 | 'andreas':1 | 2015-08-17 23:53:26
    2 | 'andreas':1 | 2015-08-18 03:07:55
    3 | 'andreas':1 | 2015-08-18 03:49:02
    4 | 'andreas':1 | 2012-12-03 14:14:05.488
    5 | 'andreas':1 | 2012-11-20 15:41:04.936
    6 | 'andreas':1 | 2013-01-28 21:47:44.561
    6 | 'andreas':1 | 2015-09-29 00:26:56
    7 | 'andreas':1 | 2012-11-21 14:16:26.448
    8 | 'andreas':1 | 2015-05-09 08:39:14.128
 (9 rows)

 I created index:

 CREATE INDEX rumidx ON test USING rum (fts rum_tsvector_timestamp_ops,
 received) WITH (attach = 'received', to = 'fts');

 Then we can execute queries:

 =# SELECT id, received FROM test WHERE fts @@ to_tsquery('simple',
 'andreas') ORDER BY received LIMIT 8;
   id |        received
 +-
    5 | 2012-11-20 15:41:04.936
    7 | 2012-11-21 14:16:26.448
    4 | 2012-12-03 14:14:05.488
    6 | 2013-01-28 21:47:44.561
    8 | 2015-05-09 08:39:14.128
    1 | 2015-08-17 23:53:26
    2 | 2015-08-18 03:07:55
    3 | 2015-08-18 03:49:02
 (8 rows)

 =# SELECT id, received FROM test WHERE fts @@ to_tsquery('simple',
 'andreas') ORDER BY received DESC LIMIT 8;
   id |        received
 +-
    6 | 2015-09-29 00:26:56
    3 | 2015-08-18 03:49:02
    2 | 2015-08-18 03:07:55
    1 | 2015-08-17 23:53:26
    8 | 2015-05-09 08:39:14.128
    6 | 2013-01-28 21:47:44.561
    4 | 2012-12-03 14:14:05.488
    7 | 2012-11-21 14:16:26.448
 (8 rows)
 
Yes, this gives the correct result, but the whole motivation for using 
RUM-index is for the query to use the same index for ORDER BY, as it seems to 
do using the <=> operator.
 
The query you gave above does not the index for sorting AFAIU.
 
Operators <=>, |=>, <=| you can use to order by nearest date to specific
 date:
 [snip]
 
I hope this is what you want.
 
I still don't understand how my query which had ORDER BY '2000-01-01' :: 
TIMESTAMP <=> del.received_timestamp
can produce the following ordering:
 
 entity_id | folder_id |   received_timestamp 
---+---+-
  1224278 |   1068087 | 2015-08-17 23:53:26
  1224382 |   1068087 | 

Re: [GENERAL] Updated RUM-index and support for bigint as part of index

2016-08-11 Thread Artur Zakirov

On 07.08.2016 11:05, Andreas Joseph Krogh wrote:

På søndag 07. august 2016 kl. 08:27:06, skrev Oleg Bartunov
>:

[snip]
have you considered <=| and |=> operators ? <=> in ORDER BY works
like KNN.


I don't get how these operators should work. Neither give me the
expected results.

Using <=>


SELECT del.entity_id, del.folder_id, del.received_timestamp FROM
origo_email_delivery del WHERE del.fts_all @@ to_tsquery('simple',
'andreas:*:*') ORDER BY '2000-01-01' :: TIMESTAMP <=>
del.received_timestamp LIMIT 10;

 entity_id | folder_id |   received_timestamp
---+---+-
  1224278 |   1068087 | 2015-08-17 23:53:26
  1224382 |   1068087 | 2015-08-18 03:07:55
  1224404 |   1068087 | 2015-08-18 03:49:02
  1505713 | 48496 | 2015-10-27 14:51:45
   142132 | 66658 | 2012-12-03 14:14:05.488
   122565 | 90115 | 2012-11-20 15:41:04.936
   200744 | 66655 | 2013-01-28 21:47:44.561
  1445927 |888665 | 2015-09-29 00:26:56
   123671 | 83509 | 2012-11-21 14:16:26.448
  1129928 | 66658 | 2015-05-09 08:39:14.128
(10 rows)


Using <=|

SELECT del.entity_id, del.folder_id, del.received_timestamp FROM
origo_email_delivery del WHERE del.fts_all @@ to_tsquery('simple',
'andreas:*:*') ORDER BY '2000-01-01' :: TIMESTAMP <=|
del.received_timestamp LIMIT 10;


 entity_id | folder_id |   received_timestamp
---+---+-
  1224278 |   1068087 | 2015-08-17 23:53:26
  1224382 |   1068087 | 2015-08-18 03:07:55
  1224404 |   1068087 | 2015-08-18 03:49:02
  1505713 | 48496 | 2015-10-27 14:51:45
   142132 | 66658 | 2012-12-03 14:14:05.488
   122565 | 90115 | 2012-11-20 15:41:04.936
   200744 | 66655 | 2013-01-28 21:47:44.561
  1445927 |888665 | 2015-09-29 00:26:56
   123671 | 83509 | 2012-11-21 14:16:26.448
  1129928 | 66658 | 2015-05-09 08:39:14.128
(10 rows)


Neither are ordered by received_timestamp

Can you explain how to get ORDER BY received_timestamp DESC?

Thanks.

--
*Andreas Joseph Krogh*
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 




Do you need simple ordering by received_timestamp column? Not ordering 
by distance between received_timestamp and some date?


Then you can use simple "ORDER BY received_timestamp". For example, we 
have data:


=# SELECT * FROM test;
 id | fts |received
+-+-
  1 | 'andreas':1 | 2015-08-17 23:53:26
  2 | 'andreas':1 | 2015-08-18 03:07:55
  3 | 'andreas':1 | 2015-08-18 03:49:02
  4 | 'andreas':1 | 2012-12-03 14:14:05.488
  5 | 'andreas':1 | 2012-11-20 15:41:04.936
  6 | 'andreas':1 | 2013-01-28 21:47:44.561
  6 | 'andreas':1 | 2015-09-29 00:26:56
  7 | 'andreas':1 | 2012-11-21 14:16:26.448
  8 | 'andreas':1 | 2015-05-09 08:39:14.128
(9 rows)

I created index:

CREATE INDEX rumidx ON test USING rum (fts rum_tsvector_timestamp_ops, 
received) WITH (attach = 'received', to = 'fts');


Then we can execute queries:

=# SELECT id, received FROM test WHERE fts @@ to_tsquery('simple', 
'andreas') ORDER BY received LIMIT 8;

 id |received
+-
  5 | 2012-11-20 15:41:04.936
  7 | 2012-11-21 14:16:26.448
  4 | 2012-12-03 14:14:05.488
  6 | 2013-01-28 21:47:44.561
  8 | 2015-05-09 08:39:14.128
  1 | 2015-08-17 23:53:26
  2 | 2015-08-18 03:07:55
  3 | 2015-08-18 03:49:02
(8 rows)

=# SELECT id, received FROM test WHERE fts @@ to_tsquery('simple', 
'andreas') ORDER BY received DESC LIMIT 8;

 id |received
+-
  6 | 2015-09-29 00:26:56
  3 | 2015-08-18 03:49:02
  2 | 2015-08-18 03:07:55
  1 | 2015-08-17 23:53:26
  8 | 2015-05-09 08:39:14.128
  6 | 2013-01-28 21:47:44.561
  4 | 2012-12-03 14:14:05.488
  7 | 2012-11-21 14:16:26.448
(8 rows)

Operators <=>, |=>, <=| you can use to order by nearest date to specific 
date:


=# SELECT id, received, received <=> '2013-01-01' AS rank FROM test 
WHERE fts @@ to_tsquery('simple', 'andreas') ORDER BY received <=> 
'2013-01-01' LIMIT 8;

 id |received | rank
+-+--
  6 | 2013-01-28 21:47:44.561 |  2411264.561
  4 | 2012-12-03 14:14:05.488 |  2454354.512
  7 | 2012-11-21 14:16:26.448 |  3491013.552
  5 | 2012-11-20 15:41:04.936 |  3572335.064
  8 | 2015-05-09 08:39:14.128 | 74162354.128
  1 | 2015-08-17 23:53:26 | 82857206
  2 | 2015-08-18 03:07:55 | 82868875
  3 | 2015-08-18 03:49:02 | 82871342
(8 rows)

=# SELECT id, received, received <=> '2013-01-01' AS rank FROM test 
WHERE fts @@ to_tsquery('simple', 'andreas') ORDER BY received <=> 
'2013-01-01' DESC LIMIT 8;

 id |received | rank
+-+--
  6 | 2015-09-29 00:26:56 | 86488016
  3 | 2015-08-18 03:49:02 | 82871342
  2 | 2015-08-18 03:07:55 |  

Re: [GENERAL] Updated RUM-index and support for bigint as part of index

2016-08-07 Thread Andreas Joseph Krogh
På søndag 07. august 2016 kl. 08:27:06, skrev Oleg Bartunov >:
[snip]
have you considered <=| and |=> operators ? <=> in ORDER BY works like KNN.



 
I don't get how these operators should work. Neither give me the expected 
results.
 
Using <=>
 
SELECT del.entity_id, del.folder_id, del.received_timestamp FROM 
origo_email_delivery delWHERE del.fts_all @@ to_tsquery('simple', 
'andreas:*:*') ORDER BY '2000-01-01' :: TIMESTAMP <=> 
del.received_timestampLIMIT 10; 
 entity_id | folder_id |   received_timestamp 
 ---+---+-
   1224278 |   1068087 | 2015-08-17 23:53:26
   1224382 |   1068087 | 2015-08-18 03:07:55
   1224404 |   1068087 | 2015-08-18 03:49:02
   1505713 | 48496 | 2015-10-27 14:51:45
    142132 | 66658 | 2012-12-03 14:14:05.488
    122565 | 90115 | 2012-11-20 15:41:04.936
    200744 | 66655 | 2013-01-28 21:47:44.561
   1445927 |    888665 | 2015-09-29 00:26:56
    123671 | 83509 | 2012-11-21 14:16:26.448
   1129928 | 66658 | 2015-05-09 08:39:14.128
 (10 rows)

  
Using <=|
SELECT del.entity_id, del.folder_id, del.received_timestamp FROM 
origo_email_delivery delWHERE del.fts_all @@ to_tsquery('simple', 
'andreas:*:*') ORDER BY '2000-01-01' :: TIMESTAMP <=| 
del.received_timestampLIMIT 10; 
 
 entity_id | folder_id |   received_timestamp 
 ---+---+-
   1224278 |   1068087 | 2015-08-17 23:53:26
   1224382 |   1068087 | 2015-08-18 03:07:55
   1224404 |   1068087 | 2015-08-18 03:49:02
   1505713 | 48496 | 2015-10-27 14:51:45
    142132 | 66658 | 2012-12-03 14:14:05.488
    122565 | 90115 | 2012-11-20 15:41:04.936
    200744 | 66655 | 2013-01-28 21:47:44.561
   1445927 |    888665 | 2015-09-29 00:26:56
    123671 | 83509 | 2012-11-21 14:16:26.448
   1129928 | 66658 | 2015-05-09 08:39:14.128
 (10 rows)

  
Neither are ordered by received_timestamp
 
Can you explain how to get ORDER BY received_timestamp DESC?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


 


Re: [GENERAL] Updated RUM-index and support for bigint as part of index

2016-08-07 Thread Andreas Joseph Krogh
På lørdag 06. august 2016 kl. 20:54:32, skrev Artur Zakirov <
a.zaki...@postgrespro.ru >:
Hello,   2016-08-02 21:08 GMT+03:00 Andreas Joseph Krogh >:  The ORDER BY part seems strange; It seems one 
has to find a value "lower than any other value" to use as a kind of base, why 
is this necessary? It also seems that in order to be able to sort DESC one has 
to provide a timestamp value "higher than any other value", is this correct? 
 
It would be great if the docs explained this.
 
We will write more detailed documentation for RUM.



 
Great!
 
 
I really miss the opportunity to include a BIGINT as part of the index, so 
that the WHERE-clause could be like this:
 
WHERE del.fts_all @@ to_tsquery('simple', 'andreas') AND del.folder_id IN (1
,2,3)
 
Having this would be perfect for my use-case searching in email in folders, 
sorted by received_date, and having it use ONE index.
 
Will this be supported?


 We have a plan to use generic types to able to include bigint, timestamp and 
other types as part of index.

 
Does this eliminate the need for a btree_rum equivalent of btree_gin, being 
that the RUM-index will handle all "btree-able" datatypes?
 
 
But I cant tell date of it.


 
I understand.
Do you think it will be done by the time 9.6 is released?
 
Thanks.
 

-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


 


Re: [GENERAL] Updated RUM-index and support for bigint as part of index

2016-08-07 Thread Oleg Bartunov
On Tue, Aug 2, 2016 at 9:08 PM, Andreas Joseph Krogh 
wrote:

> Hi.
>
> I see the RUM-index is updated, which is great!
>
> I wonder, to be able to sort by timestamp one has to create the index like
> this:
>
>
> CREATE INDEX rumidx ON origo_email_delivery USING rum (fts_all 
> rum_tsvector_timestamp_ops, received_timestamp)
> WITH (attach = 'received_timestamp', TO = 'fts_all', order_by_attach = 
> TRUE );
>
> Then, to be able to use the index for sorting by the
> "received_timestamp"-column one has to issue a query like this:
>
> EXPLAIN ANALYZE SELECTdel.entity_id,
> del.subject,
> del.received_timestamp,
> fts_all <=> to_tsquery('simple', 'andreas') AS rank
> FROM origo_email_delivery del
> WHERE del.fts_all @@ to_tsquery('simple', 'andreas')
> ORDER BY '2000-01-01' :: TIMESTAMP <=> del.received_timestamp
> LIMIT 10;
>
> QUERY 
> PLAN
> --
>  Limit  (cost=14.40..26.47 rows=10 width=89) (actual time=10.908..10.952 
> rows=10 loops=1)
>->  Index Scan using rumidx on origo_email_delivery del  
> (cost=14.40..3221.22 rows=2657 width=89) (actual time=10.906..10.947 rows=10 
> loops=1)
>  Index Cond: (fts_all @@ '''andreas'' & ''kr'''::tsquery)
>  Order By: (received_timestamp <=> '2000-01-01 00:00:00'::timestamp 
> without time zone)
>  Planning time: 0.491 ms
>  Execution time: 11.010 ms
> (6 rows)
>
>
> The ORDER BY part seems strange; It seems one has to find a value
> "lower than any other value" to use as a kind of base, why is this
> necessary? It also seems that in order to be able to sort DESC one has to
> provide a timestamp value "higher than any other value", is this correct?
>

have you considered <=| and |=> operators ? <=> in ORDER BY works like KNN.


>
> It would be great if the docs explained this.
>
> I really miss the opportunity to include a BIGINT as part of the index, so
> that the WHERE-clause could be like this:
>
> WHERE del.fts_all @@ to_tsquery('simple', 'andreas') AND del.folder_id IN
> (1,2,3)
>
> Having this would be perfect for my use-case searching in email in
> folders, sorted by received_date, and having it use ONE index.
>
> Will this be supported?
>
> Thanks.
>
> --
> *Andreas Joseph Krogh*
> CTO / Partner - Visena AS
> Mobile: +47 909 56 963
> andr...@visena.com
> www.visena.com
> 
>


Re: [GENERAL] Updated RUM-index and support for bigint as part of index

2016-08-06 Thread Artur Zakirov
Hello,

2016-08-02 21:08 GMT+03:00 Andreas Joseph Krogh :
>
> The ORDER BY part seems strange; It seems one has to find a value "lower than 
> any other value" to use as a kind of base, why is this necessary? It also 
> seems that in order to be able to sort DESC one has to provide a timestamp 
> value "higher than any other value", is this correct?
>
>
> It would be great if the docs explained this.
>

We will write more detailed documentation for RUM.


>
> I really miss the opportunity to include a BIGINT as part of the index, so
> that the WHERE-clause could be like this:
>
> WHERE del.fts_all @@ to_tsquery('simple', 'andreas') AND del.folder_id IN
> (1,2,3)
>
> Having this would be perfect for my use-case searching in email in
> folders, sorted by received_date, and having it use ONE index.
>
> Will this be supported?
>

We have a plan to use generic types to able to include bigint, timestamp
and other types as part of index. But I cant tell date of it.

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company