date:20160318

Re: [PERFORM] Searching GIN-index (FTS) and sort by timestamp-column

2016-03-18 Thread Andreas Joseph Krogh

På onsdag 16. mars 2016 kl. 14:53:04, skrev Andreas Joseph Krogh <
andr...@visena.com >:
På onsdag 16. mars 2016 kl. 14:37:27, skrev Tom Lane >:
Andreas Joseph Krogh  writes:
 > 1. Why isnt' folder_id part of the index-cond?

 Because a GIN index is useless for sorting.

 > 2. Is there a way to make it use the (same) index to sort by
 > received_timestamp?

 No.

 > 3. Using a GIN-index, is there a way to use the index at all for sorting?

 No.

 > 4. It doesn't seem like ts_rank uses the index for sorting either.

 Same reason.

 regards, tom lane
 
So it's basically impossible to use FTS/GIN with sorting on large datasets?
Are there any plans to improve this situation?
 
Thanks.
 
This paper talks about ORDER BY optimizations for FTS (starting at slide 6 and 
7):
http://www.sai.msu.su/~megera/postgres/talks/Next%20generation%20of%20GIN.pdf
 
This indicates some work is being done in this area.
 
Oleg, if you're listening, do you guys have any exiting news regarding this?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com

Re: [PERFORM] Searching GIN-index (FTS) and sort by timestamp-column

2016-03-18 Thread Evgeniy Shishkin


> On 16 Mar 2016, at 18:04, Evgeniy Shishkin  wrote:
> 
>> 
>> On 16 Mar 2016, at 17:52, Evgeniy Shishkin  wrote:
>> 
>> 
>>> On 16 Mar 2016, at 16:37, Tom Lane  wrote:
>>> 
>>> Andreas Joseph Krogh  writes:
 1. Why isnt' folder_id part of the index-cond?
>>> 
>>> Because a GIN index is useless for sorting.
>> 
>> I don't see how gin inability to return sorted data relates to index 
>> condition.
>> In fact i tried to reproduce the example,
>> and if i change folder_id to int from bigint, then index condition with 
>> folder_id is used
>> 
>>Index Cond: ((fts_all @@ '''hi'''::tsquery) AND (folder_id = 1))
>> 
> 
> Looks like documentation 
> http://www.postgresql.org/docs/9.5/static/btree-gin.html
> is lying about supporting int8 type
> 

Uh, it works if i cast to bigint explicitly
  WHERE  del.fts_all @@ to_tsquery('simple', 'hi')
  AND del.folder_id = 1::bigint;
results in 
 Index Cond: ((folder_id = '1'::bigint) AND (fts_all @@ 
'''hi'''::tsquery))

>> 
>>> 
 2. Is there a way to make it use the (same) index to sort by 
 received_timestamp?
>>> 
>>> No.
>>> 
 3. Using a GIN-index, is there a way to use the index at all for sorting?
>>> 
>>> No.
>>> 
 4. It doesn't seem like ts_rank uses the index for sorting either.
>>> 
>>> Same reason.
>>> 
>>> regards, tom lane
>>> 
>>> 
>>> -- 
>>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Searching GIN-index (FTS) and sort by timestamp-column

2016-03-18 Thread Andreas Joseph Krogh

På torsdag 17. mars 2016 kl. 18:20:23, skrev Tom Lane >:
Evgeniy Shishkin  writes:
 > Uh, it works if i cast to bigint explicitly

 FWIW, the reason for that is that the int8_ops operator class that
 btree_gin creates doesn't contain any cross-type operators.  Probably
 wouldn't be that hard to fix if somebody wanted to put in the work.

 regards, tom lane
 
Thanks for info.
 
Can you explain why it works when using prepared statement without casting? 
Does the machinary then know the type so the "setParameter"-call uses the 
correct type?
 
Thanks.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com

Re: [PERFORM] Disk Benchmarking Question

2016-03-18 Thread Mike Sofen

Sorry for the delay, long work day!

 

Ok, I THINK I understand where you’re going.  Do it this way:

4 drives in Raid10 = 2 pairs of mirrored drives, aka still 2 active drives (2 
are failover).  They are sharing the 12gbps SAS interface, but that speed is 
quite irrelevant…it’s just a giant pipe for filling lots of drives.  

 

Each of your 2 drives has a max seq read/write spec 200 MBPs (WAY max).  When I 
say max, I mean, under totally edge laboratory conditions, writing to the outer 
few tracks with purely sequential data (never happens in the real world).  With 
2 drives running perfectly in raid 10, the theoretical max would be 400mbps.  
Real world, less than half, on sequential.

 

But random writes are the rulers of most activity in the data world (think of 
writing a single row to a table – a few thousand bytes that might be plopped 
anywhere on the disk and then randomly retrieved.  So the MBPs throughput 
number becomes mostly meaningless (because the data chunks are small and 
random), and IOPs and drive seek times become king (thus my earlier comments).

 

So – if you’re having disk performance issues with a database, you either add 
more spinning disks (to increase IOPs/distribute them) or switch to SSDs and 
forget about almost everything…

 

Mike

 

--

From: Dave Stibrany [mailto:dstibr...@gmail.com] 
Sent: Friday, March 18, 2016 7:48 AM



Hey Mike,

 

Thanks for the response. I think where I'm confused is that I thought vendor 
specified MBps was an estimate of sequential read/write speed. Therefore if 
you're in RAID10, you'd have 4x the sequential read speed and 2x the sequential 
write speed. Am I misunderstanding something?

 

Also, when you mention that MBPs is the capacity of the interface, what do you 
mean exactly. I've been taking interface speed to be the electronic transfer 
speed, not the speed from the actual physical medium, and more in the 6-12 
gigabit range.

 

Please let me know if I'm way off on any of this, I'm hoping to have my mental 
model updated.

 

Thanks!

 

Dave

 

On Thu, Mar 17, 2016 at 5:11 PM, Mike Sofen  > wrote:

Hi Dave,

 

Database disk performance has to take into account IOPs, and IMO, over MBPs, 
since it’s the ability of the disk subsystem to write lots of little bits 
(usually) versus writing giant globs, especially in direct attached storage 
(like yours, versus a SAN).  Most db disk benchmarks revolve around IOPs…and 
this is where SSDs utterly crush spinning disks.

 

You can get maybe 200 IOPs out of each disk, you have 4 in raid  10 so you get 
a whopping 400 IOPs.  A single quality SSD (like the Samsung 850 pro) will 
support a minimum of 40k IOPs on reads and 80k IOPs on writes.  That’s why SSDs 
are eliminating spinning disks when performance is critical and budget allows.

 

Back to your question – the MBPs is the capacity of interface, so it makes 
sense that it’s the same for both reads and writes.  The perc raid controller 
will be saving your bacon on writes, with 2gb cache (assuming it’s caching 
writes), so it becomes the equivalent of an SSD up to the capacity limit of the 
write cache.  With only 400 iops of write speed, with a busy server you can 
easily saturate the cache and then your system will drop to a crawl.

 

If I didn’t answer the intent of your question, feel free to clarify for me.

 

Mike

 

From: pgsql-performance-ow...@postgresql.org 
  
[mailto:pgsql-performance-ow...@postgresql.org 
 ] On Behalf Of Dave Stibrany
Sent: Thursday, March 17, 2016 1:45 PM
To: pgsql-performance@postgresql.org  
Subject: [PERFORM] Disk Benchmarking Question

 

I'm pretty new to benchmarking hard disks and I'm looking for some advice on 
interpreting the results of some basic tests.

 

The server is:

- Dell PowerEdge R430

- 1 x Intel Xeon E5-2620 2.4GHz

- 32 GB RAM

- 4 x 600GB 10k SAS Seagate ST600MM0088 in RAID 10

- PERC H730P Raid Controller with 2GB cache in write back mode.

 

The OS is Ubuntu 14.04, I'm using LVM and I have an ext4 volume for /, and an 
xfs volume for PGDATA.

 

I ran some dd and bonnie++ tests and I'm a bit confused by the numbers. I ran 
'bonnie++ -n0 -f' on the root volume.

 

Here's a link to the bonnie test results

https://www.dropbox.com/s/pwe2g5ht9fpjl2j/bonnie.today.html?dl=0

 

The vendor stats say sustained throughput of 215 to 108 MBps, so I guess I'd 
expect around 400-800 MBps read and 200-400 MBps write. In any case, I'm pretty 
confused as to why the read and write sequential speeds are almost identical. 
Does this look wrong?

 

Thanks,

 

Dave

 

 

 





 

-- 

THIS IS A TEST

Re: [PERFORM] Searching GIN-index (FTS) and sort by timestamp-column

2016-03-18 Thread Jeff Janes

On Wed, Mar 16, 2016 at 6:53 AM, Andreas Joseph Krogh 
wrote:

> På onsdag 16. mars 2016 kl. 14:37:27, skrev Tom Lane :
>
> Andreas Joseph Krogh  writes:
> > 1. Why isnt' folder_id part of the index-cond?
>
> Because a GIN index is useless for sorting.
>
> > 2. Is there a way to make it use the (same) index to sort by
> > received_timestamp?
>
> No.
>
> > 3. Using a GIN-index, is there a way to use the index at all for sorting?
>
> No.
>
> > 4. It doesn't seem like ts_rank uses the index for sorting either.
>
> Same reason.
>
> regards, tom lane
>
>
> So it's basically impossible to use FTS/GIN with sorting on large datasets?
> Are there any plans to improve this situation?
>

I don't see why it would not be possible to create a new execution node
type that does an index scan to obtain order (or just to satisfy an
equality or range expression), and takes a bitmap (as produced by the
FTS/GIN) to apply as a filter.  But, I don't know of anyone planning on
doing that.

Cheers,

Jeff

Re: [PERFORM] Searching GIN-index (FTS) and sort by timestamp-column

Re: [PERFORM] Searching GIN-index (FTS) and sort by timestamp-column

Re: [PERFORM] Searching GIN-index (FTS) and sort by timestamp-column

Re: [PERFORM] Disk Benchmarking Question

Re: [PERFORM] Searching GIN-index (FTS) and sort by timestamp-column

5 matches

Site Navigation

Mail list logo

Footer information