I checked out master and put together a test case using a small percentage
of production data for a known problem we have with Pg 9.2 and text search
scans.

A small percentage in this case means 10 million records randomly selected;
has a few billion records.


Tests ran for master successfully and I recorded timings.



Applied the patch included here to master along with
gin-packed-postinglists-14.patch.
Run make clean; ./configure; make; make install.
make check (All 141 tests passed.)

initdb, import dump


The GIN index fails to build with a segfault.

DETAIL:  Failed process was running: CREATE INDEX textsearch_gin_idx ON kp
USING gin (to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT
NULL);


#0  XLogCheckBuffer (holdsExclusiveLock=1 '\001', lsn=lsn@entry=0x7fffcf341920,
bkpb=bkpb@entry=0x7fffcf341960, rdata=0x468f11 <ginFindLeafPage+529>,
    rdata=0x468f11 <ginFindLeafPage+529>) at xlog.c:2339
#1  0x00000000004b9ddd in XLogInsert (rmid=rmid@entry=13 '\r',
info=info@entry=16 '\020', rdata=rdata@entry=0x7fffcf341bf0) at xlog.c:936
#2  0x0000000000468a9e in createPostingTree (index=0x7fa4e8d31030,
items=items@entry=0xfb55680, nitems=nitems@entry=762,
    buildStats=buildStats@entry=0x7fffcf343dd0) at gindatapage.c:1324
#3  0x00000000004630c0 in buildFreshLeafTuple (buildStats=0x7fffcf343dd0,
nitem=762, items=0xfb55680, category=<optimized out>, key=34078256,
    attnum=<optimized out>, ginstate=0x7fffcf341df0) at gininsert.c:281
#4  ginEntryInsert (ginstate=ginstate@entry=0x7fffcf341df0,
attnum=<optimized out>, key=34078256, category=<optimized out>,
items=0xfb55680, nitem=762,
    buildStats=buildStats@entry=0x7fffcf343dd0) at gininsert.c:351
#5  0x00000000004635b0 in ginbuild (fcinfo=<optimized out>) at
gininsert.c:531
#6  0x0000000000718637 in OidFunctionCall3Coll
(functionId=functionId@entry=2738,
collation=collation@entry=0, arg1=arg1@entry=140346257507968,
    arg2=arg2@entry=140346257510448, arg3=arg3@entry=32826432) at
fmgr.c:1649
#7  0x00000000004ce1da in index_build
(heapRelation=heapRelation@entry=0x7fa4e8d30680,
indexRelation=indexRelation@entry=0x7fa4e8d31030,
    indexInfo=indexInfo@entry=0x1f4e440, isprimary=isprimary@entry=0
'\000', isreindex=isreindex@entry=0 '\000') at index.c:1963
#8  0x00000000004ceeaa in index_create
(heapRelation=heapRelation@entry=0x7fa4e8d30680,

    indexRelationName=indexRelationName@entry=0x1f4e660
"textsearch_gin_knn_idx", indexRelationId=16395, indexRelationId@entry=0,
    relFileNode=<optimized out>, indexInfo=indexInfo@entry=0x1f4e440,
indexColNames=indexColNames@entry=0x1f4f728,
    accessMethodObjectId=accessMethodObjectId@entry=2742,
tableSpaceId=tableSpaceId@entry=0,
collationObjectId=collationObjectId@entry=0x1f4fcc8,

    classObjectId=classObjectId@entry=0x1f4fce0,
coloptions=coloptions@entry=0x1f4fcf8,
reloptions=reloptions@entry=0, isprimary=0 '\000',
    isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000',
allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',
    is_internal=0 '\000') at index.c:1082
#9  0x0000000000546a78 in DefineIndex (stmt=<optimized out>,
indexRelationId=indexRelationId@entry=0, is_alter_table=is_alter_table@entry=0
'\000',
    check_rights=check_rights@entry=1 '\001', skip_build=skip_build@entry=0
'\000', quiet=quiet@entry=0 '\000') at indexcmds.c:594
#10 0x000000000065147e in ProcessUtilitySlow
(parsetree=parsetree@entry=0x1f7fb68,

    queryString=0x1f7eb10 "CREATE INDEX textsearch_gin_idx ON kp USING gin
(to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL);",
context=<optimized out>, params=params@entry=0x0,
completionTag=completionTag@entry=0x7fffcf344c10 "", dest=<optimized out>)
at utility.c:1163
#11 0x000000000065079e in standard_ProcessUtility (parsetree=0x1f7fb68,
queryString=<optimized out>, context=<optimized out>, params=0x0,
    dest=<optimized out>, completionTag=0x7fffcf344c10 "") at utility.c:873
#12 0x000000000064de61 in PortalRunUtility (portal=portal@entry=0x1f4c350,
utilityStmt=utilityStmt@entry=0x1f7fb68, isTopLevel=isTopLevel@entry=1
'\001',
    dest=dest@entry=0x1f7ff08, completionTag=completionTag@entry=0x7fffcf344c10
"") at pquery.c:1187
#13 0x000000000064e9e5 in PortalRunMulti (portal=portal@entry=0x1f4c350,
isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x1f7ff08,
    altdest=altdest@entry=0x1f7ff08,
completionTag=completionTag@entry=0x7fffcf344c10
"") at pquery.c:1318
#14 0x000000000064f459 in PortalRun (portal=portal@entry=0x1f4c350,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1
'\001',
    dest=dest@entry=0x1f7ff08, altdest=altdest@entry=0x1f7ff08,
completionTag=completionTag@entry=0x7fffcf344c10 "") at pquery.c:816
#15 0x000000000064d2d5 in exec_simple_query (
    query_string=0x1f7eb10 "CREATE INDEX textsearch_gin_idx ON kp USING gin
(to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL);") at
postgres.c:1048
#16 PostgresMain (argc=<optimized out>, argv=argv@entry=0x1f2ad40,
dbname=0x1f2abf8 "rbt", username=<optimized out>) at postgres.c:3992
#17 0x000000000045b1b4 in BackendRun (port=0x1f47280) at postmaster.c:4085
#18 BackendStartup (port=0x1f47280) at postmaster.c:3774
#19 ServerLoop () at postmaster.c:1585
#20 0x000000000060d031 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x1f28b20)
at postmaster.c:1240
#21 0x000000000045bb25 in main (argc=3, argv=0x1f28b20) at main.c:196



On Thu, Nov 14, 2013 at 12:26 PM, Alexander Korotkov
<aekorot...@gmail.com>wrote:

> On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <
> hlinnakan...@vmware.com> wrote:
>
>> On 28.06.2013 22:31, Alexander Korotkov wrote:
>>
>>> Now, I got the point of three state consistent: we can keep only one
>>> consistent in opclasses that support new interface. exact true and exact
>>> false values will be passed in the case of current patch consistent;
>>> exact
>>> false and unknown will be passed in the case of current patch
>>> preConsistent. That's reasonable.
>>>
>>
>> I'm going to mark this as "returned with feedback". For the next version,
>> I'd like to see the API changed per above. Also, I'd like us to do
>> something about the tidbitmap overhead, as a separate patch before this, so
>> that we can assess the actual benefit of this patch. And a new test case
>> that demonstrates the I/O benefits.
>
>
> Revised version of patch is attached.
> Changes are so:
> 1) Patch rebased against packed posting lists, not depends on additional
> information now.
> 2) New API with tri-state logic is introduced.
>
> ------
> With best regards,
> Alexander Korotkov.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Reply via email to