On Wed, Aug 17, 2011 at 11:11 AM, Alexander Korotkov
wrote:
> On Tue, Aug 16, 2011 at 11:15 PM, Heikki Linnakangas <
> heikki.linnakan...@enterprisedb.com> wrote:
>
>> On 16.08.2011 22:10, Heikki Linnakangas wrote:
>>
>>> Here's an version of the patch wit
an attachment...
Thanks a lot. I'm going to start rerunning the tests now.
--
With best regards,
Alexander Korotkov.
u might as well pull tuples directly from the
> heap.
>
Yes, seems reasonable. Buffer on the root node was in the paper. But now I
don't see the need of it too.
--
With best regards,
Alexander Korotkov.
Thus, we can calculate:
levelstep = min(log_r(1 + effective_cache_size_in_pages*(r - 1)) - 1,
log_r(maintenance_work_mem_in_pages - 1))
and get more precise result. But also we need at least very rough estimate
of memory occupied by node buffers hash tab and path items.
--
With best regards,
Alexander Korotkov.
I found that I forgot to remove levelstep and buffersize from reloptions.c.
Updated patch is attached.
--
With best regards,
Alexander Korotkov.
gist_fast_build-0.14.1.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To
Hi!
Thank you for your notes.
On Fri, Aug 12, 2011 at 7:04 PM, Robert Haas wrote:
> On Thu, Aug 11, 2011 at 6:21 AM, Alexander Korotkov
> wrote:
> > [ new patch ]
>
> Some random comments:
>
> - It appears that the "noFollowFight" flag is really supp
-level
> buffer. Wouldn't it be fairly simple to detect that case explicitly, and
> stop the emptying process only if one of the lower-level buffers really
> fills up? That should be more efficient, as you would have "swap" between
> different subtrees less often.
Yes, it seems reasonable to me.
--
With best regards,
Alexander Korotkov.
On Thu, Aug 11, 2011 at 2:28 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:
> On 10.08.2011 22:44, Alexander Korotkov wrote:
>
>> Manual and readme updates.
>>
>
> Thanks, I'm reviewing these now.
>
> Do we want to expose the leve
build(). That's why I avoided deletion from lists.
Now I've added additional check for root node in loop over list.
> The calculations in initBuffering() desperately need comments. As does the
> rest of the code too, but the heuristics in that function are particularly
> hard t
ptional. We can try it as add-on
patch.
--
With best regards,
Alexander Korotkov.
Manual and readme updates.
--
With best regards,
Alexander Korotkov.
gist_fast_build-0.12.0.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql
effective_cache_size.
4) Some renames. In particular GISTLoadedPartItem
to GISTBufferingInsertStack.
5) Some comments were corrected and some were added.
6) pgindent
7) rebased with head
Readme update and user documentation coming soon.
--
With best regards,
Alexander Korotkov.
gist_fast_build-0.11.0
it also seems that both were added by your commit
of table-based parser:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ba748f7a11ef884277b61d1708a17a44acfd1736
--
With best regards,
Alexander Korotkov.
is possible?
--
With best regards,
Alexander Korotkov.
pts[0].default_val’**)
It is caused by definition of default field of relopt_string structure as
1-length character array. This seems to be a design flaw in the reloptions.c
code. Any thoughts?
--
With best regards,
Alexander Korotkov.
t; * Any user-visible options need to be documented in the user manual.
> * And of course, make sure comments and the readme are up-to-date.
> * Compiler warning:
>
> reloptions.c:259: warning: initializer-string for array of chars is too
> long
> reloptions.c:259: warning: (near initialization for
> ‘stringRelOpts[0].default_val’**)
>
> I don't think there's a way to add an entry to stringRelOpts in a way that
> works correctly. That's a design flaw in the reloptions.c code that has
> never come up before, as there hasn't been any string-formatted relopts
> before (actually buffering option might be better served by an enum
> reloption too, if we had that). Please start a new thread on that on
> pgsql-hackers.
Ok.
--
With best regards,
Alexander Korotkov.
items. If we
imagine extremely large tree with levelstep = 1 size of this datastructures
will be singnificant. And it's hard to predict that size without knowing of
tree size.
--
With best regards,
Alexander Korotkov.
gist_fast_build-0.10.0.patch.gz
Description: GNU Zip compressed data
Uhh, my bad, really stupid bug. Many thanks.
--
With best regards,
Alexander Korotkov.
On Wed, Aug 3, 2011 at 8:31 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:
> On 03.08.2011 11:18, Alexander Korotkov wrote:
>
>> I found that in previous version
only to 44/9C518750" appears. Seems that there is some totally wrong
use of WAL if even optimization level does matter...
--
With best regards,
Alexander Korotkov.
gist_fast_build-0.9.1.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (pgs
ffset
-
(0001004000D2,10982288)
(1 row)
May be you have any ideas about it?
--
With best regards,
Alexander Korotkov.
gist_fast_build-0.9.0.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.or
quot;Concurrency and Recovery for GiSTs".
http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/access-methods-for-next-generation.pdf.gz
--
With best regards,
Alexander Korotkov.
Also I need to
verify final emptying, because IO guarantees of original paper is based on
strict descending final emptying.
--
With best regards,
Alexander Korotkov.
buffers. But those paths can
become invalid on page splits. It seems to me that approximately same volume
of code for maintaining parent links should be added to this version of
patch in order to get it working correctly.
--
With best regards,
Alexander Korotkov.
2) blk: 564 numTuple: 109 free: 3352b(58.92%)
rightlink:4294967295 (InvalidBlockNumber)
> 7(l:2) blk: 165 numTuple: 108 free: 3396b(58.38%)
rightlink:4294967295 (InvalidBlockNumber)
.
Isn't it a bug?
--
With best regards,
Alexander Korotkov.
think modification of
WAL record structure is possible or I have to insert downlink one by one in
buffering build too?
--
With best regards,
Alexander Korotkov.
ry to evade keeping loaded-tree hash. This might help
me a lot. Thanks.
--
With best regards,
Alexander Korotkov.
ome researches
about distribution law for such cases (while I didn't find anything relevant
in google scholar)?
As an alternative I can propose take into account actual average IO
operations per tuple rather then an estimate.
--
With best regards,
Alexander Korotkov.
On Mon, Jul 18, 20
tomatic switching.
2) Take care about concurrent backends in automatic switching.
--
With best regards,
Alexander Korotkov.
gist_fast_build-0.7.0.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Fri, Jul 15, 2011 at 12:53 AM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:
> On 14.07.2011 23:41, Alexander Korotkov wrote:
>
>> Do you think using "rightlink" as pointer to parent page is possible
>> during
>> index build? It wo
g subtly different in
> different versions, if we need to backpatch bug fixes that use that field.
Yes, it seems very reasonable.
--
With best regards,
Alexander Korotkov.
standard use of histogram slot
should be avoided.
--
With best regards,
Alexander Korotkov.
tree and frequency/length
statistics for arrays?
> 1) In calc_distr you go to some lengths to avoid round off errors. Since it
> is
> certainly just the order of the estimate that matters, why not just
> perform the calculation in log space?
>
It seems to me that I didn't anything to avoid round off errors there...
--
With best regards,
Alexander Korotkov.
fd.c directly, ie.
> OpenTemporaryFile/FileRead/**FileWrite.
BufFile is distributing temporary data through several files. AFAICS
postgres avoids working with files larger than 1GB. Size of tree buffers can
easily be greater. Without BufFile I need to maintain set of files manually.
--
With best regards,
Alexander Korotkov.
Do you think using "rightlink" as pointer to parent page is possible during
index build? It would allow to simplify code significantly, because of no
more need to maintain in-memory structures for parents memorizing.
--
With best regards,
Alexander Korotkov.
lidOffsetNumber;
>ptr->parent = top->parent;
>ptr->next = top->next;
> top->next = ptr;
>if (tail == top);
>tail = ptr;
}
>
--
With best regards,
Alexander Korotkov.
od. But concurent backends makes this method unsafe.
There are other difficulties with concurrent backends: it would be nice
estimate usage of effective cache by other backeds before switching to
buffering method. If don't take care about it then we can don't switch to
buffering method wh
Thank you very much for detail explanation. But this line of modified patch
seems strange for me:
*newchildoffnum = blkno;
I believe it should be:
*newchildoffnum = i;
--
With best regards,
Alexander Korotkov.
On Wed, Jul 13, 2011 at 12:40 PM, Alexander Korotkov
wrote:
> On Wed, Jul 13, 2011 at 12:33 PM, Heikki Linnakangas <
> heikki.linnakan...@enterprisedb.com> wrote:
>
>> Is it possible to switch to the new buffering method in the middle of an
>> index build? We could use
tive_cache_size?), and switch to the buffering
> method after that.
Yes, it seems to be possible. It also would be great to somehow detect case
of ordered data when regular index build performs well.
--
With best regards,
Alexander Korotkov.
New version of patch with a little more refactoring and comments.
--
With best regards,
Alexander Korotkov.
gist_fast_build-0.6.0.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http
On Fri, Jul 8, 2011 at 6:18 PM, Tom Lane wrote:
> For test purposes, you could turn off synchronize_seqscans to prevent
> that.
Thanks, it helps. I'm rerunning tests now.
--
With best regards,
Alexander Korotkov.
tab1. But actually, it isn't always so. In aggregate with only few used
test cases it can cause significant error.
I'm going to make use some more thought-out testing method. Probably, some
more precise index quality measure exists (even for R-tree).
--
With best regards,
Alexander Korotkov.
much (in my expectation decrease of buffer size should make
all build parameters closer to regular build). I'm going to recheck my
experiments, probably I'm missing something.
--
With best regards,
Alexander Korotkov.
gist_fast_build-0.5.0.patch.gz
Description: GNU Zip compressed dat
uld allow me to undertand tradeoffs better.
--
With best regards,
Alexander Korotkov.
gist_fast_build-0.4.0.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
situation
> which I can't induce at will.
>
Yes, it also seems pretty hard to get this code section executed for me. I'm
going to ask Teodor and Oleg about it.
--
With best regards,
Alexander Korotkov.
New version of patch. Bug which caused falldown on trees with high number of
levels was fixed. Also some more comments and refactoring.
--
With best regards,
Alexander Korotkov.
gist_fast_build-0.3.0.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list
t and fastbuild insert.
--
With best regards,
Alexander Korotkov.
On Mon, Jun 27, 2011 at 10:32 PM, Alexander Korotkov
wrote:
> I didn't have an estimate yet, but I'm working on it.
>
Now, it seems that I have an estimate.
N - total number of itups
B - avg. number of itups in page
H - height of tree
K - avg. number of itups fitting in node buf
depend
on concurrent load. When there are no concurrent load (as in my experiments)
fraction of tree which fits to effective cache is reasonable for estimating
benefit of IO economy. But with high concurrent load part of cache occupied
by tree should be considerable smaller than whole effective cache.
--
With best regards,
Alexander Korotkov.
of this dataset. In this case
results was similar to random generated data.
--
With best regards,
Alexander Korotkov.
#x27;t give much benefit. But anyway, the value of this
work can be in producing better index in some cases and SSD drive lifetime
economy due to less IO operations.
--
With best regards,
Alexander Korotkov.
On Fri, Jun 24, 2011 at 12:40 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:
> On 21.06.2011 13:08, Alexander Korotkov wrote:
>
>> I've created section about testing in project wiki page:
>> http://wiki.postgresql.org/**wiki/Fast_GiS
?
>
I believe it's an architecture problem. You actually need not '->' operator
to be supported by GiST but "column -> 'field_name' = value" expression.
Probably, I'm missing something, but I think supporting of this require
significant catalog changes.
--
With best regards,
Alexander Korotkov.
relatively expensive penalty
method in that opclass. But, probably index build can be still faster when
index doesn't fit cache even for gist_trgm_ops. Also with that opclass index
quality is slightly worse but the difference is not dramatic.
--
With best regards,
Alexander Korotkov.
-
With best regards,
Alexander Korotkov.
On Thu, Jun 16, 2011 at 10:35 PM, Alexander Korotkov
wrote:
> Oh, actually it's so easy. Thanks.
>
> --
> With best regards,
> Alexander Korotkov.
>
> On Thu, Jun 16, 2011 at 10:26 PM, Heikki Linnakangas <
> heikki.li
Oh, actually it's so easy. Thanks.
--
With best regards,
Alexander Korotkov.
On Thu, Jun 16, 2011 at 10:26 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:
> On 16.06.2011 21:13, Alexander Korotkov wrote:
>
>> My current idea is to measure n
My current idea is to measure number of IO accesses by pg_stat_statements
and measure CPU usage by /proc/PID/stat. Any thoughts?
--
With best regards,
Alexander Korotkov.
On Thu, Jun 16, 2011 at 1:33 PM, Alexander Korotkov wrote:
> Actually, I would like to measure CPU and IO l
Actually, I would like to measure CPU and IO load independently for more
comprehensive benchmarks. Can you advice me some appropriate tools for it?
--
With best regards,
Alexander Korotkov.
don't know now to
do that on Linux.
--
With best regards,
Alexander Korotkov.
On Wed, Jun 15, 2011 at 11:21 AM, Alexander Korotkov
wrote:
> I've tried index tuples sorting on penalty function before buffer
> relocation on split. But it was without any success. Index quality becomes
> even worse than without sorting.
> The next thing I've tried is buf
16 rows=855 loops=1)
Index Cond: (v <@ '(0.503,0.503),(0.5,0.5)'::box)
Buffers: shared hit=240
Total runtime: 3.043 ms
(7 rows)
--
With best regards,
Alexander Korotkov.
gist_fast_build-0.1.0.patch.gz
Description: GNU Zip compressed data
--
Version of patch with few more comments and some fixes.
--
With best regards,
Alexander Korotkov.
arrayanalyze-0.4.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http
/
> number-of-elements-in-const) and consider at most that many MCVs.
> When this limit kicks in you'll get a less-accurate selectivity
> estimate, but that's a reasonable price to pay for not blowing out
> planning time.
Good option. I'm going to add such condition to my patch.
--
With best regards,
Alexander Korotkov.
be clarified more, please, specify them.
> And of course, a few lines for the docs also.
>
I found that in statistics patch for tsvector only article about pg_stats
view was corrected. I've corrected this article a little bit too.
--
With best regards,
Alexander Korotkov.
ar
On Mon, Jun 6, 2011 at 4:14 PM, Alexander Korotkov wrote:
> If I succeed then it will invoke even more such calls.
>
I meant here that if I succeed in enhancements which improve index quality
then fast build algorithm will invoke even more such calls.
--
With best regards,
Alexander Korotkov.
now gist fast
build algorithm invokes more penalty calls than repeatable insert algorithm.
If I succeed then it will invoke even more such calls. So, if penalty
function is very slow then gist fast build will be slover then
repeatable insert.
--
With best regards,
Alexander Korotkov.
the wiki page, just
> to see what kind of workloads have been taken into consideration and tested
> already. Do you think there's some worst-case data distributions where this
> algorithm would perform particularly badly?
I don't expect some bad cases in terms in IO. My most worrying is about
index quality which is strongly related to data distribution.
--
With best regards,
Alexander Korotkov.
-
With best regards,
Alexander Korotkov.
gist_fast_build-0.0.3.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
m each time entry with maximal
difference of inserion cost is inserted. Quadratic algorithm runs slowly
than sorting one, but on my tests it shows slightly better results.
--
With best regards,
Alexander Korotkov.
adratic one. Picksplit will take
more time, but it should give more stable and predictable result.
3) I had some experiments with my own picksplit algorithm, which showed
pretty good results on tests which I've run. But current implementation is
dirty and it's require more testing.
---
(A,B) = 3
penalty(B,A) = 0
--
With best regards,
Alexander Korotkov.
48 bits virtual
power management:
------
With best regards,
Alexander Korotkov.
*** a/doc/src/sgml/gist.sgml
--- b/doc/src/sgml/gist.sgml
***
*** 377,383 my_decompress(PG_FUNCTION_ARGS)
Returns a value indicating the cost of inserting the new
entry into a particular
(point <@ '(0.505,0.505),(0.5,0.5)'::box)
Buffers: shared hit=44
-> Bitmap Index Scan on test_idx (cost=0.00..44.10 rows=1000 width=0)
(actual time=0.966..0.966 rows=24 loops=1)
Index Cond: (point <@ '(0.505,0.505),(0.5,0.5)'::box)
Buffers: sh
ifies existing code a bit.
Heikki advice me that since this change simplifies existing code it can be
considered as a separate patch.
--
With best regards,
Alexander Korotkov.
gist_childoffnum.path
Description: Binary data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.or
I forgot to commit before diff. Here is correct version.
--
With best regards,
Alexander Korotkov.
arrayanalyze-0.2.1.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http
x27;m warrying that there could be too
expensibe calculations for estimation. Also, likely current comments don't
clearify anything...
--
With best regards,
Alexander Korotkov.
arrayanalyze-0.2.patch.gz
Description: GNU Zip compressed data
--
Sent via pgsql-hackers mailing list (p
After root split path will be 0-new_page-234.
>
> If algorithm could be able to change root then new path could be looked as
> new_root-new_page-234 because old root could be splitted to old_root_page
> and new_page.
Ok. Thank you for explanation.
With best regards,
Alexander Korotkov.
* rare situation when root splited
> */
>break;
As I understood it's because we can't move root to another page.
With best regards,
Alexander Korotkov.
?
With best regards,
Alexander Korotkov.
ldn't be
many inserts into such pages in future then they will be stay bloat.
With best regards,
Alexander Korotkov.
hreshold achived array is converted to bitmap). If this problem is urgent,
I can write a patch with opclass that would seem more suitable to be default
to me, when I'll have a time for it.
With best regards,
Alexander Korotkov.
like this?
What opclass is used for GiST index: gist__int_ops or gist__intbig_ops?
Do you take into account that gist__int_ops is very inefficient for large
datasets?
With best regards,
Alexander Korotkov.
On Tue, Apr 26, 2011 at 1:10 PM, Alexander Korotkov wrote:
> Since algorithm is focused to reduce I/O, we should expect best
> acceleration in the case when index doesn't fitting to memory. Size of
> buffers is comparable to size of whole index. It means that if we can hold
>
y of R-tree bulk loading algorithms in the literature, it should be
> possible to adapt some of them to GiST.
>
> [1] http://dx.doi.org/10.1007/s00453-001-0107-6
Yes, these priority seems very reasonable. We should have first
effectiveness confirmation as soon as possible. I'll hold on this priority.
With best regards,
Alexander Korotkov.
an use it in initial calculations. I'm going to hold on this
assumption in first implementation.
With best regards,
Alexander Korotkov.
unction.
With best regards,
Alexander Korotkov.
g and 3-grams 'aaa' and 'aab' from
second string (for simplicity, there is no padding here). GIN index will
contain structure, which can be represented so:
'aaa' => 1, 2
'aab' => 2
We can see, that there are 2 unique 3-grams, but 3 links to the rows.
With best regards,
Alexander Korotkov.
(about 120 millions of
links), while size of q-grams itself will be almost ignorable in comparison
with it.
With best regards,
Alexander Korotkov.
ins btree on gin keys, i.e. q-grams), while data pages
number (which contains links to rows in lists or btrees) will be similar. In
dependence on data volume index size can be 10x larger (on small datasets)
or few percents larger (on large datasets).
With best regards,
Alexander Korotkov.
On Mon, Apr 4, 2011 at 9:01 PM, Robert Haas wrote:
> On Mon, Apr 4, 2011 at 12:38 PM, Alexander Korotkov
> wrote:
> > relatively small when q <= 5. Accordingly, I think we should expect
> indexes
> > to be usable with at least with q = 5.
>
> I defer to your opinio
On Mon, Apr 4, 2011 at 7:04 PM, Robert Haas wrote:
> On Mon, Apr 4, 2011 at 7:16 AM, Alexander Korotkov
> wrote:
> > Project name
> > Fast GiST index build
>
> Would/could/should this be implemented in a manner similar to the
> existing "GIN fast update" fe
On Mon, Apr 4, 2011 at 6:56 PM, Robert Haas wrote:
> On Mon, Apr 4, 2011 at 7:35 AM, Alexander Korotkov
> wrote:
> > I would like to propose a q-gram module which would have following
> > differences in comparison with pg_trgm:
> > 1) Focus on acceleration of edit di
vastava. Approximate String Joins in a Database
(Almost) for Free. VLDB '01 Proceedings of the 27th International Conference
on Very Large Data Bases
With best regards,
Alexander Korotkov.
ers
from the International Workshop on Algorithm Engineering and Experimentation
With best regards,
Alexander Korotkov.
nt on
> such stuff. (It sounds reasonable to me, but I wouldn't know if there
> are problems in the idea.) They may be too busy right at the moment.
>
Thank you for reply. I'm going to ask Oleg and Teodor for their feedback.
With best regards,
Alexander Korotkov.
On Fri, Mar 25, 2011 at 8:32 PM, Alexander Korotkov
wrote:
> I would like to ask you about currency of the work above.
This seems to be a mess of words. Sorry for my bad english. Actually,
I meant that I need a appraisal of my proposal.
With best regards,
Alexander Korotkov.
--
Sent
, where do you like to see it: separate project, contrib module,
core (of course, in the case when code have sufficient quality)?
I have stong confidence level about implementability of this project
in few month. That's why I could propose this as an GSoC project.
With best regards,
Alex
ect related to indexing.
With best regards,
Alexander Korotkov.
On Tue, Mar 8, 2011 at 9:44 AM, Selena Deckelmann wrote:
> Hi!
>
> PostgreSQL is applying for GSoC again this year. We're looking for:
>
> * Mentors
> * Project ideas
>
> Would you like to mentor?
gm indexes
is not mentioned here.
With best regards,
Alexander Korotkov.
ume
something about data distribution in tsvector. But we don't know almost
nothing about data in arrays because it can hold any data (tsvector also can
holds any data, but it using for non nutural language data is out of
purpose).
--
With best regards,
Alexander Korotkov.
arrayanalyze-0
901 - 1000 of 1051 matches
Mail list logo