Re: [HACKERS] Schedule for 8.1 feature freeze

2005-06-21 Thread Victor Y. Yegorov
I'll do my best to submit bitmap index AM patch next week for your review.


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] make distclean keeps some files

2005-06-20 Thread Victor Y. Yegorov
Playing with diffs, I've noticed, that after `make distclean` command some
files are still present in the source tree. They're were not there before
./configure  make.

They are:
src/backend/bootstrap/bootparse.c
src/backend/bootstrap/bootscanner.c
src/backend/bootstrap/bootstrap_tokens.h
src/backend/parser/gram.c
src/backend/parser/parse.h
src/backend/parser/scan.c
src/backend/utils/misc/guc-file.c
src/bin/psql/psqlscan.c
src/bin/psql/sql_help.h
src/interfaces/ecpg/preproc/pgc.c
src/interfaces/ecpg/preproc/preproc.c
src/interfaces/ecpg/preproc/preproc.h
src/interfaces/libpq/blibpqdll.def
src/interfaces/libpq/libpqddll.def
src/interfaces/libpq/libpqdll.def
src/interfaces/libpq/libpq.rc
src/pl/plpgsql/src/pl_gram.c
src/pl/plpgsql/src/pl_scan.c
src/pl/plpgsql/src/pl.tab.h

Are they kept intentionally?


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] adding new pages bulky way

2005-06-07 Thread Victor Y. Yegorov
* Tom Lane [EMAIL PROTECTED] [07.06.2005 07:59]:
 Why bother?  Just write each page when you need to --- there's no law
 that says you must use P_NEW.

This means 2 things:
1) I cannot mix P_NEW and exact-number ReadBuffer() calls;
2) thus, I have to track next-block-number myself.

Is it so?


BTW, are there any differences in buffer seeking speed, if buffer
block-numbers are mixed and if they're not (i.e. P_NEW is used)?


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] adding new pages bulky way

2005-06-07 Thread Victor Y. Yegorov
* Alvaro Herrera [EMAIL PROTECTED] [08.06.2005 00:39]:
 Huh, why?  You need to grab the relation extension block
 (LockRelationForExtension in CVS tip).

Really? Didn't knew that. 

Consider:
1) I add 2 pages to the newly-created relation
   using P_NEW as BlockNumber;
2) then I do LockRelationForExtension; ReadBuffer(135) and
   UnockRelationForExtension.

What BlockNumber will be assigned to the buffer, if I call
ReadBuffer(P_NEW) now? 136?


BTW, is it OK to say BlockNumber is assigned to buffer?


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] adding new pages bulky way

2005-06-06 Thread Victor Y. Yegorov
I need your advice.

For on-disk bitmap I run a list of TIDs.

TIDs are stored in pages as an array, page's opaque data holds an array of
bits, indicating whether corresponding TID has been deleted and should be
skipped during the scan.

Pages, that contain TIDs list, are organized in extents, each extent has 2^N
pages, where N is extent's number (i.e. 2nd extent will occupy 4 pages).
Given that I know number of TIDs, that fit into one page, and the TID's
sequential number, I can easily calculate:
-  extent number TID belongs to;
-  page offset inside that extent, and;
-  TID place in the page.

At the moment, I store BlockNumber of the extent's first page in the
metapage and allocate all pages that belongs to that extent sequentially. I
need to do so to minimize number of page reads when searching for the TID in
the list; I'll need to read 1 page at most to find out TID at given position
during the scan. I hope you understood the idea.

This also means, that while extent's pages are being added this way, no other
pages can be added to the index. And the higher is extent's number, the more
time it'll take to allocate all pages.

The question is: allocating pages this way is really ugly, I understand. Is
there some API that would allow allocating N pages in the bulk way?
Maybe this is a know problem, that has been already solved before?
Any other ideas?


Thanks in advance!


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] Best way to scan on-disk bitmaps

2005-05-12 Thread Victor Y. Yegorov
Greetings.

I have questions on how to implement on-disk bitmap scan.


I've been working on on-disk bitmaps as an ordinary Index Access Method, but
now it's clear to me, that it'll lose all it's strengths this way. One on-disk
bitmap has exactly one list of indexed table's TIDs and potentially unlimited
number of bitmaps (number of index attributes multiplied by attribute's
cardinality, to be precise).

So, for better performance, one should first retrieve all the needed bitmaps
from the index, then join all bitmaps according to the query clauses and, as
the last phase, retrieve TIDs from index, that matches final bitmap.

According to the docs the index access method is responsible for
regurgitating the TIDs ..., but for on-disk bitmaps index scan is devided
into 3 phases. So, to perform the scan in phases, to my mind, executor
should be involved. (I'd like to mention again, that this is the first time
I got so deep inside postgres code).

I wanted to use Tom's nodeBitmap* stuff, but it's problematic a bit. Bitmaps
in nodeBitmap* are built upon list of TIDs retrieved during relation scan.
For on-disk bitmap indexes, there's no need for that, as all bitmaps are
already inside the index.

The question is: Is it possible to extend nodeBitmap functionality in such a
way, that Executor can either build bitmap after list of TIDs, obtained from
RelationScan, or ask index access method to give bitmaps it contain (and TIDs
at given position in the map later)?
This will, probably, require more functions in the pg_am catalog.

Or should I create a completely new node for on-disk bitmaps?


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Best way to scan on-disk bitmaps

2005-05-12 Thread Victor Y. Yegorov
* Tom Lane [EMAIL PROTECTED] [12.05.2005 23:09]:
 1. Be sure that all the indexable WHERE conditions are passed to the
indexscan as indexquals.  This might be, say,
   WHERE a = 42 and b = 'foo'

If I have on-disk bitmap
ON (a, b, c)
will the planner pick an index scan for
WHERE a = 42 AND b = 'foo'
(i.e. only part of the index attributes are involved)? Any modifications
needed to achieve this functionality?

To my mind, bitmap scan even for 1 attribute of a multi-column index would be
a win, though I haven't tested this yet.


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Comparing Datum's at aminsert() stage

2005-04-19 Thread Victor Y. Yegorov
* Tom Lane [EMAIL PROTECTED] [19.04.2005 19:48]:
 That's probably what you *have* to use, since the normal deconstructors
 assume they are working with heap tuples, which are different.  But I
 don't understand why you are waiting till after the index tuple is
 formed.  The aminsert function gets an array of Datums to start with.
 Why not do it there?

Well, I need that exactly in aminsert.

Each value is stored only once in the index (along with it's own
series-of-bits). Thus, I need to compare each Datum from aminsert()'s
array with the existing ones.

Also, I cannot form tuple the ordinary way (I need all values separated),
so I copy each TuplDesc-attrs[i] into temporary TupleDesc (1 attribute big)
and call heap_fill_tuple(). Actually, I'm not sure this is the right way...

I think, storing some kind of hash-value from the Datum is a good idea, but
it's need to be unique. Is it possible with any existing API?


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] Comparing Datum's at aminsert() stage

2005-04-18 Thread Victor Y. Yegorov
Hi!

For bitmaps, I need to search each value being inserted (for each column of
the index) in the list of already existing values (stored in index's header
area). To do that I need:

1) create Datum from PageItem's value (I store each value in it's own
   PageItem);
2) compare newly inserted Datum with on-disk existing one.

For hash access method (maybe others too, haven't checked), this is done
via index_keytest() function. But it uses ScanKey, there's no such
at aminsert() stage.


So, I'd like to ask -- what is the reverse function for heap_fill_tuple(),
is it OK to use index_getattr()?
And how do I compare 2 Datums? I need FmgrInfo pointer for the equality
operator of the corresponding data type. Are there any API calls to obtain one?

Thank you.


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] bitmap AM design

2005-03-04 Thread Victor Y. Yegorov
Some more thoughts and questions.

The main idea above bitmaps is narrowing some data sets' possible values to
yes and no (i.e. 1 and 0) and then organize the data in the series of
bits, where bit's position determines values to consider. In the cases, where
several indexed attributes are used in WHERE clause, it's possible to do
logical AND/OR on bitmaps before returning any results to the caller. For
large tables with high number of low-cardinality attributes using bitmaps can
result in certain speed-up.

For on-disk bitmaps I'm working on, each value of each indexed attribute has
it's own bitmap (i.e. series of bits, with bits set to 1 for rows with
corresponding fields having value of that bitmap). Scanning the bitmap, we end
up with an array of 1-bits' positions and need to convert those positions to
CTIDs, as executor is expecting. So, index should also keep a CTID table, that
corresponds to the bitmap's data. This CTID table will be the same for all
bitmap indexes, created for one table, thus having 2 bitmap indexes will mean
you're wasting some amount of disk space, storing absolutely identical data.
So, to save space, we have 2 possibilities:
1) create a new relkind for the CTID table (maybe used not only for on-disk
   bitmaps);
2) make all create index ... using bitmap statements actually create/extend
   existing bitmap index. This also implies, that planner/executor should
   try using multicolumn bitmap index when at least one indexed field is
   present in the WHERE clause.

I'm working on the 2nd case, because 1st one requires more work not only in
the access method + executor + planner area. It is also possible to keep
things as is and make a note in the documentation, that it is better to have 1
multicolumn bitmap index, then several single column ones, and that planner
will still use multicolumn index even if not all columns are involved.

Any comments/ideas here?


After implementing bitmap index access method, it'll be necessary to teach
planner and executor to use multicolumn bitmaps for any number of
scan-attributes. Also, I cannot say in what circumstances planner should
prefer bitmap scan to seqscan; I thought of cases, when it estimates return
set being about 60% of the relation. What community has to say here?


Also, as Tom is planning to work on in-memory bitmaps (maybe something is
done already, don't know), I thought that it can be possible to cooperate.
As I have no clue at the moment how in-memory bitmaps are going to work, is it
possible to hear from you some draft of the forthcoming implementation?
And what prerequisites would be required to join both bitmaps somehow?

Waiting for your thoughts/comments.


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] bitmap AM design

2005-03-01 Thread Victor Y. Yegorov
* Tom Lane [EMAIL PROTECTED] [01.03.2005 09:37]:
 The other stuff you mentioned implies that an insertion therefore
 requires exclusive lock on the index (because you may have to relocate
 everything in sight to add one more CTID slot).

No, exclusive lock on index is worst thing to do.

All lists (list of ctids, bitmaps) will only grow, no data will be deleted, as
deletes will require relocation and possibly exclusive lock on the index.

Extending lists will need only a short-term exclusive locks on the pages in
the tails of each list.


 I can't believe you are seriously suggesting that it's OK to force every
 VACUUM to rebuild the index from scratch.  We already get far too many
 complaints about the time needed for VACUUM.
 
 I don't think we really need any more fundamentally nonconcurrent index
 types :-(

Well, I misunderstood the purpose of ambulkdelete function, my fault.

Of course, no index rebuild will take place, instead, only flags in the
list of CTIDs will be updated for deleted tuples.


I have counter question for you: you've mentioned, that you want bitmaps in
the 8.1. What kind of bitmaps you were speaking about? On-disk bitmaps (this
is how I call new index access method I'm working on) or in-memory bitmaps,
as in here http://archives.postgresql.org/pgsql-hackers/2005-01/msg01001.php


Thanks for your reply.


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] bitmap AM design

2005-03-01 Thread Victor Y. Yegorov
* Tom Lane [EMAIL PROTECTED] [01.03.2005 21:07]:
 Hmm, you seem to be envisioning that these are actually lists, not
 arrays --- that is, to find the N'th page in a list requires traversing
 list links starting at the first page.  That doesn't sound workable.
 If you can't access all the entries in roughly constant time then the
 thing is going to have problems with big tables.

Bitmaps are arays, you're right. But they are only accessed either when data
is inserted and gets added to the end (there're direct pointers to the last
page in each bitmap in the list of values), or during index scan. Scanning
index implies visiting all pages that forms the bitmap.

After scanning the bitmaps (and performing all logical operations as
requested), we end up with bit positions and need to return CTID from the
list, that resides in the given position in the list-of-CTIDs (yes, it's
actually one huge array, that occupies many pages).

List-of-CTIDs is organized into extents, each extent contains a known number
of pages and all pages for the extent are allocated sequentially. ID of the
first page of each extent is stored in the metapage. Also, it is known at
compile time how many CTIDs can be stored into one page. Given all that, it is
possible to compute ID of the page and CTID offset inside that page by CTID
offset, obtained at bitmap scan step.
Each extent has 2,4,8,16,32,etc. pages, so the metapage has enough space to
store an array of ID's for the first page of each extent.

Updating list-of-CTIDs is also cheap operation, as direct reference to the
last page in the list-of-CTIDs is stored in metapage.

The only list, that have drawbacks you've mentioned, is list-of-values. But,
as it contains only attributes' related data and pointers to start page of
corresponding bitmap, there won't be many pages in this list.


Maybe, there are some more insufficiencies I've missed?


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] bitmap AM design

2005-02-28 Thread Victor Y. Yegorov
Here's the design of bitmap AM I'm planning to implement. I've discussed it
with Neil, but I'm willing to get more feedback on it.


There are going to be 1 metapage, 1 list of CTIDs (LOC), one list
of attribute values (LOV, including attributes for multi-column indexes) and a
bitmap for each entry in the LOV. Metapage will contain pointers to the LOC,
LOV will always start at page-1 and is organized as a 1-way chained list.

Neil suggested a very good way how to handle updates. Of course, it's not
necessary to strictly follow tuple layout in the table's heap, as I wanted
to do initially. All that's needed, is that bit positions in bitmaps would
be tied with CTID positions in LOC.
So, we can do simple insert (generally, that's how MVCC works for tuple
updates): when some tuple is updated, new CTID is added to the LOC and each
bitmap is extended by 1 bit.
On the other side (as Neil pointed), after VACUUM, we need to do some
maintenance of bitmap index to avoid situations when index contains duplicate
entries (in LOC and bitmaps) for the same CTID (value before marking tuple for
reuse and after actually reusing it). So far, the fastest way would be
recreating index, because the whole index designed for faster search/insert
operations and lacks effective reverse mapping (CTID-bit position-bit value)
functionality.

List of CTIDs is organized into an array of extents. Each extent has 2**i
pages ('i' is extent number in array). All pages for extent are allocated at
once, ensuring their IDs are sequential. So, we need to store only
BufferNumber of the first page in extent. LOC pages contain array of
ItemPointerData, it's size is detected at compile time. So, CTID for given bit
position can be obtained via only one ReadBuffer() call.

LOV pages store arrays of value-descriptors, each descriptor has a pointer to
the head of value's bitmap. Bitmaps are organized as 1-way chained list,
bitmap contents is compressed using Word-Aligned Hybryd method (similar to
RLE).
Neil suggested the other way of organizing bitmap storage: all bits for
given position are stored in one page (if it lacks space, new page is added
at the bootom), so we'll have a 2-dimensional bitmap storage.
This reduces amount of page reads during index scan, but for low-cardinality
indexes, we'll waste a lot of space per page, if each CTIDs slice is stored in
one page. On the other hand, it'll be hard to extend bitmap if we'll store
several CTID slices per page and some new value will be inserted (i.e. CTID
slice needs to be increased).

At the moment, I would implement the easiest method -- equality encoding (as
it's called in papers, bitmap per value). Neil's suggested techniques are
called range encoding. Josh Berkus on the irc suggested implementing several
encoding schemes as an option to the create index sql command.
What do you think?

I haven't looked at planner/executor yet.


If you have some questions, please, ask. Also, I'd like to tell that this is
my first time coding for PostgreSQL, thus I may use incorrect terminology.
Also, it takes much time to write anything, for the same  reason. And yes,
I would really appreciate all kind of criticism on this design.

I've started to implement AM, I need to register am* functions, what OIDs
should use to register them in include/catalog/pg_proc.h?


Waiting for your feedback.


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] Implementing Bitmap Indexes

2005-01-29 Thread Victor Y. Yegorov
Hello.
I'd like to implement bitmap indexes and want your comments. Here is
an essence of what I've found regarding bitmaps for the last month.
Consider the following table   So, the bitmap for attribute A will be the
with 1 attribute A(int2):  following:
 # | A Val | Bitmap(s)
+---   -+---
 1 | 1   1 | 11011001 0111
 2 | 1   2 | 00100100 1000
 3 | 2   3 | 0010 
 4 | 1
 5 | 1
 6 | 2
 7 | 3
 8 | 1
 9 | 2
10 | 1
11 | 1
12 | 1
Some points:
1) If some new value will be inserted (say, 4) at some point of time, a new
  bitmap for it will be added. Same for NULLs (if atrribute has no NOT NULL
  contraint) --- one more bitmap. Or should we restrict NOT NULL for
  bitmap'ed attributes?;
2) Queries, like where A = 1 or where A != 2 will require only 1 scan of
  the index, while where A  3 will require 2 stages: 1st create a 
list of
  values lesser then 3, 2nd --- do OR of all bitmaps for that values.
  For high cardinality attributes, this can take a lot of time;

3) Each bitmap is only a bitmap, so there should be an array of 
corresponding
  ctids pointers. Maybe, some more arrays (pages, don't know).

For 2)nd --- there are techniques, allowing better performance for A  3
queries via increased storage space (see here for details:
http://delab.csd.auth.gr/papers/ADBIS03mmnm.pdf) and increased reaction time
for simple queries. I don't know, if they should be implemented, may later.
The most tricky part will be combinig multiple index scans on several
attributes --- as Neil Conway said on #postrgesql, this will be tricky, 
as some
modifications will be needed in the index scan api. I remember, Tom Lane
suggested on-disk bitmaps --- implementing bitmap index access method
would be of much use not only for bitmap indexes, I think.

WAH compressing method should be used for bitmaps (to my mind). Also, 
there is
a method of reordering heap tuples for better compression of bitmaps, I 
thought
it may be possible to implement it as some option to the existing CLUSTER
command, papers:
WAH: http://www-library.lbl.gov/docs/LBNL/496/26/PDF/LBNL-49626.pdf
CLUSTER: http://www.cse.ohio-state.edu/~hakan/publications/reordering.pdf

I'd like to hear from you, before starting to do something.
--
Victor
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Implementing Bitmap Indexes

2005-01-29 Thread Victor Y. Yegorov
* Tom Lane [EMAIL PROTECTED] [29.01.2005 18:24]:
 Victor Y. Yegorov [EMAIL PROTECTED] writes:
  I remember, Tom Lane suggested on-disk bitmaps
 
 I have suggested no such thing, and in fact believe that the sort of
 index structure you are proposing would be of very little use.

Why? I thought they would be useful for data warehouse databases.

Maybe I said something the wrong way, but what I'm trying to implement
is exactly what is said about in the first link you've posted below:
http://archives.postgresql.org/pgsql-hackers/2004-10/msg00439.php

Or am I misunderstanding the point?


 What I've been hoping to look into is *in memory* bitmaps used as an
 interface between index scans and the subsequent heap lookups.

Sorry, that was what I've been speaking of.

Anyway, bitmap indexes API could be used for in-memory bitmaps you're speaking
of.


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Implementing Bitmap Indexes

2005-01-29 Thread Victor Y. Yegorov
* Pawe Niewiadomski [EMAIL PROTECTED] [29.01.2005 17:45]:
  I'd like to implement bitmap indexes and want your comments. Here is
  an essence of what I've found regarding bitmaps for the last month.
 
 Do you think it would be possible to work on it as a team?

Yes, why not.

But everything depends on the community, may bitmaps will be realized as a
contrib or pgfoundry module. The only thing --- I don't know, if that is
possible for indexes.


-- 

Victor Y. Yegorov

---(end of broadcast)---
TIP 8: explain analyze is your friend