Re: Read data from Postgres table pages

2024-03-19 Thread Sushrut Shivaswamy
>
>
> lol, thanks for the inputs Alexander :)!


Re: Read data from Postgres table pages

2024-03-19 Thread Alexander Korotkov
On Tue, Mar 19, 2024 at 4:48 PM Sushrut Shivaswamy
 wrote:
>
> If we query the DB directly, is it possible to know which new rows have been 
> added since the last query?
> Is there a change pump that can be latched onto?

Please, check this.
https://www.postgresql.org/docs/current/logicaldecoding.html

> I’m assuming the page data structs are encapsulated in specific headers which 
> can be used to list / read pages.
> Why would Postgres need to be stopped to read the data? The read / query path 
> in Postgres would also be reading these pages when the instance is running?

I think this would be a good point to start studying.
https://www.interdb.jp/
The information there should be more than enough to forget this idea forever :)

--
Regards,
Alexander Korotkov




Re: Read data from Postgres table pages

2024-03-19 Thread Sushrut Shivaswamy
If we query the DB directly, is it possible to know which new rows have been 
added since the last query?
Is there a change pump that can be latched onto?

I’m assuming the page data structs are encapsulated in specific headers which 
can be used to list / read pages.
Why would Postgres need to be stopped to read the data? The read / query path 
in Postgres would also be reading these pages when the instance is running?



Re: Read data from Postgres table pages

2024-03-19 Thread Alexander Korotkov
On Tue, Mar 19, 2024 at 4:35 PM Sushrut Shivaswamy
 wrote:
> The binary I"m trying to create should automatically be able to read data 
> from a postgres instance without users having to
> run commands for backup / pg_dump etc.
> Having access to the appropriate source headers would allow me to read the 
> data.

Please, avoid the top-posting.
https://en.wikipedia.org/wiki/Posting_style#Top-posting

If you're looking to have a separate binary, why can't your binary
just *connect* to the postgres database and query the data?  This is
what pg_dump does, you can just do the same directly.  pg_dump doesn't
access the raw data.

Trying to read raw postgres data from the separate binary looks flat
wrong for your purposes.  First, you would have to replicate pretty
much postgres internals inside. Second, you can read the consistent
data only when postgres is stopped or didn't do any modifications
since the last checkpoint.

--
Regards,
Alexander Korotkov




Re: Read data from Postgres table pages

2024-03-19 Thread Sushrut Shivaswamy
The binary I"m trying to create should automatically be able to read data
from a postgres instance without users having to
run commands for backup / pg_dump etc.
Having access to the appropriate source headers would allow me to read the
data.

On Tue, Mar 19, 2024 at 8:03 PM Sushrut Shivaswamy <
sushrut.shivasw...@gmail.com> wrote:

> I'd like to read individual rows from the pages as they are updated and
> stream them to a server to create a copy of the data.
> The data will be rewritten to columnar format for analytics queries.
>
> On Tue, Mar 19, 2024 at 7:58 PM Alexander Korotkov 
> wrote:
>
>> Hi
>>
>> On Tue, Mar 19, 2024 at 4:23 PM Sushrut Shivaswamy
>>  wrote:
>> > I'm trying to build a postgres export tool that reads data from table
>> pages and exports it to an S3 bucket. I'd like to avoid manual commands
>> like pg_dump, I need access to the raw data.
>> >
>> > Can you please point me to the postgres source header / cc files that
>> encapsulate this functionality?
>> >  - List all pages for a table
>> > - Read a given page for a table
>> >
>> > Any pointers to the relevant source code would be appreciated.
>>
>> Why do you need to work on the source code level?
>> Please, check this about having a binary  copy of the database on the
>> filesystem level.
>> https://www.postgresql.org/docs/current/backup-file.html
>>
>> --
>> Regards,
>> Alexander Korotkov
>>
>


Re: Read data from Postgres table pages

2024-03-19 Thread Sushrut Shivaswamy
I'd like to read individual rows from the pages as they are updated and
stream them to a server to create a copy of the data.
The data will be rewritten to columnar format for analytics queries.

On Tue, Mar 19, 2024 at 7:58 PM Alexander Korotkov 
wrote:

> Hi
>
> On Tue, Mar 19, 2024 at 4:23 PM Sushrut Shivaswamy
>  wrote:
> > I'm trying to build a postgres export tool that reads data from table
> pages and exports it to an S3 bucket. I'd like to avoid manual commands
> like pg_dump, I need access to the raw data.
> >
> > Can you please point me to the postgres source header / cc files that
> encapsulate this functionality?
> >  - List all pages for a table
> > - Read a given page for a table
> >
> > Any pointers to the relevant source code would be appreciated.
>
> Why do you need to work on the source code level?
> Please, check this about having a binary  copy of the database on the
> filesystem level.
> https://www.postgresql.org/docs/current/backup-file.html
>
> --
> Regards,
> Alexander Korotkov
>


Re: Read data from Postgres table pages

2024-03-19 Thread Alexander Korotkov
Hi

On Tue, Mar 19, 2024 at 4:23 PM Sushrut Shivaswamy
 wrote:
> I'm trying to build a postgres export tool that reads data from table pages 
> and exports it to an S3 bucket. I'd like to avoid manual commands like 
> pg_dump, I need access to the raw data.
>
> Can you please point me to the postgres source header / cc files that 
> encapsulate this functionality?
>  - List all pages for a table
> - Read a given page for a table
>
> Any pointers to the relevant source code would be appreciated.

Why do you need to work on the source code level?
Please, check this about having a binary  copy of the database on the
filesystem level.
https://www.postgresql.org/docs/current/backup-file.html

--
Regards,
Alexander Korotkov




Read data from Postgres table pages

2024-03-19 Thread Sushrut Shivaswamy
Hey,

I'm trying to build a postgres export tool that reads data from table pages
and exports it to an S3 bucket. I'd like to avoid manual commands like
pg_dump, I need access to the raw data.

Can you please point me to the postgres source header / cc files that
encapsulate this functionality?
 - List all pages for a table
- Read a given page for a table

Any pointers to the relevant source code would be appreciated.

Thanks,
Sushrut