Re: Code of Conduct plan

2018-06-03 Thread Berend Tober

Tom Lane wrote:

Two years ago, there was considerable discussion about creating a
Code of Conduct for the Postgres community...

We are now asking for a final round of community comments...


I really like that this was included: "Any allegations that prove not to be substantiated...will be 
viewed as a serious community offense and a violation of this Code of Conduct."


Good attempt to prevent the CoC being used as vindictive weaponry.

I also like that you kept is short.

-- B




Per-document document statistics

2018-05-22 Thread Berend Tober


I'm trying to figure out how to compile text search statistics on a 
per-document basis.
While I successfully compute text search statistics for the entire corpus with 
a call
to ts_stat after having inserted all documents, what I also want is to run 
ts_stat on
the tsvector for each row so as to get the term frequency per document.

Sample code and comments follow.


-- Dumped from database version 9.5.7

CREATE DATABASE nlp;

\connect nlp

CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;

SET search_path = public, pg_catalog;

-- This table stores one document per row

CREATE TABLE document (
document_id serial primary key,
content text NOT NULL,
document_vector tsvector
);

-- This is the table I need help with how to to populate
-- with term frequency per document

CREATE TABLE document_statistics (
document_id integer primary key,
word text,
ndoc bigint, /* this will be one, since there is only one document */
nentry bigint /* this is the number of interest */
);


ALTER TABLE ONLY document_statistics
ADD CONSTRAINT document_statistics_document_id_fkey
FOREIGN KEY (document_id)
REFERENCES document(document_id);


CREATE FUNCTION document_bit() RETURNS trigger
LANGUAGE plpgsql
AS $$
BEGIN
-- Compile document statistics for each document upon insert

SELECT to_tsvector('simple', new.content) INTO new.document_vector; 
RETURN new;
END;
$$;

CREATE TRIGGER document_bit
BEFORE INSERT OR UPDATE ON document
FOR EACH ROW EXECUTE PROCEDURE document_bit();


-- Sample data

INSERT INTO document (content) VALUES ('Hello World!');
INSERT INTO document (content) VALUES ('The quick brown dog jumped over the 
lazy dog');
INSERT INTO document (content) VALUES ('One flew over the coo coo''s nest',);

-- Once all the individual documents are inserted, then
-- calculate overall corpus statistics

insert into corpus_statistics select * from ts_stat('select document_vector 
from document');


-- I'm thinking something like this proposed after insert trigger
-- is where I want to compute document statistics, but can't
-- figure out how to make it work

CREATE FUNCTION document_ait() RETURNS trigger
LANGUAGE plpgsql
AS $$
DECLARE
word_stat record;
BEGIN
/*
--Here's one pathetic try

FOR word_stat IN
select * from ts_stat('select * from ' || (new.document_vector))
LOOP
RAISE NOTICE '%' , word_stat;
INSERT INTO public.document_statistics(
document_id, word, ndoc, nentry)
VALUES (new.document_id, word_stat.word, word_stat.ndoc, 
word_stat.nentry);
END LOOP;
*/
RETURN new;
END;
$$;

CREATE TRIGGER document_ait AFTER INSERT ON document FOR EACH ROW EXECUTE 
PROCEDURE document_ait();



Re: When use triggers?

2018-05-18 Thread Berend Tober

Michael Stephenson wrote:

On Wed, May 16, 2018 at 6:36 PM, Adrian Klaver
  wrote:

On 05/16/2018 03:19 PM, hmidi slim wrote:

HI,

I'm working on a microservice application and I avoid using triggers
because they will not be easy to maintain and need an experimented person in
database administration to manage them. ...

Two benefits triggers and their associated functions offer, with the
understanding these are general statements:

1) Are in the database so tasks that always need happen do not need to be
replicated across the various interfaces that touch the database.

2) Also since they run on the server the data does not necessarily cross a
network, so less network bandwidth and better performance.

An example of a use case is table auditing. If you need to track what is
being INSERTed/UPDATEd/DELETEd in a table stick an audit trigger/function on
the table and push the information to an audit table.

As to managing, they are like any other code. I keep my schema code in
scripts under version control and deploy them from there. I use
Sqitch(http://sqitch.org/) which allows me to deploy and revert changes. I
use a dev database to test and troubleshoot triggers and functions.


The only appropriate use for triggers that I've ever found was for
auditing changes to tables. ...



A great use case for triggers and stored functions is data validation or cleaning. Sure, this could 
be done in the end-user application ... or, that is, in *all* end-user applications if you don't 
mind duplicating algorithmic implementations and you trust the applications to do it properly and 
the same way every time and non-maliciously.


Triggering a before-insert function allows for a single implementation to be used consistently for 
all inserts.


Also, while appropriately limiting access privileges for end-users, triggers can invoke functions 
that do stuff at a higher privilege level safely, such as, for example, creating data base roles for 
new users.


Here's a complicated but cool example:

First, there is a publicly-exposed but non-data-leaking view that allows unprivileged users to 
initiate inserts for account creation (and by unprivileged I mean really unprivileged, that is, end 
users that don't even have an associated database login role when they do the insert):



CREATE OR REPLACE VIEW public.fairian AS
 SELECT NULL::name AS fairian_name,
NULL::name AS passwd,
NULL::name AS email_address;

GRANT USAGE ON SCHEMA public TO public;
GRANT SELECT, INSERT ON TABLE public.fairian TO public;


A rule redirects inserts on the public dummy view to a protected intermediate view (i.e., a view 
contained in non-publicly-visible schema) on which new and unprivileged users do not have read, 
write, or update privileges, but since relations that are used due to rules get checked against the 
privileges of the rule owner, not the user invoking the rule, this actually works:


REVOKE ALL ON schema protected  FROM public;

CREATE OR REPLACE RULE fairian_iir AS
ON INSERT TO public.fairian DO INSTEAD
INSERT INTO protected.fairian (fairian_name, passwd, email_address)
VALUES (new.fairian_name, new.passwd, new.email_address);


The redirected insert triggers a function that drills the insert down one level further to the 
actual table that is visible only to the data base owner:



CREATE TRIGGER fairian_iit
  INSTEAD OF INSERT
  ON protected.fairian
  FOR EACH ROW
  EXECUTE PROCEDURE protected.fairian_iit();

CREATE OR REPLACE FUNCTION protected.fairian_iit()
  RETURNS trigger AS
$BODY$
BEGIN
-- Note, the password is not actually stored but there is some
-- validation ... look at the private trigger function

INSERT INTO private.fairian (fairian_name, passwd, email_address)
VALUES (new.fairian_name, new.passwd, new.email_address);

  RETURN NEW;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE SECURITY DEFINER;


When the insert is finally pushed all the way down to the actual table in the (non-publicly visible) 
private schema, there's a trigger function which prevents account name collisions by serializing 
inserts, that ensures the credentials meet certain requirements such as containing no white space 
and starting with a character, and (although omitted below) sanitizes against SQL injection:



REVOKE ALL ON schema private  FROM public;

CREATE TRIGGER fairian_bit
  BEFORE INSERT
  ON private.fairian
  FOR EACH ROW
  EXECUTE PROCEDURE private.fairian_bit();


CREATE OR REPLACE FUNCTION private.fairian_bit()
  RETURNS trigger AS
$BODY$
DECLARE
  _fairwinds private.fairwinds;
BEGIN
  -- Serializes account creation
  select * into _fairwinds from private.fairwinds for update;

-- Deny white space characters

if position(' ' in new.fairian_name) > 0 then
raise exception 'Fairian name may not contain spaces.';
end if;
if position(' ' in new.passwd) > 0 then
raise exception 'Password may not contain spaces.';
end if;