Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-09-14 Thread Björn Harrtell
FYI, got an initial implementation of
http://wiki.postgresql.org/wiki/HTTP_API done in Java (intended to run as a
servlet) at https://github.com/bjornharrtell/jdbc-http-server. Feedback is
welcome :)

Regards,

Björn

2014-09-03 1:19 GMT+02:00 Álvaro Hernández Tortosa :

>
> On 02/09/14 04:47, Dobes Vandermeer wrote:
>
>
>
>
>> Same idea as PgBouncer or PgPool. The advantage over hacking
>> PgBouncer/PgPool for the job is that Tomcat can already do a lot of what
>> you want using built-in, pre-existing functionality. Connection pool
>> management, low level REST-style HTTP processing, JSON handling etc are
>> all done for you.
>>
>
>  Yeah, those are nice conveniences but I still think installing Java and
> getting something to run on startup is a bit more of a hurdle.  Better maek
> life easier up front by having a simple standalone proxy you can compile
> and run with just whatever is already available on a typical AWS ubuntu
> environment.
>
>
> If instead of Tomcat you use Jetty, you can embed the whole
> app+Jetty+dependencies in a single executable JAR, which easies deployment
> a lot. Installing a JVM in a Ubuntu environment is just one apt-get and
> even easier if you use CloudFormation for automation. I don't think is a
> bad choice at all... you get most of the functionality you want already
> there, as Craig said, and it's lightweight.
>
> Hope it helps,
>
> Álvaro
>
>


Re: [HACKERS] Patch: regexp_matches variant returning an array of matching positions

2014-01-29 Thread Björn Harrtell
I'll elaborate on the use case. I have OCR scanned text for a large amounts
of images, corresponding to one row per image. I want to match against
words in another table. I need two results sets, one with all matched words
and one with only the first matched word within the first 50 chars of the
OCR scanned text. Having the matched position in the first result set makes
it easy to produce the second.

I cannot find the position using the substring because I use word
boundaries in my regexp.

Returning a SETOF named composite makes sense, so I could try to make such
a function instead if there is interest. Perhaps a good name for such a
function would be simply regexp_match och regexp_search (as in python).

/Björn


2014-01-29 David Johnston 

> Alvaro Herrera-9 wrote
> > Björn Harrtell wrote:
> >> I've written a variant of regexp_matches called regexp_matches_positions
> >> which instead of returning matching substrings will return matching
> >> positions. I found use of this when processing OCR scanned text and
> >> wanted
> >> to prioritize matches based on their position.
> >
> > Interesting.  I didn't read the patch but I wonder if it would be of
> > more general applicability to return more info in a fell swoop a
> > function returning a set (position, length, text of match), rather than
> > an array.  So instead of first calling one function to get the match and
> > then their positions, do it all in one pass.
> >
> > (See pg_event_trigger_dropped_objects for a simple example of a function
> > that returns in that fashion.  There are several others but AFAIR that's
> > the simplest one.)
>
> Confused as to your thinking. Like regexp_matches this returns "SETOF
> type[]".  In this case integer but text for the matches.  I could see
> adding
> a generic function that returns a SETOF named composite (match varchar[],
> position int[], length int[]) and the corresponding type.  I'm not
> imagining
> a situation where you'd want the position but not the text and so having to
> evaluate the regexp twice seems wasteful.  The length is probably a waste
> though since it can readily be gotten from the text and is less often
> needed.  But if it's pre-calculated anyway...
>
> My question is what position is returned in a multiple-match situation? The
> supplied test only covers the simple, non-global, situation.  It needs to
> exercise empty sub-matches and global searches.  One theory is that the
> first array slot should cover the global position of match zero (i.e., the
> entire pattern) within the larger document while sub-matches would be
> relative offsets within that single match.  This conflicts, though, with
> the
> fact that _matches only returns array elements for () items and never for
> the full match - the goal in this function being parallel un-nesting. But
> as
> nesting is allowed it is still possible to have occur.
>
> How does this resolve in the patch?
>
> SELECT regexp_matches('abcabc','((a)(b)(c))','g');
>
> David J.
>
>
>
>
>
>
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/Patch-regexp-matches-variant-returning-an-array-of-matching-positions-tp5789321p5789414.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
>


[HACKERS] Patch: regexp_matches variant returning an array of matching positions

2014-01-28 Thread Björn Harrtell
I've written a variant of regexp_matches called regexp_matches_positions
which instead of returning matching substrings will return matching
positions. I found use of this when processing OCR scanned text and wanted
to prioritize matches based on their position.

The patch is for discussion. I'd also appriciate general suggestions as
this is my first experience with the postgresql code base.

The patch is against the master branch and includes a simple regression
test.
*** /tmp/DQoMjJ_regexp.c2014-01-28 19:59:37.470271459 +0100
--- src/backend/utils/adt/regexp.c  2014-01-28 19:44:47.298288383 +0100
***
*** 113,118 
--- 113,119 
 bool ignore_degenerate);
  static void cleanup_regexp_matches(regexp_matches_ctx *matchctx);
  static ArrayType *build_regexp_matches_result(regexp_matches_ctx *matchctx);
+ static ArrayType *build_regexp_matches_positions_result(regexp_matches_ctx 
*matchctx);
  static Datum build_regexp_split_result(regexp_matches_ctx *splitctx);
  
  
***
*** 833,838 
--- 834,898 
return regexp_matches(fcinfo);
  }
  
+ 
+ /*
+  * regexp_matches_positions()
+  *Return a table of matched locations of a pattern within a 
string.
+  */
+ Datum
+ regexp_matches_positions(PG_FUNCTION_ARGS)
+ {
+   FuncCallContext *funcctx;
+   regexp_matches_ctx *matchctx;
+ 
+   if (SRF_IS_FIRSTCALL())
+   {
+   text   *pattern = PG_GETARG_TEXT_PP(1);
+   text   *flags = PG_GETARG_TEXT_PP_IF_EXISTS(2);
+   MemoryContext oldcontext;
+ 
+   funcctx = SRF_FIRSTCALL_INIT();
+   oldcontext = 
MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+ 
+   /* be sure to copy the input string into the multi-call ctx */
+   matchctx = setup_regexp_matches(PG_GETARG_TEXT_P_COPY(0), 
pattern,
+   
flags,
+   
PG_GET_COLLATION(),
+   
false, true, false);
+ 
+   /* Pre-create workspace that 
build_regexp_matches_positions_result needs */
+   matchctx->elems = (Datum *) palloc(sizeof(Datum) * 
matchctx->npatterns);
+   matchctx->nulls = (bool *) palloc(sizeof(bool) * 
matchctx->npatterns);
+ 
+   MemoryContextSwitchTo(oldcontext);
+   funcctx->user_fctx = (void *) matchctx;
+   }
+ 
+   funcctx = SRF_PERCALL_SETUP();
+   matchctx = (regexp_matches_ctx *) funcctx->user_fctx;
+ 
+   if (matchctx->next_match < matchctx->nmatches)
+   {
+   ArrayType  *result_ary;
+ 
+   result_ary = build_regexp_matches_positions_result(matchctx);
+   matchctx->next_match++;
+   SRF_RETURN_NEXT(funcctx, PointerGetDatum(result_ary));
+   }
+ 
+   /* release space in multi-call ctx to avoid intraquery memory leak */
+   cleanup_regexp_matches(matchctx);
+ 
+   SRF_RETURN_DONE(funcctx);
+ }
+ 
+ /* This is separate to keep the opr_sanity regression test from complaining */
+ Datum
+ regexp_matches_positions_no_flags(PG_FUNCTION_ARGS)
+ {
+   return regexp_matches_positions(fcinfo);
+ }
+ 
  /*
   * setup_regexp_matches --- do the initial matching for regexp_matches()
   *or regexp_split()
***
*** 1035,1040 
--- 1095,1140 
  }
  
  /*
+  * build_regexp_matches_positions_result - build output array for current 
match
+  */
+ static ArrayType *
+ build_regexp_matches_positions_result(regexp_matches_ctx *matchctx)
+ {
+   Datum  *elems = matchctx->elems;
+   bool   *nulls = matchctx->nulls;
+   int dims[1];
+   int lbs[1];
+   int loc;
+   int i;
+ 
+   /* Extract matching substrings from the original string */
+   loc = matchctx->next_match * matchctx->npatterns * 2;
+   for (i = 0; i < matchctx->npatterns; i++)
+   {
+   int so = matchctx->match_locs[loc++];
+   int eo = matchctx->match_locs[loc++];
+ 
+   if (so < 0 || eo < 0)
+   {
+   elems[i] = (Datum) 0;
+   nulls[i] = true;
+   }
+   else
+   {
+   elems[i] = Int32GetDatum(so)+1;
+   nulls[i] = false;
+   }
+   }
+ 
+   /* And form an array */
+   dims[0] = matchctx->npatterns;
+   lbs[0] = 1;
+   /* XXX: this hardcodes assumptions about the int4 type */
+   return construct_md_array(elems, nulls, 1, dims, lbs,
+ INT4OID, 4, true, 
'i');
+ }
+ 
+ /*
   * regexp_split

[HACKERS] HTTP API experimental implementation

2012-07-10 Thread Björn Harrtell
Hey all,

I've begun an implementation of the proposed HTTP API [1] (with some
changes) using node.js

The project lives at
https://github.com/bjornharrtell/postgresql-http-server and
basic functionality is in place.

Feedback appriciated!

[1] http://wiki.postgresql.org/wiki/HTTP_API

Regards

/Björn Harrtell