Re: [GENERAL] Longest Common Subsequence in Postgres - Algorithm Challenge

2013-07-09 Thread Misa Simic
On Monday, July 8, 2013, Robert James wrote:

> On 7/8/13, hubert depesz lubaczewski >
> wrote:
> > On Mon, Jul 08, 2013 at 09:09:26AM -0400, Robert James wrote:
> >> I have two relations, where each relation has two fields, one
> >> indicating a name and one indicating a position.  That is, each
> >> relation defines a sequence.
> >>
> >> I need to determine their longest common subsequence.  Yes, I can do
> >> this by fetching all the data into Java (or any other language) and
> >> computing it using the standard LCS dynamic programming language.  But
> >> I'd like to stay within Postgres.  Is there any way to do this?
> >
> > I'm not entirely sure I understand. Can you show us some sample data and
> > expected output?
>
> Sure.  Borrowing a good example from
> http://wordaligned.org/articles/longest-common-subsequence :
>
> Table A (val varchar primary key, pos integer):
> 1, "C"
> 2, "H"
> 3, "I"
> 4, "M"
> 5, "P"
> 6, "A"
> 7, "N"
> 8, "Z"
> 9, "E"
> 10, "E"
>
> Table B (val varchar primary key, pos integer):
> 1, "H"
> 2, "U"
> 3, "M"
> 4, "A"
> 5, "N"
>
> SELECT LongestCommonSubsequence(A,B):
> 1, "H"
> 2, "M"
> 3, "A"
> 4, "N"
>
> (Common chars are in upper case:
> cHiMpANzee
> HuMAN
> )
>
> The std dynamic programming algorithm is described at
> http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
>
>
> --
> Sent via pgsql-general mailing list 
> (pgsql-general@postgresql.org
> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


To me it looks like:

Selecte distinct val from tablea join tableb using (val)


Re: [GENERAL] Longest Common Subsequence in Postgres - Algorithm Challenge

2013-07-09 Thread Darren Duncan
Though people talk about doing this in other languages, I think you can solve it 
in plain SQL if you wanted to.


For one thing, you could start off using unordered set operations to make the 
problem space smaller, such as using set intersection to see what the common 
subSETs of values there are from each of the sequences, which SQL does quickly 
and easily, and then you can eliminate any subsequences from the original whose 
set of values don't match one of these common subsets, and only those would you 
then have to compare for order.


-- Darren Duncan

On 2013.07.08 1:04 PM, Robert James wrote:

On 7/8/13, hubert depesz lubaczewski  wrote:

On Mon, Jul 08, 2013 at 09:09:26AM -0400, Robert James wrote:

I have two relations, where each relation has two fields, one
indicating a name and one indicating a position.  That is, each
relation defines a sequence.

I need to determine their longest common subsequence.  Yes, I can do
this by fetching all the data into Java (or any other language) and
computing it using the standard LCS dynamic programming language.  But
I'd like to stay within Postgres.  Is there any way to do this?


I'm not entirely sure I understand. Can you show us some sample data and
expected output?


Sure.  Borrowing a good example from
http://wordaligned.org/articles/longest-common-subsequence :

Table A (val varchar primary key, pos integer):
1, "C"
2, "H"
3, "I"
4, "M"
5, "P"
6, "A"
7, "N"
8, "Z"
9, "E"
10, "E"

Table B (val varchar primary key, pos integer):
1, "H"
2, "U"
3, "M"
4, "A"
5, "N"

SELECT LongestCommonSubsequence(A,B):
1, "H"
2, "M"
3, "A"
4, "N"

(Common chars are in upper case:
cHiMpANzee
HuMAN
)

The std dynamic programming algorithm is described at
http://en.wikipedia.org/wiki/Longest_common_subsequence_problem






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Longest Common Subsequence in Postgres - Algorithm Challenge

2013-07-08 Thread Marcin Mańk


Dnia 9 lip 2013 o godz. 00:46 Michael Paquier  
napisał(a):

> On Tue, Jul 9, 2013 at 5:04 AM, Robert James  wrote:
>> On 7/8/13, hubert depesz lubaczewski  wrote:
>>> On Mon, Jul 08, 2013 at 09:09:26AM -0400, Robert James wrote:
 I have two relations, where each relation has two fields, one
 indicating a name and one indicating a position.  That is, each
 relation defines a sequence.
 
 I need to determine their longest common subsequence.  Yes, I can do
 this by fetching all the data into Java (or any other language) and
 computing it using the standard LCS dynamic programming language.  But
 I'd like to stay within Postgres.  Is there any way to do this?
>>> 
>>> I'm not entirely sure I understand. Can you show us some sample data and
>>> expected output?
>> 
>> Sure.  Borrowing a good example from
>> http://wordaligned.org/articles/longest-common-subsequence :
>> 
>> Table A (val varchar primary key, pos integer):
>> 1, "C"
>> 2, "H"
>> 3, "I"
>> 4, "M"
>> 5, "P"
>> 6, "A"
>> 7, "N"
>> 8, "Z"
>> 9, "E"
>> 10, "E"
>> 
>> Table B (val varchar primary key, pos integer):
>> 1, "H"
>> 2, "U"
>> 3, "M"
>> 4, "A"
>> 5, "N"
>> 
>> SELECT LongestCommonSubsequence(A,B):
>> 1, "H"
>> 2, "M"
>> 3, "A"
>> 4, "N"
>> 
>> (Common chars are in upper case:
>> cHiMpANzee
>> HuMAN
>> )
>> 
>> The std dynamic programming algorithm is described at
>> http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
> Your best bet on that would be to implement using either a procedural
> language (plpython, plperl that are in core, or even another one), or
> a C function and install it on server side with an extension. I'll do
> the latter if I were you. Roughly such a function would take in
> arguments the OIDs of the functions to compare (or their relation
> name), and return a set of rows describing the longest sequence found.
> 
Also if you only need the length of lcs, you can derive it using the built in 
levenshtein with costs function 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Longest Common Subsequence in Postgres - Algorithm Challenge

2013-07-08 Thread Michael Paquier
On Tue, Jul 9, 2013 at 5:04 AM, Robert James  wrote:
> On 7/8/13, hubert depesz lubaczewski  wrote:
>> On Mon, Jul 08, 2013 at 09:09:26AM -0400, Robert James wrote:
>>> I have two relations, where each relation has two fields, one
>>> indicating a name and one indicating a position.  That is, each
>>> relation defines a sequence.
>>>
>>> I need to determine their longest common subsequence.  Yes, I can do
>>> this by fetching all the data into Java (or any other language) and
>>> computing it using the standard LCS dynamic programming language.  But
>>> I'd like to stay within Postgres.  Is there any way to do this?
>>
>> I'm not entirely sure I understand. Can you show us some sample data and
>> expected output?
>
> Sure.  Borrowing a good example from
> http://wordaligned.org/articles/longest-common-subsequence :
>
> Table A (val varchar primary key, pos integer):
> 1, "C"
> 2, "H"
> 3, "I"
> 4, "M"
> 5, "P"
> 6, "A"
> 7, "N"
> 8, "Z"
> 9, "E"
> 10, "E"
>
> Table B (val varchar primary key, pos integer):
> 1, "H"
> 2, "U"
> 3, "M"
> 4, "A"
> 5, "N"
>
> SELECT LongestCommonSubsequence(A,B):
> 1, "H"
> 2, "M"
> 3, "A"
> 4, "N"
>
> (Common chars are in upper case:
> cHiMpANzee
> HuMAN
> )
>
> The std dynamic programming algorithm is described at
> http://en.wikipedia.org/wiki/Longest_common_subsequence_problem
Your best bet on that would be to implement using either a procedural
language (plpython, plperl that are in core, or even another one), or
a C function and install it on server side with an extension. I'll do
the latter if I were you. Roughly such a function would take in
arguments the OIDs of the functions to compare (or their relation
name), and return a set of rows describing the longest sequence found.
--
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Longest Common Subsequence in Postgres - Algorithm Challenge

2013-07-08 Thread Robert James
On 7/8/13, hubert depesz lubaczewski  wrote:
> On Mon, Jul 08, 2013 at 09:09:26AM -0400, Robert James wrote:
>> I have two relations, where each relation has two fields, one
>> indicating a name and one indicating a position.  That is, each
>> relation defines a sequence.
>>
>> I need to determine their longest common subsequence.  Yes, I can do
>> this by fetching all the data into Java (or any other language) and
>> computing it using the standard LCS dynamic programming language.  But
>> I'd like to stay within Postgres.  Is there any way to do this?
>
> I'm not entirely sure I understand. Can you show us some sample data and
> expected output?

Sure.  Borrowing a good example from
http://wordaligned.org/articles/longest-common-subsequence :

Table A (val varchar primary key, pos integer):
1, "C"
2, "H"
3, "I"
4, "M"
5, "P"
6, "A"
7, "N"
8, "Z"
9, "E"
10, "E"

Table B (val varchar primary key, pos integer):
1, "H"
2, "U"
3, "M"
4, "A"
5, "N"

SELECT LongestCommonSubsequence(A,B):
1, "H"
2, "M"
3, "A"
4, "N"

(Common chars are in upper case:
cHiMpANzee
HuMAN
)

The std dynamic programming algorithm is described at
http://en.wikipedia.org/wiki/Longest_common_subsequence_problem


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Longest Common Subsequence in Postgres - Algorithm Challenge

2013-07-08 Thread hubert depesz lubaczewski
On Mon, Jul 08, 2013 at 09:09:26AM -0400, Robert James wrote:
> I have two relations, where each relation has two fields, one
> indicating a name and one indicating a position.  That is, each
> relation defines a sequence.
> 
> I need to determine their longest common subsequence.  Yes, I can do
> this by fetching all the data into Java (or any other language) and
> computing it using the standard LCS dynamic programming language.  But
> I'd like to stay within Postgres.  Is there any way to do this?

I'm not entirely sure I understand. Can you show us some sample data and
expected output?

Best regards,

depesz



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Longest Common Subsequence in Postgres - Algorithm Challenge

2013-07-08 Thread Atri Sharma
On Mon, Jul 8, 2013 at 6:39 PM, Robert James  wrote:
> I have two relations, where each relation has two fields, one
> indicating a name and one indicating a position.  That is, each
> relation defines a sequence.
>
> I need to determine their longest common subsequence.  Yes, I can do
> this by fetching all the data into Java (or any other language) and
> computing it using the standard LCS dynamic programming language.  But
> I'd like to stay within Postgres.  Is there any way to do this?
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Pl/R is the way to go,IMO.

Or any supported language.

Regards,

Atri

--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Longest Common Subsequence in Postgres - Algorithm Challenge

2013-07-08 Thread Robert James
I have two relations, where each relation has two fields, one
indicating a name and one indicating a position.  That is, each
relation defines a sequence.

I need to determine their longest common subsequence.  Yes, I can do
this by fetching all the data into Java (or any other language) and
computing it using the standard LCS dynamic programming language.  But
I'd like to stay within Postgres.  Is there any way to do this?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general