答复: how to reference a column from a TMUDF?

Liu, Ming (Ming) Wed, 01 Jun 2016 18:16:52 -0700

Thanks Hans, Suresh,

The double quote works!
The error message is a little confusing though ☺  I was confused by the JOIN 
words.
And please Hans help to update the javaDoc, I can also try to update the 
document too.

Trafodion doesn’t has special index for CHAR/VARCHAR columns, so to do ‘WHERE 
col like ‘%xxx’ ‘ , it will cause a full table scan I believe. The idea is to 
index the string col into Solr and then do a join between SolrUDF and the 
source table , so that one can get the matching row faster, in theory, when the 
table is very huge.

A little more about this TMUDF:
One thing I am thinking but not start to do is to have Solr schema including 
two fields: hbase-rowkey and the index-column value. Then the UDF need to 
decode the rowkey into the PK list and return, plus the indexed column. It 
needs more work to do the decoding. But the benefit is: I can write a Hbase 
observer coprocessor, which implement postPut()/postDelete method, and in those 
hook, do Solr indexing. So when user insert a row in a Trafodion table, it will 
get indexed into Solr transparently and consistently, but at that level, I only 
know the rowkey.
But a much simpler scheme is just do a regular incremental Solr index update, 
solr fields include all PK and the index column. But that way trafodion/solr 
cannot be consistent during two updates. But in write-once, read-often 
scenario, this is a much easier approach. So I feel still have some usage.

One difficulty is how to make this UDF run in parallel, my best idea is to get 
a fake table input and force the compiler to generate a parallel plan, so each 
UDF should get part of the table input, it then can check what it get and 
decide which part of the Solr rows it needs to search and return. But still 
need work it out.

This will not modify the SQL kernel…

Thanks,
Ming

发件人: Hans Zeller [mailto:hans.zel...@esgyn.com]
发送时间: 2016年6月1日 23:58
收件人: user@trafodion.incubator.apache.org
主题: Re: how to reference a column from a TMUDF?

Yes, I agree with Suresh. The reason is likely that you defined the output 
column name as a lower-case string "id", and that is treated as a 
case-sensitive, delimited identifier. If you define the column as "ID" in the 
UDF, that is treated as a regular identifier, which is not case-sensitive. The 
calls in the UDF interface that deal with column names, such as 
TupleInfo.addCharColumn, use an "internal format" with no quotes. ALL CAPS mean 
a regular identifier, otherwise it's interpreted as a delimited identifier. 
This is mentioned in the addCharColumn 
Javadoc<https://57e4a67d4ac0d80d65fbe17f54e5cf57ab4e0478.googledrive.com/host/0BztdUbYgvYe0N0hSRG9HME85MlE/1.3.0/apidocs/index.html?org/trafodion/sql/udr/TupleInfo.html>,
 but the explanation is missing from the ColumnInfo 
constructor<https://57e4a67d4ac0d80d65fbe17f54e5cf57ab4e0478.googledrive.com/host/0BztdUbYgvYe0N0hSRG9HME85MlE/1.3.0/apidocs/index.html?org/trafodion/sql/udr/ColumnInfo.html>
 (same is true for C++ interface). I'll update the Javadoc/Doxygen.

Hans

On Wed, Jun 1, 2016 at 8:24 AM, Suresh Subbiah 
<suresh.subbia...@gmail.com<mailto:suresh.subbia...@gmail.com>> wrote:
Hi Ming,

That is an interesting UDF.

The syntax you have shown should work. Please see the modified example below 
from regression test udr/TEST001.
I think either we have a bug or it is an issue with delmited names. You could 
try these two steps to understand the issue better.

a)  select * from  udf(solrUDF('db','iphone')) u where u."id<http://u.id/>" = 
1; OR
b) cqd UDR_DEBUG_FLAGS '64'; select * from  udf(solrUDF('db','iphone')) u ;
Approach b) will print out the output column names which could be helpful. 
Debug flags are documented by Hans at <http://goog_1873561815> 
https://cwiki.apache.org/confluence/display/TRAFODION/Tutorial%3A+The+object-oriented+UDF+interface

Thanks
Suresh

Example from udr/test001 showing that syntax given above should work. The bug I 
mentioned could be due to the fact that solrUDF does not have a table input. We 
have had issues with 0 table valued input UDFs in the past.

SELECT *
FROM UDF(sessionize_dynamic(TABLE(SELECT userid,
                                         JULIANTIMESTAMP(ts) as TS,
                                         ipAddr
                                  FROM clicks
                                  PARTITION BY ipaddr ORDER BY ts),
                            'IPADDR',
                            cast('TS' as char(2)),
                            60000000)) u
where u.session_id < 10;

SESSION_ID            SEQUENCE_NO           USERID                            
TS                    IPADDR
--------------------  --------------------  --------------------------------  
--------------------  ---------------

                   1                     1  super-user                          
212331578399500000  12.345.567.345
                   2                     1  super-user                          
212331585599500000  12.345.567.345
                   2                     2  super-services                      
212331585599500000  12.345.567.345
                   2                     3  super-services                      
212331585599550000  12.345.567.345

--- 4 row(s) selected.
>>

Notice how column names are in upper case in the output.

On Wed, Jun 1, 2016 at 4:18 AM, Liu, Ming (Ming) 
<ming....@esgyn.cn<mailto:ming....@esgyn.cn>> wrote:
Hi, all,

I wrote a simple TMUDF which will perform a solr query and get result as a 
table value.

It can work like this:

>>select * from  udf(solrUDF('db','iphone'));
id       description
------ ----------------------------------------
1       iphone 5
2      iphone 5s
--- 2 row(s) selected.

As you can see, it will return two columns: ‘id’ and ‘description’. Now I want 
to do a filter on id, so I try this:

>>select * from  udf(solrUDF('db','iphone')) u where u.id<http://u.id> = 1;

It failed and report this error:

*** ERROR[4003] Column U.ID<http://U.ID> is not a column in table U, or, after 
a NATURAL JOIN or JOIN USING, is no longer allowed to be specified with a table 
correlation name.

*** ERROR[8822] The statement was not prepared.

Because I want to join the udf result from the source Trafodion table, so I 
have to reference the columns in the UDF.

Please help, how can I reference to the column returned from a UDF?

Thanks,
Ming

答复: how to reference a column from a TMUDF?

Reply via email to