FWIW, if you develop any standard SQL built-ins that are missing, we're happy to pull these into Phoenix. Thanks, James
On Tuesday, February 3, 2015, Jaime Solano <[email protected]> wrote: > Thanks, guys, for your insights. We'll try to stay away from custom > functions for now, and try to use plain SQL to do the transformation. For > custom functionality, we'll go with Pig. > On Jan 28, 2015 1:34 PM, "Ravi Kiran" <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> I second James opinion of going with Pig. You can use the >> PhoenixHBaseLoader to load data and have custom Pig UDF to apply various >> transformation and then finally have the tuples upsert back to Phoenix >> tables using our PhoenixHBaseStorage. >> >> My two cents :) >> >> Regards >> Ravi >> >> On Wed, Jan 28, 2015 at 9:35 AM, James Taylor <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >>> If you could share some examples of the "calculated values" case, >>> that'd be helpful. When you say you need to "transform the whole data >>> set", I'm not sure I understand. If you find yourself having to write >>> 50 built-in functions, each with specific business logic, I'd >>> definitely gravitate toward using our Pig integration. >>> >>> Thanks, >>> James >>> >>> On Wed, Jan 28, 2015 at 6:51 AM, Jaime Solano <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>> > Hi James, >>> > >>> > First of all, thank you for your quick response. Let me give you more >>> > details of our scenario: >>> > >>> > We want to load daily bulks of data into a Phoenix table. This table >>> > contains normalized data, with around 60 columns. Then, we need to >>> enrich >>> > (transform) the data. Enriching means adding more columns of basically >>> two >>> > types: >>> > - Values obtained from joins with other tables (reference data). This >>> covers >>> > a small part of the process. >>> > - Calculated values, with particular business logic, hard to implement >>> using >>> > SQL. This is the case where we're thinking of building our own >>> functions. >>> > However, I'm concerned about this approach since (1) We need to >>> transform >>> > the whole data set and (2) we might end up creating a function per data >>> > transformed/added (we expect around 50 additional columns to be added >>> after >>> > the enrichment process). >>> > >>> > Thank you for your time and I'd appreciate your thoughts about this. >>> > >>> > -Jaime >>> > >>> > On Jan 27, 2015 11:51 PM, "James Taylor" <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>> >> >>> >> Hi Jaime, >>> >> >>> >> Would it be possible to see a few examples of the kind of >>> >> transformations you're doing? I think the tool you use depends on >>> >> whether you're transforming all of your data or a smaller subset. It >>> >> also depends on the complexity of the transformation. If you're >>> >> transforming every row of data in your HBase table to create a new row >>> >> in a different HBase table, Phoenix is not going to be an ideal >>> >> choice. Also, if you're transforming data such that you need a new >>> >> built-in function for each kind of transformation, Phoenix would not >>> >> be the right choice. >>> >> >>> >> Have you seen our map-reduce[1] and pig integration[2] support? Pig is >>> >> very good at ETL. You may be able to leverage Pig to do the >>> >> transformation such that the resulting table is queryable through >>> >> Phoenix as well. >>> >> >>> >> HTH. Thanks, >>> >> >>> >> James >>> >> >>> >> [1] http://phoenix.apache.org/phoenix_mr.html >>> >> [2] http://phoenix.apache.org/pig_integration.html >>> >> >>> >> On Tue, Jan 27, 2015 at 11:50 AM, Jaime Solano <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >>> >> wrote: >>> >> > Hi guys, >>> >> > >>> >> > The company I work for wants to use Phoenix for Data Transformation. >>> >> > Basically, the idea is to denormalize and include additional >>> calculated >>> >> > data >>> >> > to a details table, by using UPSERT SELECT statements (joins with >>> other >>> >> > tables and specific functions). This has proven to be challenging, >>> since >>> >> > the >>> >> > SQL language sometimes is not enough, leading us to try implement >>> our >>> >> > own >>> >> > built-in Phoenix functions (following this post: >>> >> > >>> >> > >>> http://phoenix-hbase.blogspot.com/2013/04/how-to-add-your-own-built-in-function.html >>> ). >>> >> > >>> >> > I feel this is not the right direction, and maybe we should be using >>> >> > other >>> >> > tools like Pig, MR or Storm (for Near-Real Time). >>> >> > >>> >> > What are your thoughts about this? Would you recommend Phoenix for >>> >> > complex >>> >> > Data transformation? What a re the drawbacks you see in this >>> approach? >>> >> > >>> >> > Thanks in advance, >>> >> > -Jaime >>> >> >>
