Re: replace CsvToKeyValueMapper with my

Dhruv Gohil Thu, 29 Oct 2015 13:28:10 -0700

+1

FYI: We did it(phoenix 4.2.2) by copy pasting whole "CsvBulkLoadTool"and changing the pieces we want "Custom parser", getting back "jobcounters" to take downstream decisions etc..

+1 for pluggability, but we don't know how stable the interface would be(should we even publish it?),A wild idea is to instead of "inventing proper interface" if wecan refector the logic out of org.apache.phoenix.mapreduce.Csv* (3classes) to make the current implementation independent of "CSV" and"Mapreduce"that way CsvBulkLoadTool will be lightweight default reference andpeople might just extend/copy it to customize MOST of behaviour.

P.S.: We are gonna give a shot to pick up record directly from kafkainstead of a CSV file soon.


On Thursday 29 October 2015 03:38 PM, Bulvik, Noam wrote:

This is exactly what I need i.e. to be able to change the content ofthe row rather than different input format.

The use case is when you need to load large amount of data from filesand each row needs to be handled before it is been processed by theCSV parser. Examples can be change date format, fix encoding, escapedelimiters and more. Of course this can be done in differentmap-reduce job but since we are already processing each row then itwould be nice if we can do it there.


*erom:*James Taylor [mailto:[email protected]]
*Sent:* Thursday, October 29, 2015 7:33 PM
*To:* user <[email protected]>
*Subject:* Re: replace CsvToKeyValueMapper with my implementation

I seem to remember you starting down that path, Gabriel - a kind ofpluggable transformation for each row. It wasn't pluggable on theinput format, but that's a nice idea too, Ravi. I'm not sure if thisis what Noam needs or if it's something else.

Probably good to discuss a bit more at the use case level tounderstand the specifics a bit more.

On Thu, Oct 29, 2015 at 9:17 AM, Ravi Kiran <[email protected]<mailto:[email protected]>> wrote:


    It would be great if we can provide an api and have end users
    provided implementation on how to parse each record . This way, we
    can move away with only bulk loading csv and have json and other
    formats of input bulk loaded onto phoenix tables.

    I can take that one up. Would it be something the community like
    as a feature ?

    On Thu, Oct 29, 2015 at 8:10 AM, Gabriel Reid
    <[email protected] <mailto:[email protected]>> wrote:

        Hi Noam,

        That specific piece of code in CsvBulkLoadTool that you
        referred to
        allows packaging the CsvBulkLoadTool within a different job
        jar file,
        but won't allow setting a different mapper class. The actual
        setting
        of the mapper class is done further down in the submitJob method,
        specifically the following piece:

         job.setMapperClass(CsvToKeyValueMapper.class);

        There isn't currently a way to load a custom mapper in the
        CsvBulkLoadTool, so the only (current) option is to create a
        fully new
        custom implementation of the bulk load tool (probably copying or
        reusing most of the existing tool). However, I can certainly
        imagine
        this being a useful feature to have in some situations.

        Could you log this request in jira? It would also be really
        good to
        have some more detail on your specific use case. And even
        better is a
        patch that implements it :-)

        - Gabriel



        On Thu, Oct 29, 2015 at 3:22 PM, Bulvik, Noam
        <[email protected] <mailto:[email protected]>> wrote:
        > Hi,
        >
        >
        >
        > We have private logic to be executed when parsing each line
        before it is
        > uploaded to phoenix. I saw the following in the code of the
        CsvBulkLoadTool
        >
        > // Allow overriding the job jar setting by using a -D system
        property at
        > startup
        >
        > if (job.getJar() == null)
        >
        >  {
        >
        >
        > job.setJarByClass(CsvToKeyValueMapper.class);
        >
        >                  }
        >
        >
        >
        > Assuming I have the implementation for MyKeyValueMapper how
        can I make sure
        > it will be loaded instead of standard one ?
        >
        >
        >
        > Also in CsvToKeyValueMapper class there are some private
        members like
        >
        > ·         private PhoenixConnection conn;
        >
        > ·         private byte[] tableName;
        >
        >
        >
        > can you add option to access these member or make them
        protected so we will
        > be able to use them in the class we create that extends
        CsvToKeyValueMapper
        > and not to duplicate them and the code that init them
        >
        >
        >
        > we are using  phoenix 4.5.2 over CDH
        >
        >
        >
        > thanks
        >
        > Noam
        >
        >
        >
        > Noam Bulvik
        >
        > R&D Manager
        >
        >
        >
        > TEOCO CORPORATION
        >
        > c: +972 54 5507984 <tel:%2B972%2054%205507984>
        >
        > p: +972 3 9269145 <tel:%2B972%203%209269145>
        >
        > [email protected] <mailto:[email protected]>
        >
        > www.teoco.com <http://www.teoco.com>
        >
        >
        >
        >
        > ________________________________
        >
        > PRIVILEGED AND CONFIDENTIAL
        > PLEASE NOTE: The information contained in this message is
        privileged and
        > confidential, and is intended only for the use of the
        individual to whom it
        > is addressed and others who have been specifically
        authorized to receive it.
        > If you are not the intended recipient, you are hereby
        notified that any
        > dissemination, distribution or copying of this communication
        is strictly
        > prohibited. If you have received this communication in
        error, or if any
        > problems occur with transmission, please contact sender.
        Thank you.


------------------------------------------------------------------------

PRIVILEGED AND CONFIDENTIAL

PLEASE NOTE: The information contained in this message is privilegedand confidential, and is intended only for the use of the individualto whom it is addressed and others who have been specificallyauthorized to receive it. If you are not the intended recipient, youare hereby notified that any dissemination, distribution or copying ofthis communication is strictly prohibited. If you have received thiscommunication in error, or if any problems occur with transmission,please contact sender. Thank you.

Re: replace CsvToKeyValueMapper with my

Reply via email to