@Alexis: I now get your point clearly... Let me first digest the
challenge properly then make a recommendation.


On Wed, May 28, 2014 at 10:42 AM, Alexis Gryta
<[email protected]> wrote:
> Good morning/afternoon/night !
> Thank you for your replies !!!
>
> *M. Geyer :* I don't know ... to be safe that it's a right value.
> *M. Simbwa :* Ok, I explain myself:
>
> I work for a firm which produces (and has produced) many many XML files.
> (for 10years)
> But now, they want to change everything and use the Big Data to store and
> to do statistics in the server and no longer in the client.
> They want to use Hadoop, Impala, HDFS etc... But before, they want to store
> all these files with validation of structure and adaptation of structure in
> perennial way because the structure of xml files will maybe change..
> I have to use this.. it's my statement of work (remit?).
>
> Then, my boss want to use thrift to convert every XML files to thrift
> binary files.
>
> Now, I try to use thrift with very tiny structure (Creation) but the final
> structure will be very big. (there wasn't XML schema, I had to do it)
>
> So I think that I have to do what I said.. [  xml->object (automatically in
> php or ruby) -> thrift object (I have to code myself...) -> thrift binary
> file  -> HDFS  ]
>
> I have replied to your question ?
>
>
>
> *M. Farrell :* Yes, I have looked at Parquet (I have done a state of art of
> Hadoop components at the beginning of my internship)
> After to have use thrift, I'll use Parquet. (I saw that Impala+Parquet was
> amazing !!)
> I think that there is fewer docs than for thrift... ( but if there is a way
> to use directly Parquet without thrift, I'm taker, althought my boss will
> demand me to do the thrift serialization too )
>
> Regards.
>
>
>
>
>
> 2014-05-28 9:16 GMT+02:00 Phillip Simbwa <[email protected]>:
>
>> Hi Alexis,
>>
>> Now why would you want to serialize XML and then save to HDFS?
>>
>> I would propose you use the thrift structs to define objects that can
>> lighter to carry around the wire and easily read in any other language
>> without a big processing overhead.
>>
>> What do you think?
>>
>> On Wed, May 28, 2014 at 10:10 AM, Alexis Gryta
>> <[email protected]> wrote:
>> > Thank you, I'm a french engineer student and I'm in an internship.
>> > I choose php to do many tests quickly but afterwards, I would like to do
>> > with ruby.
>> >
>> > I have passed 1week to read many website and docs about thrift but
>> nothing
>> > to just serialize etc... but I think that I found what I wanted to do.
>> >
>> > *Actually, this setters/getters don't created to be used.. (for user) I
>> > think..*
>> >
>> > I have found an example in Java and I tried to find equivalent classes in
>> > PHP ...
>> > ###
>> > $bins = new TBinarySerializer(new TJSONProtocol());
>> > $seria=  $bins->serialize($creaticket);
>> > echo $seria;
>> > echo bin2hex($seria);
>> > ###
>> > shows:
>> > ###
>> >
>> > Creation Object
>> > (
>> >     [a_iso] => 846545458
>> >     [date] => 27052014
>> > )
>> >
>> >   2uB2   oeÇî
>> > 08000132754232080002019cc7ee00
>> > ###
>> >
>> > But I have to do :
>> > $creaticket = new Creation();
>> > $creaticket->a_iso = 846545458;
>> > $creaticket->date= 27052014;
>> >
>> > I have to code my own setter to write a_iso and date ? (
>> > $creaticket->setA_iso(846545458) )
>> >
>> >
>> > I would like to serialize my XML files (many..) to thrift binary files.
>> > So I parse and convert my XML file in object (automatically in php) and I
>> > convert this object in thrift object (with Types.php, file generated by
>> > Thrift with my thrift structue) and I serialize in binary and I create a
>> > file.
>> > I don't know how to do but I try and I evolve slowly.
>> >
>> > I want to do this to store them in HDFS and use Impala.
>> >
>> > Sorry I've told my life but it's to be more clear.
>> >
>> >
>> >
>> >
>> >
>> > 2014-05-27 17:59 GMT+02:00 Aaron Mefford <[email protected]>:
>> >
>> >> If you do not like the interface of the generated Thrift code consider
>> >> subclassing or otherwise wrapping the generated stub in a class that
>> >> provides the interface that you want or need to use. Generated Thrift
>> code
>> >> as I understand it is designed to provide minimally functional
>> interfaces
>> >> in your language of choice.  The conventions are intended to be similar
>> >> where possible across the gamut of languages supported by Thrift.  As
>> such
>> >> you may see influences for c++ in your PHP stub.  Wrap that stub and
>> make
>> >> it work the way you are comfortable with.  There is know way the
>> developers
>> >> of Thrift could write a library that would generate stubs that are
>> >> comfortable for everyone.
>> >>
>> >> I would provide you a sample in PHP, but PHP is not my cup of tea.
>> >>
>> >> Aaron
>> >>
>> >>
>> >>
>> >> On 5/27/14, 3:32 AM, Alexis Gryta wrote:
>> >>
>> >>> Hi !
>> >>>
>> >>> I'm not able to use getters and setters of Types.php generated by
>> Thrift.
>> >>>
>> >>> My Thrift structure :
>> >>>
>> >>> typedef i32 intstruct Creation {
>> >>>
>> >>>     1: int a_iso,
>> >>>     2: int date}
>> >>>
>> >>> I did :
>> >>> $objetcree = new Creation(); $objetcree->a_iso = 45;
>> >>>
>> >>> Ok but I don't want use like that.
>> >>> $objetcree->read($input);
>> >>>
>> >>> How has to be $input if I want to write just the a_iso field ?
>> >>>
>> >>> Thank you very much !!!
>> >>>
>> >>>
>> >>> Just to know :  I want to convert my XML files in thrift binary to
>> >>> store them into HDFS.
>> >>>
>> >>> I parse my XML files in SimpleXMLObject and I convert this object in
>> >>> Thrift object.
>> >>>
>> >>> Afterwards, I serialize and I store in file. (I could process with
>> >>> Hbase, Impala etc..)
>> >>> I'm in an internship and my boss want to store many many xml files
>> >>> which (valid the structure and adapt to changes)
>> >>>
>> >>> Right ?
>> >>>
>> >>>
>> >>> My Thrift file generated :
>> >>>
>> >>> ########################################
>> >>> class Creation {
>> >>>    static $_TSPEC;
>> >>>
>> >>>    public $a_iso = null;
>> >>>    public $date = null;
>> >>>
>> >>>                    public function __construct($vals=null) {
>> >>>                          if (!isset(self::$_TSPEC)) {
>> >>>                            self::$_TSPEC = array(
>> >>>                                  1 => array(
>> >>>                                    'var' => 'a_iso',
>> >>>                                    'type' => TType::I32,
>> >>>                                    ),
>> >>>                                  2 => array(
>> >>>                                    'var' => 'date',
>> >>>                                    'type' => TType::I32,
>> >>>                                    ),
>> >>>                                  );
>> >>>                          }
>> >>>                          if (is_array($vals)) {
>> >>>                            if (isset($vals['a_iso'])) {
>> >>>                                  $this->a_iso = $vals['a_iso'];
>> >>>                            }
>> >>>                            if (isset($vals['date'])) {
>> >>>                                  $this->date = $vals['date'];
>> >>>                            }
>> >>>                          }
>> >>>                    }
>> >>>
>> >>>                    public function read($input)
>> >>>                    {
>> >>>                          $xfer = 0;
>> >>>                          $fname = null;
>> >>>                          $ftype = 0;
>> >>>                          $fid = 0;
>> >>>                          $xfer += $input->readStructBegin($fname);
>> >>>                          while (true)
>> >>>                          {
>> >>>                            $xfer += $input->readFieldBegin($fname,
>> >>> $ftype, $fid);
>> >>>                            if ($ftype == TType::STOP) {
>> >>>                                  break;
>> >>>                            }
>> >>>                            switch ($fid)
>> >>>                            {
>> >>>                                  case 1:
>> >>>                                    if ($ftype == TType::I32) {
>> >>>                                          $xfer +=
>> >>> $input->readI32($this->a_iso);
>> >>>                                    } else {
>> >>>                                          $xfer += $input->skip($ftype);
>> >>>                                    }
>> >>>                                    break;
>> >>>                                  case 2:
>> >>>                                    if ($ftype == TType::I32) {
>> >>>                                          $xfer +=
>> >>> $input->readI32($this->date);
>> >>>                                    } else {
>> >>>                                          $xfer += $input->skip($ftype);
>> >>>                                    }
>> >>>                                    break;
>> >>>                                  default:
>> >>>                                    $xfer += $input->skip($ftype);
>> >>>                                    break;
>> >>>                            }
>> >>>                            $xfer += $input->readFieldEnd();
>> >>>                          }
>> >>>                          $xfer += $input->readStructEnd();
>> >>>                          return $xfer;
>> >>>                    }
>> >>>
>> >>>                    public function write($output) {
>> >>>                          $xfer = 0;
>> >>>                          $xfer +=
>> $output->writeStructBegin('Creation');
>> >>>                          if ($this->a_iso !== null) {
>> >>>                            $xfer += $output->writeFieldBegin('a_iso',
>> >>> TType::I32, 1);
>> >>>                            $xfer += $output->writeI32($this->a_iso);
>> >>>                            $xfer += $output->writeFieldEnd();
>> >>>                          }
>> >>>                          if ($this->date !== null) {
>> >>>                            $xfer += $output->writeFieldBegin('date',
>> >>> TType::I32, 2);
>> >>>                            $xfer += $output->writeI32($this->date);
>> >>>                            $xfer += $output->writeFieldEnd();
>> >>>                          }
>> >>>                          $xfer += $output->writeFieldStop();
>> >>>                          $xfer += $output->writeStructEnd();
>> >>>                          return $xfer;
>> >>>                    }}
>> >>> ########################################
>> >>>
>> >>>
>> >>
>>
>>
>>
>> --
>> - Phillip.
>>
>> "Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in
>> waht
>> oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist
>> and lsat ltteer are in the rghit pclae.
>>  The rset can be a toatl mses  and
>> you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed
>> ervey lteter by it slef but the wrod as a wlohe and the biran fguiers it
>> out aynawy."
>>



-- 
- Phillip.

"Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht
oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist
and lsat ltteer are in the rghit pclae.
 The rset can be a toatl mses  and
you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed
ervey lteter by it slef but the wrod as a wlohe and the biran fguiers it
out aynawy."

Reply via email to