Of course :)
Thank you.

2014-05-28 9:53 GMT+02:00 Phillip Simbwa <[email protected]>:

> @Alexis: I now get your point clearly... Let me first digest the
> challenge properly then make a recommendation.
>
>
> On Wed, May 28, 2014 at 10:42 AM, Alexis Gryta
> <[email protected]> wrote:
> > Good morning/afternoon/night !
> > Thank you for your replies !!!
> >
> > *M. Geyer :* I don't know ... to be safe that it's a right value.
> > *M. Simbwa :* Ok, I explain myself:
> >
> > I work for a firm which produces (and has produced) many many XML files.
> > (for 10years)
> > But now, they want to change everything and use the Big Data to store and
> > to do statistics in the server and no longer in the client.
> > They want to use Hadoop, Impala, HDFS etc... But before, they want to
> store
> > all these files with validation of structure and adaptation of structure
> in
> > perennial way because the structure of xml files will maybe change..
> > I have to use this.. it's my statement of work (remit?).
> >
> > Then, my boss want to use thrift to convert every XML files to thrift
> > binary files.
> >
> > Now, I try to use thrift with very tiny structure (Creation) but the
> final
> > structure will be very big. (there wasn't XML schema, I had to do it)
> >
> > So I think that I have to do what I said.. [  xml->object (automatically
> in
> > php or ruby) -> thrift object (I have to code myself...) -> thrift binary
> > file  -> HDFS  ]
> >
> > I have replied to your question ?
> >
> >
> >
> > *M. Farrell :* Yes, I have looked at Parquet (I have done a state of art
> of
> > Hadoop components at the beginning of my internship)
> > After to have use thrift, I'll use Parquet. (I saw that Impala+Parquet
> was
> > amazing !!)
> > I think that there is fewer docs than for thrift... ( but if there is a
> way
> > to use directly Parquet without thrift, I'm taker, althought my boss will
> > demand me to do the thrift serialization too )
> >
> > Regards.
> >
> >
> >
> >
> >
> > 2014-05-28 9:16 GMT+02:00 Phillip Simbwa <[email protected]>:
> >
> >> Hi Alexis,
> >>
> >> Now why would you want to serialize XML and then save to HDFS?
> >>
> >> I would propose you use the thrift structs to define objects that can
> >> lighter to carry around the wire and easily read in any other language
> >> without a big processing overhead.
> >>
> >> What do you think?
> >>
> >> On Wed, May 28, 2014 at 10:10 AM, Alexis Gryta
> >> <[email protected]> wrote:
> >> > Thank you, I'm a french engineer student and I'm in an internship.
> >> > I choose php to do many tests quickly but afterwards, I would like to
> do
> >> > with ruby.
> >> >
> >> > I have passed 1week to read many website and docs about thrift but
> >> nothing
> >> > to just serialize etc... but I think that I found what I wanted to do.
> >> >
> >> > *Actually, this setters/getters don't created to be used.. (for user)
> I
> >> > think..*
> >> >
> >> > I have found an example in Java and I tried to find equivalent
> classes in
> >> > PHP ...
> >> > ###
> >> > $bins = new TBinarySerializer(new TJSONProtocol());
> >> > $seria=  $bins->serialize($creaticket);
> >> > echo $seria;
> >> > echo bin2hex($seria);
> >> > ###
> >> > shows:
> >> > ###
> >> >
> >> > Creation Object
> >> > (
> >> >     [a_iso] => 846545458
> >> >     [date] => 27052014
> >> > )
> >> >
> >> >   2uB2   oeÇî
> >> > 08000132754232080002019cc7ee00
> >> > ###
> >> >
> >> > But I have to do :
> >> > $creaticket = new Creation();
> >> > $creaticket->a_iso = 846545458;
> >> > $creaticket->date= 27052014;
> >> >
> >> > I have to code my own setter to write a_iso and date ? (
> >> > $creaticket->setA_iso(846545458) )
> >> >
> >> >
> >> > I would like to serialize my XML files (many..) to thrift binary
> files.
> >> > So I parse and convert my XML file in object (automatically in php)
> and I
> >> > convert this object in thrift object (with Types.php, file generated
> by
> >> > Thrift with my thrift structue) and I serialize in binary and I
> create a
> >> > file.
> >> > I don't know how to do but I try and I evolve slowly.
> >> >
> >> > I want to do this to store them in HDFS and use Impala.
> >> >
> >> > Sorry I've told my life but it's to be more clear.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > 2014-05-27 17:59 GMT+02:00 Aaron Mefford <[email protected]>:
> >> >
> >> >> If you do not like the interface of the generated Thrift code
> consider
> >> >> subclassing or otherwise wrapping the generated stub in a class that
> >> >> provides the interface that you want or need to use. Generated Thrift
> >> code
> >> >> as I understand it is designed to provide minimally functional
> >> interfaces
> >> >> in your language of choice.  The conventions are intended to be
> similar
> >> >> where possible across the gamut of languages supported by Thrift.  As
> >> such
> >> >> you may see influences for c++ in your PHP stub.  Wrap that stub and
> >> make
> >> >> it work the way you are comfortable with.  There is know way the
> >> developers
> >> >> of Thrift could write a library that would generate stubs that are
> >> >> comfortable for everyone.
> >> >>
> >> >> I would provide you a sample in PHP, but PHP is not my cup of tea.
> >> >>
> >> >> Aaron
> >> >>
> >> >>
> >> >>
> >> >> On 5/27/14, 3:32 AM, Alexis Gryta wrote:
> >> >>
> >> >>> Hi !
> >> >>>
> >> >>> I'm not able to use getters and setters of Types.php generated by
> >> Thrift.
> >> >>>
> >> >>> My Thrift structure :
> >> >>>
> >> >>> typedef i32 intstruct Creation {
> >> >>>
> >> >>>     1: int a_iso,
> >> >>>     2: int date}
> >> >>>
> >> >>> I did :
> >> >>> $objetcree = new Creation(); $objetcree->a_iso = 45;
> >> >>>
> >> >>> Ok but I don't want use like that.
> >> >>> $objetcree->read($input);
> >> >>>
> >> >>> How has to be $input if I want to write just the a_iso field ?
> >> >>>
> >> >>> Thank you very much !!!
> >> >>>
> >> >>>
> >> >>> Just to know :  I want to convert my XML files in thrift binary to
> >> >>> store them into HDFS.
> >> >>>
> >> >>> I parse my XML files in SimpleXMLObject and I convert this object in
> >> >>> Thrift object.
> >> >>>
> >> >>> Afterwards, I serialize and I store in file. (I could process with
> >> >>> Hbase, Impala etc..)
> >> >>> I'm in an internship and my boss want to store many many xml files
> >> >>> which (valid the structure and adapt to changes)
> >> >>>
> >> >>> Right ?
> >> >>>
> >> >>>
> >> >>> My Thrift file generated :
> >> >>>
> >> >>> ########################################
> >> >>> class Creation {
> >> >>>    static $_TSPEC;
> >> >>>
> >> >>>    public $a_iso = null;
> >> >>>    public $date = null;
> >> >>>
> >> >>>                    public function __construct($vals=null) {
> >> >>>                          if (!isset(self::$_TSPEC)) {
> >> >>>                            self::$_TSPEC = array(
> >> >>>                                  1 => array(
> >> >>>                                    'var' => 'a_iso',
> >> >>>                                    'type' => TType::I32,
> >> >>>                                    ),
> >> >>>                                  2 => array(
> >> >>>                                    'var' => 'date',
> >> >>>                                    'type' => TType::I32,
> >> >>>                                    ),
> >> >>>                                  );
> >> >>>                          }
> >> >>>                          if (is_array($vals)) {
> >> >>>                            if (isset($vals['a_iso'])) {
> >> >>>                                  $this->a_iso = $vals['a_iso'];
> >> >>>                            }
> >> >>>                            if (isset($vals['date'])) {
> >> >>>                                  $this->date = $vals['date'];
> >> >>>                            }
> >> >>>                          }
> >> >>>                    }
> >> >>>
> >> >>>                    public function read($input)
> >> >>>                    {
> >> >>>                          $xfer = 0;
> >> >>>                          $fname = null;
> >> >>>                          $ftype = 0;
> >> >>>                          $fid = 0;
> >> >>>                          $xfer += $input->readStructBegin($fname);
> >> >>>                          while (true)
> >> >>>                          {
> >> >>>                            $xfer += $input->readFieldBegin($fname,
> >> >>> $ftype, $fid);
> >> >>>                            if ($ftype == TType::STOP) {
> >> >>>                                  break;
> >> >>>                            }
> >> >>>                            switch ($fid)
> >> >>>                            {
> >> >>>                                  case 1:
> >> >>>                                    if ($ftype == TType::I32) {
> >> >>>                                          $xfer +=
> >> >>> $input->readI32($this->a_iso);
> >> >>>                                    } else {
> >> >>>                                          $xfer +=
> $input->skip($ftype);
> >> >>>                                    }
> >> >>>                                    break;
> >> >>>                                  case 2:
> >> >>>                                    if ($ftype == TType::I32) {
> >> >>>                                          $xfer +=
> >> >>> $input->readI32($this->date);
> >> >>>                                    } else {
> >> >>>                                          $xfer +=
> $input->skip($ftype);
> >> >>>                                    }
> >> >>>                                    break;
> >> >>>                                  default:
> >> >>>                                    $xfer += $input->skip($ftype);
> >> >>>                                    break;
> >> >>>                            }
> >> >>>                            $xfer += $input->readFieldEnd();
> >> >>>                          }
> >> >>>                          $xfer += $input->readStructEnd();
> >> >>>                          return $xfer;
> >> >>>                    }
> >> >>>
> >> >>>                    public function write($output) {
> >> >>>                          $xfer = 0;
> >> >>>                          $xfer +=
> >> $output->writeStructBegin('Creation');
> >> >>>                          if ($this->a_iso !== null) {
> >> >>>                            $xfer +=
> $output->writeFieldBegin('a_iso',
> >> >>> TType::I32, 1);
> >> >>>                            $xfer += $output->writeI32($this->a_iso);
> >> >>>                            $xfer += $output->writeFieldEnd();
> >> >>>                          }
> >> >>>                          if ($this->date !== null) {
> >> >>>                            $xfer += $output->writeFieldBegin('date',
> >> >>> TType::I32, 2);
> >> >>>                            $xfer += $output->writeI32($this->date);
> >> >>>                            $xfer += $output->writeFieldEnd();
> >> >>>                          }
> >> >>>                          $xfer += $output->writeFieldStop();
> >> >>>                          $xfer += $output->writeStructEnd();
> >> >>>                          return $xfer;
> >> >>>                    }}
> >> >>> ########################################
> >> >>>
> >> >>>
> >> >>
> >>
> >>
> >>
> >> --
> >> - Phillip.
> >>
> >> "Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in
> >> waht
> >> oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the
> frist
> >> and lsat ltteer are in the rghit pclae.
> >>  The rset can be a toatl mses  and
> >> you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed
> >> ervey lteter by it slef but the wrod as a wlohe and the biran fguiers it
> >> out aynawy."
> >>
>
>
>
> --
> - Phillip.
>
> "Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in
> waht
> oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist
> and lsat ltteer are in the rghit pclae.
>  The rset can be a toatl mses  and
> you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed
> ervey lteter by it slef but the wrod as a wlohe and the biran fguiers it
> out aynawy."
>

Reply via email to