Good morning/afternoon/night !
Thank you for your replies !!!

*M. Geyer :* I don't know ... to be safe that it's a right value.
*M. Simbwa :* Ok, I explain myself:

I work for a firm which produces (and has produced) many many XML files.
(for 10years)
But now, they want to change everything and use the Big Data to store and
to do statistics in the server and no longer in the client.
They want to use Hadoop, Impala, HDFS etc... But before, they want to store
all these files with validation of structure and adaptation of structure in
perennial way because the structure of xml files will maybe change..
I have to use this.. it's my statement of work (remit?).

Then, my boss want to use thrift to convert every XML files to thrift
binary files.

Now, I try to use thrift with very tiny structure (Creation) but the final
structure will be very big. (there wasn't XML schema, I had to do it)

So I think that I have to do what I said.. [  xml->object (automatically in
php or ruby) -> thrift object (I have to code myself...) -> thrift binary
file  -> HDFS  ]

I have replied to your question ?



*M. Farrell :* Yes, I have looked at Parquet (I have done a state of art of
Hadoop components at the beginning of my internship)
After to have use thrift, I'll use Parquet. (I saw that Impala+Parquet was
amazing !!)
I think that there is fewer docs than for thrift... ( but if there is a way
to use directly Parquet without thrift, I'm taker, althought my boss will
demand me to do the thrift serialization too )

Regards.





2014-05-28 9:16 GMT+02:00 Phillip Simbwa <[email protected]>:

> Hi Alexis,
>
> Now why would you want to serialize XML and then save to HDFS?
>
> I would propose you use the thrift structs to define objects that can
> lighter to carry around the wire and easily read in any other language
> without a big processing overhead.
>
> What do you think?
>
> On Wed, May 28, 2014 at 10:10 AM, Alexis Gryta
> <[email protected]> wrote:
> > Thank you, I'm a french engineer student and I'm in an internship.
> > I choose php to do many tests quickly but afterwards, I would like to do
> > with ruby.
> >
> > I have passed 1week to read many website and docs about thrift but
> nothing
> > to just serialize etc... but I think that I found what I wanted to do.
> >
> > *Actually, this setters/getters don't created to be used.. (for user) I
> > think..*
> >
> > I have found an example in Java and I tried to find equivalent classes in
> > PHP ...
> > ###
> > $bins = new TBinarySerializer(new TJSONProtocol());
> > $seria=  $bins->serialize($creaticket);
> > echo $seria;
> > echo bin2hex($seria);
> > ###
> > shows:
> > ###
> >
> > Creation Object
> > (
> >     [a_iso] => 846545458
> >     [date] => 27052014
> > )
> >
> >   2uB2   oeÇî
> > 08000132754232080002019cc7ee00
> > ###
> >
> > But I have to do :
> > $creaticket = new Creation();
> > $creaticket->a_iso = 846545458;
> > $creaticket->date= 27052014;
> >
> > I have to code my own setter to write a_iso and date ? (
> > $creaticket->setA_iso(846545458) )
> >
> >
> > I would like to serialize my XML files (many..) to thrift binary files.
> > So I parse and convert my XML file in object (automatically in php) and I
> > convert this object in thrift object (with Types.php, file generated by
> > Thrift with my thrift structue) and I serialize in binary and I create a
> > file.
> > I don't know how to do but I try and I evolve slowly.
> >
> > I want to do this to store them in HDFS and use Impala.
> >
> > Sorry I've told my life but it's to be more clear.
> >
> >
> >
> >
> >
> > 2014-05-27 17:59 GMT+02:00 Aaron Mefford <[email protected]>:
> >
> >> If you do not like the interface of the generated Thrift code consider
> >> subclassing or otherwise wrapping the generated stub in a class that
> >> provides the interface that you want or need to use. Generated Thrift
> code
> >> as I understand it is designed to provide minimally functional
> interfaces
> >> in your language of choice.  The conventions are intended to be similar
> >> where possible across the gamut of languages supported by Thrift.  As
> such
> >> you may see influences for c++ in your PHP stub.  Wrap that stub and
> make
> >> it work the way you are comfortable with.  There is know way the
> developers
> >> of Thrift could write a library that would generate stubs that are
> >> comfortable for everyone.
> >>
> >> I would provide you a sample in PHP, but PHP is not my cup of tea.
> >>
> >> Aaron
> >>
> >>
> >>
> >> On 5/27/14, 3:32 AM, Alexis Gryta wrote:
> >>
> >>> Hi !
> >>>
> >>> I'm not able to use getters and setters of Types.php generated by
> Thrift.
> >>>
> >>> My Thrift structure :
> >>>
> >>> typedef i32 intstruct Creation {
> >>>
> >>>     1: int a_iso,
> >>>     2: int date}
> >>>
> >>> I did :
> >>> $objetcree = new Creation(); $objetcree->a_iso = 45;
> >>>
> >>> Ok but I don't want use like that.
> >>> $objetcree->read($input);
> >>>
> >>> How has to be $input if I want to write just the a_iso field ?
> >>>
> >>> Thank you very much !!!
> >>>
> >>>
> >>> Just to know :  I want to convert my XML files in thrift binary to
> >>> store them into HDFS.
> >>>
> >>> I parse my XML files in SimpleXMLObject and I convert this object in
> >>> Thrift object.
> >>>
> >>> Afterwards, I serialize and I store in file. (I could process with
> >>> Hbase, Impala etc..)
> >>> I'm in an internship and my boss want to store many many xml files
> >>> which (valid the structure and adapt to changes)
> >>>
> >>> Right ?
> >>>
> >>>
> >>> My Thrift file generated :
> >>>
> >>> ########################################
> >>> class Creation {
> >>>    static $_TSPEC;
> >>>
> >>>    public $a_iso = null;
> >>>    public $date = null;
> >>>
> >>>                    public function __construct($vals=null) {
> >>>                          if (!isset(self::$_TSPEC)) {
> >>>                            self::$_TSPEC = array(
> >>>                                  1 => array(
> >>>                                    'var' => 'a_iso',
> >>>                                    'type' => TType::I32,
> >>>                                    ),
> >>>                                  2 => array(
> >>>                                    'var' => 'date',
> >>>                                    'type' => TType::I32,
> >>>                                    ),
> >>>                                  );
> >>>                          }
> >>>                          if (is_array($vals)) {
> >>>                            if (isset($vals['a_iso'])) {
> >>>                                  $this->a_iso = $vals['a_iso'];
> >>>                            }
> >>>                            if (isset($vals['date'])) {
> >>>                                  $this->date = $vals['date'];
> >>>                            }
> >>>                          }
> >>>                    }
> >>>
> >>>                    public function read($input)
> >>>                    {
> >>>                          $xfer = 0;
> >>>                          $fname = null;
> >>>                          $ftype = 0;
> >>>                          $fid = 0;
> >>>                          $xfer += $input->readStructBegin($fname);
> >>>                          while (true)
> >>>                          {
> >>>                            $xfer += $input->readFieldBegin($fname,
> >>> $ftype, $fid);
> >>>                            if ($ftype == TType::STOP) {
> >>>                                  break;
> >>>                            }
> >>>                            switch ($fid)
> >>>                            {
> >>>                                  case 1:
> >>>                                    if ($ftype == TType::I32) {
> >>>                                          $xfer +=
> >>> $input->readI32($this->a_iso);
> >>>                                    } else {
> >>>                                          $xfer += $input->skip($ftype);
> >>>                                    }
> >>>                                    break;
> >>>                                  case 2:
> >>>                                    if ($ftype == TType::I32) {
> >>>                                          $xfer +=
> >>> $input->readI32($this->date);
> >>>                                    } else {
> >>>                                          $xfer += $input->skip($ftype);
> >>>                                    }
> >>>                                    break;
> >>>                                  default:
> >>>                                    $xfer += $input->skip($ftype);
> >>>                                    break;
> >>>                            }
> >>>                            $xfer += $input->readFieldEnd();
> >>>                          }
> >>>                          $xfer += $input->readStructEnd();
> >>>                          return $xfer;
> >>>                    }
> >>>
> >>>                    public function write($output) {
> >>>                          $xfer = 0;
> >>>                          $xfer +=
> $output->writeStructBegin('Creation');
> >>>                          if ($this->a_iso !== null) {
> >>>                            $xfer += $output->writeFieldBegin('a_iso',
> >>> TType::I32, 1);
> >>>                            $xfer += $output->writeI32($this->a_iso);
> >>>                            $xfer += $output->writeFieldEnd();
> >>>                          }
> >>>                          if ($this->date !== null) {
> >>>                            $xfer += $output->writeFieldBegin('date',
> >>> TType::I32, 2);
> >>>                            $xfer += $output->writeI32($this->date);
> >>>                            $xfer += $output->writeFieldEnd();
> >>>                          }
> >>>                          $xfer += $output->writeFieldStop();
> >>>                          $xfer += $output->writeStructEnd();
> >>>                          return $xfer;
> >>>                    }}
> >>> ########################################
> >>>
> >>>
> >>
>
>
>
> --
> - Phillip.
>
> "Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in
> waht
> oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist
> and lsat ltteer are in the rghit pclae.
>  The rset can be a toatl mses  and
> you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed
> ervey lteter by it slef but the wrod as a wlohe and the biran fguiers it
> out aynawy."
>

Reply via email to