Of course :) Thank you.
2014-05-28 9:53 GMT+02:00 Phillip Simbwa <[email protected]>: > @Alexis: I now get your point clearly... Let me first digest the > challenge properly then make a recommendation. > > > On Wed, May 28, 2014 at 10:42 AM, Alexis Gryta > <[email protected]> wrote: > > Good morning/afternoon/night ! > > Thank you for your replies !!! > > > > *M. Geyer :* I don't know ... to be safe that it's a right value. > > *M. Simbwa :* Ok, I explain myself: > > > > I work for a firm which produces (and has produced) many many XML files. > > (for 10years) > > But now, they want to change everything and use the Big Data to store and > > to do statistics in the server and no longer in the client. > > They want to use Hadoop, Impala, HDFS etc... But before, they want to > store > > all these files with validation of structure and adaptation of structure > in > > perennial way because the structure of xml files will maybe change.. > > I have to use this.. it's my statement of work (remit?). > > > > Then, my boss want to use thrift to convert every XML files to thrift > > binary files. > > > > Now, I try to use thrift with very tiny structure (Creation) but the > final > > structure will be very big. (there wasn't XML schema, I had to do it) > > > > So I think that I have to do what I said.. [ xml->object (automatically > in > > php or ruby) -> thrift object (I have to code myself...) -> thrift binary > > file -> HDFS ] > > > > I have replied to your question ? > > > > > > > > *M. Farrell :* Yes, I have looked at Parquet (I have done a state of art > of > > Hadoop components at the beginning of my internship) > > After to have use thrift, I'll use Parquet. (I saw that Impala+Parquet > was > > amazing !!) > > I think that there is fewer docs than for thrift... ( but if there is a > way > > to use directly Parquet without thrift, I'm taker, althought my boss will > > demand me to do the thrift serialization too ) > > > > Regards. > > > > > > > > > > > > 2014-05-28 9:16 GMT+02:00 Phillip Simbwa <[email protected]>: > > > >> Hi Alexis, > >> > >> Now why would you want to serialize XML and then save to HDFS? > >> > >> I would propose you use the thrift structs to define objects that can > >> lighter to carry around the wire and easily read in any other language > >> without a big processing overhead. > >> > >> What do you think? > >> > >> On Wed, May 28, 2014 at 10:10 AM, Alexis Gryta > >> <[email protected]> wrote: > >> > Thank you, I'm a french engineer student and I'm in an internship. > >> > I choose php to do many tests quickly but afterwards, I would like to > do > >> > with ruby. > >> > > >> > I have passed 1week to read many website and docs about thrift but > >> nothing > >> > to just serialize etc... but I think that I found what I wanted to do. > >> > > >> > *Actually, this setters/getters don't created to be used.. (for user) > I > >> > think..* > >> > > >> > I have found an example in Java and I tried to find equivalent > classes in > >> > PHP ... > >> > ### > >> > $bins = new TBinarySerializer(new TJSONProtocol()); > >> > $seria= $bins->serialize($creaticket); > >> > echo $seria; > >> > echo bin2hex($seria); > >> > ### > >> > shows: > >> > ### > >> > > >> > Creation Object > >> > ( > >> > [a_iso] => 846545458 > >> > [date] => 27052014 > >> > ) > >> > > >> > 2uB2 oeÇî > >> > 08000132754232080002019cc7ee00 > >> > ### > >> > > >> > But I have to do : > >> > $creaticket = new Creation(); > >> > $creaticket->a_iso = 846545458; > >> > $creaticket->date= 27052014; > >> > > >> > I have to code my own setter to write a_iso and date ? ( > >> > $creaticket->setA_iso(846545458) ) > >> > > >> > > >> > I would like to serialize my XML files (many..) to thrift binary > files. > >> > So I parse and convert my XML file in object (automatically in php) > and I > >> > convert this object in thrift object (with Types.php, file generated > by > >> > Thrift with my thrift structue) and I serialize in binary and I > create a > >> > file. > >> > I don't know how to do but I try and I evolve slowly. > >> > > >> > I want to do this to store them in HDFS and use Impala. > >> > > >> > Sorry I've told my life but it's to be more clear. > >> > > >> > > >> > > >> > > >> > > >> > 2014-05-27 17:59 GMT+02:00 Aaron Mefford <[email protected]>: > >> > > >> >> If you do not like the interface of the generated Thrift code > consider > >> >> subclassing or otherwise wrapping the generated stub in a class that > >> >> provides the interface that you want or need to use. Generated Thrift > >> code > >> >> as I understand it is designed to provide minimally functional > >> interfaces > >> >> in your language of choice. The conventions are intended to be > similar > >> >> where possible across the gamut of languages supported by Thrift. As > >> such > >> >> you may see influences for c++ in your PHP stub. Wrap that stub and > >> make > >> >> it work the way you are comfortable with. There is know way the > >> developers > >> >> of Thrift could write a library that would generate stubs that are > >> >> comfortable for everyone. > >> >> > >> >> I would provide you a sample in PHP, but PHP is not my cup of tea. > >> >> > >> >> Aaron > >> >> > >> >> > >> >> > >> >> On 5/27/14, 3:32 AM, Alexis Gryta wrote: > >> >> > >> >>> Hi ! > >> >>> > >> >>> I'm not able to use getters and setters of Types.php generated by > >> Thrift. > >> >>> > >> >>> My Thrift structure : > >> >>> > >> >>> typedef i32 intstruct Creation { > >> >>> > >> >>> 1: int a_iso, > >> >>> 2: int date} > >> >>> > >> >>> I did : > >> >>> $objetcree = new Creation(); $objetcree->a_iso = 45; > >> >>> > >> >>> Ok but I don't want use like that. > >> >>> $objetcree->read($input); > >> >>> > >> >>> How has to be $input if I want to write just the a_iso field ? > >> >>> > >> >>> Thank you very much !!! > >> >>> > >> >>> > >> >>> Just to know : I want to convert my XML files in thrift binary to > >> >>> store them into HDFS. > >> >>> > >> >>> I parse my XML files in SimpleXMLObject and I convert this object in > >> >>> Thrift object. > >> >>> > >> >>> Afterwards, I serialize and I store in file. (I could process with > >> >>> Hbase, Impala etc..) > >> >>> I'm in an internship and my boss want to store many many xml files > >> >>> which (valid the structure and adapt to changes) > >> >>> > >> >>> Right ? > >> >>> > >> >>> > >> >>> My Thrift file generated : > >> >>> > >> >>> ######################################## > >> >>> class Creation { > >> >>> static $_TSPEC; > >> >>> > >> >>> public $a_iso = null; > >> >>> public $date = null; > >> >>> > >> >>> public function __construct($vals=null) { > >> >>> if (!isset(self::$_TSPEC)) { > >> >>> self::$_TSPEC = array( > >> >>> 1 => array( > >> >>> 'var' => 'a_iso', > >> >>> 'type' => TType::I32, > >> >>> ), > >> >>> 2 => array( > >> >>> 'var' => 'date', > >> >>> 'type' => TType::I32, > >> >>> ), > >> >>> ); > >> >>> } > >> >>> if (is_array($vals)) { > >> >>> if (isset($vals['a_iso'])) { > >> >>> $this->a_iso = $vals['a_iso']; > >> >>> } > >> >>> if (isset($vals['date'])) { > >> >>> $this->date = $vals['date']; > >> >>> } > >> >>> } > >> >>> } > >> >>> > >> >>> public function read($input) > >> >>> { > >> >>> $xfer = 0; > >> >>> $fname = null; > >> >>> $ftype = 0; > >> >>> $fid = 0; > >> >>> $xfer += $input->readStructBegin($fname); > >> >>> while (true) > >> >>> { > >> >>> $xfer += $input->readFieldBegin($fname, > >> >>> $ftype, $fid); > >> >>> if ($ftype == TType::STOP) { > >> >>> break; > >> >>> } > >> >>> switch ($fid) > >> >>> { > >> >>> case 1: > >> >>> if ($ftype == TType::I32) { > >> >>> $xfer += > >> >>> $input->readI32($this->a_iso); > >> >>> } else { > >> >>> $xfer += > $input->skip($ftype); > >> >>> } > >> >>> break; > >> >>> case 2: > >> >>> if ($ftype == TType::I32) { > >> >>> $xfer += > >> >>> $input->readI32($this->date); > >> >>> } else { > >> >>> $xfer += > $input->skip($ftype); > >> >>> } > >> >>> break; > >> >>> default: > >> >>> $xfer += $input->skip($ftype); > >> >>> break; > >> >>> } > >> >>> $xfer += $input->readFieldEnd(); > >> >>> } > >> >>> $xfer += $input->readStructEnd(); > >> >>> return $xfer; > >> >>> } > >> >>> > >> >>> public function write($output) { > >> >>> $xfer = 0; > >> >>> $xfer += > >> $output->writeStructBegin('Creation'); > >> >>> if ($this->a_iso !== null) { > >> >>> $xfer += > $output->writeFieldBegin('a_iso', > >> >>> TType::I32, 1); > >> >>> $xfer += $output->writeI32($this->a_iso); > >> >>> $xfer += $output->writeFieldEnd(); > >> >>> } > >> >>> if ($this->date !== null) { > >> >>> $xfer += $output->writeFieldBegin('date', > >> >>> TType::I32, 2); > >> >>> $xfer += $output->writeI32($this->date); > >> >>> $xfer += $output->writeFieldEnd(); > >> >>> } > >> >>> $xfer += $output->writeFieldStop(); > >> >>> $xfer += $output->writeStructEnd(); > >> >>> return $xfer; > >> >>> }} > >> >>> ######################################## > >> >>> > >> >>> > >> >> > >> > >> > >> > >> -- > >> - Phillip. > >> > >> "Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in > >> waht > >> oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the > frist > >> and lsat ltteer are in the rghit pclae. > >> The rset can be a toatl mses and > >> you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed > >> ervey lteter by it slef but the wrod as a wlohe and the biran fguiers it > >> out aynawy." > >> > > > > -- > - Phillip. > > "Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in > waht > oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist > and lsat ltteer are in the rghit pclae. > The rset can be a toatl mses and > you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed > ervey lteter by it slef but the wrod as a wlohe and the biran fguiers it > out aynawy." >
