@Alexis: I now get your point clearly... Let me first digest the challenge properly then make a recommendation.
On Wed, May 28, 2014 at 10:42 AM, Alexis Gryta <[email protected]> wrote: > Good morning/afternoon/night ! > Thank you for your replies !!! > > *M. Geyer :* I don't know ... to be safe that it's a right value. > *M. Simbwa :* Ok, I explain myself: > > I work for a firm which produces (and has produced) many many XML files. > (for 10years) > But now, they want to change everything and use the Big Data to store and > to do statistics in the server and no longer in the client. > They want to use Hadoop, Impala, HDFS etc... But before, they want to store > all these files with validation of structure and adaptation of structure in > perennial way because the structure of xml files will maybe change.. > I have to use this.. it's my statement of work (remit?). > > Then, my boss want to use thrift to convert every XML files to thrift > binary files. > > Now, I try to use thrift with very tiny structure (Creation) but the final > structure will be very big. (there wasn't XML schema, I had to do it) > > So I think that I have to do what I said.. [ xml->object (automatically in > php or ruby) -> thrift object (I have to code myself...) -> thrift binary > file -> HDFS ] > > I have replied to your question ? > > > > *M. Farrell :* Yes, I have looked at Parquet (I have done a state of art of > Hadoop components at the beginning of my internship) > After to have use thrift, I'll use Parquet. (I saw that Impala+Parquet was > amazing !!) > I think that there is fewer docs than for thrift... ( but if there is a way > to use directly Parquet without thrift, I'm taker, althought my boss will > demand me to do the thrift serialization too ) > > Regards. > > > > > > 2014-05-28 9:16 GMT+02:00 Phillip Simbwa <[email protected]>: > >> Hi Alexis, >> >> Now why would you want to serialize XML and then save to HDFS? >> >> I would propose you use the thrift structs to define objects that can >> lighter to carry around the wire and easily read in any other language >> without a big processing overhead. >> >> What do you think? >> >> On Wed, May 28, 2014 at 10:10 AM, Alexis Gryta >> <[email protected]> wrote: >> > Thank you, I'm a french engineer student and I'm in an internship. >> > I choose php to do many tests quickly but afterwards, I would like to do >> > with ruby. >> > >> > I have passed 1week to read many website and docs about thrift but >> nothing >> > to just serialize etc... but I think that I found what I wanted to do. >> > >> > *Actually, this setters/getters don't created to be used.. (for user) I >> > think..* >> > >> > I have found an example in Java and I tried to find equivalent classes in >> > PHP ... >> > ### >> > $bins = new TBinarySerializer(new TJSONProtocol()); >> > $seria= $bins->serialize($creaticket); >> > echo $seria; >> > echo bin2hex($seria); >> > ### >> > shows: >> > ### >> > >> > Creation Object >> > ( >> > [a_iso] => 846545458 >> > [date] => 27052014 >> > ) >> > >> > 2uB2 oeÇî >> > 08000132754232080002019cc7ee00 >> > ### >> > >> > But I have to do : >> > $creaticket = new Creation(); >> > $creaticket->a_iso = 846545458; >> > $creaticket->date= 27052014; >> > >> > I have to code my own setter to write a_iso and date ? ( >> > $creaticket->setA_iso(846545458) ) >> > >> > >> > I would like to serialize my XML files (many..) to thrift binary files. >> > So I parse and convert my XML file in object (automatically in php) and I >> > convert this object in thrift object (with Types.php, file generated by >> > Thrift with my thrift structue) and I serialize in binary and I create a >> > file. >> > I don't know how to do but I try and I evolve slowly. >> > >> > I want to do this to store them in HDFS and use Impala. >> > >> > Sorry I've told my life but it's to be more clear. >> > >> > >> > >> > >> > >> > 2014-05-27 17:59 GMT+02:00 Aaron Mefford <[email protected]>: >> > >> >> If you do not like the interface of the generated Thrift code consider >> >> subclassing or otherwise wrapping the generated stub in a class that >> >> provides the interface that you want or need to use. Generated Thrift >> code >> >> as I understand it is designed to provide minimally functional >> interfaces >> >> in your language of choice. The conventions are intended to be similar >> >> where possible across the gamut of languages supported by Thrift. As >> such >> >> you may see influences for c++ in your PHP stub. Wrap that stub and >> make >> >> it work the way you are comfortable with. There is know way the >> developers >> >> of Thrift could write a library that would generate stubs that are >> >> comfortable for everyone. >> >> >> >> I would provide you a sample in PHP, but PHP is not my cup of tea. >> >> >> >> Aaron >> >> >> >> >> >> >> >> On 5/27/14, 3:32 AM, Alexis Gryta wrote: >> >> >> >>> Hi ! >> >>> >> >>> I'm not able to use getters and setters of Types.php generated by >> Thrift. >> >>> >> >>> My Thrift structure : >> >>> >> >>> typedef i32 intstruct Creation { >> >>> >> >>> 1: int a_iso, >> >>> 2: int date} >> >>> >> >>> I did : >> >>> $objetcree = new Creation(); $objetcree->a_iso = 45; >> >>> >> >>> Ok but I don't want use like that. >> >>> $objetcree->read($input); >> >>> >> >>> How has to be $input if I want to write just the a_iso field ? >> >>> >> >>> Thank you very much !!! >> >>> >> >>> >> >>> Just to know : I want to convert my XML files in thrift binary to >> >>> store them into HDFS. >> >>> >> >>> I parse my XML files in SimpleXMLObject and I convert this object in >> >>> Thrift object. >> >>> >> >>> Afterwards, I serialize and I store in file. (I could process with >> >>> Hbase, Impala etc..) >> >>> I'm in an internship and my boss want to store many many xml files >> >>> which (valid the structure and adapt to changes) >> >>> >> >>> Right ? >> >>> >> >>> >> >>> My Thrift file generated : >> >>> >> >>> ######################################## >> >>> class Creation { >> >>> static $_TSPEC; >> >>> >> >>> public $a_iso = null; >> >>> public $date = null; >> >>> >> >>> public function __construct($vals=null) { >> >>> if (!isset(self::$_TSPEC)) { >> >>> self::$_TSPEC = array( >> >>> 1 => array( >> >>> 'var' => 'a_iso', >> >>> 'type' => TType::I32, >> >>> ), >> >>> 2 => array( >> >>> 'var' => 'date', >> >>> 'type' => TType::I32, >> >>> ), >> >>> ); >> >>> } >> >>> if (is_array($vals)) { >> >>> if (isset($vals['a_iso'])) { >> >>> $this->a_iso = $vals['a_iso']; >> >>> } >> >>> if (isset($vals['date'])) { >> >>> $this->date = $vals['date']; >> >>> } >> >>> } >> >>> } >> >>> >> >>> public function read($input) >> >>> { >> >>> $xfer = 0; >> >>> $fname = null; >> >>> $ftype = 0; >> >>> $fid = 0; >> >>> $xfer += $input->readStructBegin($fname); >> >>> while (true) >> >>> { >> >>> $xfer += $input->readFieldBegin($fname, >> >>> $ftype, $fid); >> >>> if ($ftype == TType::STOP) { >> >>> break; >> >>> } >> >>> switch ($fid) >> >>> { >> >>> case 1: >> >>> if ($ftype == TType::I32) { >> >>> $xfer += >> >>> $input->readI32($this->a_iso); >> >>> } else { >> >>> $xfer += $input->skip($ftype); >> >>> } >> >>> break; >> >>> case 2: >> >>> if ($ftype == TType::I32) { >> >>> $xfer += >> >>> $input->readI32($this->date); >> >>> } else { >> >>> $xfer += $input->skip($ftype); >> >>> } >> >>> break; >> >>> default: >> >>> $xfer += $input->skip($ftype); >> >>> break; >> >>> } >> >>> $xfer += $input->readFieldEnd(); >> >>> } >> >>> $xfer += $input->readStructEnd(); >> >>> return $xfer; >> >>> } >> >>> >> >>> public function write($output) { >> >>> $xfer = 0; >> >>> $xfer += >> $output->writeStructBegin('Creation'); >> >>> if ($this->a_iso !== null) { >> >>> $xfer += $output->writeFieldBegin('a_iso', >> >>> TType::I32, 1); >> >>> $xfer += $output->writeI32($this->a_iso); >> >>> $xfer += $output->writeFieldEnd(); >> >>> } >> >>> if ($this->date !== null) { >> >>> $xfer += $output->writeFieldBegin('date', >> >>> TType::I32, 2); >> >>> $xfer += $output->writeI32($this->date); >> >>> $xfer += $output->writeFieldEnd(); >> >>> } >> >>> $xfer += $output->writeFieldStop(); >> >>> $xfer += $output->writeStructEnd(); >> >>> return $xfer; >> >>> }} >> >>> ######################################## >> >>> >> >>> >> >> >> >> >> >> -- >> - Phillip. >> >> "Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in >> waht >> oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist >> and lsat ltteer are in the rghit pclae. >> The rset can be a toatl mses and >> you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed >> ervey lteter by it slef but the wrod as a wlohe and the biran fguiers it >> out aynawy." >> -- - Phillip. "Aoccdrnig to rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer are in the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe and the biran fguiers it out aynawy."
