Hi , the problem we are trying to solve with protocol buffer is to 
serialize large relational db into protobuff messages files.  We have 
millions (b/w 100 to 500 million or even more) of row in the database that 
we are trying to serialize in a chunk of 10,000 (or 5,000) rows each file. 
10,000 rows protobuff message size goes around 1.2 MB where as for 5000 
rows it goes up to 650 KB.

The slowness starts as we start processing files > 1000 or 1100 after that 
serialization continues to slows down as we progress more . ex. initially 
say on 100th file serialization takes around 100 ms , while on 500th file 
it 300 ms , while on 1000th file takes around 1000 ms ..and so on.  

the time most is while inserting and packing 

TableData.Builder tableDataBuilder = TableData.newBuilder().setName(tableName); 
 -- Message creation  (only once)



while (loop){


       //each row 

        DataRow.Builder dataRowBuilder = DataRow.newBuilder();

                         //some row processing

       tableDataBuilder.addDataRows(dataRowBuilder);   



   //each 10000 chunk here 

    tableDataBuilder.build().writeTo(output);  // write to stream


   //assign a new message for next file

   tableDataBuilder = TableData.newBuilder().setName(tableName);


}


Can we have some suggestion to improve anything in processing proto here. 
the main time taken here is in the call  tableDataBuilder.addDataRows(
dataRowBuilder); which is happening for each row.


Here is proto message 


message TableData {
   required string name = 1;        //Name of the database table
   repeated ColNameDbType colNameDbType = 2;  //Column name and column Db type 
mapping
   repeated DataRow dataRows = 3;  //Table data rows
   
   
   message DataRow {
      repeated ColNameRowData colNameRowData = 1;
      
      message ColNameRowData {
         required string colName = 1;  //column name
         required DbType colDbType = 2; //column db type
         optional string data = 3;  //using string for all types except bool
         optional bool boolData = 4;   //this fileds gets poplulated if column 
db datatype is bool
         optional bytes blobData = 5;   
      }
     }    
   
   message ColNameDbType {
       required string name = 1;
       required DbType type = 2;
     }
   
   enum DbType {
       BIGINT = 0;
       BIT = 1;
       INT = 2;
       VARCHAR = 3;
       DATE = 4;
       SMALLINT = 5;
       SMALLINT_UNSIGNED = 6;
       TIMESTAMP = 7;
       BLOB = 8;
       DATETIME = 9;
       TINYINT = 10;
       TINYINT_UNSIGNED = 11;
       CHAR = 12;
       INTEGER = 13;
       LONGVARCHAR = 14;
       DECIMAL = 15;
       BIGINT_UNSIGNED = 16;
       DOUBLE = 17;
        LONGBLOB = 18;
        VARBINARY = 19;
        VARCHAR2=20; //Oracle specific
        NUMBER=21;  //Oracle specific
        CLOB=22;    //Oracle specific
        IMAGE=23;  //Its a blob (sql server)
        NUMERIC=24; //sqlserver specific
        DATETIME2=25; //sqlserver specific
        FLOAT=26; //sqlserver specific
        NVARCHAR=27; //sqlserver specific
        INT2=28; //postgres specific
        INT8=29; //postgres specific
        INT4=30; //postgres specific
        BOOL=31; //postgres specific
        BYTEA=32; //postgres specific
        TEXT=33; //postgres specific
        FLOAT8=34; //postgres specific
        BPCHAR=35; //postgres specific
        RAW=36; //Oracel equivalent of VARBINARY in mysql
        BINARY=37; //MSSQL equivalent of VARBINARY in mysql
       UNKNOWN = 38;
      }
  
}


Thank you,

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Reply via email to