Hi , the problem we are trying to solve with protocol buffer is to serialize large relational db into protobuff messages files. We have millions (b/w 100 to 500 million or even more) of row in the database that we are trying to serialize in a chunk of 10,000 (or 5,000) rows each file. 10,000 rows protobuff message size goes around 1.2 MB where as for 5000 rows it goes up to 650 KB.
The slowness starts as we start processing files > 1000 or 1100 after that serialization continues to slows down as we progress more . ex. initially say on 100th file serialization takes around 100 ms , while on 500th file it 300 ms , while on 1000th file takes around 1000 ms ..and so on. the time most is while inserting and packing TableData.Builder tableDataBuilder = TableData.newBuilder().setName(tableName); -- Message creation (only once) while (loop){ //each row DataRow.Builder dataRowBuilder = DataRow.newBuilder(); //some row processing tableDataBuilder.addDataRows(dataRowBuilder); //each 10000 chunk here tableDataBuilder.build().writeTo(output); // write to stream //assign a new message for next file tableDataBuilder = TableData.newBuilder().setName(tableName); } Can we have some suggestion to improve anything in processing proto here. the main time taken here is in the call tableDataBuilder.addDataRows( dataRowBuilder); which is happening for each row. Here is proto message message TableData { required string name = 1; //Name of the database table repeated ColNameDbType colNameDbType = 2; //Column name and column Db type mapping repeated DataRow dataRows = 3; //Table data rows message DataRow { repeated ColNameRowData colNameRowData = 1; message ColNameRowData { required string colName = 1; //column name required DbType colDbType = 2; //column db type optional string data = 3; //using string for all types except bool optional bool boolData = 4; //this fileds gets poplulated if column db datatype is bool optional bytes blobData = 5; } } message ColNameDbType { required string name = 1; required DbType type = 2; } enum DbType { BIGINT = 0; BIT = 1; INT = 2; VARCHAR = 3; DATE = 4; SMALLINT = 5; SMALLINT_UNSIGNED = 6; TIMESTAMP = 7; BLOB = 8; DATETIME = 9; TINYINT = 10; TINYINT_UNSIGNED = 11; CHAR = 12; INTEGER = 13; LONGVARCHAR = 14; DECIMAL = 15; BIGINT_UNSIGNED = 16; DOUBLE = 17; LONGBLOB = 18; VARBINARY = 19; VARCHAR2=20; //Oracle specific NUMBER=21; //Oracle specific CLOB=22; //Oracle specific IMAGE=23; //Its a blob (sql server) NUMERIC=24; //sqlserver specific DATETIME2=25; //sqlserver specific FLOAT=26; //sqlserver specific NVARCHAR=27; //sqlserver specific INT2=28; //postgres specific INT8=29; //postgres specific INT4=30; //postgres specific BOOL=31; //postgres specific BYTEA=32; //postgres specific TEXT=33; //postgres specific FLOAT8=34; //postgres specific BPCHAR=35; //postgres specific RAW=36; //Oracel equivalent of VARBINARY in mysql BINARY=37; //MSSQL equivalent of VARBINARY in mysql UNKNOWN = 38; } } Thank you, -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscr...@googlegroups.com. To post to this group, send email to protobuf@googlegroups.com. Visit this group at https://groups.google.com/group/protobuf. For more options, visit https://groups.google.com/d/optout.