Re: multi threading support

2019-10-23 Thread Driesprong, Fokko
Manik, What I've done previously with Apache Avro, is sharding the workload based on the fingerprint of the schema. So you compute the fingerprint of the schema, and, you'll get a Long that represents the canonical form of the schema. I was using Flink, but you can implement this with any other

Re: multi threading support

2019-10-21 Thread Manik Singla
In our case, all 1500 writers have different schema, so we need to increase throughput per writer. But currently, writers throughput is not application bottleneck. As per suggestion, We will look at application level fixes if we come to this. Regards Manik Singla +91-9996008893 +91-9665639677

Re: multi threading support

2019-10-21 Thread Ryan Blue
I agree with Fokko. Multi-threading is not the responsibility of Parquet. You can parallelize by writing more Parquet files in separate threads. Adding locks to Parquet doesn't make much sense and is unlikely to speed up your application without huge changes to Parquet. On Mon, Oct 21, 2019 at

Re: multi threading support

2019-10-21 Thread Driesprong, Fokko
I don't think the multi-threading should be on the level of Parquet. But you could write on a different thread. For example, when one of the 1500 writers is ready to write, you could do this on a different thread. Cheers, Fokko Op za 19 okt. 2019 om 12:16 schreef Manik Singla : > Thanks Fokko

Re: multi threading support

2019-10-19 Thread Manik Singla
Thanks Fokko for response and correcting me on the way I addressed We are using parquet using our internal framework where we usually have dynamic schema. Due to dynamic schema, we do some buffering to figure out schema for current writer. We open around 1500 writers at a time but not able to

Re: multi threading support

2019-10-17 Thread Driesprong, Fokko
Thank you for your question Manik, First of all, I think most of the people working on this project are guys, but I would not exclude any other gender. Secondly. Parquet is widely used in different open source project such as Hive, Presto and Spark. These frameworks scale-out by design. For

multi threading support

2019-10-15 Thread Manik Singla
Hi Guys I was looking for tasks list or blockers which are required to support multi-threaded writer( java specifically). I did not find anything in JIRA or forums. Could someone help me to point some doc/link if exists Regards Manik Singla +91-9996008893 +91-9665639677 "Life doesn't consist