Manik,
What I've done previously with Apache Avro, is sharding the workload based
on the fingerprint of the schema. So you compute the fingerprint of the
schema, and, you'll get a Long that represents the canonical form of the
schema. I was using Flink, but you can implement this with any other
In our case, all 1500 writers have different schema, so we need to
increase throughput per writer.
But currently, writers throughput is not application bottleneck.
As per suggestion, We will look at application level fixes if we come to
this.
Regards
Manik Singla
+91-9996008893
+91-9665639677
I agree with Fokko. Multi-threading is not the responsibility of Parquet.
You can parallelize by writing more Parquet files in separate threads.
Adding locks to Parquet doesn't make much sense and is unlikely to speed up
your application without huge changes to Parquet.
On Mon, Oct 21, 2019 at
I don't think the multi-threading should be on the level of Parquet. But
you could write on a different thread. For example, when one of the 1500
writers is ready to write, you could do this on a different thread.
Cheers, Fokko
Op za 19 okt. 2019 om 12:16 schreef Manik Singla :
> Thanks Fokko
Thanks Fokko for response and correcting me on the way I addressed
We are using parquet using our internal framework where we usually have
dynamic schema. Due to dynamic schema, we do some buffering to figure out
schema for current writer.
We open around 1500 writers at a time but not able to
Thank you for your question Manik,
First of all, I think most of the people working on this project are guys,
but I would not exclude any other gender.
Secondly. Parquet is widely used in different open source project such as
Hive, Presto and Spark. These frameworks scale-out by design. For
Hi Guys
I was looking for tasks list or blockers which are required to support
multi-threaded writer( java specifically).
I did not find anything in JIRA or forums.
Could someone help me to point some doc/link if exists
Regards
Manik Singla
+91-9996008893
+91-9665639677
"Life doesn't consist