I agree this is quite useful…
You should be able to use logicalTypes for this purpose, by implementing your
own and use it in your idl like:
@logicalType("internedString ") string myStringField;
It might be even possible to create a logical type that would work with any
other type… @logicalType("interned") to deduplicate any types
--Z
From: Bernardo Bennett [mailto:[email protected]]
Sent: Thursday, March 31, 2016 12:41 PM
To: [email protected]
Subject: String Pooling on reader side
Are there plans to introduce such feature? Depending on the nature of the data,
memory savings can be quite substantial.
So far I've experimented modifying the java generated IndexedRecord.put()
methods to perform lookups on concurrent hash maps in case field type is
String. The overhead seems insignificant compared to savings on GC times and
disk spills (Spark) for applications which read and cache avros in memory.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee, you should not
disseminate, distribute, alter or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and delete
this e-mail from your system. E-mail transmissions cannot be guaranteed to be
secure or without error as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The sender,
therefore, does not accept liability for any errors or omissions in the
contents of this message which arise during or as a result of e-mail
transmission. If verification is required, please request a hard-copy version.
This message is provided for information purposes and should not be construed
as a solicitation or offer to buy or sell any securities or related financial
instruments in any jurisdiction. Securities are offered in the U.S. through
PIMCO Investments LLC, distributor and a company of PIMCO LLC.