thanks all, I decided to convert to UTF-8 only.
Regards
Manik Singla
+91-9996008893
+91-9665639677
"Life doesn't consist in holding good cards but playing those you hold
well."
On Thu, Feb 7, 2019 at 5:41 AM Ryan Blue wrote:
> Ok, thanks. Sorry I misread what you meant!
>
> On Wed, Feb 6, 201
Ok, thanks. Sorry I misread what you meant!
On Wed, Feb 6, 2019 at 3:02 PM Wes McKinney wrote:
> > I think the right thing is to transcode your data to UTF-8.
>
> I agree -- I wasn't recommending the approach I described, just to say
> that it is not impossible.
>
> On Wed, Feb 6, 2019 at 12:10
> I think the right thing is to transcode your data to UTF-8.
I agree -- I wasn't recommending the approach I described, just to say
that it is not impossible.
On Wed, Feb 6, 2019 at 12:10 PM Ryan Blue wrote:
>
> I disagree with Wes. He's right that you *could* just use binary and keep
> extra m
I disagree with Wes. He's right that you *could* just use binary and keep
extra metadata somewhere, it is very unlikely that Parquet would ever
support such a scheme. And it is bad for the community when people attempt
to go around the format spec, as we see with the INT96 timestamp mess.
I think
hi Manik -- you are free to store the data as opaque binary as
BYTE_ARRAY type and add some metadata to the schema so your readers
can recognize that it's UTF-16 stored as binary
On Wed, Feb 6, 2019 at 12:24 AM Manik Singla wrote:
>
> I am not the producer of data so I can not control encoding. I
I am not the producer of data so I can not control encoding. I do receive
ByteBuffer and encoding.
I can decode data with given encoding and covert to UTF-8 for storing with
Parquet.
I was thinking to remove that overhead if possible
Regards
Manik Singla
+91-9996008893
+91-9665639677
"Life doesn'
Hello Manik,
this is not possible at the moment. As Parquet is a portable on-disk format, we
focus on having a single representation for each data type. Thus implementing
readers/writers is limited to these to make their implementation simpler.
Especially as you are the producer but not the con
Hi
I am new to Parquet. I am trying to save UTF-16 or some other encoding than
UTF-8.
I am also trying to use encoding hint when saving ByteBuffer.
I don't find way to use any thing other than UTF-8.
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md says
we can extend primitive