Re: Using Spark to process JSON with gzip filed

Akhil Das Sun, 20 Dec 2015 00:46:19 -0800

Yes it is. You can actually use the java.util.zip.GZIPInputStream in your
case.


Thanks
Best Regards

On Sun, Dec 20, 2015 at 3:23 AM, Eran Witkon <eranwit...@gmail.com> wrote:

> Thanks, since it is just a snippt do you mean that Inflater is coming
> from ZLIB?
> Eran
>
> On Fri, Dec 18, 2015 at 11:37 AM Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> Something like this? This one uses the ZLIB compression, you can replace
>> the decompression logic with GZip one in your case.
>>
>> compressedStream.map(x => {
>>       val inflater = new Inflater()
>>       inflater.setInput(x.getPayload)
>>       val decompressedData = new Array[Byte](x.getPayload.size * 2)
>>       var count = inflater.inflate(decompressedData)
>>       var finalData = decompressedData.take(count)
>>       while (count > 0) {
>>         count = inflater.inflate(decompressedData)
>>         finalData = finalData ++ decompressedData.take(count)
>>       }
>>       new String(finalData)
>>     })
>>
>>
>>
>>
>> Thanks
>> Best Regards
>>
>> On Wed, Dec 16, 2015 at 10:02 PM, Eran Witkon <eranwit...@gmail.com>
>> wrote:
>>
>>> Hi,
>>> I have a few JSON files in which one of the field is a binary filed -
>>> this field is the output of running GZIP of a JSON stream and compressing
>>> it to the binary field.
>>>
>>> Now I want to de-compress the field and get the outpur JSON.
>>> I was thinking of running map operation and passing a function to the
>>> map operation which will decompress each JSON file.
>>> the above function will find the right field in the outer JSON and then
>>> run GUNZIP on it.
>>>
>>> 1) is this a valid practice for spark map job?
>>> 2) any pointer on how to do that?
>>>
>>> Eran
>>>
>>
>>

Re: Using Spark to process JSON with gzip filed

Reply via email to