Re: Using Spark to process JSON with gzip filed

2015-12-20 Thread Akhil Das
Yes it is. You can actually use the java.util.zip.GZIPInputStream in your
case.

Thanks
Best Regards

On Sun, Dec 20, 2015 at 3:23 AM, Eran Witkon  wrote:

> Thanks, since it is just a snippt do you mean that Inflater is coming
> from ZLIB?
> Eran
>
> On Fri, Dec 18, 2015 at 11:37 AM Akhil Das 
> wrote:
>
>> Something like this? This one uses the ZLIB compression, you can replace
>> the decompression logic with GZip one in your case.
>>
>> compressedStream.map(x => {
>>   val inflater = new Inflater()
>>   inflater.setInput(x.getPayload)
>>   val decompressedData = new Array[Byte](x.getPayload.size * 2)
>>   var count = inflater.inflate(decompressedData)
>>   var finalData = decompressedData.take(count)
>>   while (count > 0) {
>> count = inflater.inflate(decompressedData)
>> finalData = finalData ++ decompressedData.take(count)
>>   }
>>   new String(finalData)
>> })
>>
>>
>>
>>
>> Thanks
>> Best Regards
>>
>> On Wed, Dec 16, 2015 at 10:02 PM, Eran Witkon 
>> wrote:
>>
>>> Hi,
>>> I have a few JSON files in which one of the field is a binary filed -
>>> this field is the output of running GZIP of a JSON stream and compressing
>>> it to the binary field.
>>>
>>> Now I want to de-compress the field and get the outpur JSON.
>>> I was thinking of running map operation and passing a function to the
>>> map operation which will decompress each JSON file.
>>> the above function will find the right field in the outer JSON and then
>>> run GUNZIP on it.
>>>
>>> 1) is this a valid practice for spark map job?
>>> 2) any pointer on how to do that?
>>>
>>> Eran
>>>
>>
>>


Re: Using Spark to process JSON with gzip filed

2015-12-19 Thread Eran Witkon
Thanks, since it is just a snippt do you mean that Inflater is coming from
ZLIB?
Eran

On Fri, Dec 18, 2015 at 11:37 AM Akhil Das 
wrote:

> Something like this? This one uses the ZLIB compression, you can replace
> the decompression logic with GZip one in your case.
>
> compressedStream.map(x => {
>   val inflater = new Inflater()
>   inflater.setInput(x.getPayload)
>   val decompressedData = new Array[Byte](x.getPayload.size * 2)
>   var count = inflater.inflate(decompressedData)
>   var finalData = decompressedData.take(count)
>   while (count > 0) {
> count = inflater.inflate(decompressedData)
> finalData = finalData ++ decompressedData.take(count)
>   }
>   new String(finalData)
> })
>
>
>
>
> Thanks
> Best Regards
>
> On Wed, Dec 16, 2015 at 10:02 PM, Eran Witkon 
> wrote:
>
>> Hi,
>> I have a few JSON files in which one of the field is a binary filed -
>> this field is the output of running GZIP of a JSON stream and compressing
>> it to the binary field.
>>
>> Now I want to de-compress the field and get the outpur JSON.
>> I was thinking of running map operation and passing a function to the map
>> operation which will decompress each JSON file.
>> the above function will find the right field in the outer JSON and then
>> run GUNZIP on it.
>>
>> 1) is this a valid practice for spark map job?
>> 2) any pointer on how to do that?
>>
>> Eran
>>
>
>


Re: Using Spark to process JSON with gzip filed

2015-12-18 Thread Akhil Das
Something like this? This one uses the ZLIB compression, you can replace
the decompression logic with GZip one in your case.

compressedStream.map(x => {
  val inflater = new Inflater()
  inflater.setInput(x.getPayload)
  val decompressedData = new Array[Byte](x.getPayload.size * 2)
  var count = inflater.inflate(decompressedData)
  var finalData = decompressedData.take(count)
  while (count > 0) {
count = inflater.inflate(decompressedData)
finalData = finalData ++ decompressedData.take(count)
  }
  new String(finalData)
})




Thanks
Best Regards

On Wed, Dec 16, 2015 at 10:02 PM, Eran Witkon  wrote:

> Hi,
> I have a few JSON files in which one of the field is a binary filed - this
> field is the output of running GZIP of a JSON stream and compressing it to
> the binary field.
>
> Now I want to de-compress the field and get the outpur JSON.
> I was thinking of running map operation and passing a function to the map
> operation which will decompress each JSON file.
> the above function will find the right field in the outer JSON and then
> run GUNZIP on it.
>
> 1) is this a valid practice for spark map job?
> 2) any pointer on how to do that?
>
> Eran
>


Using Spark to process JSON with gzip filed

2015-12-16 Thread Eran Witkon
Hi,
I have a few JSON files in which one of the field is a binary filed - this
field is the output of running GZIP of a JSON stream and compressing it to
the binary field.

Now I want to de-compress the field and get the outpur JSON.
I was thinking of running map operation and passing a function to the map
operation which will decompress each JSON file.
the above function will find the right field in the outer JSON and then run
GUNZIP on it.

1) is this a valid practice for spark map job?
2) any pointer on how to do that?

Eran