Re: Snappy Compression with red-parquet Ruby Gem

2020-04-23 Thread Sutou Kouhei
Hi,

Oh, we forgot to integrate saver interface with the Parquet
compression option.

You can use the feature by the following code with 0.17.0:

--
require "parquet"

table = Arrow::Table.new({"count" => [1, 2, 3]})
Arrow::FileOutputStream.open("test.parquet", false) do |output|
  properties = Parquet::WriterProperties.new
  properties.set_compression(:snappy)
  Parquet::ArrowFileWriter.open(table.schema, output, properties) do |writer|
chunk_size = 1024
writer.write_table(table, chunk_size)
  end
end
--

You'll be able to write the following code with the next release:

--
require "parquet"

table = Arrow::Table.new({"count" => [1, 2, 3]})
table.save("test.parquet", compression: :snappy)
--


Thanks,
--
kou

In <78b1b196-4217-4526-b848-fe126edb2...@contoso.com>
  "Snappy Compression with red-parquet Ruby Gem" on Thu, 23 Apr 2020 20:13:25 
+,
  David Lahn  wrote:

> Hi,
> 
> Does anyone have any examples of how to output a Parquet file with Snappy 
> compression using the Ruby gem?
> 
> We have tested trying to set compression to “snappy” on the TableSaver, but 
> we get the following:
> 
> [compressed-output-stream][new]: NotImplemented: Streaming compression 
> unsupported with Snappy (Arrow::Error::NotImplemented)
> 
> Example:
> 
> Arrow::TableSaver.new(table, 'test.parquet', {compression: 'snappy'}).save
> 
> Or are we completely turned around on how to accomplish this?
> 
> Dave
> 
> David Lahn
> DevOps Lead
> Development
>
> ForwardPMX 
> Privacy Policy
> 
>  
>   
> 
> This e-mail is confidential to ForwardPMX intended for use by the recipient. 
> If you received this in error or are not the intended recipient, you are 
> hereby notified that any review, retransmission, copying or other use of, or 
> taking of any action in reliance upon this information is strictly prohibited.
> 


Re: Arrow Format vs Feather v2

2020-04-23 Thread Wes McKinney
hi Dan

See 
https://lists.apache.org/thread.html/r0be397a5f901b9dc8787a7dbcb0a34c9ed60ad07ff1e3f064d418a98%40%3Cdev.arrow.apache.org%3E.
There is an experimental implementation in C++ which is being used for
"Feather V2"

The Arrow specification does include a "file" format -- this is
exactly what "Feather V2" is using

https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#ipc-file-format

- Wes

On Thu, Apr 23, 2020 at 4:20 PM Daniel Nugent  wrote:
>
> Was just reading the 0.17 release notes (congratulations to the maintainers, 
> btw), and was wondering if there could be some clarification on the language 
> about file formats.
>
> The notes mention that the compression support available for Feather 2 will 
> be formalized in the Arrow format at a later time.
>
> Does that mean that they will be formalized for in-memory and on the wire 
> Arrow messages? Or that there will be another, separate from Feather 2, 
> on-disk representation for Arrow called “Arrow file format” or something 
> along those lines?
>
> Thanks,
>
> -Dan Nugent


Arrow Format vs Feather v2

2020-04-23 Thread Daniel Nugent
Was just reading the 0.17 release notes (congratulations to the maintainers, 
btw), and was wondering if there could be some clarification on the language 
about file formats.

The notes mention that the compression support available for Feather 2 will be 
formalized in the Arrow format at a later time.

Does that mean that they will be formalized for in-memory and on the wire Arrow 
messages? Or that there will be another, separate from Feather 2, on-disk 
representation for Arrow called “Arrow file format” or something along those 
lines?

Thanks,

-Dan Nugent


Re: Snappy Compression with red-parquet Ruby Gem

2020-04-23 Thread Wes McKinney
hi David,

You don't want to pass the compression option to TableSaver.new --
compression is something that's configured in the Parquet writer. This
would need to be an option on save_as_parquet, but it doesn't look
like it is exposed right now

https://github.com/apache/arrow/blob/master/ruby/red-parquet/lib/parquet/arrow-table-savable.rb#L21

It's available in GLib though so this could be added to the Ruby library

https://github.com/apache/arrow/blob/master/c_glib/parquet-glib/arrow-file-writer.h

- Wes

On Thu, Apr 23, 2020 at 3:13 PM David Lahn  wrote:
>
> Hi,
>
>
>
> Does anyone have any examples of how to output a Parquet file with Snappy 
> compression using the Ruby gem?
>
>
>
> We have tested trying to set compression to “snappy” on the TableSaver, but 
> we get the following:
>
>
>
> [compressed-output-stream][new]: NotImplemented: Streaming compression 
> unsupported with Snappy (Arrow::Error::NotImplemented)
>
>
>
> Example:
>
>
>
> Arrow::TableSaver.new(table, 'test.parquet', {compression: 'snappy'}).save
>
>
>
> Or are we completely turned around on how to accomplish this?
>
>
>
> Dave
>
>
> David Lahn
> DevOps Lead
> Development
>
> ForwardPMX
> Privacy Policy
>
> e: david.l...@forwardpmx.com
> d: +44 (0)203 476 3725 (main office number)
> m: +1 519 573 1624
>
>
> This e-mail is confidential to ForwardPMX intended for use by the recipient. 
> If you received this in error or are not the intended recipient, you are 
> hereby notified that any review, retransmission, copying or other use of, or 
> taking of any action in reliance upon this information is strictly prohibited.
>


Snappy Compression with red-parquet Ruby Gem

2020-04-23 Thread David Lahn
Hi,

Does anyone have any examples of how to output a Parquet file with Snappy 
compression using the Ruby gem?

We have tested trying to set compression to “snappy” on the TableSaver, but we 
get the following:

[compressed-output-stream][new]: NotImplemented: Streaming compression 
unsupported with Snappy (Arrow::Error::NotImplemented)

Example:

Arrow::TableSaver.new(table, 'test.parquet', {compression: 'snappy'}).save

Or are we completely turned around on how to accomplish this?

Dave

David Lahn
DevOps Lead
Development
   
ForwardPMX 
Privacy Policy

 
  

This e-mail is confidential to ForwardPMX intended for use by the recipient. If 
you received this in error or are not the intended recipient, you are hereby 
notified that any review, retransmission, copying or other use of, or taking of 
any action in reliance upon this information is strictly prohibited.