RE: [julia-users] [ANN] DataStreams v0.1: Blog post + Package Release Notes

2016-10-28 Thread David Anthoff
Thanks, this is really super cool work!

 

Let me also point out that Query.jl works great with DataStream sources and 
sinks. For example, let’s say you want to load some code from a SQLite 
database, apply some filtering and transformations and write the result out as 
a CSV file, you can do that like this:

 

using Query, SQLite, CSV

sqlite_db = SQLite.DB(joinpath(Pkg.dir("SQLite"), "test", 
"Chinook_Sqlite.sqlite"))

q = @from i in SQLite.Source(sqlite_db, "SELECT * FROM Employee") begin

@where i.ReportsTo==2

@select {Name=i.LastName, Adr=i.Address}

@collect CSV.Sink("test-output.csv")

end

Data.close!(q)

 

Note that this will actually never materialize the data into a DataFrame or 
anything like that, instead everything is streamed throughout, from the 
DataStreams source to the think, including the whole query part in the middle.

 

Best,

David

 

From: julia-users@googlegroups.com [mailto:julia-users@googlegroups.com] On 
Behalf Of Jacob Quinn
Sent: Thursday, October 27, 2016 11:33 PM
To: julia-users <julia-users@googlegroups.com>
Subject: [julia-users] [ANN] DataStreams v0.1: Blog post + Package Release Notes

 

Hey everyone, 

 

Just wanted to put out the announcement of the release of DataStreams v0.1. (it 
was actually tagged a few weeks ago, but I've been letting a few last things 
shake out before announcing).

 

I've written up a blog post on the updates and release here: 
http://quinnj.github.io/datastreams-jl-v0-1/

 

The TL;DR is DataStreams.jl now defines concrete interfaces for Data.Sources 
and Data.Sinks, with each being completely decoupled from the other. This has 
also allowed some cool new features like appending to Data.Sinks and allowing 
simple transform functions to be applied to data "in-transit".

 

I included release notes of existing packages in the blog post, but I'll 
copy-paste here below for easier access:

 

Do note that the DataStreams.jl framework is now Julia 0.5-only.

 

 

·  CSV.jl

o <http://juliadata.github.io/CSV.jl/stable/> Docs

oSupports a wide variety of delimited file options such as delim, 
quotechar, escapechar, custom null strings; a header can be provided manually 
or on a specified row or range of rows; types can be provided manually, and 
results can be requested as nullable or not (nullable=true by default); and the 
# of rows can be provided manually (if known) for efficiency.

oCSV.parsefield(io::IO, ::Type{T}) can be called directly on any IOtype to 
tap into the delimited-parsing functionality manually

·  SQLite.jl

o <http://juliadb.github.io/SQLite.jl/stable/> Docs

oQuery results will now use the declared table column type by default, 
which can help resultset column typing in some cases

oParameterized SQL statements are fully supported, with the ability to bind 
julia values to be sent to the DB

oFull serialization/deserialization of native and custom Julia types is 
supported; so Complex{Int128} can be stored in its own SQLite table column and 
retrieved without any issue

oPure Julia scalar and aggregation functions can be registered with an 
SQLite database and then called from within SQL statements: full docs  
<http://juliadb.github.io/SQLite.jl/stable/#User-Defined-Functions-1> here

*   Feather.jl 

o <http://juliastats.github.io/Feather.jl/stable/> Docs

oFull support for feather release v0.3.0 to ensure compatibility

oFull support for returning "factor" or "category" type columns as native 
CategoricalArray and NullableCategoricalArray types in Julia, thanks to the new 
 <https://github.com/JuliaData/CategoricalArrays.jl> CategoricalArrays.jl 
package

onullable::Bool=true keyword argument; if false, columns without null 
values will be returned as Vector{T} instead of NullableVector{T}

oFeather.Sink now supports appending, so multiple DataFrames or CSV.Source 
or any Data.Source can all be streamed to a single feather file

*   ODBC.jl 

o <http://juliadb.github.io/ODBC.jl/stable/> Docs

oA new ODBC.DSN type that represents a valid, open connection to a 
database; used in all subsequent api calls; it can be constructed using a 
previously configured system/user dsn w/ username and password, or as a full 
custom connection string

oFull support for the DataStreams.jl framework through the ODBC.Sourceand 
ODBC.Sink types, along with their high-level convenience methods ODBC.query and 
ODBC.load

oA new ODBC.prepare(dsn, sql) => ODBC.Statement method which can send an 
sql statement to the database to be compiled and planned before executed 1 or 
more times. SQL statements can include parameters to be prepared that can have 
dynamic values bound before each execution.



[julia-users] [ANN] DataStreams v0.1: Blog post + Package Release Notes

2016-10-28 Thread Jacob Quinn
Hey everyone,

Just wanted to put out the announcement of the release of DataStreams v0.1. 
(it was actually tagged a few weeks ago, but I've been letting a few last 
things shake out before announcing).

I've written up a blog post on the updates and release 
here: http://quinnj.github.io/datastreams-jl-v0-1/

The TL;DR is DataStreams.jl now defines concrete interfaces for 
Data.Sources and Data.Sinks, with each being completely decoupled from the 
other. This has also allowed some cool new features like appending to 
Data.Sinks and allowing simple transform functions to be applied to data 
"in-transit".

I included release notes of existing packages in the blog post, but I'll 
copy-paste here below for easier access:

Do note that the DataStreams.jl framework is now Julia 0.5-only.



   - 
   
   *CSV.jl*
   - *Docs* 
  - Supports a wide variety of delimited file options such as delim, 
  quotechar, escapechar, custom null strings; a header can be provided 
  manually or on a specified row or range of rows; types can be 
  provided manually, and results can be requested as nullable or not (
  nullable=true by default); and the # of rows can be provided manually 
  (if known) for efficiency.
  - CSV.parsefield(io::IO, ::Type{T}) can be called directly on any IOtype 
  to tap into the delimited-parsing functionality manually
   - 
   
   *SQLite.jl*
   - *Docs* 
  - Query results will now use the declared table column type by 
  default, which can help resultset column typing in some cases
  - Parameterized SQL statements are fully supported, with the ability 
  to bind julia values to be sent to the DB
  - Full serialization/deserialization of native and custom Julia types 
  is supported; so Complex{Int128} can be stored in its own SQLite 
  table column and retrieved without any issue
  - Pure Julia scalar and aggregation functions can be registered with 
  an SQLite database and then called from within SQL statements: full docs 
  here 
  
   - *Feather.jl*
  - *Docs* 
  - Full support for feather release v0.3.0 to ensure compatibility
  - Full support for returning "factor" or "category" type columns as 
  native CategoricalArray and NullableCategoricalArray types in Julia, 
  thanks to the new CategoricalArrays.jl 
   package
  - nullable::Bool=true keyword argument; if false, columns without 
  null values will be returned as Vector{T} instead of NullableVector{T}
  - Feather.Sink now supports appending, so multiple DataFrames or 
  CSV.Source or any Data.Source can all be streamed to a single feather 
  file
   - *ODBC.jl*
  - *Docs* 
  - A new ODBC.DSN type that represents a valid, open connection to a 
  database; used in all subsequent api calls; it can be constructed using a 
  previously configured system/user dsn w/ username and password, or as a 
  full custom connection string
  - Full support for the DataStreams.jl framework through the 
  ODBC.Sourceand ODBC.Sink types, along with their high-level 
  convenience methods ODBC.query and ODBC.load
  - A new ODBC.prepare(dsn, sql) => ODBC.Statement method which can 
  send an sql statement to the database to be compiled and planned 
  before executed 1 or more times. SQL statements can include parameters to 
  be prepared that can have dynamic values bound before each execution.