from:"Jacob Quinn"

Re: [julia-users] Question: Forcing readtable to create string type on import

2016-11-03 Thread Jacob Quinn

LeAnthony,

I'm wondering if you're on an old version of DataFrames? There haven't been
any issues "show"-ing DataFrames with NullableArray columns for quite some
time. You can check (and post back here) your current package versions by
doing:

Pkg.installed()

You can also ensure you're on the latest valid release by doing:

Pkg.update()


-Jacob

On Thu, Nov 3, 2016 at 3:15 PM, Milan Bouchet-Valat 
wrote:

> Le jeudi 03 novembre 2016 à 13:35 -0700, LeAnthony Mathews a écrit :
> > Thanks Michael,
> >   I been thinking about this all day.  Yes, basically I am going to
> > have to create a macro CSVreadtable that mimics the readtable
> > command, but in the expantion uses CSV.read.  The macro will manually
> > constructs a similar readtable sized dataframe array, but use the
> > column types I specify or inherit from the original readtable
> > command.  The macro can use the current CSV.read parameters.
> >
> > So this would work.
> > df1_CSVreadtable = CSVreadtable("$df1_path"; types=Dict(1=>String))
> >
> > so a:
> > eltypes(df1_CSVreadtable)
> > 3-element Array{Type,1}:
> >  Int32
> >  String
> >  String
> >
> >
> >   Anyway, I was looking for a quick fix, but it least I will learn
> > some Julia.
> If you don't have missing values and just want a Vector{String}, you
> can pass nullable=false to CSV.read().
>
>
> Regards
>
> >
> >
> > > DataFrames is currently undergoing a very major change. Looks like
> > > CSV creates the new type of DataFrames. I hope someone can help you
> > > with using that. As a workaround, on the normal DataFrames version,
> > > I have generally just replaced with a string representation:
> > > ```
> > > df[:account_numbers] = ["$account_number" for account_number in
> > > df[:account_numbers]]
> > >
> > > On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews  > > om> wrote:
> > > > Sure, so I need col #1 in my CSV to be a string in my data frame.
> > > >
> > > >
> > > > So as a test  I tried to load the file 3 different ways:
> > > >
> > > > df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String))  #forcing
> > > > the column to stay a string
> > > > df1_readtable = readtable("$df1_path")  #Do not know how to force
> > > > the column to stay a string
> > > > df1_convertDF = convert(DataFrame, df1_CSV)
> > > >
> > > > Here is the output:  If they are all dataframes then showcols
> > > > should work an all three df1:
> > > >
> > > > julia> names(df1_CSV)
> > > > 3-element Array{Symbol,1}:
> > > >  :account_number
> > > >  Symbol("Discharge Date")
> > > >  :site
> > > >
> > > > julia> names(df1_readtable)
> > > > 3-element Array{Symbol,1}:
> > > >  :account_number
> > > >  :Discharge_Date
> > > >  :site
> > > >
> > > > julia> names(df1_convertDF)
> > > > 3-element Array{Symbol,1}:
> > > >  :account_number
> > > >  Symbol("Discharge Date")
> > > >  :site
> > > >
> > > >
> > > > julia> eltypes(df1_CSV)
> > > > 3-element Array{Type,1}:
> > > >  Nullable{String}
> > > >  Nullable{WeakRefString{UInt8}}
> > > >  Nullable{WeakRefString{UInt8}}
> > > >
> > > > julia> eltypes(df1_readtable)
> > > > 3-element Array{Type,1}:
> > > >  Int32   #Do not know how to force the column to stay a string
> > > >  String
> > > >  String
> > > >
> > > > julia> eltypes(df1_convertDF)
> > > > 3-element Array{Type,1}:
> > > >  Nullable{String}
> > > >  Nullable{WeakRefString{UInt8}}
> > > >  Nullable{WeakRefString{UInt8}}
> > > >
> > > > julia> showcols(df1_convertDF)
> > > > 1565x3 DataFrames.DataFrame
> > > > ERROR: MethodError: no method matching
> > > > countna(::NullableArrays.NullableArray{St
> > > > ring,1})
> > > > Closest candidates are:
> > > >   countna(::Array{T,N}) at
> > > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
> > > > ils.jl:115
> > > >   countna(::DataArrays.DataArray{T,N}) at
> > > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > > es\src\other\utils.jl:128
> > > >   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at
> > > > C:\Users\lmathews\.ju
> > > > lia\v0.5\DataFrames\src\other\utils.jl:143
> > > >  in colmissing(::DataFrames.DataFrame) at
> > > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > > es\src\abstractdataframe\abstractdataframe.jl:657
> > > >  in showcols(::Base.TTY, ::DataFrames.DataFrame) at
> > > > C:\Users\lmathews\.julia\v0.
> > > > 5\DataFrames\src\abstractdataframe\show.jl:574
> > > >  in showcols(::DataFrames.DataFrame) at
> > > > C:\Users\lmathews\.julia\v0.5\DataFrames
> > > > \src\abstractdataframe\show.jl:581
> > > >
> > > > julia> showcols(df1_readtable)
> > > > 1565x3 DataFrames.DataFrame
> > > > │ Col # │ Name   │ Eltype │ Missing │
> > > > ├───┼┼┼─┤
> > > > │ 1 │ account_number │ Int32  │ 0   │
> > > > │ 2 │ Discharge_Date │ String │ 0   │
> > > > │ 3 │ site   │ String │ 0   │
> > > >
> > > > julia> showcols(df1_CSV)
> > > > 1565x3 DataFrames.DataFrame
> > > > ERROR: MethodError: no method matching
> > > >

Re: [julia-users] Question: Forcing readtable to create string type on import

2016-10-31 Thread Jacob Quinn

You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/

In this case, you'd do:

df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account number
is column # 1
df2 = CSV.read(file2; types=Dict(1=>String))

-Jacob


On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews 
wrote:

> Using v0.5.0
> I have two different 10,000 line CSV files that I am reading into two
> different dataframe variables using the readtable function.
> Each table has in common a ten digit account_number that I would like to
> use as an index and join into one master file.
>
> Here is the account number example in the original CSV from file1:
> 8018884596
> 8018893530
> 8018909633
>
> When I do a readtable of this CSV into file1 then do a*
> typeof(file1[:account_number])* I get:
> *DataArrays.DataArray(Int32,1)*
>  -571049996
>  -571041062
>  -571024959
>
> when I do a
> *typeof(file2[:account_number])*
> *DataArrays.DataArray(String,1)*
>
>
> *Question:  *
> My CSV files give no guidance that account_number should be Int32 or
> string type.  How do I force it to make both account_number elements type
> String?
>
> I would like this join command to work:
> *new_account_join = join(file1, file2, on =:account_number,kind = :left)*
>
> But I am getting this error:
> *ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, got
> Array{*
> *Array{Symbol,1},1}*
> * in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join,
> ::DataFrames.DataFrame, ::D*
> *ataFrames.DataFrame) at .\:0*
>
>
> Any help would be appreciated.
>
>
>

[julia-users] [ANN] DataStreams v0.1: Blog post + Package Release Notes

2016-10-28 Thread Jacob Quinn

Hey everyone,

Just wanted to put out the announcement of the release of DataStreams v0.1. 
(it was actually tagged a few weeks ago, but I've been letting a few last 
things shake out before announcing).

I've written up a blog post on the updates and release 
here: http://quinnj.github.io/datastreams-jl-v0-1/

The TL;DR is DataStreams.jl now defines concrete interfaces for 
Data.Sources and Data.Sinks, with each being completely decoupled from the 
other. This has also allowed some cool new features like appending to 
Data.Sinks and allowing simple transform functions to be applied to data 
"in-transit".

I included release notes of existing packages in the blog post, but I'll 
copy-paste here below for easier access:

Do note that the DataStreams.jl framework is now Julia 0.5-only.



   - 
   
   *CSV.jl*
   - *Docs* 
  - Supports a wide variety of delimited file options such as delim, 
  quotechar, escapechar, custom null strings; a header can be provided 
  manually or on a specified row or range of rows; types can be 
  provided manually, and results can be requested as nullable or not (
  nullable=true by default); and the # of rows can be provided manually 
  (if known) for efficiency.
  - CSV.parsefield(io::IO, ::Type{T}) can be called directly on any IOtype 
  to tap into the delimited-parsing functionality manually
   - 
   
   *SQLite.jl*
   - *Docs* 
  - Query results will now use the declared table column type by 
  default, which can help resultset column typing in some cases
  - Parameterized SQL statements are fully supported, with the ability 
  to bind julia values to be sent to the DB
  - Full serialization/deserialization of native and custom Julia types 
  is supported; so Complex{Int128} can be stored in its own SQLite 
  table column and retrieved without any issue
  - Pure Julia scalar and aggregation functions can be registered with 
  an SQLite database and then called from within SQL statements: full docs 
  here 
  
   - *Feather.jl*
  - *Docs* 
  - Full support for feather release v0.3.0 to ensure compatibility
  - Full support for returning "factor" or "category" type columns as 
  native CategoricalArray and NullableCategoricalArray types in Julia, 
  thanks to the new CategoricalArrays.jl 
   package
  - nullable::Bool=true keyword argument; if false, columns without 
  null values will be returned as Vector{T} instead of NullableVector{T}
  - Feather.Sink now supports appending, so multiple DataFrames or 
  CSV.Source or any Data.Source can all be streamed to a single feather 
  file
   - *ODBC.jl*
  - *Docs* 
  - A new ODBC.DSN type that represents a valid, open connection to a 
  database; used in all subsequent api calls; it can be constructed using a 
  previously configured system/user dsn w/ username and password, or as a 
  full custom connection string
  - Full support for the DataStreams.jl framework through the 
  ODBC.Sourceand ODBC.Sink types, along with their high-level 
  convenience methods ODBC.query and ODBC.load
  - A new ODBC.prepare(dsn, sql) => ODBC.Statement method which can 
  send an sql statement to the database to be compiled and planned 
  before executed 1 or more times. SQL statements can include parameters to 
  be prepared that can have dynamic values bound before each execution.

Re: [julia-users] Using SQLBulkOperations in ODBC.jl

2016-10-26 Thread Jacob Quinn

As long as your DB table is created correctly (i.e. correct types), you can
do

ODBC.load(dsn, "table_name", df)

More docs here: http://juliadb.github.io/ODBC.jl/stable/#ODBC.load-1



On Wed, Oct 26, 2016 at 9:49 AM, Terry Seaward 
wrote:

> Hi,
>
> How could one use the SQLBulkOperations function in ODBC.jl to insert a
> DataFrame into a table?
>
> Additional ref: https://msdn.microsoft.com/en-us/library/ms712471(v=
> vs.85).aspx
>
> - TS
>

Re: [julia-users] Filtering DataFrame with a function

2016-10-13 Thread Jacob Quinn

I think the Julia ecosystem is evolving tremendously in this respect. I
think originally, there were a lot of these "mammoth" packages that tried
to provide everything and the kitchen sink. Unfortunately, this has led to
package bloat, package inefficiencies in terms of load times and
installation, and unmaintainability. DataFrames and Gadfly are great
examples.

The trend more recently has been a rededication to small, modular packages
that interopt nicely with others. This means moving things **out** of
packages that aren't totally essential: or in the case of DataFrames, that
can include things like IO (CSV.jl), data manipulation (Query.jl and
StructuredQuery.jl), and others.

Ultimately, with the help of core languages features like (
https://github.com/JuliaLang/julia/issues/15705), I think we'll continue to
see packages slim down. This, of course, opens up more possibilities in the
future for so-called "meta" packages that could bundle several packages
together. These "meta" packages are then essentially tasked with tracking
versions, dependencies, and so forth while individual packages can focus on
simple, solid code.

-Jacob

On Wed, Oct 12, 2016 at 11:20 PM, Júlio Hoffimann  wrote:

> Thank you very Much David, these queries you showed are really nice. I
> meant that ideally I wouldn't need to install another package for a simple
> filter operation on the rows.
>
> -Júlio
>
> 2016-10-12 22:14 GMT-07:00 :
>
>> Were you worried about Query being not lightweight enough in terms of
>> overhead, or in terms of syntax?
>>
>> I just added a more lightweight syntax for this scenario to Query. You
>> can now do the following two things:
>>
>> q = @where(df, i->i.price > 30.)
>>
>> that will return a filtered iterator. You can materialize that into a
>> DataFrame with collect(q, DataFrame).
>>
>> I also added a counting option. Turns out that is actually a LINQ query
>> operator, and the goal is to implement all of those in Query. The syntax is
>> simple:
>>
>> @count(df, i->i.price > 30.)
>>
>> returns the number of rows for which the filter condition is true.
>>
>> Under the hood both of these new syntax options use the normal Query
>> machinery, this just provides a simpler syntax relative to the more
>> elaborate things I've posted earlier. In terms of LINQ, this corresponds to
>> the method invocation API that LINQ has. I'm still figuring out how to
>> surface something like @count in the query expression syntax, but for now
>> one can use it via this macro.
>>
>> All of this is on master right now, so you would have to do
>> Pkg.checkout("Query") to get these macros.
>>
>> Best,
>> David
>>
>> On Wednesday, October 12, 2016 at 6:47:15 PM UTC-7, Júlio Hoffimann wrote:
>>>
>>> Hi David,
>>>
>>> Thank you for your elaborated answer and for writing a package for
>>> general queries, that is great! I will keep the package in mind if I need
>>> something more complex.
>>>
>>> I am currently looking for a lightweight solution within DataFrames,
>>> filtering is a very common operation. Right now, I am considering
>>> converting the DataFrame to an array and looping over the rows. I wonder if
>>> there is a syntactic sugar for this loop.
>>>
>>> -Júlio
>>>
>>> 2016-10-12 17:48 GMT-07:00 David Anthoff :
>>>
 Hi Julio,

 you can use the Query package for the first part. To filter a DataFrame
 using some arbitrary julia expression, use something like this:

 using DataFrames, Query, NamedTuples

 q = @from i in df begin

 @where 

 @select i

 end

 You can use any julia code in . Say your DataFrame
 has a column called price, then you could filter like this:

 @where i.price > 30.

 The i will be a NamedTuple type, so you can access the columns either
 by their name, or also by their index, e.g.

 @where i[1] > 30.

 if you want to filter by the first column. You can also just call some
 function that you have defined somewhere else:

 @where foo(i)

 As long as the  returns a Bool, you should be good.

 If you run a query like this, q will be a standard julia iterator.
 Right now you can’t just say length(q), although that is something I should
 probably enable at some point (I’m also looking into the VB LINQ syntax
 that supports things like counting in the query expression itself).

 But you could materialize the query as an array and then look at the
 length of that:

 q = @from i in df begin

 @where 

 @select i

 @collect

 end

 count = length(q)

 The @collect statement means that the query will return an array of a
 NamedTuple type (you can also materialize it

Re: [julia-users] Very Odd Enum Behavior

2016-10-03 Thread Jacob Quinn

Because Suit is the **first** field to Card? i.e. you need

push!(deck, Card(Suit(s), n))

-Jacob

On Mon, Oct 3, 2016 at 9:55 PM,  wrote:

> Ran into this while writing a simple, contrived example for a tutorial.
> Still working on it but I am baffled. Can anyone tell me why this is
> happening?
>
> *julia> **@enum Suit hearts=1 diamonds=2 clubs=3 spades=4*
>
> *julia> **type Card*
>
>   *suit::Suit*
>
>   *number::Int64*
>
>   *end*
>
> *julia> **function newDeck()*
>
>   *deck = Card[]*
>
>   *for s = 1:4*
>
>   *for n = 1:14*
>
>   *push!(deck, Card(n, Suit(s)))*
>
>   *end*
>
>   *end*
>
>   *deck*
>
>   *end*
>
> *newDeck (generic function with 1 method)*
>
>
>
> *julia> **newDeck()*
>
> *ERROR: ArgumentError: invalid value for Enum Suit: 5*
>
> * in enum_argument_error at Enums.jl:27*
>
> * in convert at Enums.jl:79*
>
> * in newDeck at none:6*
>
>
>

Re: [julia-users] ls()?

2016-09-14 Thread Jacob Quinn

readdir()

On Wed, Sep 14, 2016 at 8:34 AM, Adrian Lewis 
wrote:

> In the filesystem package, if we have pwd() and cd(), why do we not have
> ls()?
>
> Aidy
>

Re: [julia-users] Proposed solution for writing Enums

2016-08-23 Thread Jacob Quinn

Julia indeed has built-in enums: Just below here in the docs:
http://docs.julialang.org/en/latest/stdlib/base/#Base.Val{c}

On Tue, Aug 23, 2016 at 10:51 AM, Evan Fields  wrote:

> @enum doesn't do what you want for enums?
>

Re: [julia-users] How to Manipulate each character in of a string using a for loop in Julia ?

2016-08-17 Thread Jacob Quinn

Strings are immutable (similar to other languages). There are several
different ways to get what you want, but I tend to utilize IOBuffer a lot:

a = "abcd"
io = IOBuffer()

for char in a
write(io, a + 1)
end

println(takebuf_string(io))

-Jacob

On Wed, Aug 17, 2016 at 12:30 AM, Rishabh Raghunath  wrote:

>
> Hello fellow Julia Users!!
>
> How do you manipulate the individual characters comprising a string in
> Julia using a for loop ?
> For example:
> ###
>
> a = "abcd"
>
>   for i in length(a)
>a[i]+=1
>  end
>
> print(a)
>
> 
>  I am expecting to get my EXPECTED OUTPUT as" bcde  "
>
>  BUT I get the following error:
> ##
>
>  ERROR: MethodError: `setindex!` has no method matching
> setindex!(::ASCIIString, ::Char, ::Int64)
>  [inlined code] from ./none:2
>  in anonymous at ./no file:4294967295
>
> ##
> I also tried using:
>
> for i in eachindex(a) instead of the for loop in the above program .. And
> I get the same error..
>
> Please tell me what i should do to get my desired output ..
> Please respond ASAP..
> Thanks..
>

Re: [julia-users] How to debug segmentation fault?

2016-08-09 Thread Jacob Quinn

There are many much more knowledgeable than me on this, but I know there's
a good section in the manual to help you get started:
http://docs.julialang.org/en/latest/devdocs/C/

-Jacob

On Tue, Aug 9, 2016 at 9:53 AM, Adrian Salceanu 
wrote:

> I ran into an issue where apparently at random I get segmentation faults -
> how can I find out what exactly is causing the problem?
>
> Here is the dump:
> signal (11): Segmentation fault: 11
> julia_call_23669 at  (unknown line)
> disposable_instance at /Users/adrian/Dropbox/Projects/jinnie/lib/Genie/src
> /Model.jl:647
> to_select_part at /Users/adrian/Dropbox/Projects/jinnie/lib/Genie/src/
> Model.jl:262
> to_fetch_sql at /Users/adrian/Dropbox/Projects/jinnie/lib/Genie/src/Model.
> jl:542
> find at /Users/adrian/Dropbox/Projects/jinnie/lib/Genie/src/Model.jl:40
> jl_apply at /private/tmp/julia-20160615-15177-tdcnou/src/./julia.h:1331
> find_one_by at /Users/adrian/Dropbox/Projects/jinnie/lib/Genie/src/Model.
> jl:55
> jl_apply at /private/tmp/julia-20160615-15177-tdcnou/src/./julia.h:1331
> current_user at /Users/adrian/Dropbox/Projects/jinnie/app/resources/users/
> model.jl:64
> jl_apply at /private/tmp/julia-20160615-15177-tdcnou/src/gf.c:1691
> with_authorization at /Users/adrian/Dropbox/Projects/jinnie/app/resources/
> users/model.jl:82
> articles at /Users/adrian/Dropbox/Projects/jinnie/app/resources/articles
> /./modules/AdminController.jl:6
> jl_apply at /private/tmp/julia-20160615-15177-tdcnou/src/./julia.h:1331
> jl_f_invoke at /private/tmp/julia-20160615-15177-tdcnou/src/builtins.c:
> 1114
> invoke_controller at /Users/adrian/Dropbox/Projects/jinnie/lib/Genie/src/
> Router.jl:187
> jl_apply at /private/tmp/julia-20160615-15177-tdcnou/src/./julia.h:1331
> match_routes at /Users/adrian/Dropbox/Projects/jinnie/lib/Genie/src/Router
> .jl:73
> route_request at /Users/adrian/Dropbox/Projects/jinnie/lib/Genie/src/
> Router.jl:44
> jl_apply at /private/tmp/julia-20160615-15177-tdcnou/src/./julia.h:1331
> anonymous at /Users/adrian/Dropbox/Projects/jinnie/lib/Genie/src/AppServer
> .jl:18
> on_message_complete at /Users/adrian/.julia/v0.4/HttpServer/src/HttpServer
> .jl:400
> jl_apply at /private/tmp/julia-20160615-15177-tdcnou/src/./julia.h:1331
> on_message_complete at /Users/adrian/.julia/v0.4/HttpServer/src/
> RequestParser.jl:104
> jlcapi_on_message_complete_21686 at  (unknown line)
> http_parser_execute at /Users/adrian/.julia/v0.4/HttpParser/deps/usr/lib/
> libhttp_parser.dylib (unknown line)
> http_parser_execute at /Users/adrian/.julia/v0.4/HttpParser/src/HttpParser
> .jl:92
> process_client at /Users/adrian/.julia/v0.4/HttpServer/src/HttpServer.jl:
> 365
> jlcall_process_client_23170 at  (unknown line)
> jl_apply at /private/tmp/julia-20160615-15177-tdcnou/src/./julia.h:1331
> anonymous at task.jl:447
> jl_apply at /private/tmp/julia-20160615-15177-tdcnou/src/./julia.h:1331
> [1]50892 segmentation fault  ./genie.jl s
>
> and this is the last known function (which works ok in different
> circumstances):
>
> function disposable_instance{T<:AbstractModel}(m::Type{T})
>   if m <: AbstractModel
> return m()
>   else
> error("$m is not a concrete subtype of AbstractModel")
>   end
> end
>
>

Re: [julia-users] How do I write `import Base.!` in julia 0.5?

2016-08-08 Thread Jacob Quinn

There's also a Compat.jl entry for this, see the first bullet point in the
documentation on the README. That way, you can just do

@compat Base.:!

and it will be valid for both 0.4/0.5.

https://github.com/JuliaLang/Compat.jl

-Jacob


On Mon, Aug 8, 2016 at 9:04 AM, Scott T  wrote:

> Great, thanks. I found that
> import Base: !
> also works and is also compatible with 0.4/0.5.
>
> On Monday, 8 August 2016 15:26:46 UTC+1, Kevin Squire wrote:
>>
>> Try
>>
>>   import Base.(!)
>>
>> Cheers,
>>   Kevin
>>
>> On Monday, August 8, 2016, Scott T  wrote:
>>
>>> In 0.4 I would write: import Base.!
>>> The syntax Base.:! is not yet supported.
>>>
>>> In 0.5:
>>>
>>> julia> import Base.:!
>>> ERROR: syntax: invalid "import" statement: expected identifier
>>>
>>> julia> import Base.!
>>> ERROR: syntax: invalid operator ".!"
>>>
>>> What am I missing here?
>>>
>>

Re: [julia-users] Re: Working with DataFrame columns types that are Nullable

2016-07-22 Thread Jacob Quinn

You are correct. There are properties of NullableArrays required for proper
data transfer/handling, but I still wanted users to get a familiar type
back. I'm definitely helping out with
https://github.com/JuliaStats/DataFrames.jl/pull/1008 to ensure DataFrames
gets ported over as quickly as possible.

-Jacob

On Fri, Jul 22, 2016 at 1:10 PM, John Best  wrote:

> Yeah, that's why I was surprised to get the Nullables. It's probably
> ODBC.jl anticipating the DataFrames.jl switch.
>
>
>> As far as I know, DataFrames are only backed by DataArrays. (I believe
>> there's current work being done to upgrade the speed and type stability of
>> DataFrames, in part by using Nullables.)
>>
>> It might be helpful to write a little convenience function like
>>
>> f(n) = isnull(n) ? NA : get(n)
>>
>> I will say I've found DataArrays to be super finicky when inferring
>> types. YMMV.
>>
>

Re: [julia-users] Does the Julia has goto?

2016-07-12 Thread Jacob Quinn

It's basically:

@label goto_label

then later.

@goto goto_label

Should probably have some docs though.

On Tue, Jul 12, 2016 at 6:16 AM, vasili111  wrote:

> I search docs and I did not find any information how to use it but there
> is some example code with goto. So, does the Julia has goto?
>

Re: [julia-users] I would like to display links to new Julia Questions from StackOverflow in the Gitter Sidebar

2016-07-09 Thread Jacob Quinn

Just sent it to you.

On Sat, Jul 9, 2016 at 10:11 PM, Lyndon White  wrote:

>
> Hi all,
> We were discussing this in the gitter chat
> .
> That it would be cool if everytime someone asked a Julia question on Stack
> Overflow, it would appear in the Activity sidebar.
>
>
> *They way, when questions are asked on stackoverflow, people hanging
> around on gitter would see it, and could answer it.Thus making the
> community more welcoming by people who are stuck getting help sooner,and so
> improving the adoption of the language.*
>
>
> So I threw together some code to make that happen:
> https://github.com/oxinabox/GitterBots.jl
> You can see it in action on my own gitter channel:
> https://gitter.im/oxinabox/JuliaStackOverflowWatcher
>
> 
>
>
> Its just a script running in a loop on my computer,
> each minute it checks the stackover flow JuliaLang RSS feed,
> and then posts them to a Gitter Custom Integration activity notifier.
>
> I would like to set it up to run in the main channel.
> The Activity bar in the main channel is currently empty -- unused.
>
> I'm happy to host it, more or less forever, on the same server I use to
> host the bot that links the IRC to gitter
> 
> But unlike that bot, I can not do this without permission form a channel
> admin.
> Which by default for gitter are people with commit access to the Julia
> repository.
>
> I need a channel admin to give me a webhook URL.
>
> This can be gotten by clicking:
> Room Settings -> Integrations -> *Custom*
> Then copying the URL (It should look like 
> *https://webhooks.gitter.im/e/adb87a00ca31a22272dc
> *) and clicking done.
> and sending it to me in an private message on gitter
> or an email (though that is unencripted plain text, so could I guess be
> snooped.  Certainly not be sending to this mailing list as that would leave
> it open for anyone to hook to)
>
> Alternatively, the repo for the bot that checks stack overflow could be
> cloned by a channel admin, and then they could run it themselves.
> And so I wouldn't need to be given the webhook URL.
>
>
> What do people think?
> People who were on gitter at the time I brought it up, and showed the
> demonstration were in favor, I think.
>
> Regards
> Lyndon White
>
>
>
>

Re: [julia-users] Parsing ASCIIString to Bool

2016-06-30 Thread Jacob Quinn

This was just fixed on master: https://github.com/JuliaLang/julia/pull/17078

On Thu, Jun 30, 2016 at 1:43 AM,  wrote:

> Hi all,
>
> I was just wondering if the following is expected behaviour:
>
> julia> parse("true")
> true
>
> julia> typeof(parse("true"))
> Bool
>
> julia> parse(Bool, "true")
> ERROR: InexactError()
>  in convert at ./bool.jl:6
>  in tryparse_internal at parse.jl:84
>  in tryparse_internal at parse.jl:136
>  in parse at parse.jl:146
>
>
> It seems odd that the last line results in an error. Anyone know why?
>
> Cheers,
>
> Colin
>

Re: [julia-users] Oracle ODBC

2016-06-27 Thread Jacob Quinn

It's probably the case that Oracle has built the drivers themselves, so a
programming language would just need a wrapper library around the direct
driver (similar things exist for MySQL, Postgres, etc.).

But what Stefan said still applies, someone would have to take the
initiative to build the wrapper library around the Oracle driver and
provide a Julia package. Probably not a terribly hard project (basically
lots of ccalls and some julia-level interface design), but where ODBC
provides a connection more-or-less out of the box, it cuts down a little on
the pressing need.

-Jacob

On Mon, Jun 27, 2016 at 10:23 AM, Stefan Karpinski 
wrote:

> Given how expensive Oracle databases are, creating a direct driver isn't
> really a plausible open source endeavor, so this is unlikely to happen
> unless someone who actually has an Oracle database builds it themselves or
> pays for it to be built (e.g. through Julia Computing
> ).
>
> On Sun, Jun 26, 2016 at 8:51 PM, John Kim  wrote:
>
>> In R, there are direct Oracle OCI drivers.  According to the oracle
>> benchmarks, they are 3x faster than the ODBC versions.  Any idea if direct
>> OCI will be supported for Julia?
>>
>>
>> On Monday, May 9, 2016 at 1:03:25 AM UTC-7, Stefan Karpinski wrote:
>>>
>>> You have to install ODBC drivers yourself – the Julia package just
>>> provides an interface to them.
>>>
>>> On Mon, May 9, 2016 at 5:27 AM, John Kim  wrote:
>>>
 Hello

 I'm new to Julia and would like to start using it for various
 projects.  One such project requires me to access an Oracle database.  when
 using the ODBC package, the listdrivers() command only shows PostgreSQL and
 MySQL drivers installed by default.  Are Oracle drivers available for
 Julia?

>>>
>>>
>

Re: [julia-users] indexing over `zip(collection1, collection2)`

2016-06-23 Thread Jacob Quinn

Sorry, to clarify a little:

The things you're zipping are not necessarily indexable (i.e. other
iterators), so it's not safe to assume you can always index a Zip.

On Thu, Jun 23, 2016 at 2:21 PM, Jacob Quinn <quinn.jac...@gmail.com> wrote:

> Most "iterator" types are not indexable, AFAIK. The typical
> recommendation/idiom is to just call `collect(itr)` if you need to
> specifically index.
>
> -Jacob
>
> On Thu, Jun 23, 2016 at 2:18 PM, Davide Lasagna <lasagnadav...@gmail.com>
> wrote:
>
>> Is there any particular reason why `Zip` objects are iterable but not
>> indexable? Python allows that.
>>
>> From previous discussion on the topic (2014 topic at
>> https://groups.google.com/forum/#!topic/julia-dev/5bgMvzJveWA) it seems
>> that it has not been implemented yet.
>>
>> Thanks,
>>
>>
>>
>

Re: [julia-users] indexing over `zip(collection1, collection2)`

2016-06-23 Thread Jacob Quinn

Most "iterator" types are not indexable, AFAIK. The typical
recommendation/idiom is to just call `collect(itr)` if you need to
specifically index.

-Jacob

On Thu, Jun 23, 2016 at 2:18 PM, Davide Lasagna 
wrote:

> Is there any particular reason why `Zip` objects are iterable but not
> indexable? Python allows that.
>
> From previous discussion on the topic (2014 topic at
> https://groups.google.com/forum/#!topic/julia-dev/5bgMvzJveWA) it seems
> that it has not been implemented yet.
>
> Thanks,
>
>
>

Re: [julia-users] Syntax for composite type constructors

2016-06-17 Thread Jacob Quinn

There's been a long-open issue for this here:
https://github.com/JuliaLang/julia/pull/6122

Not sure why it's never been implemented as I don't think anyone was really
opposed. I imagine if people make a big push for it, we could convince
someone to get it across the finish line.

-Jacob

On Fri, Jun 17, 2016 at 11:17 AM, Stephan Buchert 
wrote:

> When making instances of my composite Julia types, such as
>
> foo = Foo(1, "bartxt", 3.14)
>
> I often have to look up the definition, because I don't remember whether,
> in this case, "bar" was the first or the second field:
>
> type Foo
>   bar::AbstractString
>   baz::Int
>   qux::Float64
> end
>
> "bar" was the first field, so the correct construction is
>
> foo = Foo("bartxt", 1, 3.14)
>
> In this case the first, incorrect code would not compile because of no
> matching method.
>
> But for
>
> type Goo
>   vertical::Float64
>   horizontal::Float64
>   qux::Float64
> end
>
> it is easy to write incorrect code when the order of the fields gets mixed
> up, and such bugs are potentially difficult to find.
>
> Wouldn't it be possible to allow (optionally) for a syntax like
>
> foo = Foo(;bar=bartxt, baz=1, qux=3.14);
> goo = Goo(;horizontal=186.,vertical=0.33, qux=3.14)
>
> where the keyword arguments are the fieldnames of the type?
>
> Often I remember the fields of a type, but not their order.
> With this syntax the argument order would be arbitrary.
> The syntax would imply, that incorrect, non-existing, or missing
> fields/keywords result in an error.
>
> Thanks for your consideration.
>

Re: [julia-users] Re: Git submodules vs Julia submodules

2016-06-15 Thread Jacob Quinn

You're correct Eric; libgit2 isn't doing anything related to what you're
doing here. You're really just telling Julia one more location to look in
order to find a module definition. There's an open issue I created recently
about how to do this more generally (i.e. without having to explicitly
touch LOAD_PATH)

Ref: https://github.com/JuliaLang/julia/issues/16830

-Jacob

On Thu, Jun 16, 2016 at 12:29 AM, Eric Forgy  wrote:

> Hi Tony,
>
> I'm not sure I understand. I am using Julia 0.4.5 (Windows), but not using
> anything explicit from package manager to handle the Git submodules. I just
> do a `git submodule add ...` from the command line (not REPL). I can't
> imagine package manager breaking this in 0.5. Will it? This libgit2 seems
> to be causing all kinds of problems for Windows users, e.g. I still cannot
> even play with 0.5 until this
>  is resolved.
>
> Eric
>
> On Thursday, June 16, 2016 at 3:10:03 AM UTC+8, Tony Kelman wrote:
>>
>> Assuming you're using Julia 0.4 here? Since Julia 0.5 has transitioned to
>> using libgit2 for the package manager, I don't think you'll have explicit
>> support for git submodules right away.
>>
>

Re: [julia-users] Problems with MySQL sample code

2016-06-14 Thread Jacob Quinn

Yep, it's just like I said above, in the "mysql_execute(con"
command/line, you'll even notice the code highlighting in your email shows
that "John" is highlighted as a value instead of a text. That's because the
double quote right before John is ending the string that starts with
"INSERT...".

You need to escape the double quotes there or, as I said, use single
quotes, so that line would be:

mysql_execute(con, "INSERT INTO Employee (Name, Salary, JoinDate)
values ('John', 25000.00, '2015-12-12'), ('Sam', 35000.00,
'2012-18-17), ('Tom', 5.00, '2013-12-14');")

Also feel free to checkout https://github.com/JuliaDB/ODBC.jl, it's a
great package for interacting with a variety of databases through the
common ODBC framework, most heavily tested against MySQL.

-Jacob

On Tue, Jun 14, 2016 at 1:53 PM, Ingemar Skarpås  wrote:

> Well - it is this sample code- direct from Github for MySQL.jl -only
> modification i adrress username et c in the con=mysql_connect(). And the
> same works for the MySQL test suite. And the code works up until row 14 in
> this case. I my code I have a few more comments I've added but teh
> executable code is the same!
>
> using MySQL
> con = mysql_connect("192.168.23.24", "username", "password", "db_name")
> command = """CREATE TABLE Employee
>  (
>  ID INT NOT NULL AUTO_INCREMENT,
>  Name VARCHAR(255),
>  Salary FLOAT,
>  JoinDate DATE,
>  PRIMARY KEY (ID)
>  );"""
> mysql_execute(con, command)
>
> # Insert some values
> mysql_execute(con, "INSERT INTO Employee (Name, Salary, JoinDate) values 
> ("John", 25000.00, '2015-12-12'), ("Sam", 35000.00, '2012-18-17), ("Tom", 
> 5.00, '2013-12-14');")
>
> # Get SELECT results
> command = "SELECT * FROM Employee;"
> dframe = mysql_execute(con, command)
>
> # Close connection
> mysql_disconnect(con)
>
>
>
>
> Den tisdag 14 juni 2016 kl. 19:01:49 UTC+2 skrev Stefan Karpinski:
>>
>> On Tue, Jun 14, 2016 at 11:50 AM, Ingemar Skarpås 
>> wrote:
>>
>>> LoadError:UdefVarError: @John_str not defined
>>
>>
>> This bit indicates that you've written something like this: John"...".
>> This is translated to a macro call to a macro named @John_str.
>>
>>

Re: [julia-users] Problems with MySQL sample code

2016-06-14 Thread Jacob Quinn

My guess is that you have an SQL query that is something like:

query("select * from table where name = "John" and dept = 7")

The problem then is that the double quotation mark right before John is
actually *ending* the double quote that started your query string. What you
want is to escape the quotation mark around John, (or better yet, just use
a single quote), so something like:

query("select * from table where name = 'John' and dept = 7")

or

query("select * from table where name = \"John\" and dept = 7")

should work.

-Jacob

On Tue, Jun 14, 2016 at 9:50 AM, Ingemar Skarpås  wrote:

>
>
> Last week I started to evaluate Julia to see if it fits my purposes (I
> made some test a year and half ago also) , so I have tested a few things
> written some tests of my own, and modified some sample code. So bare with
> me if this is an easy one...
>
> I have Julia 0.4.5, with MySQL 0.2.2+ on W7U and MariaDB 10.0.17 I have
> create a database which is set to be the default and then tries to execute
> the sample code with the MySQL.jl from that Github. I used Atom 1.8.0.
> Everything is updated!
>
> The running the sample code, the table is correctly created, but when
> executing the line with the SELECT INSERT it gives an Error on the file
> name refering to the line number after that line. In my case an empty line.
> (So, I found out that Atom's error message window can't be copied from!).
> So manually typed...
>
> LoadError:UdefVarError: @John_str not defined
>  in include_string at loading.jl:282
> in include_String at C:\Users\Myname\.julia\v.04\CodeTools\src\eval.jl:32
>  in anonymous at C:\Users\Myname\ .julia\v0.4\Atom\src\eval.jl:84
>  in withpath at C:\Users\Myname\.julia\v0.4\Requires\src\requires.jl:37
>  in withpath at C:\Users\Mname\.julia\v0.4\Atom\src\eval.jl:53
>  [inlined code] from C:\Users\Myname\.julia\v0.4\Atom\src\eval.jl:83
>  in anonymour at task.jl:58
> while loading C:\Julia_Code\test_Mysql_localhost.jl, in expression
> starting on line 16
>
> The odd thing - running the test suite that is located from my local MySQL
> test directory with Pkg.test("MySQL") is ok! And, that test suite (3 files
> are run) also uses SELECT INTO, but in a much more complicated way!
>
> So, what is the problem? I have been staring myself blind so far
>

Re: [julia-users] Initialization via ntuple()

2016-06-08 Thread Jacob Quinn

See the open issue here: https://github.com/JuliaLang/julia/issues/11902
And Keno's PR here: https://github.com/JuliaLang/julia/pull/12113

TL;DR nothing like fill! yet for tuples, but planned/being worked on

-Jacob

On Wed, Jun 8, 2016 at 11:04 AM, Islam Badreldin <islam.badrel...@gmail.com>
wrote:

> Thanks! I was looking for something similar to fill() but for NTuple ..
> But I couldn't find any ..
>
>   -Islam
>
> On Wednesday, June 8, 2016 at 11:20:06 AM UTC-4, Jacob Quinn wrote:
>
>> That looks right as far as I understand.
>>
>> -Jacob
>>
>> On Wed, Jun 8, 2016 at 8:42 AM, Islam Badreldin <islam.b...@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> What is the easiest way to initialize NTuple with some constant value?
>>> Currently I am using
>>>
>>> julia> y=ntuple(x->0.0,10)
>>> (0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0)
>>>
>>>
>>> Is this the only initialization option?
>>>
>>> Thanks,
>>> Islam
>>>
>>
>>

Re: [julia-users] Initialization via ntuple()

2016-06-08 Thread Jacob Quinn

That looks right as far as I understand.

-Jacob

On Wed, Jun 8, 2016 at 8:42 AM, Islam Badreldin 
wrote:

> Hi
>
> What is the easiest way to initialize NTuple with some constant value?
> Currently I am using
>
> julia> y=ntuple(x->0.0,10)
> (0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0)
>
>
> Is this the only initialization option?
>
> Thanks,
> Islam
>

Re: [julia-users] Signal / slot (or publish-subscribe) libraries in Julia

2016-05-29 Thread Jacob Quinn

ZMQ.jl? Or maybe you don't need something socket based.

-Jacob

On Sun, May 29, 2016 at 1:25 PM, Femto Trader 
wrote:

> Hello,
>
> I'm looking for a Julia library which implements one of these design
> pattern:
>
> - signal / slot
> - publish / subscribe
>
> Python have many libraries for this purpose
>
> for signal / slot
> signalslot https://github.com/Numergy/signalslot
> smokesignal https://github.com/shaunduncan/smokesignal
> circuits https://github.com/circuits/circuits
> blinker https://pythonhosted.org/blinker/
> for pub / sub
> pypubsub http://pubsub.sourceforge.net/
>
> What Julia alternatives should I consider ?
> I noticed Reactive.jl ... but it seems quite complex
>
> What I like in signal / slot pattern is these very simple (and easily
> understandable)
> emit and connect methods.
>
> Kind regards
>

Re: [julia-users] Re: Questions regarding to SQLite

2016-05-25 Thread Jacob Quinn

Yep,

like I said, just switch your "using" statements around to make sure "using
DataFrames" comes before "using SQLite"

-Jacob

On Wed, May 25, 2016 at 11:15 PM, SHORE SHEN <shore.horizonl...@gmail.com>
wrote:

> Hi:
>
> heres my codes:
>
> using SQLite
> using DataFrames
>
> db=SQLite.DB("test3.db")
> query(db,"create table emp (id integer, name text)")
> query(db,"insert into emp (id, name) values(1,'jay')")
> query(db,"insert into emp (id, name) values(2,'kay')")
> query(db,"insert into emp (id, name) values(4,'vay')")
>
> y=SQLite.query(db,"select * from emp")
> z=Data.stream!(SQLite.Source(db,"select * from emp"), Data.Table)
>
> DataFrame(z)
>
> and the error reported:
>
> MethodError: `convert` has no method matching
> convert(::Type{DataFrames.DataFrame},
> ::DataStreams.Data.Table{Array{NullableArrays.NullableArray{T,1},1}})
> This may have arisen from a call to the constructor
> DataFrames.DataFrame(...),
> since type constructors fall back to convert methods.WARNING: Error
> showing method candidates, aborted
>
>  in call at essentials.jl:56
>  in include_string at C:\Users\shoren\.julia\v0.4\CodeTools\src\eval.jl:28
>  in include_string at C:\Users\shoren\.julia\v0.4\CodeTools\src\eval.jl:32
>  [inlined code] from C:\Users\shoren\.julia\v0.4\Atom\src\eval.jl:39
>  in anonymous at C:\Users\shoren\.julia\v0.4\Atom\src\eval.jl:62
>  in withpath at C:\Users\shoren\.julia\v0.4\Requires\src\require.jl:37
>  in withpath at C:\Users\shoren\.julia\v0.4\Atom\src\eval.jl:53
>  [inlined code] from C:\Users\shoren\.julia\v0.4\Atom\src\eval.jl:61
>  in anonymous at task.jl:58
>
> 在 2016年5月26日星期四 UTC+8下午12:45:00，Jacob Quinn写道：
>>
>> Make sure you do
>>
>> using DataFrames
>> using SQLite
>>
>> in that order in a fresh Julia session to ensure the conversion methods
>> get defined (when you load the SQLite package, it checks if DataFrames has
>> already been loaded to define the conversion routines).
>>
>> -Jacob
>>
>>
>> On Wed, May 25, 2016 at 10:43 PM, SHORE SHEN <shore.ho...@gmail.com>
>> wrote:
>>
>>> Hi Alex
>>>
>>> thanks so much for your reply, I tried out the method but it fails when
>>> using DataFrame(Data.stream!(source, Data.Table)) commend. It reported
>>> no such method error msg as follow:
>>>
>>> MethodError: `convert` has no method matching
>>> convert(::Type{DataFrames.DataFrame},
>>> ::DataStreams.Data.Table{Array{NullableArrays.NullableArray{T,1},1}})
>>> This may have arisen from a call to the constructor
>>> DataFrames.DataFrame(...),
>>> since type constructors fall back to convert methods.WARNING: Error
>>> showing method candidates, aborted
>>>
>>>  in call at essentials.jl:56
>>>  in include_string at
>>> C:\Users\shoren\.julia\v0.4\CodeTools\src\eval.jl:28
>>>  in include_string at
>>> C:\Users\shoren\.julia\v0.4\CodeTools\src\eval.jl:32
>>>  [inlined code] from C:\Users\shoren\.julia\v0.4\Atom\src\eval.jl:39
>>>  in anonymous at C:\Users\shoren\.julia\v0.4\Atom\src\eval.jl:62
>>>  in withpath at C:\Users\shoren\.julia\v0.4\Requires\src\require.jl:37
>>>  in withpath at C:\Users\shoren\.julia\v0.4\Atom\src\eval.jl:53
>>>  [inlined code] from C:\Users\shoren\.julia\v0.4\Atom\src\eval.jl:61
>>>  in anonymous at task.jl:58
>>>
>>> 在 2016年5月26日星期四 UTC+8上午12:30:52，Alex Mellnik写道：
>>>
>>>>
>>>> Yes to both!
>>>>
>>>> For the first one, you want to use Data.stream! to get a Data.Table
>>>> from the result set, and then convert it to a DataFrame.  For the second
>>>> you need to do the opposite.  I generally work with DataFrames so I wrote
>>>> two convenience functions for doing this, which should give you an idea of
>>>> how to go about it:
>>>>
>>>> function queryToDF(db, query)
>>>> source = SQLite.Source(db, query)
>>>> return DataFrame(Data.stream!(source, Data.Table))
>>>> end
>>>>
>>>> function dfToDB(db, df, table)
>>>> sink = SQLite.Sink(Data.Table(df), db, table)
>>>> Data.stream!(Data.Table(df), sink)
>>>> end
>>>>
>>>> Cheers -A
>>>>
>>>> On Wednesday, May 25, 2016 at 2:54:26 AM UTC-7, SHORE SHEN wrote:
>>>>>
>>>>> Hello
>>>>>
>>>>> Im trying out the sqlite package in julia, I got the following 2
>>>>> questions:
>>>>>
>>>>> 1, the query will result in a type of SQLite.ResultSet, if i can
>>>>> output dataframe or dataarry type?
>>>>>
>>>>> 2, if i have a dataframe or dataarray type, would i be able to put it
>>>>> into the database table?
>>>>>
>>>>> thanks a lot!
>>>>>
>>>>
>>

Re: [julia-users] Re: Questions regarding to SQLite

2016-05-25 Thread Jacob Quinn

Make sure you do

using DataFrames
using SQLite

in that order in a fresh Julia session to ensure the conversion methods get
defined (when you load the SQLite package, it checks if DataFrames has
already been loaded to define the conversion routines).

-Jacob


On Wed, May 25, 2016 at 10:43 PM, SHORE SHEN 
wrote:

> Hi Alex
>
> thanks so much for your reply, I tried out the method but it fails when
> using DataFrame(Data.stream!(source, Data.Table)) commend. It reported no
> such method error msg as follow:
>
> MethodError: `convert` has no method matching
> convert(::Type{DataFrames.DataFrame},
> ::DataStreams.Data.Table{Array{NullableArrays.NullableArray{T,1},1}})
> This may have arisen from a call to the constructor
> DataFrames.DataFrame(...),
> since type constructors fall back to convert methods.WARNING: Error
> showing method candidates, aborted
>
>  in call at essentials.jl:56
>  in include_string at C:\Users\shoren\.julia\v0.4\CodeTools\src\eval.jl:28
>  in include_string at C:\Users\shoren\.julia\v0.4\CodeTools\src\eval.jl:32
>  [inlined code] from C:\Users\shoren\.julia\v0.4\Atom\src\eval.jl:39
>  in anonymous at C:\Users\shoren\.julia\v0.4\Atom\src\eval.jl:62
>  in withpath at C:\Users\shoren\.julia\v0.4\Requires\src\require.jl:37
>  in withpath at C:\Users\shoren\.julia\v0.4\Atom\src\eval.jl:53
>  [inlined code] from C:\Users\shoren\.julia\v0.4\Atom\src\eval.jl:61
>  in anonymous at task.jl:58
>
> 在 2016年5月26日星期四 UTC+8上午12:30:52，Alex Mellnik写道：
>
>>
>> Yes to both!
>>
>> For the first one, you want to use Data.stream! to get a Data.Table from
>> the result set, and then convert it to a DataFrame.  For the second you
>> need to do the opposite.  I generally work with DataFrames so I wrote two
>> convenience functions for doing this, which should give you an idea of how
>> to go about it:
>>
>> function queryToDF(db, query)
>> source = SQLite.Source(db, query)
>> return DataFrame(Data.stream!(source, Data.Table))
>> end
>>
>> function dfToDB(db, df, table)
>> sink = SQLite.Sink(Data.Table(df), db, table)
>> Data.stream!(Data.Table(df), sink)
>> end
>>
>> Cheers -A
>>
>> On Wednesday, May 25, 2016 at 2:54:26 AM UTC-7, SHORE SHEN wrote:
>>>
>>> Hello
>>>
>>> Im trying out the sqlite package in julia, I got the following 2
>>> questions:
>>>
>>> 1, the query will result in a type of SQLite.ResultSet, if i can output
>>> dataframe or dataarry type?
>>>
>>> 2, if i have a dataframe or dataarray type, would i be able to put it
>>> into the database table?
>>>
>>> thanks a lot!
>>>
>>

Re: [julia-users] Capture output from a shell command

2016-05-18 Thread Jacob Quinn

`countlines(file)` in Base should actually be just as fast as `wc`.

But otherwise, you can do

readall(`wc -l ratings.dat`) to capture all the output of a command (note
that this is `readstring(`wc -l ratings.dat`)` on 0.5)

-Jacob


On Wed, May 18, 2016 at 11:43 AM, Douglas Bates  wrote:

> I have forgotten how to capture the output from a shell command.  The
> background is that the `wc` program on Linux (and, I assume, OS X) is an
> incredibly fast way to count the number of lines in a data file, as in
>
> julia> run(`wc -l ratings.dat`)
> 1000209 ratings.dat
>
> I want to capture that string and parse the number.
>

Re: [julia-users] Slow reading of file

2016-05-14 Thread Jacob Quinn

I'm actually one who is trying to increase overall things like integer
parsing. Currently in the CSV.jl package, we have some beta-mode fast
parsing functions. I generated a file with:

open("test_integers","w") do f
for i = 1:7_068_650
println(f, rand(Int16))
end
end

And reading it with the CSV.jl package gives me:

julia> @time csv = CSV.csv("test_integers";types=[Int])
  1.156055 seconds (7.07 M allocations: 168.536 MB, 6.68% gc time)
Data.Table:
7068649x1 Data.Schema:
 19591
 Int64

Column Data:
[-2,-23575,-6091,-1421,-4229,27266,-15925,20891,19254,4630  …
 25060,-20681,2218,-16672,14473,-14427,4868,14841,7874,6445]


On Sat, May 14, 2016 at 7:22 AM, Yichao Yu  wrote:

> On Sat, May 14, 2016 at 8:31 AM, Ford Ox  wrote:
> > type Tokenizer
> > tokens::Array{ASCIIString, 1}
> > index::Int
> > Tokenizer(s::ASCIIString) = new(split(strip(s)), 0)
> > end
> >
> > Julia still runs 11 seconds...
>
> The main cost is not coming from the dynamic dispatch,but the
> allocation of strings and arrays
>
> The `new(split(strip(s)), 0)` above allocates a new string (`strip`)
> and allocates an array of `SubString`'s (`split`) and then converting
> it to (and therefore allocate) an array of `ASCIIString`. I believe
> the java tokenizer is likely much more efficient than this.
>
> Changing it to sth like
>
> ```
> using Compat
>
> ##Tokenizer ##
>
> type Tokenizer
> string::Compat.ASCIIString
> index::Int
> len::Int
> function Tokenizer(s::Compat.ASCIIString)
> i = 1
> len = length(s)
> while i <= len && isspace(s[i])
> i += 1
> end
> new(s, i - 1, len)
> end
> end
>
> isempty(t::Tokenizer) = t.len == t.index
>
> function next!(t::Tokenizer)
> i = j = t.index + 1
> len = t.len
> s = t.string
> while i <= len && !isspace(s[i])
> i += 1
> end
> subs = SubString(s, j, i - 1)
> while i <= len && isspace(s[i])
> i += 1
> end
> t.index = i - 1
> subs
> end
> ```
>
> reduces the time from 17s to 3s for me
>
> changing the `i += 1` to the more general version `i = nextind(s, i)`
> increases the runtime to ~4s and I think improving this is one of the
> reason Stefan is working on the String stuff.
>
> The integer parsing also need some work, removing it reduces the
> runtime to 1.7s and I believe at least @simonbyrne (and maybe many
> others) are working on that.
>
>
> >
> > Dne sobota 14. května 2016 14:08:48 UTC+2 Milan Bouchet-Valat napsal(a):
> >>
> >> Le samedi 14 mai 2016 à 05:01 -0700, Ford Ox a écrit :
> >> > Fixed. Julia now takes 11 seconds to finish
> >> > type Tokenizer
> >> > tokens::Array{AbstractString, 1}
> >> > index::Int
> >> > Tokenizer(s::AbstractString) = new(split(strip(s)), 0)
> >> > end
> >> >
> >> > type Buffer
> >> > stream::IOStream
> >> > tokenizer::Tokenizer
> >> > Buffer(stream) = new(stream, Tokenizer(""))
> >> > end
> >> AbstractString is still not a concrete type. Use
> >> UTF8String/ASCIIString, or do this instead:
> >>
> >> type Tokenizer{T<:AbstractString}
> >>  tokens::Array{T, 1}
> >>  index::Int
> >>  Tokenizer(s::AbstractString) = new(split(strip(s)), 0)
> >> end
> >>
> >> type Buffer{T<:AbstractString}
> >> stream::IOStream
> >> tokenizer::Tokenizer{T}
> >> Buffer(stream) = new(stream, Tokenizer(""))
> >> end
> >>
> >> (Note that "" will create an ASCIIString, use UTF8String("") if you need
> >> to support non-ASCII chars.)
> >>
> >>
> >> Regards
> >>
> >> >
> >> >
> >> > > Your types have totally untyped fields – the compiler has to emit
> >> > > very pessimistic code about this. Rule of thumb: locations (fields,
> >> > > collections) should be as concretely typed as possible; parameters
> >> > > don't need to be.
> >> > >
> >> > > On Sat, May 14, 2016 at 1:36 PM, Ford Ox  wrote:
> >> > > > I have written exact same code in java and julia for reading
> >> > > > integers from file.
> >> > > > Julia code was A LOT slower. (12 seconds vs 1.16 seconds)
> >> > > >
> >> > > > import Base.isempty, Base.close
> >> > > >
> >> > > > ##Tokenizer ##
> >> > > >
> >> > > > type Tokenizer
> >> > > > tokens
> >> > > > index
> >> > > > Tokenizer(s::AbstractString) = new(split(strip(s)), 0)
> >> > > > end
> >> > > >
> >> > > > isempty(t::Tokenizer) = length(t.tokens) == t.index
> >> > > >
> >> > > > function next!(t::Tokenizer)
> >> > > > t.index += 1
> >> > > > t.tokens[t.index]
> >> > > > end
> >> > > >
> >> > > > ## Buffer ##
> >> > > >
> >> > > > type Buffer
> >> > > > stream
> >> > > > tokenizer
> >> > > > Buffer(stream) = new(stream, [])
> >> > > > end
> >> > > >
> >> > > > function next!(b::Buffer)
> >> > > > if isempty(b.tokenizer)
> >> > > > b.tokenizer = Tokenizer(readline(b.stream))
> >> > > > end
> >> > > > next!(b.tokenizer)
> >> > > > end
> >> > > >
> >> > > >

Re: [julia-users] Julia Utopia: Share your tips and tricks to efficient coding in Julia

2016-05-12 Thread Jacob Quinn

I'll take a stab here. For context, I've been coding in Julia since 2012,
contributed to Base and some data processing packages.

* I currently use Atom; it's come a long way since it started, both the IDE
itself and Julia support. It seems to have way more momentum in both
respects as well (vs. say, sublime). I don't use the inline evaluation very
much; not sure why, but I just haven't ever been a huge fan of that
workflow. I typically just copy paste into the terminal (iTerm2 on mac),
and enjoy using the *very* rich REPL features Julia provides (tab
completion, history search, symbols, etc.).

* For debugging code, I typically rely on good ole `println` or `@show`,
plus building things up incrementally (validating things as you build up).
I recently gave Gallium.jl a spin; for context, I've never coded in a
language with a "real debugger", so I'm not sure I'm the ideal candidate
here, but I found it somewhat difficult to navigate the right granularity
of stepping through code (should I go to the next line? next call? step
into it?). I'm 100% positive this is due to me needing to spend more time
with it and better learn "debugging" practices; note it's currently being
developed on master, so 0.5/nightlies.

* I don't do a lot of visualizing of my work, so no great advice/opinions
here.
.
* For day-to-day, here's a smattering of things I find myself reaching for
all the time:

  * modules: I rarely get very far developing new code without throwing it
in a module; this allows to develop much easier interactively and
iteratively because you can just `reload("MyModule")` and you're ensured
all the types/methods get redefined.
  * @which: One of the handiest macros in Base to help you navigate Julia's
powerful dispatch system; use to ensure that the right methods are being
called. It also comes in very handy for just taking a quick peek at
somebody else's function and where it was defined if I want to take a
closer look.
  * @time and @profile: also very handy macros when you get into
performance tuning. Check out the Profiling section in the manual to learn
some of the ins and outs, but it becomes extremely helpful to get a feel
for where time is being spent in your code
  * @code_lowered, @code_typed, @code_llvm, @code_warntype: (there's also
@code_native if you're into the real nitty gritty). These are your best
friends! For greater context, search YouTube for Jeff Bezanson's JuliaCon
talk where he talks through the various levels of "compilation" in Julia;
it goes a long way to understanding exactly what's going with your
code. @code_warntype is particularly helpful for spotting type
instabilities which can kill performance. It's also helpful to get a feel
for the actual machinery of your code

Anyway, hopefully that sparks a few ideas.

-Jacob

On Thu, May 12, 2016 at 11:01 AM, David Parks 
wrote:

> I'm a few weeks into Julia and excited and motivated to learn and be as
> efficient as possible. I'm sure I'm not alone. I know my way around now,
> but am I as efficient as I can be?
>
> What haven't I tried? What haven't I seen? What haven't I asked?
>
> For those of you who have been around longer, could you share your advice
> on efficient day-to-day development style?
>
> For example:
>
>- What IDE do you use? Are you using Atom? A combination of Atom and
>the REPL? Something else?
>- How do you debug complex code efficiently? How do you debug other
>peoples code efficiently?
>- Do you have a favorite way of visualizing your work?
>- Are there must have tools? packages? utilities?
>- Any simple day-to-day efficiency/advice you could share with others
>who didn't yet know to ask.
>
>
>

Re: [julia-users] Julia SQL

2016-05-08 Thread Jacob Quinn

Also checkout the SQLite.jl package. It provides methods for reading CSV
files into an SQLite table and then running SQLite SQL commands on those
tables. You can then export the SQLite to a CSV or Data.Table/DataFrame.

-Jacob
On May 8, 2016 4:32 AM, "Tero Frondelius"  wrote:

> Maybe this thread is relevant:
> https://groups.google.com/forum/m/#!topic/julia-users/QjxiCO-Lv-0

Re: [julia-users] Can't add PostgreSQL

2016-05-04 Thread Jacob Quinn

Hey Ross,

It actually sounds like you're still on the older release of ODBC (you can
check by running Pkg.installed()). To get the latest version (that matches
the current documentation), you'll have to run:

Pkg.update()

Then restart Julia to make sure you have the latest code.

-Jacob


On Tue, May 3, 2016 at 9:58 PM, Boylan, Ross <ross.boy...@ucsf.edu> wrote:

> I got it working before your update.  Yay!  Thank you!
>
> There were a couple of issues, mostly documentation, leaving aside getting
> ODBC setup outside of julia.
>
> The docs (the help at https://github.com/JuliaDB/ODBC.jl) say to use DSN
> to get things going.  As far as I can tell, it doesn't exist.  I used
> connect.
>
> The docs mention a connection string, but provide no pointers to its
> content.  It would be nice to have one; the odbc packages I installed were
> almost documentation free.
>
> The docs imply the results of a query need to be turned into a DataFrame,
> but it seems to come out as one.
>
> The source code suggested one could get a dataframe with the output
> argument.  I'm not really sure what that's supposed to be or do, but when I
> tried
> julia> raw = ODBC.query("SELECT isim, id, x, t, y, regular, censor, wait
> from obs2619 where isim=1", conn, output=df
> )
>
>  ERROR: TypeError: typeassert: expected Union{AbstractString,DataType},
> got Function
> I had not set df to anything before the call, AFAIK.
>
> Ross
> ------
> *From:* karbar...@gmail.com [karbar...@gmail.com] on behalf of Jacob
> Quinn [quinn.jac...@gmail.com]
> *Sent:* Tuesday, May 03, 2016 7:57 PM
> *To:* julia-users@googlegroups.com
> *Subject:* Re: [julia-users] Can't add PostgreSQL
>
> Hey Ross,
>
> Just confirmed that connecting to Postgres does indeed work; haven't seen
> any issues yet. I'm using the PostgreSQL ODBC Driver from their website
> with unixODBC on a mac. I'm using a connection string I built by following
> the guidelines here:
> http://www.connectionstrings.com/postgresql-odbc-driver-psqlodbc/
>
> Here's a little sample code (I have more testing all the various data
> types in Postgres, but I won't post the full snippet here):
>
> using ODBC
>
> dsn =
> ODBC.DSN("Driver={PostgreSQL};Server=[server];Port=5432;Database=testdb;Uid=username;Pwd=password")
>
> # Check some basic queries
> dbs = ODBC.query(dsn, "SELECT datname FROM pg_database WHERE datistemplate
> = false;")
> data = ODBC.query(dsn, "SELECT table_schema,table_name FROM
> information_schema.tables ORDER BY table_schema,table_name;")
>
>
> The latest ODBC release was just merged in METADATA.jl, so if you do
> Pkg.add("ODBC") you should get the latest release/code. If you already had
> it installed, just run Pkg.update() to get the latest release.
>
> Good luck!
>
> -Jacob
>
> On Tue, May 3, 2016 at 7:11 PM, Boylan, Ross <ross.boy...@ucsf.edu> wrote:
>
>> Attempting to post comments inline, even though my mailer is allergic.
>> Look for >>
>>
>>
>> From: karbar...@gmail.com [karbar...@gmail.com] on behalf of Jacob Quinn
>> [quinn.jac...@gmail.com]
>> Sent: Tuesday, May 03, 2016 5:47 PM
>>
>> I'm not sure I understand: I don't see DBI.jl nor PostgreSQL.jl at
>> pkg.julialang.org, which tells me they aren't officially registered as
>> packages. The difficulties you're seeing then aren't that surprising
>> because they haven't felt comfortable enough to officially register.
>>
>> >> You're right.  I got to them through https://github.com/JuliaDB
>>
>> ODBC actually works fine on Linux/OSX through the unixODBC library. It
>> can be installed through a variety of means, but works quite well from what
>> I've seen. Postgres is actually next on my list to test with ODBC (so far
>> the testing has focused
>>  on Teradata, SQL Server, and MySQL), so if I find time in the next few
>> days, I'll try to put together an example and share it here.
>> >>  The status page indicates errors with julia 0.4 and Linux.  Does it
>> matter?  Clicking on the build logs it appears the most recent build for
>> 0.4, 124.4 (https://travis-ci.org/JuliaDB/ODBC.jl) succeeded, so I'm not
>> sure why the problem is showing (also the previous error looks like a
>> problem with the configuration of the build machine rather than a problem
>> with the package).
>>
>> >> No joy with the alternate Postgresl package;
>> https://github.com/NCarson/Postgres.jl/issues/3 has the details.
>>
>> >> So I guess I'll give ODBC a whirl.  Thanks  for your help.
>>
>
>

Re: [julia-users] Can't add PostgreSQL

2016-05-03 Thread Jacob Quinn

Hey Ross,

Just confirmed that connecting to Postgres does indeed work; haven't seen
any issues yet. I'm using the PostgreSQL ODBC Driver from their website
with unixODBC on a mac. I'm using a connection string I built by following
the guidelines here:
http://www.connectionstrings.com/postgresql-odbc-driver-psqlodbc/

Here's a little sample code (I have more testing all the various data types
in Postgres, but I won't post the full snippet here):

using ODBC

dsn =
ODBC.DSN("Driver={PostgreSQL};Server=[server];Port=5432;Database=testdb;Uid=username;Pwd=password")

# Check some basic queries
dbs = ODBC.query(dsn, "SELECT datname FROM pg_database WHERE datistemplate
= false;")
data = ODBC.query(dsn, "SELECT table_schema,table_name FROM
information_schema.tables ORDER BY table_schema,table_name;")


The latest ODBC release was just merged in METADATA.jl, so if you do
Pkg.add("ODBC") you should get the latest release/code. If you already had
it installed, just run Pkg.update() to get the latest release.

Good luck!

-Jacob

On Tue, May 3, 2016 at 7:11 PM, Boylan, Ross <ross.boy...@ucsf.edu> wrote:

> Attempting to post comments inline, even though my mailer is allergic.
> Look for >>
>
>
> From: karbar...@gmail.com [karbar...@gmail.com] on behalf of Jacob Quinn [
> quinn.jac...@gmail.com]
> Sent: Tuesday, May 03, 2016 5:47 PM
>
> I'm not sure I understand: I don't see DBI.jl nor PostgreSQL.jl at
> pkg.julialang.org, which tells me they aren't officially registered as
> packages. The difficulties you're seeing then aren't that surprising
> because they haven't felt comfortable enough to officially register.
>
> >> You're right.  I got to them through https://github.com/JuliaDB
>
> ODBC actually works fine on Linux/OSX through the unixODBC library. It can
> be installed through a variety of means, but works quite well from what
> I've seen. Postgres is actually next on my list to test with ODBC (so far
> the testing has focused
>  on Teradata, SQL Server, and MySQL), so if I find time in the next few
> days, I'll try to put together an example and share it here.
> >>  The status page indicates errors with julia 0.4 and Linux.  Does it
> matter?  Clicking on the build logs it appears the most recent build for
> 0.4, 124.4 (https://travis-ci.org/JuliaDB/ODBC.jl) succeeded, so I'm not
> sure why the problem is showing (also the previous error looks like a
> problem with the configuration of the build machine rather than a problem
> with the package).
>
> >> No joy with the alternate Postgresl package;
> https://github.com/NCarson/Postgres.jl/issues/3 has the details.
>
> >> So I guess I'll give ODBC a whirl.  Thanks  for your help.
>

RE: [julia-users] Can't add PostgreSQL

2016-05-03 Thread Jacob Quinn

I'm not sure I understand: I don't see DBI.jl nor PostgreSQL.jl at
pkg.julialang.org, which tells me they aren't officially registered as
packages. The difficulties you're seeing then aren't that surprising
because they haven't felt comfortable enough to officially register.

ODBC actually works fine on Linux/OSX through the unixODBC library. It can
be installed through a variety of means, but works quite well from what
I've seen. Postgres is actually next on my list to test with ODBC (so far
the testing has focused on Teradata, SQL Server, and MySQL), so if I find
time in the next few days, I'll try to put together an example and share it
here.

-Jacob
On May 3, 2016 5:16 PM, "Boylan, Ross" <ross.boy...@ucsf.edu> wrote:

> That's odd.  After cloning the Postgres package, and then executing the
> script from the command line, it gets further:
> $ julia trouble.jl
> ERROR: LoadError: DBI API not fully implemented
>  in fetchdf at /home/ross/.julia/v0.4/DBI/src/DBI.jl:97
>  in include at ./boot.jl:261
>  in include_from_node1 at ./loading.jl:320
>  in process_options at ./client.jl:280
>  in _start at ./client.jl:378
> while loading /home/ross/PCORI/trouble.jl, in expression starting on line 6
>
> Here are the 3 additional lines:
> stmt = prepare(conn, "SELECT isim::int, id::int, x::int, t::double
> precision, y::int, regular::int, censor::int, wait::double precision from
> obs2619 where isim=1")
> execute(stmt)
> obs = fetchdf(stmt)
>
> So the fetchdf fails because it's not implemented.
> --
> *From:* Boylan, Ross
> *Sent:* Tuesday, May 03, 2016 4:06 PM
> *To:* julia-users@googlegroups.com
> *Subject:* RE: [julia-users] Can't add PostgreSQL
>
> Thank you, Jacob, for the pointer.  I'm further along but still not
> there.  I did the clone and then build.  I'm not sure if the build was
> necessary, but it took awhile.
>
> This process did not install DBI although it's listed as a requirement,
> and  the installer gave no warning about it.  When that didn't work I
> cloned DBI.
>
> I'm surprised that packages shown on http://pkg.julialang.org/ are not
> necessarily in https://github.com/JuliaLang/METADATA.jl.  Neither DBI nor
> PostgreSQL are there.  Does it mean anything if a  package is not in
> METADATA?
>
> After all that I got
>  julia> Pkg.clone("https://github.com/JuliaDB/DBI.jl.git;)
>  INFO: Cloning DBI from https://github.com/JuliaDB/DBI.jl.git
>  INFO: Computing changes...
>
> !julia> include("/home/ross/PCORI/trouble.jl")
>
>  INFO: Precompiling module DataArrays...
>  INFO: Precompiling module DataFrames...
>  ERROR: LoadError: UndefVarError: Postgres not defined
>   in include at ./boot.jl:261
>   in include_from_node1 at ./loading.jl:320
>  while loading /home/ross/PCORI/trouble.jl, in expression starting on line
> 3
>
> trouble.jl begins
>
> using DBI
> using PostgreSQL
> conn = connect(Postgres, "localhost", "user", "word", "table")
>
> Ideas?
>
> Isn't ODBC just for MS Windows?
>
> There's also https://github.com/NCarson/Postgres.jl; an announcement said
> it didn't implement the full DBI spec.
>
> Ross
>
> --
> *From:* karbar...@gmail.com [karbar...@gmail.com] on behalf of Jacob
> Quinn [quinn.jac...@gmail.com]
> *Sent:* Tuesday, May 03, 2016 3:23 PM
> *To:* julia-users@googlegroups.com
> *Subject:* Re: [julia-users] Can't add PostgreSQL
>
> I don't think the PostgreSQL.jl package was ever officially registered.
> You could try Pkg.clone("https://github.com/JuliaDB/PostgreSQL.jl;) to
> manually download/install it; you may need to do Pkg.build("PostgreSQL") as
> well after cloning. Alternatively, you might try the latest master of
> https://github.com/JuliaDB/ODBC.jl for interacting with databases.
>
> -Jacob
>
> On Tue, May 3, 2016 at 4:19 PM, Boylan, Ross <ross.boy...@ucsf.edu> wrote:
>
>> Am I misunderstanding how things work?  Is the package only available for
>> julia 0.3?
>> I have a julia obtained via git a few days ago, on the release-0.4 branch
>> and built locally.
>> Seeing https://github.com/JuliaDB/PostgreSQL.jl I tried, from within an
>> ESS session,
>> julia> Pkg.add("PostgreSQL")
>>  ERROR: unknown package PostgreSQL
>>   in error at ./error.jl:21
>>   [inlined code] from pkg/entry.jl:49
>>   in anonymous at task.jl:447
>>   in sync_end at ./task.jl:413
>>   [inlined code] from task.jl:422
>>   in add at pkg/entry.jl:64
>>   in add at pkg/entry.jl:73
>>   in anonymous at pkg/dir.jl:31
>>   in cd at file.jl:22
>>   in cd at pkg/dir.jl:31
>>   in add at pkg.jl:23
>>
>> To see if this was a spelling mistake:
>> julia> Pkg.add("PostgresSQL")
>>  ERROR: unknown package PostgresSQL
>> ...
>>
>> I was able to add the DataFrames package and dependencies.
>>
>> This is on a current Debian system, amd64 architecture.
>
>
>

Re: [julia-users] Can't add PostgreSQL

2016-05-03 Thread Jacob Quinn

I don't think the PostgreSQL.jl package was ever officially registered. You
could try Pkg.clone("https://github.com/JuliaDB/PostgreSQL.jl;) to manually
download/install it; you may need to do Pkg.build("PostgreSQL") as well
after cloning. Alternatively, you might try the latest master of
https://github.com/JuliaDB/ODBC.jl for interacting with databases.

-Jacob

On Tue, May 3, 2016 at 4:19 PM, Boylan, Ross  wrote:

> Am I misunderstanding how things work?  Is the package only available for
> julia 0.3?
> I have a julia obtained via git a few days ago, on the release-0.4 branch
> and built locally.
> Seeing https://github.com/JuliaDB/PostgreSQL.jl I tried, from within an
> ESS session,
> julia> Pkg.add("PostgreSQL")
>  ERROR: unknown package PostgreSQL
>   in error at ./error.jl:21
>   [inlined code] from pkg/entry.jl:49
>   in anonymous at task.jl:447
>   in sync_end at ./task.jl:413
>   [inlined code] from task.jl:422
>   in add at pkg/entry.jl:64
>   in add at pkg/entry.jl:73
>   in anonymous at pkg/dir.jl:31
>   in cd at file.jl:22
>   in cd at pkg/dir.jl:31
>   in add at pkg.jl:23
>
> To see if this was a spelling mistake:
> julia> Pkg.add("PostgresSQL")
>  ERROR: unknown package PostgresSQL
> ...
>
> I was able to add the DataFrames package and dependencies.
>
> This is on a current Debian system, amd64 architecture.

[julia-users] [ANN] Major Breaking Release/Overhaul for ODBC.jl

2016-05-03 Thread Jacob Quinn

Hey everyone,

A new release is imminent for the ODBC.jl 
 package (will be released later 
today). As a quick history:


   - ODBC.jl was the first package I ever worked on with Julia (indeed, the 
   first commit is from January 2013)
   - A lot of refinements went into the package to make it basically 
   functional for the Julia 0.2/0.3 releases
   - Since an initial push on 0.3, much has languished as I've worked on 
   other projects, Julia-related and otherwise


Which brings us to today; a quick summary of the new release, which has 
been a long time coming:


   - Major overhaul of key types/methods to align with more common Julia 
   idioms and more foundationally, the DataStreams.jl 
    framework
   - No more "global" ODBC connections that are saved under the hood; you 
  now explicitly construct an `ODBC.DSN` type and pass that to all other 
  functions
  - Querying is done through constructing an `ODBC.Source` and then 
  `Data.stream!`ing it to any one of the currently supported `Data.Sink` 
  types (CSV.Sink, SQLite.Sink, Data.Table)
  - Additionally, the `ODBC.query` still exists for now constructing 
  the ODBC.Source and streaming it to a Sink in one step (a Data.Table by 
  default)
  - Execute a query without returning results via `ODBC.execute(dsn, 
  querystring)`
   - 0.4 and 0.5 compatibility (currently not tested on 0.3, but I could be 
   persuaded to add compatibility if someone desperately needs)
   - Better support across platforms, DBMS types, and DBMS data types
   - Currently tested against SQL Server, MySQL, Teradata, and SQLite for 
  all data types they support
  - Tested on Windows, OSX, and Linux (where DBMS drivers are available)
  - Tested on 0.4 and 0.5
  - Will soon be additionally tested against PostgreSQL and Oracle DBM 
  as well
  - Native support for SQL_NUMERIC and SQL_DECIMAL types through the 
  DecFP.jl  package on OSX/Linux, 
  with Windows support coming soon (pending the resolution 
  of https://github.com/stevengj/DecFP.jl/issues/10)
   - Summary of Breaking Changes:
  - ODBC.query now requires a valid DSN as a 1st argument
  - ODBC.query no longer takes an "output" keyword argument; instead, a 
  CSV.Sink can be passed as the 3rd argument to stream the results of a 
query 
  out to a CSV file
  - The ODBC.Connection and ODBC.Metadata types no longer exist
  - query is no longer exported; all types/methods are internal to the 
  module and usage now always includes the `ODBC.` prefix
   


So what should you do?

* Dive in! Feel free to report any bugs here 

* Port code over from older versions of ODBC (shouldn't be too bad really; 
it should all just work better after the port!)
* Alternatively, if you're currently relying on older versions of ODBC, you 
can run `Pkg.pin("ODBC", v"0.3.11")` to keep your code continuing to run on 
the latest stable ODBC release


Happy DBMS-ing!

-Jacob

Re: [julia-users] Cloning private package with 0.5

2016-04-24 Thread Jacob Quinn

Note that it's currently possible to clone private GitHub repos using
"personal access tokens" from Github. If you go to your profile/settings,
you can generate private access tokens that allow access to private Github
repos and then from Julia, you can clone with the following format:

Pkg.clone("https://username:to...@github.com/username/private_repo.jl;)

On Fri, Apr 22, 2016 at 5:57 PM, Eric Forgy  wrote:

> I have the same problem (on Windows) and really hoping this gets fixed
> before 0.5 is released. I have a ton of private repos.
>
> I'd prefer a solution that lets me keep my remotes at 
> g...@github.com:EricForgy/MyPrivateRepo.jl.
> It would be more than a little annoying if I have to change every one to
> https. Hopefully, Pkg can do some magic and make the changes for me if
> needed.

Re: [julia-users] promotion vs. specialization

2016-04-21 Thread Jacob Quinn

There's also been discussion around doing this specifically for keyword
arguments, which if I remember has generally had pretty good reception. It
would make the following pattern much more friendly:

f(x; parameter_arg::Nullable{Int}=Nullable{Int}()) = ...

f(x; parameter_arg = 1)

(since the 1 would be auto-converted to Nullable{Int}(1))

-Jacob


On Thu, Apr 21, 2016 at 9:20 AM, Milan Bouchet-Valat 
wrote:

> Le jeudi 21 avril 2016 à 11:00 -0400, Yichao Yu a écrit :
> > On Thu, Apr 21, 2016 at 10:55 AM, Stefan Karpinski  wrote:
> > >
> > > This is probably more of a julia-dev topic, but my gut reaction is
> that the
> > > combination of multiple dispatch and implicit conversion would be
> chaos.
> > > Following method calls can be tricky enough (much easier with Gallium,
> > > however) with just dispatch in the mix. With implicit conversion too,
> it
> > > seems like it would be nearly impossible to know what might or might
> not be
> > > called. I think it would be too easy to accidentally invoke a method
> that
> > > wasn't intended.
> > I think the proposal was to add an automatic conversion on top of the
> > dispatch. so
> >
> > f(a::Integer as Int) = ... will be effectively translated to
> > f(_a::Integer) = (a = convert(Int, _a)::Int; ...)
> For reference, this recently came up in a PR regarding 'as':
> https://github.com/JuliaLang/julia/pull/15818#issuecomment-207922230
>
>
> Regards
>
>
> > > On Thu, Apr 21, 2016 at 9:05 AM, Didier Verna
> > > wrote:
> > > >
> > > >
> > > >
> > > >   This is just an idea from the top of my head, probably wild and
> maybe
> > > >   silly. I haven't given it any serious thought.
> > > >
> > > > Given the existence of the general promotion system (which I like a
> lot,
> > > > along with other things in Julia, such as the functor capabilities),
> I'm
> > > > wondering about automatic specialization.
> > > >
> > > > What I mean is this: suppose you have a type Foo which can be
> converted
> > > > to an Int. Suppose as well that you have a function bar that only
> works
> > > > on Ints. You cannot currently call bar with a Foo, but since Foo is
> > > > convertible to an Int, it could make sense that bar() suddenly
> becomes
> > > > an applicable method, with implicit conversion...
> > > >
> > > > --
> > > > ELS'16 registration open! http://www.european-lisp-symposium.org
> > > >
> > > > Lisp, Jazz, Aïkido: http://www.didierverna.info
> > >
>

Re: [julia-users] MethodError: '+' has no method matching +(::DateTime, ::Int64)

2016-04-04 Thread Jacob Quinn

Sorry, I should have been more clear.

I was trying to express that perhaps we should document these
previously-internal methods so that they are actually a part of the
official interface/exported. They're not unsafe or anything and people may
actually have a use for these, so maybe we should just document to
everythings more clear.


On Mon, Apr 4, 2016 at 10:49 AM, Milan Bouchet-Valat <nalimi...@club.fr>
wrote:

> Le lundi 04 avril 2016 à 10:27 -0600, Jacob Quinn a écrit :
> > Hmmm.yeah, it's not ideal, I guess. Dates.day(::Integer) is
> > indeed an internal method that takes the # of Rata Die days (i.e. the
> > value of Int(Date(2015,1,1))) and returns the day of the month for
> > that Rata Die. It might be worth documenting so that it's more clear
> > what's going on when people search/help it.
> If these methods are internal, they shouldn't be added to exported
> functions. A useful convention is to prefix these with an underscore.
>
>
> Regards
>
>
> > -Jacob
> >
> > On Mon, Apr 4, 2016 at 10:02 AM, Josh Langsfeld <jdla...@gmail.com>
> > wrote:
> > > Shouldn’t Dates.day(1) be the MethodError here? It calls what
> > > appears to be an internal calculation method that happens to have
> > > the same name as the exported and documented method.
> > >
> > > On Monday, April 4, 2016 at 11:21:47 AM UTC-4, Jacob Quinn wrote:
> > >
> > > > Dates.day is the accessor funciton, returning the day of month
> > > > for a Date/DateTime.
> > > >
> > > > Dates.Day is the Period type representing the span of one day.
> > > >
> > > > So you'll want something like:
> > > >
> > > > now() + Dates.Day(1)
> > > >
> > > > -Jacob
> > > >
> > > >
> > > > On Mon, Apr 4, 2016 at 5:48 AM, Josh <josh...@gmail.com> wrote:
> > > > > When trying to increment and decrement dates I get the method
> > > > > error stated above.
> > > > >
> > > > > For example with: now() + Dates.day(1)
> > > > >
> > > > > I get the error:
> > > > >
> > > > > ERROR: MethodError: `+` has no method matching +(::DateTime,
> > > > > ::Int64)
> > > > > Closest candidates are:
> > > > >   +(::Any, ::Any, ::Any, ::Any...)
> > > > >   +(::Int64, ::Int64)
> > > > >   +(::Complex{Bool}, ::Real)
> > > > >   ...
> > > > >
> > > > > But with doing something like this: Date(2015,12,25) - today()
> > > > >
> > > > > I get the correct result with no error.
> > > > >
> > > > > Any ideas?
> > > > > Thanks
> > > > >
> > > > >
> > > >
>

Re: [julia-users] MethodError: '+' has no method matching +(::DateTime, ::Int64)

2016-04-04 Thread Jacob Quinn

Hmmm.yeah, it's not ideal, I guess. Dates.day(::Integer) is indeed an
internal method that takes the # of Rata Die days (i.e. the value of
Int(Date(2015,1,1))) and returns the day of the month for that Rata Die. It
might be worth documenting so that it's more clear what's going on when
people search/help it.

-Jacob

On Mon, Apr 4, 2016 at 10:02 AM, Josh Langsfeld <jdla...@gmail.com> wrote:

> Shouldn’t Dates.day(1) be the MethodError here? It calls what appears to
> be an internal calculation method that happens to have the same name as the
> exported and documented method.
>
> On Monday, April 4, 2016 at 11:21:47 AM UTC-4, Jacob Quinn wrote:
>
> Dates.day is the accessor funciton, returning the day of month for a
>> Date/DateTime.
>>
>> Dates.Day is the Period type representing the span of one day.
>>
>> So you'll want something like:
>>
>> now() + Dates.Day(1)
>>
>> -Jacob
>>
>>
>> On Mon, Apr 4, 2016 at 5:48 AM, Josh <josh...@gmail.com> wrote:
>>
>>> When trying to increment and decrement dates I get the method error
>>> stated above.
>>>
>>> For example with: now() + Dates.day(1)
>>>
>>> I get the error:
>>>
>>> *ERROR: MethodError: `+` has no method matching +(::DateTime, ::Int64)*
>>>
>>> Closest candidates are:
>>>
>>>   +(::Any, ::Any, *::Any*, *::Any...*)
>>>
>>>   +(*::Int64*, ::Int64)
>>>
>>>   +(*::Complex{Bool}*, ::Real)
>>>
>>>   ...
>>>
>>> But with doing something like this: Date(2015,12,25) - today()
>>>
>>> I get the correct result with no error.
>>>
>>> Any ideas?
>>> Thanks
>>>
>>>
>> 
>

Re: [julia-users] MethodError: '+' has no method matching +(::DateTime, ::Int64)

2016-04-04 Thread Jacob Quinn

Dates.day is the accessor funciton, returning the day of month for a
Date/DateTime.

Dates.Day is the Period type representing the span of one day.

So you'll want something like:

now() + Dates.Day(1)

-Jacob


On Mon, Apr 4, 2016 at 5:48 AM, Josh  wrote:

> When trying to increment and decrement dates I get the method error stated
> above.
>
> For example with: now() + Dates.day(1)
>
> I get the error:
>
> *ERROR: MethodError: `+` has no method matching +(::DateTime, ::Int64)*
>
> Closest candidates are:
>
>   +(::Any, ::Any, *::Any*, *::Any...*)
>
>   +(*::Int64*, ::Int64)
>
>   +(*::Complex{Bool}*, ::Real)
>
>   ...
>
> But with doing something like this: Date(2015,12,25) - today()
>
> I get the correct result with no error.
>
> Any ideas?
> Thanks
>
>

Re: [julia-users] DataFrame from string

2016-03-21 Thread Jacob Quinn

You should be able to wrap the string in an IOBuffer, which satisfies the
general IO interface.

e.g.

io = IOBuffer(csv)
readtable(io)

-Jacob

On Mon, Mar 21, 2016 at 9:28 AM, jw3126  wrote:

> I have a string which is secretly a csv like
> """
> 1, 7.6
> 2, 45.6
> 3, 12.1
> ...
> """
>
> I want to turn it into a data frame. I guess I have to use readtable
> ,
> however readtable accepts only IOStreams (or strings that are filepaths)
> and I don't know how to feed the string in a sane way to it.
>

Re: [julia-users] Inverse of `bits`?

2016-02-16 Thread Jacob Quinn

There is the parse(*type*, *str*[, *base*]) function that will parse
*integers* of a given base (link to doc:
http://docs.julialang.org/en/release-0.4/stdlib/numbers/?highlight=binary#Base.parse),
but floats are always parsed as decimal. But using reinterpret, you could do

julia> str = bits(1.0f0)
"00111000"

julia> int = parse(Int32,str,2)
1065353216

julia> reinterpret(Float32,int)
1.0f0

-Jacob

On Tue, Feb 16, 2016 at 2:07 PM, Sheehan Olver 
wrote:

>
>
> I'm lecturing a course that will include basic machine arithmetic, and the
> bits function looks very helpful.  But is there a way to go back?  I.e., I
> want a frombits to do:
>
>
> str=bits(1.0f0)# returns *"00111000"*
> *frombits(Float32,str)  # should return 1.0f0*
>
>
>

Re: [julia-users] Re: Nullable{Date}

2016-02-08 Thread Jacob Quinn

The problem with that proposition is that it introduces type instability.
i.e. the user would be tempted to write code like Michael's original
example like

LeapDay(yr) = isLeapYr(yr) ? Date(yr,2,29) : Nullable{Date}()

where the function `LeapDay` can actually return two different, distinct
types: `Date` or `Nullable{Date}`. Those familiar with efficient codegen
know that these kinds of type instabilities kill code performance.

-Jacob

On Mon, Feb 8, 2016 at 8:55 PM, Christopher Alexander 
wrote:

> I really like that construction!
>
>
> On Monday, February 8, 2016 at 10:49:41 PM UTC-5, Greg Plowman wrote:
>>
>> If only Nullables can be null, could we formally define this?
>>
>> isnull(x::Nullable) = x.isnull # already defined in nullable.jl
>> isnull(x) = false  # extra definition for everything else
>>
>>
>>
>>> isnull( lp15 ) --> true
>>> isnull( lp16 ) -->  MethodError: `isnull` has no method matching isnull(
>>> ::Date )
>>>
>>
>>

Re: [julia-users] load a Julia dataframe from Microsoft SQL Server table

2016-02-04 Thread Jacob Quinn

That's a big part of the "remodel" I've been working on to always call the
"W" version of the functions and use UTF8 consistently (see here:
https://github.com/JuliaDB/ODBC.jl/blob/jq/remodel/src/API.jl). I would
certainly welcome those willing to test various configurations/setups.

-Jacob

On Thu, Feb 4, 2016 at 2:28 PM, Stefan Karpinski 
wrote:

> On Thu, Feb 4, 2016 at 1:50 PM, Scott Jones 
> wrote:
>
>>
>> This still doesn't explain why some drivers are accepting UCS-2/UTF-16
>>> when called with the non-Unicode API.
>>>
>>
>> When you do so, are you actually calling the functions with the A, or
>> just the macro without either A or W?
>> The macro will compile to either the A or the W form, depending on how
>> your application is built.
>>
>> This is a better page in MSDN:
>> https://msdn.microsoft.com/en-us/library/ms712612(v=vs.85).aspx describing
>> what is going on.
>>
>
> The ODBC package calls the functions without A or W. What it's calling
> can't be a macro since macros aren't callable via ccall. But changing ODBC
> to call the W version of everything may be the fix here.
>

Re: [julia-users] How to convert Int64 to Date

2016-01-23 Thread Jacob Quinn

The easiest way right now is to do

Date(Dates.UTD(735685))

It's a bit awkward, and I think we should have a better default option.
Perhaps we should define `convert(::Type{Date}, x::Int64)` defined for this.

On Sat, Jan 23, 2016 at 9:36 AM, Min-Woong Sohn  wrote:

> I want to convert an Int64 value 735685 to Date type (2016-1-22).
> convert(Date,735685) does not work.  Does anyone know how to do this?
>
>

Re: [julia-users] How to convert Int64 to Date

2016-01-23 Thread Jacob Quinn

I've created a pull request to clarify the usage here and add documentation.

https://github.com/JuliaLang/julia/pull/14775

-Jacob

On Sat, Jan 23, 2016 at 9:36 AM, Min-Woong Sohn  wrote:

> I want to convert an Int64 value 735685 to Date type (2016-1-22).
> convert(Date,735685) does not work.  Does anyone know how to do this?
>
>

Re: [julia-users] Get the OS

2015-12-08 Thread Jacob Quinn

See here:
http://docs.julialang.org/en/latest/stdlib/base/?highlight=windows#Base.@windows

There are also the

@windows_only ...code...
@unix_only ...code
@osx_only ...code
@linux_only code...

Or the variable name `OS_NAME` that will return the OS name as a symbol.

Plenty of options :)

On Tue, Dec 8, 2015 at 1:46 PM, Ben Ward  wrote:

> Hi,
>
> How can I check the OS in Julia? I thought it may be in the Getting Around
> section of the reference manual but it's not. Basically I'd like to check
> the OS to see if I should write to /dev/null or just NUL in the case of
> windows.
>
> Thanks,
> Ben.
>

Re: [julia-users] using a JSON REST API from Julia

2015-11-14 Thread Jacob Quinn

You got it. Requests.jl is the current standard, unless you felt more
comfortable calling a python library with PyCall. Requests.jl has gotten a
lot of love lately from Jon Malmaud and is 0.4 ready.

-Jacob

On Sat, Nov 14, 2015 at 7:10 AM, Christof Stocker <
stocker.chris...@gmail.com> wrote:

> What is the current best practice to handle JSON and to use a REST API
> from Julia?
>
> From preliminary search I would assume JuliaLang/JSON.jl and
> JuliaWeb/Requests.jl, but I am unsure
>
> Doe anyone have any suggestions what to look at?
>

Re: [julia-users] Enums in Julia

2015-10-30 Thread Jacob Quinn

Eric,

Currently in Julia, officially supported Enums can only take on integer
values. This excludes the ability to use an arbitrary type as values for
enum members. You could, however, use enums as a part of a solution:

@enum Country Brazil China Canada USA etc.

immutable CountryData
field1::Int
field2::ASCIIString
end

const COUNTRYDATA = [CountryData(1,"hey"), CountryData(2,"there"), ...]

Then usage would be along the lines of

function getfield1(x::Country)
return COUNTRYDATA[x]
end



On Fri, Oct 30, 2015 at 4:41 PM, Eric Forgy  wrote:

> Hi Mauro,
>
> Thank you for your response and sorry I did not ask very clearly. Let me
> try again.
>
> I am considering creating a composite immutable type. I know there will
> always only be a finite number of them, e.g. Country, and I'd like to just
> create them in the beginning of my code. Each Country will have a set of
> fields I need.
>
> I've never used an enumerable type before, but they seem handy when you
> know there is only going to be a finite number of them like Weekdays or
> something.
>
> Chronologically, I created my composite immutable type, Country, with
> fields I need for each country and then had the idea to make Country an
> Enum.
>
> So my question is: Can I make Country an Enum and still allow each country
> to have its own data like a composite immutable type?
>
> For example,
>
> @enum Country countryA countryB
>
> And then be able to access data like
>
> countryA.population
>
> ?
>
> Sorry again for not asking a clear question. As often happens, if I
> understood well enough the language and what I'm try to do with it so that
> I could ask a clear question, then, ironically, I probably would not need
> to ask the question :)
>
> Sent from my iPhone
>
> > On 30 Oct 2015, at 9:13 PM, Mauro  wrote:
> >
> > I don't quite understand what you fail to achieve.  Using other values
> > than integers for the enum does not work, if that is the question.
> >
> > If you just try to make your custom enum, then it cannot be used with
> > the @enum macro.
> >
> > immutable MyEnum
> >field1::Type1
> >filed2::Type2
> > end
> >
> > # no need/use for @enum:
> > enum1 = MyEnum(f11,f12)
> > enum2 = MyEnum(f21,f22)
> >
> > But it seems a bit strange to have an enumeration which uses two values,
> > as an enumeration suggests that there is a mapping to the integers!
> >
> > Mauro
> >
> >> On Fri, 2015-10-30 at 09:37, Eric Forgy  wrote:
> >> I am thinking about making an Enum type MyEnum, but each MyEnum is a
> >> composite immutable type. Is that possible (recommended) and how could
> I do
> >> that?
> >>
> >> I've looked at
> >>
> >>   - https://github.com/JuliaLang/julia/pull/10168
> >>   - And the @enum section of Docs
> >>
> >> but it still isn't obvious to me yet how to do it or if it is even
> >> possible/recommended.
> >>
> >> In psuedo-code, I'd like something like this:
> >>
> >> @enum MyEnum enum1 enum2 enum3
> >>
> >> but each enum[i] is a composite immutable type. How would I construct
> them?
> >>
> >> I thought of something like:
> >>
> >> immutable MyEnum
> >>field1::Type1
> >>filed2::Type2
> >> end
> >>
> >> @enum MyEnum enum1 = MyEnum(f11,f12) enum2 = MyEnum(f21,f22)
> >>
> >> but I don't think that will work.
> >>
> >> Any ideas?
> >>
> >> Thank you.
> >>
> >>
> >>> On Tuesday, January 27, 2015 at 4:55:18 AM UTC+8, Reid Atcheson wrote:
> >>>
> >>> Ok now that you have put it there, the comments in the documentation
> make
> >>> more sense to me. It looks like both yours and mine are essentially
> >>> equivalent, but yours is simpler. I was aiming for the following
> behavior
> >>> with my implementation:
> >>>
> >>> - different enum types won't typecheck (can't do "if e1::EnumType1 ==
> >>> e2::EnumType2" without error).
> >>> - Array of enum type values contiguous in memory, for easy passing to
> C.
> >>> (immutable should do this)
> >>> - referring to enum fields by name, not by their numbering.
> >>>
> >>> I will switch to what you have written, it looks like it hits all of my
> >>> points while not simultaneously abusing the type system.
> >>>
>  On Monday, January 26, 2015 at 2:16:33 PM UTC-6, Mauro wrote:
> 
>  There is a FAQ entry on this which suggests not to use types each of
>  elements of the enum (if I recall correctly).
> 
>  I recently did a enum like this:
> 
>  export nonstiff, mildlystiff, stiff
>  abstract Enum
>  immutable Stiff <: Enum
> val::Int
> function Stiff(i::Integer)
> @assert 0<=i<=2
> new(i)
> end
>  end
>  const nonstiff = Stiff(0)
>  const mildlystiff = Stiff(1)
>  const stiff = Stiff(2)
> 
>  Then you can just do
>  if someflag==nonstiff
>    do_something
>  end
> 
>  So no need either to refer to the numeral value.  But I'm not sure
>  whether this is better or not.
> 
> >

Re: [julia-users] Re: [ANN] DataStreams.jl, CSV.jl, SQLite.jl New Releases

2015-10-27 Thread Jacob Quinn

\libjulia.dll
> (unknow
> n line)
> anonymous at none:2
> jl_eval_with_compiler_p at C:\Program
> Files\Juno\resources\app\julia\bin\libjuli
> a.dll (unknown line)
> jl_f_tuple at C:\Program Files\Juno\resources\app\julia\bin\libjulia.dll
> (unknow
> n line)
> process_options at client.jl:284
> _start at client.jl:411
> jlcall__start_1845 at  (unknown line)
> jl_apply_generic at C:\Program
> Files\Juno\resources\app\julia\bin\libjulia.dll (
> unknown line)
> unknown function (ip: 004018D0)
> unknown function (ip: 0040282B)
> unknown function (ip: 0040140C)
> unknown function (ip: 0040153B)
> BaseThreadInitThunk at C:\Windows\system32\kernel32.dll (unknown line)
> RtlUserThreadStart at C:\Windows\SYSTEM32\ntdll.dll (unknown line)
> ERROR: LoadError: Failed to precompile DataStreams to
> C:\Users\workstation\.juli
> a\lib\v0.4\DataStreams.ji
>  in error at error.jl:21
> while loading C:\Users\workstation\.julia\SQLite\src\SQLite.jl, in
> expression st
> arting on line 1
> ERROR: Failed to precompile SQLite to
> C:\Users\workstation\.julia\lib\v0.4\SQLit
> e.ji
>  in error at error.jl:21
>
> julia>
>
>
> Am Dienstag, 27. Oktober 2015 06:25:50 UTC+1 schrieb Jacob Quinn:
>>
>> Hey everyone,
>>
>> I know it's been mentioned here and there, but now it's official: two new
>> packages have been officially released for 0.4, DataStreams.jl and CSV.jl.
>> SQLite.jl has also gone through a big overhaul to modernize the code and
>> rework the data processing interface.
>>
>> DataStreams.jl is a new package with a lofty goal and not a lot of code.
>> It aims to put forth a data ingestion/processing framework that can be used
>> by all types of data-reader/ingestion/source/sink/writer type packages. The
>> basic idea is that for a type of data source, defining a `Source` and
>> `Sink` types, and then implementing the various combinations of
>> `Data.stream!(::Source, ::Sink)` methods that make sense. For example,
>> CSV.jl and SQLite.jl now both have `Source` and `Sink` types, and I've
>> simply defined the following methods between the two packages:
>>
>> Data.stream!(source::CSV.Source, sink::SQLite.Sink)  =>  parse a CSV file
>> represented by `source` directly into the SQLite table represented by `sink`
>> Data.stream!(source::SQLite.Source, sink::CSV.Sink)  =>  fetch the SQLite
>> table represented by `source` directly out to a CSV file represented by
>> `sink`
>>
>> The DataStreams.jl package also defines a `Data.Table` type which is
>> simply:
>>
>> type Table{T}
>> schema::Data.Schema
>> data::T
>> end
>>
>> this is meant as a "backend-agnostic" kind of type that represents an
>> in-memory Julia structure. Currently the default constructors put a
>> `Vector{NullableVector}` as the `.data` field, but it could really be
>> anything you wanted (e.g. DataFrame, Matrix, etc.). The aim of `Data.Table`
>> certainly isn't to replace something like DataFrames, but rather to act as
>> a default "pure julia type" with the DataStreams.jl framework. Indeed, to
>> do a non-copying convert of a `Data.Table` to a `DataFrame` is just:
>> `DataFrame(dt::Data.Table)`.
>>
>> You can see more details in the blog post I wrote up here:
>> http://julialang.org/blog/2015/10/datastreams/
>>
>> A big thanks to a number of people as well who have helped encourage and
>> develop these packages with me. I truly love the community and caliber of
>> people around here and just want to say thanks.
>>
>> DataStreams.jl: https://github.com/JuliaDB/DataStreams.jl
>> CSV.jl: https://github.com/JuliaDB/CSV.jl
>> SQLite.jl: https://github.com/JuliaDB/SQLite.jl
>>
>> -Jacob
>>
>

Re: [julia-users] 900mb csv loading in Julia failed: memory comparison vs python pandas and R

2015-10-26 Thread Jacob Quinn

Just a quick follow-up here: after some benchmarking of my own on a windows
machine, the culprit ended up being a deathly slow `strtod` system library
function on windows. It takes a few hoops to get the performance right,
which I discovered is already done in Base Julia, it just wasn't exported.

My PR to Base Julia <https://github.com/JuliaLang/julia/pull/13641> has
been accepted and is backport pending, so once Julia 0.4.1 is released,
CSV.jl will be updated to use the new code and will require that version of
Julia to enable similar great performance cross-platform.

-Jacob

On Wed, Oct 14, 2015 at 3:51 AM, bernhard <kafis...@gmail.com> wrote:

> with readtable the julia process goes up to 6.3 GB and stays there. It
> takes 95 seconds. (@time shows "373M, allocations: 13GB, 7% GC time")
> I will try Jacob's approach again.
>
>
> Am Mittwoch, 14. Oktober 2015 10:59:06 UTC+2 schrieb Milan Bouchet-Valat:
>>
>> Le mercredi 14 octobre 2015 à 00:15 -0700, Grey Marsh a écrit :
>> > Done with the testing in the cloud instance.
>> > It works and the timings in my case
>> >
>> > 58.346345 seconds (694.00 M allocations: 12.775 GB, 2.63% gc time)
>> >
>> > result of "top" command:  VIRT: 11.651g RES: 3.579g
>> >
>> > ~13gb memory for a 900mb file!
>> > Thanks to Jacob atleast I was able check that the process works.
>> As Yichao noted, at no point in the import did Julia use 13GB of RAM.
>> That's the total amount of memory that was allocated and freed by
>> pieces (694M of them). You'd need to watch the Julia process while
>> working to see what's the maximum value of RES when importing.
>>
>>
>> Regards
>>
>> > On Wednesday, October 14, 2015 at 12:10:02 PM UTC+5:30, bernhard
>> > wrote:
>> > > Jacob
>> > >
>> > > I do run into the same issue as Grey. the step
>> > > ds = DataStreams.DataTable(f);
>> > > gets stuck.
>> > > I also tried this with a smaller file (150MB) which I have. This
>> > > file is read by readtable in 15s. But the DataTable function
>> > > freezes. I use 0.4 on Windows 7.
>> > >
>> > > I note that your code did work on a tiny file though (40 lines or
>> > > so).
>> > > I do get a dataframe, but when I show it (by simply typing df, or
>> > > dump(df)) Julia crashes...
>> > >
>> > > Bernhard
>> > >
>> > >
>> > > Am Mittwoch, 14. Oktober 2015 06:54:16 UTC+2 schrieb Grey Marsh:
>> > > > I am using Julia 0.4 for this purpose, if that's what is meant by
>> > > > "0.4 only".
>> > > >
>> > > > On Wednesday, October 14, 2015 at 9:53:09 AM UTC+5:30, Jacob
>> > > > Quinn wrote:
>> > > > > Oh yes, I forgot to mention that the CSV/DataStreams code is
>> > > > > 0.4 only. Definitely interested to hear about any
>> > > > > results/experiences though.
>> > > > >
>> > > > > -Jacob
>> > > > >
>> > > > > On Tue, Oct 13, 2015 at 10:11 PM, Yichao Yu <yyc...@gmail.com>
>> > > > > wrote:
>> > > > > > On Wed, Oct 14, 2015 at 12:02 AM, Grey Marsh <
>> > > > > > kd.k...@gmail.com> wrote:
>> > > > > > > @Jacob, I tried your approach. Somehow it got stuck in the
>> > > > > > "@time ds =
>> > > > > > > DataStreams.DataTable(f)" line. After 15 minutes running,
>> > > > > > julia is using
>> > > > > > > ~500mb and 1 cpu core with no sign of end. The memory use
>> > > > > > has been almost
>> > > > > > > same for the whole duration of 15 minutes. I'm letting it
>> > > > > > run, hoping that
>> > > > > > > it finishes after some time.
>> > > > > > >
>> > > > > > > From your run, I can see it needs 12gb memory which is
>> > > > > > higher than my
>> > > > > > > machine memory of 8gb. could it be the problem?
>> > > > > >
>> > > > > > 12GB is the total number of memory ever allocated during the
>> > > > > > timing. A
>> > > > > > lot of them might be intermediate results that are freed by
>> > > > > > the GC.
>> > > > > > Also, from the output of @time, it looks like 0.4.
>> > > > > >

[julia-users] [ANN] DataStreams.jl, CSV.jl, SQLite.jl New Releases

2015-10-26 Thread Jacob Quinn

Hey everyone,

I know it's been mentioned here and there, but now it's official: two new 
packages have been officially released for 0.4, DataStreams.jl and CSV.jl. 
SQLite.jl has also gone through a big overhaul to modernize the code and 
rework the data processing interface.

DataStreams.jl is a new package with a lofty goal and not a lot of code. It 
aims to put forth a data ingestion/processing framework that can be used by 
all types of data-reader/ingestion/source/sink/writer type packages. The 
basic idea is that for a type of data source, defining a `Source` and 
`Sink` types, and then implementing the various combinations of 
`Data.stream!(::Source, ::Sink)` methods that make sense. For example, 
CSV.jl and SQLite.jl now both have `Source` and `Sink` types, and I've 
simply defined the following methods between the two packages:

Data.stream!(source::CSV.Source, sink::SQLite.Sink)  =>  parse a CSV file 
represented by `source` directly into the SQLite table represented by `sink`
Data.stream!(source::SQLite.Source, sink::CSV.Sink)  =>  fetch the SQLite 
table represented by `source` directly out to a CSV file represented by 
`sink`

The DataStreams.jl package also defines a `Data.Table` type which is simply:

type Table{T}
schema::Data.Schema
data::T
end

this is meant as a "backend-agnostic" kind of type that represents an 
in-memory Julia structure. Currently the default constructors put a 
`Vector{NullableVector}` as the `.data` field, but it could really be 
anything you wanted (e.g. DataFrame, Matrix, etc.). The aim of `Data.Table` 
certainly isn't to replace something like DataFrames, but rather to act as 
a default "pure julia type" with the DataStreams.jl framework. Indeed, to 
do a non-copying convert of a `Data.Table` to a `DataFrame` is just: 
`DataFrame(dt::Data.Table)`.

You can see more details in the blog post I wrote up 
here: http://julialang.org/blog/2015/10/datastreams/

A big thanks to a number of people as well who have helped encourage and 
develop these packages with me. I truly love the community and caliber of 
people around here and just want to say thanks.

DataStreams.jl: https://github.com/JuliaDB/DataStreams.jl
CSV.jl: https://github.com/JuliaDB/CSV.jl
SQLite.jl: https://github.com/JuliaDB/SQLite.jl

-Jacob

Re: [julia-users] What does sizehint! do, exactly?

2015-10-21 Thread Jacob Quinn

The way I came to understand was to just take a peak at the [source code](
https://github.com/JuliaLang/julia/blob/ae154d076a6ae75bfdb9a0a377a6a5f9b0e1096f/src/array.c#L670);
I find it pretty legible. The basic idea is that the underlying "storage"
of a Julia Array{T,N} can actually be (and often is) different than the
size(A) in Julia. sizehint! modifies that underlying storage without
changing the size(A) in Julia.

-Jacob

On Wed, Oct 21, 2015 at 12:46 PM, Seth  wrote:

> I know it's good to use sizehint! with an estimate of the sizes of
> (variable-length) containers such as vectors, but I have a couple of
> questions I'm hoping someone could answer:
>
> 1) what are the benefits of using sizehint!? (How does it work, and under
> what circumstances is it beneficial?)
> 2) what are the implications (positive/negative, if any) of overestimating
> the size of a container?
>
> Thanks.
>

Re: [julia-users] What does sizehint! do, exactly?

2015-10-21 Thread Jacob Quinn

I believe it stays allocated until the object is removed. There's an old
issue about providing a way to "shrink" the underlying storage:
https://github.com/JuliaLang/julia/issues/2879

-Jacob

On Wed, Oct 21, 2015 at 1:26 PM, Seth <catch...@bromberger.com> wrote:

> Thanks, Jacob and Stefan. What happens if you overestimate? Is the
> allocated-but-not-used memory eventually freed, or is it tied up until the
> object gets removed?
>
> On Wednesday, October 21, 2015 at 12:18:28 PM UTC-7, Stefan Karpinski
> wrote:
>>
>> If you expect that you're going to have to push a lot of values onto a
>> vector, you can avoid the cost of incremental reallocation by doing it once
>> up front.
>>
>> On Wednesday, October 21, 2015, Jacob Quinn <quinn@gmail.com> wrote:
>>
>>> The way I came to understand was to just take a peak at the [source
>>> code](
>>> https://github.com/JuliaLang/julia/blob/ae154d076a6ae75bfdb9a0a377a6a5f9b0e1096f/src/array.c#L670);
>>> I find it pretty legible. The basic idea is that the underlying "storage"
>>> of a Julia Array{T,N} can actually be (and often is) different than the
>>> size(A) in Julia. sizehint! modifies that underlying storage without
>>> changing the size(A) in Julia.
>>>
>>> -Jacob
>>>
>>> On Wed, Oct 21, 2015 at 12:46 PM, Seth <c...@bromberger.com> wrote:
>>>
>>>> I know it's good to use sizehint! with an estimate of the sizes of
>>>> (variable-length) containers such as vectors, but I have a couple of
>>>> questions I'm hoping someone could answer:
>>>>
>>>> 1) what are the benefits of using sizehint!? (How does it work, and
>>>> under what circumstances is it beneficial?)
>>>> 2) what are the implications (positive/negative, if any) of
>>>> overestimating the size of a container?
>>>>
>>>> Thanks.
>>>>
>>>
>>>

Re: [julia-users] What does sizehint! do, exactly?

2015-10-21 Thread Jacob Quinn

I think that's why there's an open issue :)

On Wed, Oct 21, 2015 at 3:29 PM, Seth <catch...@bromberger.com> wrote:

> Is there a reason Julia doesn't use jl_array_del_end as a means to allow
> users to shrink the allocation for an array (either within sizehint! or as
> a separate function)?
>
> On Wednesday, October 21, 2015 at 12:32:04 PM UTC-7, Jacob Quinn wrote:
>>
>> I believe it stays allocated until the object is removed. There's an old
>> issue about providing a way to "shrink" the underlying storage:
>> https://github.com/JuliaLang/julia/issues/2879
>>
>> -Jacob
>>
>> On Wed, Oct 21, 2015 at 1:26 PM, Seth <catc...@bromberger.com> wrote:
>>
>>> Thanks, Jacob and Stefan. What happens if you overestimate? Is the
>>> allocated-but-not-used memory eventually freed, or is it tied up until the
>>> object gets removed?
>>>
>>> On Wednesday, October 21, 2015 at 12:18:28 PM UTC-7, Stefan Karpinski
>>> wrote:
>>>>
>>>> If you expect that you're going to have to push a lot of values onto a
>>>> vector, you can avoid the cost of incremental reallocation by doing it once
>>>> up front.
>>>>
>>>> On Wednesday, October 21, 2015, Jacob Quinn <quinn@gmail.com>
>>>> wrote:
>>>>
>>>>> The way I came to understand was to just take a peak at the [source
>>>>> code](
>>>>> https://github.com/JuliaLang/julia/blob/ae154d076a6ae75bfdb9a0a377a6a5f9b0e1096f/src/array.c#L670
>>>>> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fblob%2Fae154d076a6ae75bfdb9a0a377a6a5f9b0e1096f%2Fsrc%2Farray.c%23L670=D=1=AFQjCNHQ3e_imUCFF2rvxf4NzFs3DTZxJQ>);
>>>>> I find it pretty legible. The basic idea is that the underlying "storage"
>>>>> of a Julia Array{T,N} can actually be (and often is) different than the
>>>>> size(A) in Julia. sizehint! modifies that underlying storage without
>>>>> changing the size(A) in Julia.
>>>>>
>>>>> -Jacob
>>>>>
>>>>> On Wed, Oct 21, 2015 at 12:46 PM, Seth <c...@bromberger.com> wrote:
>>>>>
>>>>>> I know it's good to use sizehint! with an estimate of the sizes of
>>>>>> (variable-length) containers such as vectors, but I have a couple of
>>>>>> questions I'm hoping someone could answer:
>>>>>>
>>>>>> 1) what are the benefits of using sizehint!? (How does it work, and
>>>>>> under what circumstances is it beneficial?)
>>>>>> 2) what are the implications (positive/negative, if any) of
>>>>>> overestimating the size of a container?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>
>>>>>
>>

Re: [julia-users] 900mb csv loading in Julia failed: memory comparison vs python pandas and R

2015-10-13 Thread Jacob Quinn

Oh yes, I forgot to mention that the CSV/DataStreams code is 0.4 only.
Definitely interested to hear about any results/experiences though.

-Jacob

On Tue, Oct 13, 2015 at 10:11 PM, Yichao Yu <yyc1...@gmail.com> wrote:

> On Wed, Oct 14, 2015 at 12:02 AM, Grey Marsh <kd.kg...@gmail.com> wrote:
> > @Jacob, I tried your approach. Somehow it got stuck in the "@time ds =
> > DataStreams.DataTable(f)" line. After 15 minutes running, julia is using
> > ~500mb and 1 cpu core with no sign of end. The memory use has been almost
> > same for the whole duration of 15 minutes. I'm letting it run, hoping
> that
> > it finishes after some time.
> >
> > From your run, I can see it needs 12gb memory which is higher than my
> > machine memory of 8gb. could it be the problem?
>
> 12GB is the total number of memory ever allocated during the timing. A
> lot of them might be intermediate results that are freed by the GC.
> Also, from the output of @time, it looks like 0.4.
>
> >
> > On Wednesday, October 14, 2015 at 2:28:09 AM UTC+5:30, Jacob Quinn wrote:
> >>
> >> I'm hesitant to suggest, but if you're in a bind, I have an experimental
> >> package for fast CSV reading. The API has stabilized somewhat over the
> last
> >> week and I'm planning a more broad release soon, but I'd still consider
> it
> >> alpha mode. That said, if anyone's willing to give it a drive, you just
> need
> >> to
> >>
> >> Pkg.add("Libz")
> >> Pkg.add("NullableArrays")
> >> Pkg.clone("https://github.com/quinnj/DataStreams.jl;)
> >> Pkg.clone("https://github.com/quinnj/CSV.jl;)
> >>
> >> With the original file referenced here I get:
> >>
> >> julia> reload("CSV")
> >>
> >> julia> f = CSV.Source("/Users/jacobquinn/Downloads/train.csv";null="NA")
> >> CSV.Source: "/Users/jacobquinn/Downloads/train.csv"
> >> delim: ','
> >> quotechar: '"'
> >> escapechar: '\\'
> >> null: "NA"
> >> schema:
> >>
> DataStreams.Schema(UTF8String["ID","VAR_0001","VAR_0002","VAR_0003","VAR_0004","VAR_0005","VAR_0006","VAR_0007","VAR_0008","VAR_0009"
> >> …
> >>
> "VAR_1926","VAR_1927","VAR_1928","VAR_1929","VAR_1930","VAR_1931","VAR_1932","VAR_1933","VAR_1934","target"],[Int64,DataStreams.PointerString,Int64,Int64,Int64,DataStreams.PointerString,Int64,Int64,DataStreams.PointerString,DataStreams.PointerString
> >> …
> >>
> Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,DataStreams.PointerString,Int64],145231,1934)
> >> dateformat: Base.Dates.DateFormat(Base.Dates.Slot[],"","english")
> >>
> >>
> >> julia> @time ds = DataStreams.DataTable(f)
> >>  43.513800 seconds (694.00 M allocations: 12.775 GB, 2.55% gc time)
> >>
> >>
> >> You can convert the result to a DataFrame with:
> >>
> >> function DataFrames.DataFrame(dt::DataStreams.DataTable)
> >> cols = dt.schema.cols
> >> data = Array(Any,cols)
> >> types = DataStreams.types(dt)
> >> for i = 1:cols
> >> data[i] = DataStreams.column(dt,i,types[i])
> >> end
> >> return DataFrame(data,Symbol[symbol(x) for x in dt.schema.header])
> >> end
> >>
> >>
> >> -Jacob
> >>
> >> On Tue, Oct 13, 2015 at 2:40 PM, feza <moham...@gmail.com> wrote:
> >>>
> >>> Finally was able to load it, but the process   consumes a ton of
> memory.
> >>> julia> @time train = readtable("./test.csv");
> >>> 124.575362 seconds (376.11 M allocations: 13.438 GB, 10.77% gc time)
> >>>
> >>>
> >>>
> >>> On Tuesday, October 13, 2015 at 4:34:05 PM UTC-4, feza wrote:
> >>>>
> >>>> Same here on a 12gb ram machine
> >>>>
> >>>>_
> >>>>_   _ _(_)_ |  A fresh approach to technical computing
> >>>>   (_) | (_) (_)|  Documentation: http://docs.julialang.org
> >>>>_ _   _| |_  __ _   |  Type "?help" for help.
> >>>>   | | | | | | |/ _` |  |
> >>>>   | | |_| | | | (_| |  |  Version 0.5.0-dev+429 (2015-09-29 09:47 UTC)
> >>>>  _/ |\__'_|_|_|\__'_|  |  C

Re: [julia-users] 900mb csv loading in Julia failed: memory comparison vs python pandas and R

2015-10-13 Thread Jacob Quinn

I'm hesitant to suggest, but if you're in a bind, I have an experimental
package for fast CSV reading. The API has stabilized somewhat over the last
week and I'm planning a more broad release soon, but I'd still consider it
alpha mode. That said, if anyone's willing to give it a drive, you just
need to

Pkg.add("Libz")
Pkg.add("NullableArrays")
Pkg.clone("https://github.com/quinnj/DataStreams.jl;)
Pkg.clone("https://github.com/quinnj/CSV.jl;)

With the original file referenced here I get:

julia> reload("CSV")

julia> f = CSV.Source("/Users/jacobquinn/Downloads/train.csv";null="NA")
CSV.Source: "/Users/jacobquinn/Downloads/train.csv"
delim: ','
quotechar: '"'
escapechar: '\\'
null: "NA"
schema:
DataStreams.Schema(UTF8String["ID","VAR_0001","VAR_0002","VAR_0003","VAR_0004","VAR_0005","VAR_0006","VAR_0007","VAR_0008","VAR_0009"
 …
 
"VAR_1926","VAR_1927","VAR_1928","VAR_1929","VAR_1930","VAR_1931","VAR_1932","VAR_1933","VAR_1934","target"],[Int64,DataStreams.PointerString,Int64,Int64,Int64,DataStreams.PointerString,Int64,Int64,DataStreams.PointerString,DataStreams.PointerString
 …
 
Int64,Int64,Int64,Int64,Int64,Int64,Int64,Int64,DataStreams.PointerString,Int64],145231,1934)
dateformat: Base.Dates.DateFormat(Base.Dates.Slot[],"","english")


julia> @time ds = DataStreams.DataTable(f)
 43.513800 seconds (694.00 M allocations: 12.775 GB, 2.55% gc time)


You can convert the result to a DataFrame with:

function DataFrames.DataFrame(dt::DataStreams.DataTable)
cols = dt.schema.cols
data = Array(Any,cols)
types = DataStreams.types(dt)
for i = 1:cols
data[i] = DataStreams.column(dt,i,types[i])
end
return DataFrame(data,Symbol[symbol(x) for x in dt.schema.header])
end


-Jacob

On Tue, Oct 13, 2015 at 2:40 PM, feza  wrote:

> Finally was able to load it, but the process   consumes a ton of memory.
> julia> @time train = readtable("./test.csv");
> 124.575362 seconds (376.11 M allocations: 13.438 GB, 10.77% gc time)
>
>
>
> On Tuesday, October 13, 2015 at 4:34:05 PM UTC-4, feza wrote:
>>
>> Same here on a 12gb ram machine
>>
>>_
>>_   _ _(_)_ |  A fresh approach to technical computing
>>   (_) | (_) (_)|  Documentation: http://docs.julialang.org
>>_ _   _| |_  __ _   |  Type "?help" for help.
>>   | | | | | | |/ _` |  |
>>   | | |_| | | | (_| |  |  Version 0.5.0-dev+429 (2015-09-29 09:47 UTC)
>>  _/ |\__'_|_|_|\__'_|  |  Commit f71e449 (14 days old master)
>> |__/   |  x86_64-w64-mingw32
>>
>> julia> using DataFrames
>>
>>
>>
>> julia> train = readtable("./test.csv");
>>
>> ERROR: OutOfMemoryError()
>>
>>  in resize! at array.jl:452
>>
>>  in readnrows! at
>> C:\Users\Mustafa\.julia\v0.5\DataFrames\src\dataframe\io.jl:164
>>  in readtable! at
>> C:\Users\Mustafa\.julia\v0.5\DataFrames\src\dataframe\io.jl:767
>>  in readtable at
>> C:\Users\Mustafa\.julia\v0.5\DataFrames\src\dataframe\io.jl:847
>>  in readtable at
>> C:\Users\Mustafa\.julia\v0.5\DataFrames\src\dataframe\io.jl:893
>>
>>
>>
>>
>>
>> On Tuesday, October 13, 2015 at 3:47:58 PM UTC-4, Yichao Yu wrote:
>>>
>>>
>>> On Oct 13, 2015 2:47 PM, "Grey Marsh"  wrote:
>>>
>>> Which julia version are you using. There's sime gc tweak on 0.4 for that.
>>>
>>> >
>>> > I was trying to load the training dataset from springleaf marketing
>>> response on Kaggle. The csv is 921 mb, has 145321 row and 1934 columns. My
>>> machine has 8 gb ram and julia ate 5.8gb+ memory after that I stopped julia
>>> as there was barely any memory left for OS to function properly. It took
>>> about 5-6 minutes later for the incomplete operation. I've windows 8
>>> 64bit. Used the following code to read the csv to Julia:
>>> >
>>> > using DataFrames
>>> > train = readtable("C:\\train.csv")
>>> >
>>> > Next I tried to to load the same file in python:
>>> >
>>> > import pandas as pd
>>> > train = pd.read_csv("C:\\train.csv")
>>> >
>>> > This took ~2.4gb memory, about a minute time
>>> >
>>> > Checking the same in R again:
>>> > df = read.csv('E:/Libraries/train.csv', as.is = T)
>>> >
>>> > This took 2-3 minutes and consumes 3.5gb mem on the same machine.
>>> >
>>> > Why such discrepancy and why Julia even fails to load the csv before
>>> running out of memory? If there is any better way to get the file loaded in
>>> Julia?
>>> >
>>> >
>>>
>>

Re: [julia-users] convert a list of ASCIIString into Any

2015-10-12 Thread Jacob Quinn

I'd probably do something like:

julia> x = Any["abc", "10"]
2-element Array{Any,1}:
 "abc"
 "10"

julia> for (i,v) in enumerate(x)
parsed = tryparse(Int,v)
x[i] = isnull(parsed) ? v : get(parsed)
   end

julia> x
2-element Array{Any,1}:
   "abc"
 10


On Mon, Oct 12, 2015 at 8:31 AM, masa charlie 
wrote:

> I have a question:
> I want to know the way to convert a ASCIIString represented number in an
> Array{ASCIIString, 1} into Int, and put it back into the original list as
> Array{Any, 1} in Julia.
>
> For example:
>
> When I have a array variable x as follows;
>
> julia> x = ["abc", "10"]
> 2-element Array{ASCIIString,1}:
>  "abc"
>  "10"
>
> and I want to get it converted as the following.
>
> julia> y = ["abc", 10 ]
> 2-element Array{Any,1}:
>"abc"
>  10
>
> Then, if I've tried applying parse function and putting it back to the
> original Array, I ended up getting an error below.
>
> x[2] = parse(Int, x[2])
> ERROR: MethodError: `parse` has no method matching parse(::Type{Int64},
> ::Int64)
> Closest candidates are:
>   parse{T<:Integer}(::Type{T<:Integer}, !Matched::Char)
>   parse{T<:Integer}(::Type{T<:Integer}, !Matched::Char, !Matched::Integer)
>   parse{T<:Integer}(::Type{T<:Integer}, !Matched::AbstractString,
> !Matched::Integer)
>   ...
>  in convert at none:1
>  in setindex! at array.jl:313
>
> Can any one have a suggestion on the way to get an expected result?
>
>
> Thanks in advance!!!
>
> Masa
>
>
>
>

Re: [julia-users] Combining arrays in an R enlist like manner

2015-10-12 Thread Jacob Quinn

Yeah, you're not going to do much better than that:

julia> function unlist{T}(vec_of_vecs::Vector{Vector{T}})
i = 0
for vec in vec_of_vecs
i += length(vec)
end
final = Array(T, i)
i = 1
for vec in vec_of_vecs
for element in vec
final[i] = element
i += 1
end
end
return final
   end
unlist (generic function with 1 method)

julia> vec = Vector{Int}[[1,2,3,4,5], [6,7,8,9,10]]
2-element Array{Array{Int64,1},1}:
 [1,2,3,4,5]
 [6,7,8,9,10]

julia> unlist(vec)
10-element Array{Int64,1}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10

julia> vec = Vector{Float64}[[1.0,2.0,3.0,4.0,5.0], [6.0,7.0,8.0,9.0,10.0]]
2-element Array{Array{Float64,1},1}:
 [1.0,2.0,3.0,4.0,5.0]
 [6.0,7.0,8.0,9.0,10.0]

julia> unlist(vec)
10-element Array{Float64,1}:
  1.0
  2.0
  3.0
  4.0
  5.0
  6.0
  7.0
  8.0
  9.0
 10.0


On Mon, Oct 12, 2015 at 3:33 PM, Ben Ward  wrote:

> Hi,
>
> In R, if you have a list, containing two vectors that are - say - numeric,
> you can unlist them into one vector:
>
> > a <- list(c(1,2,3,4,5), c(7,8,9,10,11))
>
> > a
>
> [[1]]
>
> [1] 1 2 3 4 5
>
>
>
> [[2]]
>
> [1]  7  8  9 10 11
>
>
>
> > unlist(a)
>
>  [1]  1  2  3  4  5  7  8  9 10 11
>
> >
>
>
> Is there a convenient way to do this with vectors in Julia? Say I have a
> Vector{Vector{Int}}:
>
> *vec = Vector{Int}[[1,2,3,4,5], [6,7,8,9,10]]*
>
> I can only think of creating a new vector and through some
> loopiness, filling it in.
>
> Thanks,
> Ben.
>

Re: [julia-users] How to define function f(x::(Int...)) in Julia 0.4?

2015-10-11 Thread Jacob Quinn

I believe the current way to do this is:

julia> t(x::Tuple{Vararg{Int}}) = sum(x)
t (generic function with 1 method)

julia> t((1,1))
2

julia> t((1,1,1))
3

julia> t((1,1,1,1))
4

Though I recall there being an open issue or two on syntax to make this
nicer. (something like `t(x::{Int...})` )

-Jacob

On Sun, Oct 11, 2015 at 12:08 PM, Jerry Xiong  wrote:

> For example, in Julia 0.3, I can use below function definition:
>
> julia> f(::(Int...))="This is an Int tuple."
> julia> f((1,2))
> "This is an Int tuple."
> julia> f((1,2,3))
> "This is an Int tuple."
>
> How to define a function with unlimited tuple length in Julia 0.4?
>

Re: [julia-users] DataFrames' readtable very slow compared to R's read.csv when loading ~7.6M csv rows

2015-10-08 Thread Jacob Quinn

Pushed some fixes. Thanks for trying it out.

-Jacob

On Wed, Oct 7, 2015 at 11:54 PM, bernhard <kafis...@gmail.com> wrote:

> Thank you Quinn
>
> Things do not work (for me) though.
>
> is it possible you are missing a comma after "col" in lines 24 and 33 of
> Sink.jl
> function writefield(io::Sink, val::AbstractString, col N)
>
>
>
> Am Mittwoch, 7. Oktober 2015 16:36:52 UTC+2 schrieb David Gold:
>>
>> Yaas. Very excited to see this.
>>
>> On Wednesday, October 7, 2015 at 6:07:44 AM UTC-7, Jacob Quinn wrote:
>>>
>>> Haha, nice timing. I just pushed a big CSV.jl overhaul for 0.4 yesterday
>>> afternoon. I just pushed the DataStreams.jl package, so you can find that
>>> at https://github.com/quinnj/DataStreams.jl, and you'll have to
>>> Pkg.clone it. Everything should work at that point.
>>>
>>> I'm still cleaning up some other related packages, so that's why things
>>> aren't documented/registered/tagged quite yet as the interface may evolve
>>> slightly, probably more the low-level machinery. So `stream!(::CSV.Source,
>>> ::DataStream)` should stay the same.
>>>
>>> I've already got a bit writeup started once everything's done, so if
>>> you'd rather wait another couple days or a week, I should have something
>>> ready by then.
>>>
>>> -Jacob
>>>
>>> On Wed, Oct 7, 2015 at 12:33 AM, bernhard <kafi...@gmail.com> wrote:
>>>
>>>> Is there any update on this? Or maybe a timeline/roadmap?
>>>> I would love to see a faster CSV reader.
>>>>
>>>> I tried to take a look at Jacob's CSV.jl.
>>>> But I seem to be missing https://github.com/lindahua/DataStreams.jl
>>>> I have no idea where to find DataStreams package
>>>> Does it still exist?
>>>>
>>>> Is there any (experimental) way to make CSV.jl work?
>>>>
>>>>
>>>>
>>>>> Am Samstag, 6. Juni 2015 14:41:36 UTC+2 schrieb David Gold:
>>>>>
>>>>> @Jacob,
>>>>>
>>>>> Thank you very much for your explanation! I expect having such a
>>>>> blueprint will make delving into the actual code more tractable for me.
>>>>> I'll be curious to see how your solution here and your proposal for string
>>>>> handling end up playing with the current Julia data ecosystem.
>>>>>
>>>>> On Saturday, June 6, 2015 at 1:17:34 AM UTC-4, Jacob Quinn wrote:
>>>>>>
>>>>>> @David,
>>>>>>
>>>>>> Sorry for the slow response. It's been a busy week :)
>>>>>>
>>>>>> Here's a quick rundown of the approach:
>>>>>>
>>>>>> - In the still-yet-to-be-officially-published
>>>>>> https://github.com/quinnj/CSV.jl package, the bulk of the code goes
>>>>>> into creating a `CSV.File` type where the structure/metadata of the file 
>>>>>> is
>>>>>> parsed/detected/saved in a type (e.g. header, delimiter, newline, # of
>>>>>> columns, detected column types, etc.)
>>>>>> - `SQLite.create` and now `CSV.read` both take a `CSV.File` as input
>>>>>> and follow a similar process in parsing:
>>>>>>   - The actual file contents are mmapped; i.e. the entire file is
>>>>>> loaded into memory at once
>>>>>>   - There are currently three `readfield` methods
>>>>>> (Int,Float64,String) that take an open `CSV.Stream` type (which holds the
>>>>>> mmapped data and the current "position" of parsing), and read a single
>>>>>> field according to what the type of that column is supposed to be
>>>>>>   - for example, readfield(io::CSV.Stream, ::Type{Float64}, row,
>>>>>> col), will start reading at the current position of the `CSV.Stream` 
>>>>>> until
>>>>>> it hits the next delimiter, newline, or end of the file and then 
>>>>>> interpret
>>>>>> the contents as a Float64, returning `val, isnull`
>>>>>>
>>>>>> That's pretty much it. One of the most critical performance keys for
>>>>>> both SQLite and CSV.read is non-copying strings once the file has been
>>>>>> mmapped. For SQLite, the sqlite3_bind_text library method actually has a
>>>>>> flag to indicate whether the text should be copied or not, so we're able

Re: [julia-users] DataFrames' readtable very slow compared to R's read.csv when loading ~7.6M csv rows

2015-10-07 Thread Jacob Quinn

Haha, nice timing. I just pushed a big CSV.jl overhaul for 0.4 yesterday
afternoon. I just pushed the DataStreams.jl package, so you can find that
at https://github.com/quinnj/DataStreams.jl, and you'll have to Pkg.clone
it. Everything should work at that point.

I'm still cleaning up some other related packages, so that's why things
aren't documented/registered/tagged quite yet as the interface may evolve
slightly, probably more the low-level machinery. So `stream!(::CSV.Source,
::DataStream)` should stay the same.

I've already got a bit writeup started once everything's done, so if you'd
rather wait another couple days or a week, I should have something ready by
then.

-Jacob

On Wed, Oct 7, 2015 at 12:33 AM, bernhard <kafis...@gmail.com> wrote:

> Is there any update on this? Or maybe a timeline/roadmap?
> I would love to see a faster CSV reader.
>
> I tried to take a look at Jacob's CSV.jl.
> But I seem to be missing https://github.com/lindahua/DataStreams.jl
> I have no idea where to find DataStreams package
> Does it still exist?
>
> Is there any (experimental) way to make CSV.jl work?
>
>
>
>> Am Samstag, 6. Juni 2015 14:41:36 UTC+2 schrieb David Gold:
>>
>> @Jacob,
>>
>> Thank you very much for your explanation! I expect having such a
>> blueprint will make delving into the actual code more tractable for me.
>> I'll be curious to see how your solution here and your proposal for string
>> handling end up playing with the current Julia data ecosystem.
>>
>> On Saturday, June 6, 2015 at 1:17:34 AM UTC-4, Jacob Quinn wrote:
>>>
>>> @David,
>>>
>>> Sorry for the slow response. It's been a busy week :)
>>>
>>> Here's a quick rundown of the approach:
>>>
>>> - In the still-yet-to-be-officially-published
>>> https://github.com/quinnj/CSV.jl package, the bulk of the code goes
>>> into creating a `CSV.File` type where the structure/metadata of the file is
>>> parsed/detected/saved in a type (e.g. header, delimiter, newline, # of
>>> columns, detected column types, etc.)
>>> - `SQLite.create` and now `CSV.read` both take a `CSV.File` as input and
>>> follow a similar process in parsing:
>>>   - The actual file contents are mmapped; i.e. the entire file is loaded
>>> into memory at once
>>>   - There are currently three `readfield` methods (Int,Float64,String)
>>> that take an open `CSV.Stream` type (which holds the mmapped data and the
>>> current "position" of parsing), and read a single field according to what
>>> the type of that column is supposed to be
>>>   - for example, readfield(io::CSV.Stream, ::Type{Float64}, row,
>>> col), will start reading at the current position of the `CSV.Stream` until
>>> it hits the next delimiter, newline, or end of the file and then interpret
>>> the contents as a Float64, returning `val, isnull`
>>>
>>> That's pretty much it. One of the most critical performance keys for
>>> both SQLite and CSV.read is non-copying strings once the file has been
>>> mmapped. For SQLite, the sqlite3_bind_text library method actually has a
>>> flag to indicate whether the text should be copied or not, so we're able to
>>> pass the pointer to the position in the mmapped array directly. For the
>>> CSV.read method, which returns a Vector of the columns (as typed arrays),
>>> I've actually rolled a quick and dirty CString type that looks like
>>>
>>> immutable CString
>>>   ptr::Ptr{UInt8}
>>>   len::Int
>>> end
>>>
>>> With a few extra method definitions, this type looks very close to a
>>> real string type, but we can construct it by pointing directly to the
>>> mmapped region (which currently isn't possible for native Julia string
>>> types). See https://github.com/quinnj/Strings.jl for more brainstorming
>>> around this alternative string implementation. You can convert a CString to
>>> a Julia string by calling string(x::CString) or map(string,column) for an
>>> Array of CSV.CStrings.
>>>
>>> As an update on the performance on the Facebook Kaggle competition
>>> bids.csv file:
>>>
>>> -readcsv: 45 seconds, 33% gc time
>>> -CSV.read: 19 seconds, 3% gc time
>>> -SQLite.create: 25 seconds, 3.25% gc time
>>>
>>> Anyway, hopefully I'll get around to cleaning up CSV.jl to be released
>>> officially, but it's that last 10-20% that's always the hardest to finish
>>> up :)
>>>
>>> -Jacob
>>>
>>>
>>>
>>> On Mon, Jun 1, 2015 at 4:

Re: [julia-users] Re: The status of Julia's Webstack

2015-10-07 Thread Jacob Quinn

Jon,

I think you mean Morsel and Meddle are deprecated? While Mux is actually
maintained?

On Wed, Oct 7, 2015 at 7:04 AM, Jonathan Malmaud  wrote:

> Mux and Morsel are formally deprecated at this point and have no
> maintainers. They were designed early on in Julia's life and don't have a
> design that is particularly suited for modern Julia, so the maintainers of
> JuliaWeb made a decision to not invest time in keeping them operational.
>
> That said, Randy is right that I would still merge PRs for them.
>
> On Tuesday, October 6, 2015 at 5:05:10 PM UTC-4, Mohammed El-Beltagy wrote:
>>
>> It seems that Morsel.jl and Meddle.jl are quietly dieing away. I noticed
>> that after I did Pkg.update and found that my previously working code is is
>> now failing. Looking that the repositories on github, I noticed that both
>> have been failing their tests. I had to pin down Morsel, Meddle,
>> HttpCommon, HttpServer, and HttpParser to earlier versions to keep my
>> server running.
>>
>> There claimed replacement "Mux.jl" is morel like Meddle and can to be
>> regarded as a micro framework. This leaves a significant hole in Julia's
>> package echo system. I wonder if there are any attempts to fill that hole.
>>
>

Re: [julia-users] Re: META: What are the chances of moving this forum?

2015-09-09 Thread Jacob Quinn

I know I've heard chatter around about moving google groups to Discourse. I
think people would generally be favorable, the problem is just having
someone to own it, set it up, maintain, etc. I'm not even sure what the
overhead for something like Discourse is vs. google groups. If someone
feels strongly and has the bandwidth, it'd certainly be worth putting forth
a solid proposal of how to migrate/setup/maintain etc.

-Jacob

On Wed, Sep 9, 2015 at 10:31 AM, Nils Gudat  wrote:

> That's a good point, I hadn't thought of that. Discourse (which I think is
> what's behind the Juno forum, and the Atom discussion group) seems to do:
> Discourse Mailing list mode
> 
> Discourse reply via email
> 
>
> But I've never used mailing lists properly, so I can't say whether this
> would be satisfactory to those who do.
>

Re: [julia-users] Re: Convert datetime string that has AM/PM to datetime, without using Calendar.jl

2015-09-08 Thread Jacob Quinn

I'm not seeing the error on the latest master

   _
   _   _ _(_)_ |  A fresh approach to technical computing
  (_) | (_) (_)|  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.4.0-pre+7400 (2015-09-08 22:42 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit bffe239* (0 days old master)
|__/   |  x86_64-apple-darwin14.4.0

julia> using Base.Dates

julia> ds = "2015-08-12 12:01:23 PM"
"2015-08-12 12:01:23 PM"

julia> DateTime(ds, DateFormat("-mm-dd HH:MM:SS")) - (contains(ds,
"AM") ? Hour(12) : Hour(0))
2015-08-12T12:01:23

julia>

On Tue, Sep 8, 2015 at 5:23 PM, Ian Butterworth 
wrote:

> Thanks for the tip, although I'm getting an error with your code in
> 0.4.0-dev
> Any idea why?
>
> using Base.Dates
> ds = "2015-08-12 12:01:23 PM"
> DateTime(ds, DateFormat("-mm-dd HH:MM:SS")) - (contains(ds, "AM") ?
> Hour(12) : Hour(0))
>
>
> LoadError: ArgumentError: Non-digit character encountered
> while loading In[51], in expression starting on line 3
>
>  in getslot at dates/io.jl:110
>  in parse at dates/io.jl:122
>  in call at dates/io.jl:164
>
>

Re: [julia-users] Re: Recent developments in JuliaWeb

2015-09-06 Thread Jacob Quinn

The switch will happen automatically; no need to change any user code. We'll 
make the switch and tag a new release and next time you run Pkg.update() it'll 
download and install MbedTLS.jl.

Sent from my iPhone

> On Sep 6, 2015, at 5:45 PM, jock.law...@gmail.com wrote:
> 
> Iain these are positive steps and seem to be an optimal use of the labour 
> available.
> I look forward to working with a lighter, tighter, 
> even-more-easy-to-understand stack.
> 
> Any thoughts on when users of GnuTLS.jl should migrate to MbedTLS.jl?
> (I'm a 0.3 user and won't be switching to 0.4 until it is stable, which I 
> guess won't be far off anyway)
> 
> 
> 
>> On Monday, September 7, 2015 at 1:53:29 AM UTC+10, Iain Dunning wrote:
>> Hi all,
>> 
>> JuliaWeb started off as a really mixed collection of packages, many of them 
>> made by people who had moved on (Hacker School). Most were essentially 
>> unmaintained.
>> 
>> By putting them all these packages in one place, we've basically managed to 
>> tread water for a while, trying to merge PRs if we understand them and doing 
>> basic maintenance.
>> It was clear it wasn't long term viable though - no one maintaining really 
>> had a strong need or desire to 'own' these packages.
>> 
>> Recently, we've taken some steps to reign things in and focus limited 
>> resources:
>> - Deprecate Meddle and Morsel.jl. These are "web framework" type packages 
>> that were too complex to be maintained, but were seemingly somewhat popular 
>> (by github stars, at least). 
>> I've added a max version cap of 0.5 on all releases and master (so they'll 
>> install on 0.4, but not 0.5), and made it clear in the README that they are 
>> abandoned.
>> I doubt someone will take over maintenance, so they are effectively out of 
>> sight and mind now.
>> 
>> - Move away from GnuTLS.jl. Apart from concerns about GnuTLS itself, this 
>> package was also unmaintained and poorly understood, while being critical to 
>> pretty much everything.
>> @malmaud has created a new wrapper for MbedTLS, which should be simpler and 
>> more importantly, it works. We'll be moving e.g. Requests, etc. to MbedTLS.jl
>> 
>> - We no longer support Julia 0.3. A branch was made on the last 
>> 0.3-supporting commit, so people can submit PRs if they wish to backport 
>> fixes to that line, but the limited volunteer
>> power available will not be touching it. This is yielding benefits already: 
>> last night I systematically went through HttpCommon.jl and was able to get 
>> 100% coverage, no Compat.jl cruft,
>> 0.4-style docstring on everything, and removal of redundant or unsupported 
>> functionality. Contributions along these lines welcome for the other 
>> JuliaWeb packages too.
>> 
>> Thanks,
>> Iain

Re: [julia-users] Re: Dates and typemax error

2015-08-31 Thread Jacob Quinn

Date/DateTime are documented as following the proleptic gregorian calendar,
so negative values for the year imply BC/BCE era values.

On Mon, Aug 31, 2015 at 4:06 PM, Michael Francis <mdcfran...@gmail.com>
wrote:

> it does seem that typemin and typemax of Date are rather large. It also
> implies that a DateTime can not contain all possible Dates. Given the
> following
>
>
>
> julia> typemin( DateTime )
> -146138511-01-01T00:00:00
>
>
> julia> typemax( DateTime )
> 146138512-12-31T23:59:59
>
>
> I was surprised to see negative values rather than the type being a
> UInt64. But is somewhat makes sense as the types are wrappers on periods
> for period math. So perhaps the issue is that typemin should be floored at
> zero ?
>
>
> On Monday, August 31, 2015 at 5:37:16 PM UTC-4, Jeffrey Sarnoff wrote:
>>
>>
>> What is the utility of a date that predates the advent of time?
>>
>> On Friday, August 28, 2015 at 12:23:28 PM UTC-4, Michael Francis wrote:
>>>
>>> I agree except that people may expect yy-mm-dd to truncate, likely one
>>> of the reasons for the ccyy-mm-dd strict form. Where yy is defined as the
>>> two digit year.
>>>
>>> On Friday, August 28, 2015 at 11:51:14 AM UTC-4, Stefan Karpinski wrote:
>>>>
>>>> The safest option is probably to raise an error.
>>>>
>>>> On Fri, Aug 28, 2015 at 11:47 AM, Jacob Quinn <karb...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hm...it's not entirely clear to me what we should do here.
>>>>>
>>>>> On the one hand, when you ask to have the typemax(Date) formatted,
>>>>> it's currently doing what you asked, "formatting the year with 4 digits".
>>>>> Because your year in this case is greater than 4 digits, that results in
>>>>> truncation, which probably isn't what you want. But is it ok to give you
>>>>> all the digits even though you only asked for 4? I'd appreciate any other
>>>>> thoughts/input on this.
>>>>>
>>>>> I do think the Date/DateTime parsing/formatting code needs another
>>>>> once over to polish it up, so any ideas on allowing more
>>>>> flexibility/functionality would be appreciated.
>>>>>
>>>>> -Jacob
>>>>>
>>>>>
>>>>>
>>>>> On Friday, August 28, 2015 at 9:33:25 AM UTC-6, Michael Francis wrote:
>>>>>>
>>>>>> It seems that there is an issue with typemax of dates and string
>>>>>> representation
>>>>>>
>>>>>> julia> using Dates
>>>>>>
>>>>>>
>>>>>> julia> Dates.format(typemax( Date ),"-mm-dd" )
>>>>>> "1149-12-31"
>>>>>>
>>>>>>
>>>>>> julia> typemax( Date )
>>>>>> 252522163911149-12-31
>>>>>>
>>>>>>
>>>>>> julia> Dates.format(typemax( Date ),"yyy-mm-dd" )
>>>>>> "252522163911149-12-31"
>>>>>>
>>>>>> This hidden truncation seems dangerous. Has anybody else seen this ?
>>>>>>
>>>>>
>>>>

[julia-users] Re: Dates and typemax error

2015-08-28 Thread Jacob Quinn

Hm...it's not entirely clear to me what we should do here.

On the one hand, when you ask to have the typemax(Date) formatted, it's 
currently doing what you asked, formatting the year with 4 digits. 
Because your year in this case is greater than 4 digits, that results in 
truncation, which probably isn't what you want. But is it ok to give you 
all the digits even though you only asked for 4? I'd appreciate any other 
thoughts/input on this.

I do think the Date/DateTime parsing/formatting code needs another once 
over to polish it up, so any ideas on allowing more 
flexibility/functionality would be appreciated.

-Jacob



On Friday, August 28, 2015 at 9:33:25 AM UTC-6, Michael Francis wrote:

 It seems that there is an issue with typemax of dates and string 
 representation

 julia using Dates


 julia Dates.format(typemax( Date ),-mm-dd )
 1149-12-31


 julia typemax( Date )
 252522163911149-12-31


 julia Dates.format(typemax( Date ),yyy-mm-dd )
 252522163911149-12-31

 This hidden truncation seems dangerous. Has anybody else seen this ?

Re: [julia-users] Inheriting from Real works, from AbstractFloat does not?

2015-08-24 Thread Jacob Quinn

When you subtype AbstractFloat, it's going to try to use the `grisu.jl`
code to do the showing. The grisu code has all sorts of requirements to
work, most of it semi-hard-coded for Float16, Float32, and Float64. Your
best bet would probably be to define

Base.show(io::IO, x::SubtypeAbsFloat) = show(io, x.val)

On Mon, Aug 24, 2015 at 8:56 PM, Jeffrey Sarnoff jeffrey.sarn...@gmail.com
wrote:

 julia immutable SubtypeReal : Real
val::Float64
end

 julia a=SubtypeReal(5.0)
 SubtypeReal(5.0)

 julia immutable SubtypeAbsFloat : AbstractFloat
val::Float64
end

 julia a=SubtypeAbsFloat(5.0)
 Error showing value of type SubtypeAbsFloat:
 ERROR: - not defined for SubtypeAbsFloat
  in _show at grisu.jl:64
  in show at grisu.jl:119
  ...
  in run_repl at ./REPL.jl:166
  in _start at ./client.jl:453

 Defining show for it does not help.

Re: [julia-users] Re: Get time between two DateTime values in hours (and minutes if simple)

2015-08-15 Thread Jacob Quinn

Yeah, it's forthcoming. I left it out originally just to be conservative in
code and function, but it's come up enough that we should add it in for
TimePeriods. A good up for grabs kind of PR if anyone's feeling up for it.

-Jacob

On Sat, Aug 15, 2015 at 10:59 AM, Ian Butterworth i.r.butterwo...@gmail.com
 wrote:

 Thanks guys. I ended up using Kaj's approach. Functionality like we
 discussed would be good if possible.


 On Saturday, 15 August 2015 06:16:09 UTC-4, Jeffrey Sarnoff wrote:

 What would like two lines of code to do with durations?

 On Friday, August 14, 2015 at 7:41:04 PM UTC-4, Ian Butterworth wrote:

 Trying to get the number of hours between these two dates (ideally x
 hours and y minutes), but can't figure out how to convert the duration
 variable into hours. The bottom line currently errors

 timein = 2015/8/13 10:19:50
 timein2 = 2015/8/14 13:12:34

 time_series[1] = DateTime(timein,/mm/dd HH:MM:SS)
 time_series[2] = DateTime(timein2,/mm/dd HH:MM:SS)

 duration = time_series[2]-time_series[1]
 Dates.Hour(duration)

Re: [julia-users] Pass an array of values to SQLite query

2015-08-04 Thread Jacob Quinn

You probably want something like

query(db,insert into tbl values ($(join(vals,','

to do a single row.

Also note that the `create` and `append` methods are supplied to handle
uploading table-like datastructures (i.e. anything that supports size(A)
and getindex(A, i, j)).

-Jacob

On Tue, Aug 4, 2015 at 10:44 AM, Brandon Booth etu...@gmail.com wrote:

 I'm trying to insert a series of large datasets into an SQLite database.
 My plan was to loop through the datasets and insert chunks of rows into the
 database. I'm trying to get a single row to work and then expand it to work
 with blocks of rows.

 So far, this works:
 v1 = vals[1,1]
 v2 = vals[1,2]
 v3 = vals[1,3]
 v4 = vals[1,4]
 query(db,INSERT INTO tbl VALUES ('$v1','$v2','$v3','$v4'))

 I'd like to do something like this:
 query(db,INSERT INTO tbl VALUES ('$vals[1,1:4]'))

 My attempt to pass an array gives me an error that the table has 4 columns
 but only 1 value was supplied. How do I properly pass the array?

 Thanks.

 Brandon

Re: [julia-users] Re: MongoDB and Julia

2015-07-13 Thread Jacob Quinn

You may also try Pkg.add(ODBC) if you can find a working ODBC driver for
mongo. I feel like I've heard of people going this route.

-Jacob

On Mon, Jul 13, 2015 at 9:23 AM, Kevin Liu kevinliu2...@gmail.com wrote:

 Hey Stefan, thanks for replying. I have not opened an issue on Github's
 pzion/Mongo.jl. I will, and I will attempt to debug it. Thank you. Kevin

 On Monday, July 13, 2015 at 9:02:30 AM UTC-3, Stefan Karpinski wrote:

 Have you tried opening issues on the relevant packages? Most people here
 (myself included) won't know much about mongoDB or these packages.


 On Jul 13, 2015, at 12:27 AM, Kevin Liu kevinl...@gmail.com wrote:

 Any help would be greatly appreciated. I am even debating over the idea
 of contributing to the development of this package because I believe so
 much in the language and need to use MongoDB.

 On Sunday, July 12, 2015 at 4:17:44 AM UTC-3, Kevin Liu wrote:

 Hi,

 I have Julia 0.3, Mongodb-osx-x86_64-3.0.4, and Mongo-c-driver-1.1.9
 installed, but can't get Julia to access the Mongo Client through this
 'untestable' package https://github.com/pzion/Mongo.jl, according to
 http://pkg.julialang.org/.

 I have tried Lytol/Mongo.jl and the command require(Mongo.jl) can't
 open file Mongo.jl, or the auto-generated deps.jl.

 Is anyone having similar problems trying to make Julia work with Mongo?

 Thank you

 Kevin

Re: [julia-users] Re: Too many packages?

2015-07-13 Thread Jacob Quinn

Note there's also an open issue for requiring a higher overall standard for
officially registered packages in the JuliaLang/METADATA.jl package
repository. It's a big issue with a lot of work required to get to the
proposal, but it would lead to (hopefully) instilling more confidence in
users knowing that anything they add through `Pkg.add()` would meet some
acceptable level of quality and robustness.

-Jacob

On Mon, Jul 13, 2015 at 11:11 AM, Christoph Ortner 
christophortn...@gmail.com wrote:

 I seem to be in the minority too many packages camp. I would prefer
 stable updates of julia version which means that key functionality should
 be included in core, e.g. BLAS, sparse solvers, eig, eigs, basic plotting
 and so on and so forth. But at some point there was an idea of having core
 and Stdlib, which I think is equally acceptable.
 Christoph

Re: [julia-users] Re: MongoDB and Julia

2015-07-13 Thread Jacob Quinn

No worries. I realize it's a bit of a square peg-round hole there.

On Mon, Jul 13, 2015 at 2:07 PM, Kevin Liu kevinliu2...@gmail.com wrote:

 Hey Jacob, thanks for the suggestion. ODBC just doesn't sound like the
 optimal way to go for being too generic. I am studying its implications and
 alternatives, but probably won't follow with ODBC. I appreciate the help.

 On Monday, July 13, 2015 at 4:02:25 PM UTC-3, Jacob Quinn wrote:

 You may also try Pkg.add(ODBC) if you can find a working ODBC driver
 for mongo. I feel like I've heard of people going this route.

 -Jacob

 On Mon, Jul 13, 2015 at 9:23 AM, Kevin Liu kevinl...@gmail.com wrote:

 Hey Stefan, thanks for replying. I have not opened an issue on Github's
 pzion/Mongo.jl. I will, and I will attempt to debug it. Thank you. Kevin

 On Monday, July 13, 2015 at 9:02:30 AM UTC-3, Stefan Karpinski wrote:

 Have you tried opening issues on the relevant packages? Most people
 here (myself included) won't know much about mongoDB or these packages.


 On Jul 13, 2015, at 12:27 AM, Kevin Liu kevinl...@gmail.com wrote:

 Any help would be greatly appreciated. I am even debating over the idea
 of contributing to the development of this package because I believe so
 much in the language and need to use MongoDB.

 On Sunday, July 12, 2015 at 4:17:44 AM UTC-3, Kevin Liu wrote:

 Hi,

 I have Julia 0.3, Mongodb-osx-x86_64-3.0.4, and Mongo-c-driver-1.1.9
 installed, but can't get Julia to access the Mongo Client through this
 'untestable' package https://github.com/pzion/Mongo.jl, according to
 http://pkg.julialang.org/.

 I have tried Lytol/Mongo.jl and the command require(Mongo.jl) can't
 open file Mongo.jl, or the auto-generated deps.jl.

 Is anyone having similar problems trying to make Julia work with
 Mongo?

 Thank you

 Kevin

Re: [julia-users] Re: Help to eliminate Calendar.jl dependence

2015-07-10 Thread Jacob Quinn

On Fri, Jul 10, 2015 at 8:11 AM, Tom Breloff t...@breloff.com wrote:

 as


Tom,

Yes, the method I proposed won't work retroactively since the method for
getting the current local offset from GMT is `now() - now(Dates.UTC)`. If
you were to run that every second crossing over the daylight savings
moment, you'd see that it correctly adjusts for daylight savings, but it's
only going to give you the *current* offset from GMT. Something more
elaborate will require tapping into the tzinfo database (which TimeZones.jl
will do).

-Jacob

Re: [julia-users] Help to eliminate Calendar.jl dependence

2015-07-08 Thread Jacob Quinn

TimeZones.jl isn't quite ready for public consumption yet (enough public
clamor will eventually get me to fix it up).

How about the following?

function calcSecondsEpochToMidnight(secondsSinceEpoch::Integer)
utc = DateTime(Date(Dates.unix2datetime(secondsSinceEpoch)))
adjustment = Dates.Second(div(Dates.value(now() - now(Dates.UTC)),1000))
return Dates.value(utc+adjustment)
end

Obviously this will only work if you're in EST (now() will return a
DateTime according to your system time), so it's not very portable, but it
also doesn't require any other timezone fiddling.

-Jacob

On Wed, Jul 8, 2015 at 8:59 AM, Tom Breloff t...@breloff.com wrote:

 I have some code which requires figuring out the number of seconds from
 the Epoch until midnight (local time) in order to quickly compute the local
 TimeOfDay.  The reason is that I get passed a field which is seconds since
 Epoch, and I'd like to just subtract off the (cached) # seconds from
 Epoch--Midnight.

 Since I'm using a cached number, I don't care so much how long it takes to
 calculate.  Right now I use both Dates and Calendar.jl, but I'm wondering
 if I can accomplish this without the dependency on Calendar.jl (which I
 currently use ONLY to get the hours offset between Eastern US and UTC).  Is
 there a better way to write this function?


 function getHoursAdjustmentFromUTC(year::Integer, month::Integer,
 day::Integer)
   millisEST = *Calendar.ymd*(year, month, day, EST5EDT).millis
   millisUTC = *Calendar.ymd*(year, month, day, UTC).millis
   UInt64(round((millisEST - millisUTC) / (secondsInOneHour *
 millisInOneSecond)))
 end

 getEpochMillis() = UInt64(DateTime(1970,1,1).instant.periods.value)
 createUTCDateTimeFromSecondsSinceEpoch(secondsSinceEpoch::Integer) =
 DateTime(Dates.UTM(secondsSinceEpoch * millisInOneSecond +
 getEpochMillis()))


 # this is the function I care about... note that midnight refers to
 midnight local to Eastern US
 function calcSecondsEpochToMidnight(secondsSinceEpoch::Integer)

   dt = createUTCDateTimeFromSecondsSinceEpoch(secondsSinceEpoch)

   # get the hour adjustment using the Calendar module
   y = Dates.year(dt)
   m = Dates.month(dt)
   d = Dates.day(dt)
   hourAdjustment = getHoursAdjustmentFromUTC(y, m, d)

   millisMidnightUTC::UInt64 = DateTime(y, m, d).instant.periods.value
   millisMidnightEST::UInt64 = millisMidnightUTC + hourAdjustment *
 secondsInOneHour * millisInOneSecond

   return UInt64((millisMidnightEST - getEpochMillis()) / millisInOneSecond)
 end

Re: [julia-users] how to determine if a particular method signature defined

2015-07-06 Thread Jacob Quinn

http://docs.julialang.org/en/latest/stdlib/base/#Base.applicable

On Mon, Jul 6, 2015 at 11:25 AM, Simon Byrne simonby...@gmail.com wrote:

 If I have a generic method foo, is there a way I can tell if a particular
 signature has been defined?

 Note that I don't want method_exists (which simply determines if something
 can be dispatched), I want to determine if a particular definition has been
 made, e.g. if

 foo(x) = x

 then I want

 method_defined(foo,(Int,)) == false
 method_defined(foo,(Any,)) == true

[julia-users] JuliaCon Hacking

2015-06-26 Thread Jacob Quinn

A few of us are hacking tonight at the Hyatt Regency hotel for JuliaCon. 
We're on the 2nd floor at the end of the hall in the Aquarium room if 
anyone wants to join us.

-Jacob

Re: [julia-users] JuliaCon Hacking

2015-06-26 Thread Jacob Quinn

They actually just kicked us out for the night, so we're calling it a
night. Looking forward to some more hacking tomorrow!

-Jacob

On Sat, Jun 27, 2015 at 1:11 AM, Scott Jones scott.paul.jo...@gmail.com
wrote:

 Are you all still up hacking?

Re: [julia-users] subtracting two uint8's results in a Uint64?

2015-06-17 Thread Jacob Quinn

This has been changed on 0.4.

https://github.com/JuliaLang/julia/issues/3759

-Jacob

On Wed, Jun 17, 2015 at 4:33 PM, Phil Tomson philtom...@gmail.com wrote:

 Maybe this is expected, but it was a bit of a surprise to me:

  julia function foo()
  red::Uint8 = 0x33
  blue::Uint8 = 0x36
  (red-blue)
   end
 julia foo()
 0xfffd
 julia typeof(foo())
 Uint64

 The fact that it overflowed wasn't surprising, but the fact that it got
 converted to a Uint64 is a bit surprising (it ended up being a very large
 number that got used in other calculations later which led to odd results)
 . So it looks like all of the math operators will always promote to the
 largest size (but keep the same signed or unsignedness).

 I'm wondering if it might make more sense if:
 Uint8 - Uint8 - Uint8
 Or more generally: UintN op UintN - UintN ?
 and:  IntN op IntN - IntN

Re: [julia-users] DataFrames' readtable very slow compared to R's read.csv when loading ~7.6M csv rows

2015-06-05 Thread Jacob Quinn

@David,

Sorry for the slow response. It's been a busy week :)

Here's a quick rundown of the approach:

- In the still-yet-to-be-officially-published
https://github.com/quinnj/CSV.jl package, the bulk of the code goes into
creating a `CSV.File` type where the structure/metadata of the file is
parsed/detected/saved in a type (e.g. header, delimiter, newline, # of
columns, detected column types, etc.)
- `SQLite.create` and now `CSV.read` both take a `CSV.File` as input and
follow a similar process in parsing:
  - The actual file contents are mmapped; i.e. the entire file is loaded
into memory at once
  - There are currently three `readfield` methods (Int,Float64,String) that
take an open `CSV.Stream` type (which holds the mmapped data and the
current position of parsing), and read a single field according to what
the type of that column is supposed to be
  - for example, readfield(io::CSV.Stream, ::Type{Float64}, row, col),
will start reading at the current position of the `CSV.Stream` until it
hits the next delimiter, newline, or end of the file and then interpret the
contents as a Float64, returning `val, isnull`

That's pretty much it. One of the most critical performance keys for both
SQLite and CSV.read is non-copying strings once the file has been mmapped.
For SQLite, the sqlite3_bind_text library method actually has a flag to
indicate whether the text should be copied or not, so we're able to pass
the pointer to the position in the mmapped array directly. For the CSV.read
method, which returns a Vector of the columns (as typed arrays), I've
actually rolled a quick and dirty CString type that looks like

immutable CString
  ptr::Ptr{UInt8}
  len::Int
end

With a few extra method definitions, this type looks very close to a real
string type, but we can construct it by pointing directly to the mmapped
region (which currently isn't possible for native Julia string types). See
https://github.com/quinnj/Strings.jl for more brainstorming around this
alternative string implementation. You can convert a CString to a Julia
string by calling string(x::CString) or map(string,column) for an Array of
CSV.CStrings.

As an update on the performance on the Facebook Kaggle competition bids.csv
file:

-readcsv: 45 seconds, 33% gc time
-CSV.read: 19 seconds, 3% gc time
-SQLite.create: 25 seconds, 3.25% gc time

Anyway, hopefully I'll get around to cleaning up CSV.jl to be released
officially, but it's that last 10-20% that's always the hardest to finish
up :)

-Jacob



On Mon, Jun 1, 2015 at 4:25 PM, David Gold david.gol...@gmail.com wrote:

 @Jacob I'm just developing a working understanding of these issues. Would
 you please help me to get a better handle on your solution?

 My understanding thus far: Reading a (local) .csv file into a DataFrame
 using DataFrames.readtable involves reading the file into an IOStream and
 then parsing that stream into a form amenable to parsing by
 DataFrames.builddf, which builds the DataFrame object returned by
 readtable. The work required to get the contents of the .csv file into
 memory in a form that can be manipulated by Julia functions is
 work-intensive in this manner. However, with SQLite, the entire file can
 just be thrown into memory wholesale, along with some metadata (maybe not
 the right term?) that delineates the tabular properties of the data.

 What I am curious about, then (if this understanding is not too
 misguided), is how SQLite returns, say, a column of data that doesn't
 include, say, a bunch of delimiters. That is, what sort of parsing *does*
 SQLite do, and when?

 On Monday, June 1, 2015 at 1:48:16 PM UTC-4, Jacob Quinn wrote:

 The biggest single advantage SQLite has is the ability to mmap a file and
 just tell SQLite which pointer addresses start strings and how long they
 are, all without copying. The huge, huge bottleneck in most
 implementations, is not just identifying where a string starts and how long
 it is, but then allocating in program memory and copying the string into
 it. With SQLite, we can use an in-memory database, mmap the file, and tell
 SQLite where each string for a column lives by giving it the starting
 pointer address and how long it is. I've been looking into how to solve
 this problem over the last month or so (apart from Oscar's gc wizardry) and
 it just occurred to me last week that using SQLite may be the best way; so
 far, the results are promising!

 -Jacob

 On Mon, Jun 1, 2015 at 11:40 AM, verylu...@gmail.com wrote:

 Great, thank you Jacob, I will try it out!

 Do you have a writeup on differences in the way you read CSV files and
 the way it is currently done in Julia? Would love to know more!

 Obvious perhaps but for completeness: Reading the data using readcsv or
 readdlm does not improve much the metrics I reported, suggesting that the
 overhead from DataFrames is not much.

 Thank you again!

 On Monday, June 1, 2015 at 1:06:50 PM UTC-4, Jacob Quinn wrote:

 I've been meaning to clean some things up and properly

Re: [julia-users] IJulia: Swap shift-enter for enter?

2015-06-03 Thread Jacob Quinn

If you go into the IJulia package in your Sublime packages directory
(there's a menu item to Browse Packages, you'll fine keymapping files for
each platform. Just find the one that says shift+enter and change it to
enter and save. Done.

-Jacob


On Wed, Jun 3, 2015 at 12:56 AM, RecentConvert giz...@gmail.com wrote:

 Is it possible to swap shift-enter for enter in IJulia?

 At the moment I use Sublime for my Julia coding but despite using it for
 months now I find shift-enter to execute to be a royal pain. No other
 program I use does this. I don't mind it in my scripts but in the command
 line I find it annoying.

 Given that it comes from the usage case of iPython notebooks it makes
 sense for them to have shift-enter to execute. I generally like the
 functionality of Sublime and haven't had time to find a suitable
 replacement, if there is one, which doesn't require shift-enter.

Re: [julia-users] DataFrames' readtable very slow compared to R's read.csv when loading ~7.6M csv rows

2015-06-01 Thread Jacob Quinn

I've been meaning to clean some things up and properly release the
functionality, but I have a new way to read in CSV files that beats
anything else out there that I know of. To get the functionality, you'll
need to be running 0.4 master, then do

Pkg.add(SQLite)
Pkg.checkout(SQLite,jq/updates)
Pkg.clone(https://github.com/quinnj/CSV.jl;)
Pkg.clone(https://github.com/quinnj/Mmap.jl;)

I then ran the following on the bids.csv file

using SQLite, CSV

db = SQLite.SQLiteDB()

ff = CSV.File(/Users/jacobquinn/Downloads/bids.csv)

@time lines = SQLite.create(db, ff,temp2)

It took 18 seconds on my newish MBP. From the R data.table package, the
`fread` is the other fastest CSV I know of and it took 34 seconds on my
machine. I'm actually pretty surprised by that, since in other tests I've
done it was on par with the SQLite+CSV or sometimes slightly faster.

Now, you're not necessarily getting a Julia structure in this case, but
it's loading the data into an SQLite table, that you can then run
SQLite.query(db, sql_string) to do manipulations and such.

-Jacob


On Sun, May 31, 2015 at 9:42 PM, verylucky...@gmail.com wrote:

 Thank you Tim and Jiahao for your responses. Sorry, I did not mention in
 my OP that I was using Version 0.3.10-pre+1 (2015-05-30 11:26 UTC) Commit
 80dd75c* (1 day old release-0.3).

 I tried other releases as Tim suggested:

 On Version 0.4.0-dev+5121 (2015-05-31 12:13 UTC) Commit bfa8648* (0 days
 old master),
 the same command takes 14 minutes - half that it was taking with
 release-0.3 but still 3 times more than that taken by R's read.csv (5 min).
 More important, Julia process takes up 8GB memory (Rsession takes 1.6GB)
 output of the command `@time DataFrames.readtable(bids.csv);` is
 857.120 seconds  (352 M allocations: 16601 MB, 71.59% gc time) #
 reduced from 85% to 71%

 For completeness, On Version 0.4.0-dev+4451 (2015-04-22 21:55 UTC)
 ob/gctune/238ed08* (fork: 1 commits, 39 days), the command `@time
 DataFrames.readtable(bids.csv);` takes 21 minutes; the output of the
 macro is:
 elapsed time: 1303.167204109 seconds (18703 MB allocated, 76.58% gc time
 in 33 pauses with 31 full sweep)
 The process also takes up 8GB memory on the machine, more than the earlier
 one. My machine has also significantly slowed down - so perhaps the
 increase in memory when compared to release-0.3 is significant.

 On disabling gc, my machine (4GB laptop) goes soul searching; so its not
 an option for now.

 Is this the best one can expect for now? I read the discussion on issue
 #10428 but I did not understand it well :-(

 Thank you!



 On Sunday, May 31, 2015 at 9:25:14 PM UTC-4, Jiahao Chen wrote:

 Not ideal, but for now you can try turning off the garbage collection
 while reading in the DataFrame.

 gc_disable()
 df = DataFrames.readtable(bids.csv)
 gc_enable()


 Thanks,

 Jiahao Chen
 Research Scientist
 MIT CSAIL

 On Mon, Jun 1, 2015 at 1:36 AM, Tim Holy tim@gmail.com wrote:

 If you're using julia 0.3, you might want to try current master and/or
 possibly the ob/gctune branch.

 https://github.com/JuliaLang/julia/issues/10428

 Best,
 --Tim

 On Sunday, May 31, 2015 09:50:03 AM verylu...@gmail.com wrote:
  Facebook's Kaggle competition has a dataset with ~7.6e6 rows with 9
 columns
  (mostly
  strings).
 https://www.kaggle.com/c/facebook-recruiting-iv-human-or-bot/data
 
  Loading the dataset in R using read.csv takes 5 minutes and the
 resulting
  dataframe takes 0.6GB (RStudio takes a total of 1.6GB memory on my
 machine)
 
  t0 = proc.time(); a = read.csv(bids.csv); proc.time()-t0
 
  user   system elapsed
  332.295   4.154 343.332
 
   object.size(a)
 
  601496056 bytes #(0.6 GB)
 
  Loading the same dataset using DataFrames' readtable takes about 30
 minutes
  on the same machine (varies a bit, lowest is 25 minutes) and the
 resulting
  (Julia process, REPL on Terminal, takes 6GB memory on the same machine)
 
  (I added couple of calls to @time macro inside the readtable function
 to
  see whats taking time - outcomes of these calls too are below)
 
  julia @time DataFrames.readtable(bids.csv);
  WARNING: Begin readnrows call
  elapsed time: 29.517358476 seconds (2315258744 bytes allocated, 0.35%
 gc
  time)
  WARNING: End readnrows call
  WARNING: Begin builddf call
  elapsed time: 1809.506275842 seconds (18509704816 bytes allocated,
 85.54%
  gc time)
  WARNING: End builddf call
  elapsed time: 1840.471467982 seconds (21808681500 bytes allocated,
 84.12%
  gc time) #total time for loading
 
 
  Can you please suggest how I can improve load time and memory usage in
  DataFrames for sizes this big and bigger?
 
  Thank you!

Re: [julia-users] DataFrames' readtable very slow compared to R's read.csv when loading ~7.6M csv rows

2015-06-01 Thread Jacob Quinn

The biggest single advantage SQLite has is the ability to mmap a file and
just tell SQLite which pointer addresses start strings and how long they
are, all without copying. The huge, huge bottleneck in most
implementations, is not just identifying where a string starts and how long
it is, but then allocating in program memory and copying the string into
it. With SQLite, we can use an in-memory database, mmap the file, and tell
SQLite where each string for a column lives by giving it the starting
pointer address and how long it is. I've been looking into how to solve
this problem over the last month or so (apart from Oscar's gc wizardry) and
it just occurred to me last week that using SQLite may be the best way; so
far, the results are promising!

-Jacob

On Mon, Jun 1, 2015 at 11:40 AM, verylucky...@gmail.com wrote:

 Great, thank you Jacob, I will try it out!

 Do you have a writeup on differences in the way you read CSV files and the
 way it is currently done in Julia? Would love to know more!

 Obvious perhaps but for completeness: Reading the data using readcsv or
 readdlm does not improve much the metrics I reported, suggesting that the
 overhead from DataFrames is not much.

 Thank you again!

 On Monday, June 1, 2015 at 1:06:50 PM UTC-4, Jacob Quinn wrote:

 I've been meaning to clean some things up and properly release the
 functionality, but I have a new way to read in CSV files that beats
 anything else out there that I know of. To get the functionality, you'll
 need to be running 0.4 master, then do

 Pkg.add(SQLite)
 Pkg.checkout(SQLite,jq/updates)
 Pkg.clone(https://github.com/quinnj/CSV.jl;)
 Pkg.clone(https://github.com/quinnj/Mmap.jl;)

 I then ran the following on the bids.csv file

 using SQLite, CSV

 db = SQLite.SQLiteDB()

 ff = CSV.File(/Users/jacobquinn/Downloads/bids.csv)

 @time lines = SQLite.create(db, ff,temp2)

 It took 18 seconds on my newish MBP. From the R data.table package, the
 `fread` is the other fastest CSV I know of and it took 34 seconds on my
 machine. I'm actually pretty surprised by that, since in other tests I've
 done it was on par with the SQLite+CSV or sometimes slightly faster.

 Now, you're not necessarily getting a Julia structure in this case, but
 it's loading the data into an SQLite table, that you can then run
 SQLite.query(db, sql_string) to do manipulations and such.

 -Jacob


 On Sun, May 31, 2015 at 9:42 PM, verylu...@gmail.com wrote:

 Thank you Tim and Jiahao for your responses. Sorry, I did not mention in
 my OP that I was using Version 0.3.10-pre+1 (2015-05-30 11:26 UTC) Commit
 80dd75c* (1 day old release-0.3).

 I tried other releases as Tim suggested:

 On Version 0.4.0-dev+5121 (2015-05-31 12:13 UTC) Commit bfa8648* (0 days
 old master),
 the same command takes 14 minutes - half that it was taking with
 release-0.3 but still 3 times more than that taken by R's read.csv (5 min).
 More important, Julia process takes up 8GB memory (Rsession takes 1.6GB)
 output of the command `@time DataFrames.readtable(bids.csv);` is
 857.120 seconds  (352 M allocations: 16601 MB, 71.59% gc time) #
 reduced from 85% to 71%

 For completeness, On Version 0.4.0-dev+4451 (2015-04-22 21:55 UTC)
 ob/gctune/238ed08* (fork: 1 commits, 39 days), the command `@time
 DataFrames.readtable(bids.csv);` takes 21 minutes; the output of the
 macro is:
 elapsed time: 1303.167204109 seconds (18703 MB allocated, 76.58% gc time
 in 33 pauses with 31 full sweep)
 The process also takes up 8GB memory on the machine, more than the
 earlier one. My machine has also significantly slowed down - so perhaps the
 increase in memory when compared to release-0.3 is significant.

 On disabling gc, my machine (4GB laptop) goes soul searching; so its not
 an option for now.

 Is this the best one can expect for now? I read the discussion on issue
 #10428 but I did not understand it well :-(

 Thank you!



 On Sunday, May 31, 2015 at 9:25:14 PM UTC-4, Jiahao Chen wrote:

 Not ideal, but for now you can try turning off the garbage collection
 while reading in the DataFrame.

 gc_disable()
 df = DataFrames.readtable(bids.csv)
 gc_enable()


 Thanks,

 Jiahao Chen
 Research Scientist
 MIT CSAIL

 On Mon, Jun 1, 2015 at 1:36 AM, Tim Holy tim@gmail.com wrote:

 If you're using julia 0.3, you might want to try current master and/or
 possibly the ob/gctune branch.

 https://github.com/JuliaLang/julia/issues/10428

 Best,
 --Tim

 On Sunday, May 31, 2015 09:50:03 AM verylu...@gmail.com wrote:
  Facebook's Kaggle competition has a dataset with ~7.6e6 rows with 9
 columns
  (mostly
  strings).
 https://www.kaggle.com/c/facebook-recruiting-iv-human-or-bot/data
 
  Loading the dataset in R using read.csv takes 5 minutes and the
 resulting
  dataframe takes 0.6GB (RStudio takes a total of 1.6GB memory on my
 machine)
 
  t0 = proc.time(); a = read.csv(bids.csv); proc.time()-t0
 
  user   system elapsed
  332.295   4.154 343.332
 
   object.size(a)
 
  601496056 bytes #(0.6 GB)
 
  Loading

Re: [julia-users] Best way in Julia to build a set of unique values?

2015-05-18 Thread Jacob Quinn

You could also take a look at JudyDicts.jl, which wrap the corresponding C
library. Supposedly, it's one of the most highly optimized Dict
implementations anywhere. I think the Julia package may need an update,
however.

https://github.com/tanmaykm/JudyDicts.jl

-Jacob

On Mon, May 18, 2015 at 4:07 PM, Steven G. Johnson stevenj@gmail.com
wrote:

 Scott, this looks pretty much exactly like what Tim's example does: you
 have a dictionary (aka associative array, aka mapping, depending on your
 terminology) mapping keys to a counter.

 Dicts are reasonably fast in Julia, although they could certainly be
 further optimized (like almost anything).

Re: [julia-users] SharedArray definition and assignation of values. Is this behavior a bug?

2015-05-18 Thread Jacob Quinn

I'm actually just about to do another round of windows testing on #11280,
so I'll test this out as well. Thanks for the report!

-Jacob

On Mon, May 18, 2015 at 6:27 PM, Sebastian Souyris 
sebastian.souy...@gmail.com wrote:

 It seems that there is a bug when you define several SharedArray in one
 call (for example using include(file.jl)). Or maybe I'm missing
 something about how to use SharedArray. I'm using Windows 7. Let me explain
 with an example:

 This code has no problem. It assign correctly the values of SharedArrays a
 and b:

 ##
 julia a = SharedArray(Float64, (2));
 julia b = SharedArray(Float64, (2));
 julia for i in 1:2
 a[i] = i
 end
 julia for i in 1:2
 b[i] = i+2
 end
 julia a
 2-element SharedArray{Float64,1}:
  1.0
  2.0
 julia b
 2-element SharedArray{Float64,1}:
  3.0
  4.0
 ##

 But the following code has a problem.  It assign incorrectly the same
 value to a and b:

 ##
 julia a = SharedArray(Float64, (2));b = SharedArray(Float64, (2));

 julia for i in 1:2
 a[i] = i
 end

 julia for i in 1:2
 b[i] = i+2
 end

 julia a
 2-element SharedArray{Float64,1}:
  3.0
  4.0

 julia b
 2-element SharedArray{Float64,1}:
  3.0
  4.0
 ##

 If you define multiple SharedArray in one call, the values of all the
 SharedArrays of that call are equal to the values of the last SharedArray that
 was defined and has assigned values.

 Is this behavior expected? Or is it a bug?


 Thanks!

Re: [julia-users] SharedArray definition and assignation of values. Is this behavior a bug?

2015-05-18 Thread Jacob Quinn

I'm not able to reproduce the above behavior with my latest changes to
#11280, so that's a good sign!

If you're feeling ambitious/able, feel free to give that PR a spin to see
if it fixes it for you as well.

-Jacob

On Mon, May 18, 2015 at 8:55 PM, Jacob Quinn quinn.jac...@gmail.com wrote:

 I'm actually just about to do another round of windows testing on #11280,
 so I'll test this out as well. Thanks for the report!

 -Jacob

 On Mon, May 18, 2015 at 6:27 PM, Sebastian Souyris 
 sebastian.souy...@gmail.com wrote:

 It seems that there is a bug when you define several SharedArray in one
 call (for example using include(file.jl)). Or maybe I'm missing
 something about how to use SharedArray. I'm using Windows 7. Let me explain
 with an example:

 This code has no problem. It assign correctly the values of SharedArrays
 a and b:

 ##
 julia a = SharedArray(Float64, (2));
 julia b = SharedArray(Float64, (2));
 julia for i in 1:2
 a[i] = i
 end
 julia for i in 1:2
 b[i] = i+2
 end
 julia a
 2-element SharedArray{Float64,1}:
  1.0
  2.0
 julia b
 2-element SharedArray{Float64,1}:
  3.0
  4.0
 ##

 But the following code has a problem.  It assign incorrectly the same
 value to a and b:

 ##
 julia a = SharedArray(Float64, (2));b = SharedArray(Float64, (2));

 julia for i in 1:2
 a[i] = i
 end

 julia for i in 1:2
 b[i] = i+2
 end

 julia a
 2-element SharedArray{Float64,1}:
  3.0
  4.0

 julia b
 2-element SharedArray{Float64,1}:
  3.0
  4.0
 ##

 If you define multiple SharedArray in one call, the values of all the
 SharedArrays of that call are equal to the values of the last
 SharedArray that was defined and has assigned values.

 Is this behavior expected? Or is it a bug?


 Thanks!

Re: [julia-users] how do we convert an array{Float64,1} in a Float64?

2015-05-15 Thread Jacob Quinn

You'll have to clarify what you're looking for; it's not really clear from
your brief description. Perhaps sharing the code you're working with would
be easier?

-Jacob

On Fri, May 15, 2015 at 3:18 PM, Lytu lyans...@gmail.com wrote:

 Someone know how to convert an array{Float64,1} in a Float64?
 Thank you

Re: [julia-users] `include()` vs `require()`

2015-05-08 Thread Jacob Quinn

Also see the issue logged here:
https://github.com/JuliaLang/julia/issues/8000

On Fri, May 8, 2015 at 2:19 PM, Stefan Karpinski ste...@karpinski.org
wrote:

 help? reload
 search: reload prevfloat parsefloat

 Base.reload(file::AbstractString)

Like require, except forces loading of files regardless of
whether they have been loaded before. Typically used when
interactively developing libraries.

 On Fri, May 8, 2015 at 4:13 PM, Bob Nnamtrop bob.nnamt...@gmail.com
 wrote:

 OK. But then what is the difference between include and reload?

 On Fri, May 8, 2015 at 1:54 PM, Stefan Karpinski ste...@karpinski.org
 wrote:

 Why wouldn't it work in the REPL? It means paste the contents of this
 file here.

 On Fri, May 8, 2015 at 3:46 PM, Bob Nnamtrop bob.nnamt...@gmail.com
 wrote:

   include is just about splitting a single file into multiple pieces

 Ok. But then why does include work at the REPL prompt? What is the
 difference between include and reload? I really think there are too many
 ways to do the same thing.

 Bob

 On Fri, May 8, 2015 at 11:58 AM, Stefan Karpinski ste...@karpinski.org
  wrote:

 Conceptually, they are quite different:

- using/import/require are for acquiring shared resources – i.e.
modules that different bits of code might all independently want to 
 use.
- include is just about splitting a single source file into
multiple pieces.

 Not sure if that helps.

 On Fri, May 8, 2015 at 1:28 PM, Jack Minardi j...@minardi.org wrote:

 Thanks guys. I guess I thought only require didn't load twice as it
 is specifically mentioned in the doc string.

Re: [julia-users] storing @time

2015-05-07 Thread Jacob Quinn

http://docs.julialang.org/en/latest/stdlib/base/#Base.tic

On Thu, May 7, 2015 at 1:00 PM, Edward Chen echen...@gmail.com wrote:

 To whom it may concern:

 I am interested in testing the performance of my code.

 I am familiar with using @elapsed in the following way:

 metric = Float64[]
 push!(metric,@elapsed output = foo(input))

 Is there an analogous way for doing this with @time, or @allocated?

 Thanks,
 Ed

Re: [julia-users] Panel Data using DataFrame (or other method?)

2015-05-05 Thread Jacob Quinn

Remember, the world is your oyster! Take a stab at creating a package and
sharing it! I'm sure you'd get some interest/feedback.

-Jacob

On Tue, May 5, 2015 at 10:53 AM, Nils Gudat nils.gu...@gmail.com wrote:

 Hm, that's a shame - I was hoping for something better than pandas' panel
 implementation in Julia, as I'm not a big fan of it (nor of R's plm
 package, but I guess that's what I'll have to revert to now).

Re: [julia-users] Get GMT time

2015-05-04 Thread Jacob Quinn

Yeah, the second one is a little obscure, because of a couple of issues.

-`UTC` isn't exported from the Dates module, so you'll have to use
`Dates.UTC`
-`UTC` is a *type* instead of a instance of a type, (that's what the
::Type{UTC} means)

So the correct way to call this is

now(Dates.UTC)

On Mon, May 4, 2015 at 12:02 PM, Irving Rabin ipr1...@gmail.com wrote:

 Folks, I am a Julia newcomer. I need to get current time. And now() works
 just fine. But it returns a local time. And I need GMT time.

 I got to documentation. It gave me very nice description:

 now() → DateTime

 Returns a DateTime corresponding to the user’s system time including the
 system timezone locale.
 now(*::Type{UTC}*) → DateTime

 Returns a DateTime corresponding to the user’s system time as UTC/GMT.
 I spent an hour and still couldn't figure out how to call the second
 method.

Re: [julia-users] Julia Web Development Morsel Package

2015-05-04 Thread Jacob Quinn

Might be GnuTLS.jl, on which Requests has a dependency on and loads a C
library.

On Mon, May 4, 2015 at 12:39 PM, George Thomas gmt.gtho...@gmail.com
wrote:

 Hi -
 I get a server segmentation fault immediately as I connect to the server
 via URL `localhost:8000/hello/name/` from a browser (firefox or chrome). I
 initiate the server listening mode via `julia example/Hello.jl' that is
 included in the Morsel package (installed via Pkg.add(Morsel).

 This is the exact error that I see on the console running the server:

 signal (11): Segmentation fault
 unknown function (ip: -165069216)
 write at /usr/bin/../lib64/julia/sys.so (unknown line)
 unknown function (ip: 393195536)
 Segmentation fault
 Fontconfig error: Cannot load default config file


 I am on linux, Fedora 20. I ran 'yum update' to get the most recent
 updates of system libraries etc. I have
 LD_LIBRARY_PATH=/usr/local/lib:/usr/lib:/usr/local/lib64:/usr/lib64

 I have Julia version 0.3.7. This runs perfectly fine for any other package
 I experimented with so far.

 Thanks for useful hints that you might have to help fix this server
 segmentation error.

 Regards.

Re: [julia-users] readtable produce wrong column name

2015-04-30 Thread Jacob Quinn

DataFrame column names must be valid Julia identifiers, so readtable does
the conversion when reading data in.

-Jacob

On Thu, Apr 30, 2015 at 12:43 PM, Li Zhang fff...@gmail.com wrote:

 hi all,

 I first use writetable to write a dataframe to a csv file. some of the
 column names are (somename), or name  other.

 the output csv file showed exact name headers, but when i use readtable to
 read the csv file back to a dataframe, column names become _somename_ and
 name_other.

 I am not sure if i missed some optional parameters that would do the right
 work or this is bug?

 any thought?

Re: [julia-users] Re: Naming convention

2015-04-27 Thread Jacob Quinn

Official Julia issue to deprecate *, ^:
https://github.com/JuliaLang/julia/issues/11030

On Mon, Apr 27, 2015 at 10:19 AM, François Fayard francois.fay...@gmail.com
 wrote:

 So let's forget about ~ as it is already used by DataFrames which is a
 very important package.

 I just found that the project Coq is using ++ for string concatenation. It
 has the advantage of not overloading + and still be similar to Python's +.
 What do you think ?

Re: [julia-users] Julia stuck at large floating point number array from source code

2015-04-13 Thread Jacob Quinn

Can you share the script? It's hard to troubleshoot this kind of problem
without seeing exactly what you're running.

On Mon, Apr 13, 2015 at 9:37 PM, Siyi Deng mr.siyi.d...@gmail.com wrote:

 No, I'm copy and pasting the array, from a text editor to the REPL, and it
 hangs there.

 On Monday, April 13, 2015 at 7:53:29 PM UTC-7, Stefan Karpinski wrote:

 It's kind of unclear what you're trying to do. Are you printing an array
 and the repl hangs?

 On Mon, Apr 13, 2015 at 10:49 PM, Siyi Deng mr.siy...@gmail.com wrote:

 Hi, I have a coefficient array which looks like b =
 [4.67933552111843e-07,-6.32591924726271e-05,-0.000160070579209537,
 ], with about 320 elements. The entire array in ascii is about 7000
 chars.

 I cannot paste the array directly in REPL, julia simply stuck. I cannot
 put it in a script and include it, also stuck the session.

 Is this a known issue? What shall I do to get my coefficients?

[julia-users] Efficient Data Transfer via HTTPS

2015-03-29 Thread Jacob Quinn

So I'm building a program that does the following:

* Have data stored either in a file (CSV or gzipped CSV) or a Julia 
structure (Array{T,2}, or other structures that support getindex(A,i,j))
* Need to do a POST request over HTTPS with Content-Type: text/csv, and 
ideally always as Content-Encoding: gzip

Challenges:
* Sometimes the data might be too big to fit in memory (from files), so I 
need some kind of transfer in chunks
* For regular CSV files or Julia structures, the data is obviously not 
gzipped already


I'm thinking of setting it up as follows:

* create an IOBuffer of min(data/file size, MAX_BUF_SIZE); (MAX_BUF_SIZE 
can be configured, probably default to 1GB or so)
* if it's a Julia structure, probably use writedlm to get into CSV format 
and then gzip the IOBUffer somehow?
* if it's a gzipped file, just readall(file) into IOBuffer
* if it's a delimited file, maybe gzip the whole thing then read in? or 
read it in in chunks and gzip the chunks?
* takebuf_string(IOBuffer) to put as the body to my HTTPS POST request


I'm mainly wondering about the soundness of my approach; in particular with 
regards to when/how to do the gzipping and overall, how to avoid copying 
data much as possible.

I think it would be nice to able to do `g = GZipIOBuffer()` and then 
`write(g, data)` and `takebuf_string(g)` to get the raw gzipped data to 
send, but it doesn't look like that's currently setup (or possible or 
insane) with GZip.jl.

Re: [julia-users] Can Julia Import SAS Datasets or SAS Transport Files ?

2015-03-27 Thread Jacob Quinn

I used [this](http://support.sas.com/downloads/package.htm?pid=667) last
fall when I needed to convert some SAS files. It's Windows only, but got
the job done.

On Fri, Mar 27, 2015 at 2:53 PM, jorttx jack@gmail.com wrote:

 Is anyone working on the ability for Julia to import SAS datasets
 (*.sas7bdat files) as has been done for R ?  Julia looks great, but most of
 the data I need to work with originates in SAS.

Re: [julia-users] list of keyword arguments

2015-03-24 Thread Jacob Quinn

There's an open issue: https://github.com/JuliaLang/julia/issues/2758

Feel free to voice your support for a fix! Squeaky issues get the grease. :)

-Jacob

On Tue, Mar 24, 2015 at 4:49 PM, Pooya pooya.rez...@gmail.com wrote:

 Is there a way to get a list of keyword arguments using the function name?
 It does not show up in methods(myfunc), and I am trying to see if it is
 defined correctly, and if yes, why do I get the undefined keyword argument
 message when using @debug. Thanks!

1 2 3 >

1 - 100 of 259 matches

Mail list logo