Re: Generic function resolution

2020-07-06 Thread Vindaar
The issue is that you only define l in g when it is called with type B. In the 
let sk line however you call f with type C, which calls g with the same type. 
Thus, the when branch isn't part of the g proc for that case. So l is never 
defined.

Is that clear?


Re: Choosing Nim

2020-06-27 Thread Vindaar
> The experiment I worked on was BaBar at SLAC.

Oh, that's nice and also a kind of a funny coincidence. That means you were 
involved with an experiment studying weak CP violation whereas we're now trying 
to study strong CP violation, heh.

> Although the simulation and reconstruction and analysis code was written in 
> C++ book-keeping was better done in a scripting language. It was easier to do 
> this in Python. I _think, but you would know better, Python has moved to the 
> analysis area too. So, more researchers are using Python there.

Yes, a lot of people use Python all over the place in physics now. Most of my 
ATLAS colleagues write their ROOT scripts using pyroot / rootpy instead of 
ROOT's C++ abomination. Which is always kind of funny when you see them talking 
about "running python" but encountering segmentation faults... :)

> For analysis, I think Nim could have an advantage as it's faster. I think the 
> time to develop the code is about the same, but Nim would reduce the 
> execution time. It would also help to reduce the debugging time as a chunk of 
> time is spent keeping track of which variables are which type.

Oh yes, for sure! Unfortunately I feel most physicists don't realize that 
dynamic typing is a burden.

> I think I recall that someone wrote an interface to ROOT for Nim, so you can 
> read and manipulate your data in Nim.

Yes? I haven't seen that. Sounds to me like creating that wrapper would be kind 
of a pain, given how ROOT is 1. C++ and 2. essentially provides its own 
standard library.

> For your analysis are you using Nim? I think that would be a great thing. I 
> know nothing about Axions, or searches for Axions. I should look it up to 
> find out more.

Yep, I'm writing my whole analysis in Nim. Since the axion community is still 
pretty tiny (although it has been growing and should grow even more now after 
the last European Strategy for Particle Physics update, which really endorses 
axion searches) I'm not really forced to use some existing analysis framework. 
There was some code written by my predecessor, but that was all ROOT and 
MarlinTPC. Threw it all away and started from scratch. Code is here:

[https://github.com/Vindaar/TimepixAnalysis](https://github.com/Vindaar/TimepixAnalysis)

It's one big mono repository for my whole thesis essentially though. The most 
interesting code is the Analysis directory.

In general axion searches all deal with the same big problem: given that axions 
haven't been detected yet, it means their detection via some interaction is 
hard (-> coupling constants are tiny). What does that imply? Of course that no 
matter what kind of experiment one builds, all sorts of background will 
massively dominate everything one measures. So they are all very low rate 
experiments, which need the best possible background suppressions (both 
hardware and software wise) as possible. In that sense it's a little similar to 
neutrino experiments, except even worse. Also neutrino experiments of course 
have the benefit nowawadays of simply having a lot more manpower and money to 
build in better locations (e.g. waaayy below ground to shield from cosmics) 
than we do.

My experiment - CAST - is simply sitting in a random hall at surface level at 
CERN. The only shielding from muons I have is ~20 cm of lead. So there's still 
like ~1 muon every couple of seconds in my detector. What we want to measure 
are X-rays which are the result of axions entering our magnet (LHC prototype 
dipole magnet, 9 m long, 9T magnetic field) and interacting with the virtual 
photons of the magnetic field. These would just be the result of X-rays 
interacting in the Sun and randomly converting to axions and then leave the Sun 
unhindered.

Have a great weekend!


Re: Choosing Nim

2020-06-25 Thread Vindaar
I always like to hear about why people pick Nim!

> Many years ago I was tasked with looking after a database for a particle 
> physics experiment.

That's awesome! May I ask which experiment that was? Just curious, because I'm 
currently doing my PhD in Physics. The majority of my group actually works on 
ATLAS (both data analysis and hardware development for the HL-LHC), but I 
search for axions with CAST. :)


Re: gr.nim - floats in FFI

2020-05-06 Thread Vindaar
To be honest the comment by the guy who talks about the JSON representation is 
just... well.

The numbers there are just what stringification of floats looks like. Consider 
the value in the middle e-16, which is just 0. Or rather supposed to 
be. If the JSON conversion were done by Nim, I'd say this is related to these 
issues:

[https://github.com/nim-lang/nim/issues?q=is%3Aissue+float+round+trip](https://github.com/nim-lang/nim/issues?q=is%3Aissue+float+round+trip)

But from what I understand it's done by GR internally. Either the code of the C 
example in your issue happens to result in "nicer" numbers (which as strings 
are 2 - some epsilon) or there's some stuff involved that automatically makes 
the floats "nice" strings, which for some reason isn't triggered in your code.

In any case though, the determination of the axes sizes should allow for some 
epsilon above some "nice" tick value and just cut off from there, precisely for 
this reason. And this should (if you have to use a JSON representation 
internally..) not rely on the stringification of floats to produce "nice" 
numbers.

ggplotnim still doesn't handle this either though. I've thought about it quite 
a few times, but so far have been too lazy to handle this correctly without 
throwing out information in cases where some apparent epsilon is a "real value" 
etc.

Also it's kinda funny that the data range calculation is confused by the offset 
in the numbers, but handles the 0 tick label correctly. I obviously don't know 
how they calculate their tick values exactly, but ggplotnim uses linspace 
internally and then you run into this exact issue again when determining the 
tick values and have to make sure you don't print the 0 tick label as 1e-16 
something.

Sorry, this was probably not that helpful. As far as I see your code looks fine 
(you should use the implicit result variable, set the size of result from the 
start and assign via result[i] = val! :) ).


Re: Typography update - now it can render 99% the Google Fonts ttf.

2020-05-01 Thread Vindaar
I haven't had a use for this so far, but this is amazing!

You have a typo in your URL there. Missing the y at the end. :)


Re: ggplotnim - pretty native plots for us

2020-04-28 Thread Vindaar
I wasn't aware of the GR framework. I certainly looks interesting. However, it 
does _not_ look more light weight than cairo. Just having Qt as a dependency is 
an immediate no-go to me. At least for a default backend (unless I'm missing 
something and you can easily get both binaries w/o Qt dependency and build it 
w/o it).

Also it obviously does a lot more than cairo. It's a full fledged visualization 
library.

For ggplotnim's purposes the only advantage it would have would be access to 
more backends, as far as I can see.

Adding a new backend to ginger is in principle as easy as providing these procs:

[https://github.com/Vindaar/ginger/blob/master/src/ginger/backendDummy.nim](https://github.com/Vindaar/ginger/blob/master/src/ginger/backendDummy.nim)

And see the actual cairo backend:

[https://github.com/Vindaar/ginger/blob/master/src/ginger/backendCairo.nim](https://github.com/Vindaar/ginger/blob/master/src/ginger/backendCairo.nim)

So feel free to add a new GR backend to ginger if you'd like!

To me the most important features I want from backends are:

  * png, pdf support: provided by cairo already
  * LaTeX handling of labels / text: will be done via a tikz backend (good to 
see that apparently GR is going that route for LaTeX too!)
  * an interactive viewer: not implemented, but can also be done via cairo. The 
more challenging aspect is writing the logic that allows for updates in the 
first place (and if possible incremental updates of the plot, but that's hard 
with the current implementation I think)
  * a Vega backend: well, has to be done by writing a Vega backend



I can totally see how GR can be a great library to build a powerful 
visualization library, if being used from the onset. It seems to take care of a 
lot of annoying details I had to get right.


Re: ggplotnim - pretty native plots for us

2020-04-27 Thread Vindaar
Sorry about that. When I started writing this I had no idea cairo would be such 
a pain on Windows.

There's an issue about it here: 
[https://github.com/Vindaar/ggplotnim/issues/57](https://github.com/Vindaar/ggplotnim/issues/57)

I haven't updated the README yet, mostly because I don't have a good solution 
either yet. The easiest for me on a practical level was to just install emacs 
and add it to my PATH (which is I guess equivalent to you using the Inkscape 
libraries).

I guess I can think about either adding working versions of the required 
libraries to the repository for windows (at least win64) or a script which 
clones the cairo repository and builds it locally. I haven't built cairo 
locally yet, so I don't know if it works well.

Now regarding your actual question. If you want to ship a program, which uses 
ggplotnim internally, you have to do what people do on Windows as far as I 
know: bundle all required DLLs with the program.

The other alternative would be a static build of cairo. I'll see what I can do 
to improve the situation. Thanks for the input!


Re: ggplotnim - pretty native plots for us

2020-04-26 Thread Vindaar
I'm happy to say that facet_wrap is finally back with version v0.3.5.

Normal classification by a (in this case 2) discrete variable(s):

Classification by discrete variable with free scales:

See the code for these two here: 
[https://github.com/Vindaar/ggplotnim/blob/master/recipes.org#facet-wrap-for-simple-grid-of-subplots](https://github.com/Vindaar/ggplotnim/blob/master/recipes.org#facet-wrap-for-simple-grid-of-subplots)

Other notable changes of the last few versions include:

  * all recipe plots are now also checked in the CI based on the JSON 
representation of the final Viewport, which drawn by ginger
  * bar plots can now show negative bars
  * gather on the arraymancer backend does not require all columns to be of the 
same type anymore
  * ridgeline plots were added. There's no recipe yet, because one thing still 
has to be fixed: the size of the top most ridge is not scaled if the content 
(using an overlap > 1 is used) exceeds the size of the ridge. With the changes 
done for the facet_wrap fix however, this is finally possible to implement.



See the full changelog for all recent changes:

[https://github.com/Vindaar/ggplotnim/blob/master/changelog.org](https://github.com/Vindaar/ggplotnim/blob/master/changelog.org)


Re: Iterate over fields

2020-04-17 Thread Vindaar
First of all see of course the docs here:

[https://nim-lang.github.io/Nim/macros.html#quote%2Ctyped%2Cstring](https://nim-lang.github.io/Nim/macros.html#quote%2Ctyped%2Cstring)

and the macro tutorial, specifically:

[https://nim-lang.github.io/Nim/tut3.html#introduction-generating-code](https://nim-lang.github.io/Nim/tut3.html#introduction-generating-code)

So the basic idea is that quote do allows you to write exactly the code you 
want to generate. However, in most cases that's not really very helpful, 
because if you can explicitly write your code, you could also just write a 
template / proc. That's where the back ticks come in to perform actual quoting 
of NimNodes defined in the current scope. They will be inserted in those places.

quote do is thus just a nice way to avoid having to build the AST manually (as 
I for instance do in the newVar proc), but keep the ability to insert NimNodes 
you calculate / determine somehow based on what the macro is supposed to 
accomplish.

Another thing to keep in mind when using quote do is about the stuff that's not 
quoted with back ticks. As a rule of thumb (someone please correct me):

  * any procedure / template you use within quote do will be bound in the scope 
where the macro code is injected
  * any variables you introduce will be "gensym'd", that is for each symbol you 
introduce a unique symbol will be created. So if you write var x = 5 within 
quote do, the final code won't have the variable x, but something like x_12345.



The second means that if you want to refer to some variable that will be known 
in the scope in which the macro is used, you have to create the identifier 
manually and quote it. Due to the first point you fortunately don't have to do 
the same for procedures you want to use.


Re: Iterate over fields

2020-04-16 Thread Vindaar
> I'll look into your solution since I may need to adapt a few things (I've 
> simplified the real uses cases to summarize it into a single problems). The 
> goal is also to learn Nim's macro as well. I've now spent probably as much 
> time on macros than it would have took to write the solution it by hand, but 
> it's not as fun.

If you have questions about my code there or general macro questions, just ask. 
I'll try to help!


Re: Iterate over fields

2020-04-16 Thread Vindaar
If all of your procs are going to look like newFooBar above there, it's 
possible to generate with a macro.


import macros, tables

type
  Tensor[T] = object
discard
  
  Model = object
field1: string
field2: string
field3: int
  
  FooObj = object
field1: Tensor[float]
field2: Table[string, float]
field3: int

proc unpack[T; U](arg: T, to: var U) = discard

proc newVar(name, dtype: NimNode): NimNode =
  result = nnkVarSection.newTree(
nnkIdentDefs.newTree(
  ident(name.toStrLit.strVal), # replace by new ident
  dtype,
  newEmptyNode()
)
  )
  echo result.repr

macro genNewObjProc(obj, model: typed): untyped =
  let objFields = obj.getType[1].getTypeImpl[2] # get recList of type
  let modelFields = obj.getType[1].getTypeImpl[2] # get recList of type
  doAssert objFields.len == modelFields.len
  var body = newStmtList()
  let modelIdent = ident"model"
  # variable to hold object constructor `FooBar(field1: field1,...)`
  var objConstr = nnkObjConstr.newTree(obj)
  for i in 0 ..< objFields.len:
let modelName = ident(modelFields[i][0].toStrLit.strVal) # replace by 
new ident
doAssert eqIdent(objFields[i][0], modelName)
let objType = objFields[i][1]
body.add newVar(modelName, objType)
body.add quote do:
  unpack(`modelIdent`.`modelName`, `modelName`)
# add to object constructor
objConstr.add nnkExprColonExpr.newTree(modelName, modelName)
  
  # add resulting `FooObj` call
  let resIdent = ident"result"
  body.add quote do:
`resIdent` = `objConstr`
  
  let procParams = [obj, # return type
nnkIdentDefs.newTree(modelIdent,
 model,
 newEmptyNode())]
  result = newProc(name = ident("new" & obj.toStrLit.strVal),
   params = procParams,
   body = body)
  echo result.repr

genNewObjProc(FooObj, Model)


Run

I'm not sure how helpful it is to get such a macro if one isn't familiar with 
macros. But since it's fun to write I might as well give you a solution. :)


Re: Error: got proc, but expected proc {.closure.}

2020-04-15 Thread Vindaar
The proc you want to return shouldn't have a name. So this line:


result = proc differentiate(c: int): int =


Run

should be


result = proc (c: int): int =


Run


Re: Undeclared field: 'keys' (iterator call)

2020-04-12 Thread Vindaar
Can you share a more complete example on 
[https://play.nim-lang.org](https://play.nim-lang.org)? 


Re: Undeclared field: 'keys' (iterator call)

2020-04-12 Thread Vindaar
I'm on my phone right now, so I won't try to find the correct issues.

This is a problem of toSeq in combination with method UFCS. Call the keys 
iterator as a normal function call to toSeq and it should work. 


Re: High to Low on sequence not working?

2020-04-09 Thread Vindaar
The reason it doesn't work is that N .. M in the context of a for loop 
implicitly calls countup.

You need to explicitly call countdown:


for i in countdown(high(seqStr), low(seqStr)):
echo seqStr[i]


Run


Re: ggplotnim - pretty native plots for us

2020-04-07 Thread Vindaar
Ok, so I just merged the arraymancer backend PR, which includes the PR for 
version v0.2.0.

v0.2.0 was mainly ridgeline plots and scale_*_reverse. Note that due to there 
is currently no recipe for a ridgeline plot. That will be added in the next few 
days. Also they are not as nice as they should be (essentially the top ridge 
doesn't change its height depending on the max values in the ridge if 
overflowing of ridges into one another is allowed).

scale_*_reverse just allows to reverse scales as the name suggests.

Aside from that a few smaller things were added (theme_void) and a few recipes 
that use geom_tile (annotated heatmap and plotting the periodic table).

I'm not entirely happy with the state of version v0.3.0 though, since the 
formula mechanism introduces several breaking changes. Arguably reading 
formulas is now clearer (see the beginning of the README and especially the 
recipes, since they all have to be compliant with the new mechanism!), but it 
still requires code to be changed.

I think the amount of breakage is probably not that large, since not that many 
people will have used formulas for things anyways yet. Also because the DF was 
discouraged before, since it was slow.

Simple formulas e.g. f{"hwy"} remains unchanged anyways, same as f{5} to set 
some constant value to an aesthetic. But for these things formulas were 
previously only required for numbers and not referring to columns, since the 
aes proc took string | FormulaNode. Now also numbers are actually supported, so 
to set some constant value, you can just do aes(width = 0.5) instead of 
aes(width = f{0.5}).

In any case, I wanted to get this PR off my chest, since it was way too large. 
I tried to avoid breaking changes as much as possibly by macro magic, but this 
issue:

[https://github.com/nim-lang/Nim/issues/13913](https://github.com/nim-lang/Nim/issues/13913)

was the nail in the coffin. So I just release it now.

Feel free to open issues, because I broke your code. :)


Re: How to write shell scripts in Nim

2020-04-05 Thread Vindaar
Aside from putting that shebang line at the top of the file you want to run as 
a script, the file has to be saved as a NimScript file, namely use a .nims file 
ending.

someScript.nims 


#!/usr/bin/env nim

echo "Hello from NimScript!"
echo defined(NimScript)


Run

and then in your terminal: 


chmod +x someScript.nims
./someScript.nims


Run

and it should run just fine. 


Re: ggplotnim - pretty native plots for us

2020-04-03 Thread Vindaar
Thanks, maybe I'll give it a try to include it manually into the repo!

> improve performance and usability on complex apply/map

It will definitely help, but I'm already creating a single loop for each 
formula, no matter how many tensors are involved.

E.g.


let df = ...# some DF w/ cols A, B, C, D
df.mutate(f{"Foo" ~ `A` * `B` - `C` / `D`})


Run

will already be rewritten to:


var
col0_47816020 = toTensor(df["A"], float)
col1_47816021 = toTensor(df["B"], float)
col2_47816022 = toTensor(df["C"], float)
col3_47816023 = toTensor(df["D"], float)
res_47816024 = newTensor[float](df.len)
  for idx in 0 ..< df.len:
[]=(res_47816024, idx, col0_47816020[idx] * col1_47816021[idx] - 
col2_47816022[idx] /
col3_47816023[idx])
  result = toColumn res_47816024)


Run

which is indeed a little slower than a manual map_inline, but still pretty 
fast. Compare the first plot from here:

[https://github.com/Vindaar/ggplotnim/tree/arraymancerBackend/benchmarks/pandas_compare](https://github.com/Vindaar/ggplotnim/tree/arraymancerBackend/benchmarks/pandas_compare)

Not sure where the variations map_line sees are coming from though. Effects of 
openmp?

**Small aside about the types**

The data types are determined as floats from the usage of *, / etc. Could be 
overridden by giving type hints: 


f{int -> float: ...}
  ^--- type of involved tensors
 ^ type of resulting tensor


Run

> AFAIK it should would allow combining complex transformations and do them in 
> a single pass instead of allocating many intermediate dataframes so 
> performance can be an order of magnitude faster on zip/map/filter chains.

While this is certainly exciting to think about, I think it'd be pretty hard to 
(for me in the near future anyways) achieve while:

  1. keeping it simple to extend the library by adding new procs
  2. still allowing usage of the procs in a normal way as to return a new DF 
(without having differently named procs for inplace / not inplace variants).



But this is just me speculating from the not all that simple code of 
zero-functional. I guess having a custom operator like it does would allow us 
to replace the user given proc names though.

If you have a better idea of how to do efficient chaining that seems reasonable 
to implement, I'm all ears.

**what I 'm working on**

Right now I'm rather worrying about having decent performance for group_by and 
inner_join though. I'm looking at 
[https://h2oai.github.io/db-benchmark](https://h2oai.github.io/db-benchmark)/ 
since yesterday. It's a rather brutal reality check, hehe.

Comparing my current code with the first of the 0.5 GB group_by examples to 
pandas and data.table was eye opening. In my current implementation of 
summarize for grouped data frames I actually return the sub data frames for 
each group and apply a simple reduce operation based on the users formula. 
Well, what a surprise, that's slow. I haven't dug deep into data.table of 
pandas yet, but as far as I can tell they essentially special case group_by \+ 
other operation and handle these by just aggregating on all groups in a single 
pass.

So I've implemented the same and even for a single key with a single sum I'm 2 
times slower than running the code with pandas on my machine. To be fair, 
performing operations on sub groups individually is a nice 100x slower than 
pandas.

Still, the biggest performance impact I have to make is in order to allow 
columns with multiple data types to group by. I need some way to check which 
subgroup a row belongs to. Since I can't create a tuple at runtime, in order to 
just use normal comparison operators I decided to calculate a hash for each row 
and compare that. That works well, but gives me that 2x speed penalty.

For the time being though, I think I'm happy with that unless I have a better 
idea / someone can point me to something that works in a typed language and 
doesn't involve huge amount of boilerplate code.

So I'm currently working on an implementation that allows to use user defined 
formulas for aggregation while not having to call a closure for each row. 


Re: Template - how to prefix a function's name

2020-04-02 Thread Vindaar
Oh yes, I could have made that more clear.

Indeed, the default type is untyped. Both for the arguments as well as the 
return type.

And yes, untyped is required to make this work. Essentially untyped is just 
considered as a raw Nim identifier (nnkIdent in macro terms). If you used 
string as an argument, the compiler would understand that as a string at 
runtime. Since the name of the generated proc / etc. has to be known at compile 
time of course, this wouldn't work.

You _can_ (although I don't think with a template) hand a static string, which 
is a string known at compile time and extract an identifier from the string. 
But unless you do more complicated macro things where you might want to 
calculate names of procs you want to generate, this won't be much different 
than just handing a raw identifier.

An example:


import macros

macro genproc(prefix: static string): untyped =
  let procName = ident(prefix & "World")
  result = quote do:
proc `procName`() = echo "Hello world"

genProc("hello") # works, string literal is static
helloWorld()

const foo = "alsoHello"
genProc(foo) # a const string is known at CT
alsoHelloWorld()

# and also
proc getName(): string =
  result = "finalHello"

var bar {.compileTime.}: string
static: bar = getName() # some CT computation
genProc(bar)
finalHelloWorld()



Run


Re: ggplotnim - pretty native plots for us

2020-04-01 Thread Vindaar
Some simple benchmarks comparing the new backend to pandas at:

[https://github.com/Vindaar/ggplotnim/tree/arraymancerBackend/benchmarks/pandas_compare](https://github.com/Vindaar/ggplotnim/tree/arraymancerBackend/benchmarks/pandas_compare)

Note that I ran the code on a default pandas installation on my void linux, 
without blas. But I also compiled the Nim code without blas support.

It's just a port of a pandas / numpy comparison from here:

[https://github.com/mm-mansour/Fast-Pandas](https://github.com/mm-mansour/Fast-Pandas)

All in all the new backend (let's call it datamancer from now on, heh) is 
significantly faster for all operations, which essentially just rely on 
@mratim's work.

For a few others, specifically unique and sorting, it's slightly slower. But 
given the implementation of those I'm actually rather happy with that.

And especially for small data frame sizes the function call / looping overhead 
python has to bear is ridiculous.

I'll focus on finishing up the open PR (ridgelines and a bit more) and then 
finish this.


Re: ggplotnim - pretty native plots for us

2020-03-30 Thread Vindaar
I've started to implement the arraymancer backend into the actual code of 
ggplotnim now.

Most recipes are compiling and working fine now.

The rules for formula creation have changed a little bit, but actually provide 
more control now. There's no documentation about the rules until everything 
works, except: 
[https://github.com/Vindaar/ggplotnim/blob/arraymancerBackend/playground/arraymancer_backend.nim#L946-L958](https://github.com/Vindaar/ggplotnim/blob/arraymancerBackend/playground/arraymancer_backend.nim#L946-L958)
 and the modified recipes.

The code currently uses the arraymancer backend by default (-d:defaultBackend 
to use the old one; yeah irony). 


Re: ggplotnim - pretty native plots for us

2020-03-28 Thread Vindaar
Super short update: I essentially reached feature parity of the arraymancer 
backend DF right now.

Still WIP, but the implementation currently lives on the arraymancerBackend 
branch in the playground dir here: 
[https://github.com/Vindaar/ggplotnim/blob/arraymancerBackend/playground](https://github.com/Vindaar/ggplotnim/blob/arraymancerBackend/playground)

One of the worst offenders of performance before was gather if many columns 
were involved. An example from this issue: 
[https://github.com/Vindaar/ggplotnim/issues/39](https://github.com/Vindaar/ggplotnim/issues/39)
 is down from 12.5 s to only 0.05 s for only the gather call.

Progress! :)


Re: ggplotnim - pretty native plots for us

2020-03-26 Thread Vindaar
So I did a thing today… (which is why I haven't answered yet).

This morning I took another look at a rewrite of the `DataFrame` using an 
arraymancer backend. Turns out by rethinking a bunch of things and especially 
the current implementation of the `FormulaNode`, I managed to come up with a 
seemingly working solution.

This is super WIP and I've only implemented `mutate`, `transmute` and `select` 
so far, but first results are promising.

Essentially the `FormulaNode` from before is now compiled into a closure, which 
returns a full column.

So the following formula:


f{"xSquared" ~ "x" * "x"}

Run

will assume that each string is a column of a data frame and create the 
following closure:


proc(df: DataFrame): Column =
  var
colx_47075074 = toTensor(df["x"], float)
colx_47075075 = toTensor(df["x"], float)
res_47075076 = newTensor[float](df.len)
  for idx in 0 ..< df.len:
[]=(res_47075076, idx, colx_47075075[idx] * colx_47075074[idx])
  result = toColumn res_47075076

Run

The data types for the columns and the result data type are currently based on 
heuristics given things that appear in the formula. E.g. if math operators 
appear it's float, if boolean operators it's bool etc.

The data frame now looks like:


DataFrame* = object
  len*: int
  data*: Table[string, Column]
  case kind: DataFrameKind
  of dfGrouped:
# a grouped data frame stores the keys of the groups and maps them to
# a set of the categories
groupMap: OrderedTable[string, HashSet[Value]]
  else: discard

Run

where a `Column` is:


Column* = object
  case kind*: ColKind
  of colFloat: fCol*: Tensor[float]
  of colInt: iCol*: Tensor[int]
  of colBool: bCol*: Tensor[bool]
  of colString: sCol*: Tensor[string]
  of colObject: oCol*: Tensor[Value]

Run

`colObject` is the fallback for columns, which contain more than one data type.

So I only wrote a super simple for loop to get a rough idea how fast/slow this 
might be:


import arraymancer_backend
import seqmath, sequtils, times
#import ggplotnim # for comparison with current implementation

proc main(df: DataFrame, num: int) =
  let t0 = cpuTime()
  for i in 0 ..< num:
df = df.mutate(f{"xSquared" ~ "x" * "x"})
  let t1 = cpuTime()
  echo "Took ", t1 - t0, " for ", num, " iter"

proc rawTensor(df: DataFrame, num: int) =
  var t = newTensor[float](df.len)
  let xT = df["x"].toTensor(float)
  let t0 = cpuTime()
  for i in 0 ..< num:
for j in 0 ..< df.len:
  t[j] = xT[j] * xT[j]
  let t1 = cpuTime()
  echo "Took ", t1 - t0, " for ", num, " iter"

when isMainModule:
  const num = 1_000_000
  let x = linspace(0.0, 2.0, 1000)
  let y = x.mapIt(0.12 + it * it * 0.3 + 2.2 * it * it * it)
  var df = seqsToDf(x, y)
  main(df)
  rawTensor(df)

Run

Gives us: new DF:

  * `Took 9.570060132 for 100 iter`



raw arraymancer tensor:

  * `Took 1.034196647 for 100 iter` (so still some crazy overhead!)



While the old DF took 23.3 seconds for only 100,000 iterations! So about a 
factor 23 slower than the new code.

Probably really bad comparison with pandas:


import numpy as np
import pandas as pd
x = np.linspace(0.0, 2.0, 1000)
y = (0.12 + x * x * 0.3 + 2.2 * x * x * x)

df = pd.DataFrame({"x" : x, "y" : y})
def call():
t0 = time.time()
num = 10
for i in range(num):
df.assign(xSquared = df["x"] * df["x"])
t1 = time.time()
print("Took ", (t1 - t0), " for 1,000,000 iterations")
call()

Run

`Took 60.24467134475708 for 100,000 iterations` I suppose using assign and 
accessing the columns like this is probably super inefficient in pandas?

And a (also not very good) comparison with `NimData`


import nimdata

import seqmath, sequtils, times, sugar

proc main =
  let x = linspace(0.0, 2.0, 1000)
  let y = x.mapIt(0.12 + it * it * 0.3 + 2.2 * it * it * it)
  var df = DF.fromSeq(zip(x, y))
  df.take(5).show()
  echo df.count()
  
  const num = 1_000_000
  let t0 = cpuTime()
  for i in 0 ..< num:
df = df.map(x => (x[0], x[0] * x[0])).cache()
  let t1 = cpuTime()
  echo "Took ", t1 - t0, " for ", num, " iter"

when isMainModule:
  main()

Run

`Took 16.322826325 for 1,000,000 iter`

I'm definitely not saying the new code is faster than NimData or pandas, but 
it's defintely promising!

I'll see where this takes me. I think though I managed to implement the main 
things I was worried about. The rest should just be tedious work.

Will keep you all posted. 


Re: Template - how to prefix a function's name

2020-03-24 Thread Vindaar
You do it like this:


template mytempl(prefix) =
  proc `prefix World`() = echo "hello world"


Run

See here:

[https://nim-lang.github.io/Nim/manual.html#templates-identifier-construction](https://nim-lang.github.io/Nim/manual.html#templates-identifier-construction)


Re: ggplotnim - pretty native plots for us

2020-03-24 Thread Vindaar
@spip I'll answer your question below aswell.

> Is this compatible with other libraries, such as arraymancer, etc? I think 
> that one of the biggest strengths of the python numerical ecosystem is the 
> good inter-operability of most plotting libraries with numpy. So if that is 
> not already the case I would suggest making that your highest priority.

The answer to that is "sort of". I'll need to explain a little to answer the 
why and what I mean by "sort of".

**The long answer**

Originally when I started the library I never planned to write a data frame 
library to go with this. I quickly realized however that (at least with a 
library like `ggplot2`) one doesn't work well without the other. In a normal 
plotting library every plotting function is a special case. Essentially each 
kind of plot wants data in a specific form / of a specific data type.

So in the beginning I specifically didn't want to use arraymancer internally. I 
love that library, but given that all I wanted to write was a "plotting 
library", this meant two things specifically for me:

  * The library is essentially a sink for the user's data. It doesn't return 
anything, so there's no reason for the internal data type to conform to any 
standards
  * If a user wants to create a plot, performance will **not** be an important 
consideration (which does not imply performance of a plotting library does not 
matter!). Creating a plot will always be slow (compared to pure number 
crunching anyways). There are use cases for libraries, which can create plots 
at several hundred fps. But to be honest, if I need to create a huge number of 
plots and am thus performance sensitive, the question is if a plotting library 
is the best tool in the first place.



For this reason I decided to avoid having arraymancer as a dependency, because 
all its strengths are mostly useless for the intended purpose, but would mean I 
introduce an unnecessary dependency.

If a user is using arraymancer for calculations, it's easy to convert the 
required data to ggplotnim's data types. I felt the overhead of copying the 
data was not a big deal under the assumption mentioned above.

**But** , things did somewhat change when I started to write the data frame.

My first idea was actually to use `NimData`, since I really like library. 
However, the (depending on viewpoint) advantage / disadvantage that their type 
is entirely defined via a schema at compile time, didn't appeal to me. I didn't 
want to end up with a ggplot2 clone that was super restrictive, because 
everything had to be known at compile time.

I was actually hoping that @bluenote would pick up his development of Kadro 
again:

[https://github.com/bluenote10/kadro](https://github.com/bluenote10/kadro)

That sounded perfectly suited. But since he didn't, I simply started to hack 
together something that suits the needs of the library.

Originally in fact the `DataFrame` type was generic and my goal was to write 
the code in such a way that the underlying type does not matter. This made 
things complicated though. In fact I even thought about an arraymancer backend 
from the start:

[https://github.com/Vindaar/ggplotnim/blob/master/playground/arraymancer_backend.nim](https://github.com/Vindaar/ggplotnim/blob/master/playground/arraymancer_backend.nim)

which however never progressed from there. Mainly because I couldn't figure out 
how to make use of arraymancer's performance, when majority of data frame 
operations I did ended up copying around data. Which is how I ended up with 
@PMunch's persistent vector from Clojure. It kind of allowed me to "copy as 
much as I want" without the performance penalty.

This is how we got to the current situation. The data frame is okayish fast for 
simple things to prepare a plot. Anything else, I can't recommend it (also 
because it's extremely lenient on types!).

**tl;dr**

Compatibility with the "rest of the ecosystem" isn't there for practical 
reasons.

The thing is I'd love to profit from @mratsim's amazing work on arraymancer and 
laser!

Once I go back and reconsider performance of the data frame, I hope I will end 
up using as much of arraymancer as I can to be honest. I just need to figure 
out how to do it. :)

> Other than that, I didn't see mention of support for contour plots in the 
> docs. It is surprising how often those come in handy in many scenarios so I'd 
> like for you to add that if it is not available yet. Another thing I like to 
> do is to combine line plots with histograms and/or kernel density plots on 
> the X and Y axis (to get a quick idea of the distribution of the values, 
> particularly in time series). It would be neat to support for that too.

Good point. Contour plots are something I simply didn't think about.

I've never actually thought about how those are implemented before. I guess 
it's just a 2 dimensional KDE, right?

Since I will

ggplotnim - pretty native plots for us

2020-03-22 Thread Vindaar
Hey!

As many of you will be aware by now, I started to write a port of 
[ggplot2](https://ggplot2.tidyverse.org/) some time mid last year:

[https://github.com/Vindaar/ggplotnim](https://github.com/Vindaar/ggplotnim)

After many sometimes frantic sessions working on this, I'm finally approaching 
a first personal milestone: Essentially all features I consider essential for a 
plotting library (for my personal use cases!) are (or are about to be) 
implemented. This will mark the release of version `v0.3.0`.

The remaining features I will implement in the next few days are:

  * `geom_density`: to create smooth density estimates of continuous variables 
using kernel density estimation (KDE). I've implemented a naive KDE with 
complexity `O(m x n)` for testing and it works very well (but it's very slow 
obviously). I want to improve that before merging it. If anyone has a good 
resource for a simple to implement but reasonably performant KDE implementation 
/ algorithm, feel free to post it!
  * `geom_ridgeline`: ridgeline plots (or joyplots) are fun and pretty! Should 
be straightforward to implement.
  * re-activate `facet_wrap`: `facet_wrap` has been dormant for a few months 
now, because an internal rewrite broke them at some point. The implementation 
is there, but I need to fix the layouting, which is even more broken now than 
before. But that should also be fairly easy.



Now, the main reason I open this topic is to ask all of you about what I should 
focus on once the above is done.

# Possible things to work on

There are several ideas I have in my mind, but definitely not the time to 
tackle them at the same time. They are:

## properly implement the Vega-Lite backend

One of the main goals I had in mind when starting this whole project was to 
provide two different plotting backends. One native target to produce plots 
locally, fast and statically.

On the other hand, originally inspired by @mratsim's 
[monocle](https://github.com/numforge/monocle), a 
[Vega-Lite](https://vega.github.io/vega-lite/) backend to scratch that 
interactive / web based itch, which allows for easy sharing of plots 
**including data**!

I wrote a [proof of 
concept](https://github.com/Vindaar/ggplotnim#experimental-vega-lite-backend) 
and by now I have a pretty good idea (barring a lack of Vega experience) on how 
to implement this.

Essentially the whole processing of the plot as is done now remains the same. 
This allows to make use of the whole functionality of `ggplotnim` without 
having to do a lot of duplication. The drawing code will be replaced by a 
mapping to JSON instead.

The major work would be involved defining said mapping. If I'm lucky I can even 
write it as a [ginger 
backend](https://github.com/Vindaar/ginger/blob/master/src/ginger/backendCairo.nim)
 with a - for Vega pretty obscure - API (`drawPoint`, `drawLine`, etc. 
essentially just adding data to a `JsonNode`). More likely it'll involve 
replacing the [drawing 
portion](https://github.com/Vindaar/ggplotnim/blob/master/src/ggplotnim/ggplot_drawing.nim#L342-L364)
 of `ggplotnim` Vega related drawing equivalents.

## improve `DataFrame` performance

The included data frame in `ggplotnim` is - for many operations anyways - 
abysmally slow.

While performance is nice, I mainly wanted something to work with "right now" 
instead of spending a lot of time writing a performant data frame.

The reasons for the performance are three-fold, as far as I can tell:

  * for some operations the algorithms used are inefficient
  * the underlying data type is a `Value` similar to a `JsonNode`. Conversion 
to and from normal types is slow and operations on `Value` are also slow, since 
there are always case statements involved and at least one indirection to 
access the actual value.
  * each column is a `PersistentVector[Value]`. For most operations this is a 
major performance boost over a `seq[Value]`, since we avoid a large amounts of 
copying. However, iterating over long vectors or building long vectors is slow.



One thing to improve performance would be to include the distinction between a 
pure column of one data type and `Value` columns (which are somewhat similar to 
`object` types in numpy / pandas if my superficial understanding of those is 
correct).

While I'm not certain, I believe that distinction alone would make the code a 
lot more complex and would definitely require a lot of use of generics. 
Generics is something I specifically wanted to avoid in context of a data 
frame, because each time I played around with toy data frames this became a 
headache.

The only idea to avoid generics would be to extend a `Value` to also have a 
case for vector like data, similar to `JsonNodes` `JArray`. That would double 
the number of fields though.

In any case, if I were to seriously attempt to improve performance of the data 
frames, I would stop messing around myself and first do some research into how 
data frames are handled elsewhe

Re: Performance test agains Python

2020-03-06 Thread Vindaar
As far as I know such simple string manipulations are actually pretty fast in 
python. So don't expect an amazing speed improvement over python if your code 
is this simple.

In more "real world" examples you'll see Nim outperforming Python. 


Re: Performance test agains Python

2020-03-06 Thread Vindaar
You have to be aware that using strings this way is always going to be somewhat 
inefficient, since each replace call will make a copy!

Of those especially in the following:


f2.writeLine(sLine.replace(sFind, sReplaced.replace("\"", "")))


Run

the sReplaced.replace("", "") seems unnecessary. Why not perform the 
replacement when defining sReplaced above? Since it doesn't seem to depend on 
the current line, it's going to be the same either way.

Also, as far as I can tell, the whole find seems unnecessary too. If replace 
cannot find the string sFind no replacement will take place. So you can just 
replace: 


if sLine.find(sFind) > -1:
  f2.writeLine(sLine.replace(sFind, sReplaced.replace("\"", "")))
else:
  f2.writeLine(sLine)


Run

by


f2.writeLine(sLine.replace(sFind, sReplaced)) # with `sReplaced` changed as 
above


Run

Especially given that the substring seems to be found in 1/4 of the cases, I 
imagine this should be faster. The little overhead of replace over find 
shouldn't matter in that case.

To be fair, both things also apply to the Python code. 


Re: Arraymancer and --gc:arc

2020-03-05 Thread Vindaar
As far as I'm aware these two things aren't possible with Arraymancer at the 
moment. But using normal a seq[T] there's ways to do both:

For interpolation:

  * numericalnim by @hugogranstrom: 
[https://github.com/HugoGranstrom/numericalnim#natural-cubic-splines](https://github.com/HugoGranstrom/numericalnim#natural-cubic-splines)
  * to an extent seqmath by @jlp765 (and me to an extent, but not on 
interpolation): 
[https://github.com/Vindaar/seqmath/blob/master/src/seqmath/smath.nim#L639](https://github.com/Vindaar/seqmath/blob/master/src/seqmath/smath.nim#L639)
 and below



And for FFT the only way I'm aware of at the moment is via the C library kiss 
FFT: 
[https://github.com/m13253/nim-kissfft](https://github.com/m13253/nim-kissfft)

I wrote down a couple of notes (mainly for myself and @kaushalmodi) about using 
kiss FFT from Nim a couple of days ago: 
[https://gist.github.com/Vindaar/fc158afbc75627260aed90264398e473](https://gist.github.com/Vindaar/fc158afbc75627260aed90264398e473)


Re: TimeFormatParseError using period character '.' as date separator

2020-03-02 Thread Vindaar
That's because there's only a few characters, which can be used directly in a 
format string.

To use other characters / longer strings, you have to put them between ' ', 
like so:


echo dt.format("'.'mm'.'dd")


Run

See below the table here:

[https://nim-lang.github.io/Nim/times.html#parsing-and-formatting-dates](https://nim-lang.github.io/Nim/times.html#parsing-and-formatting-dates)


Re: Strange Macro Behavior

2020-02-27 Thread Vindaar
I don't understand what you actually want to accomplish, but I'm pretty sure 
you don't actually want a generic macro.

But your "would love to do that" idea actually works, with a small 
modification. Just don't use explicit types, but use varargs[typed]. Then you 
can extract the type information in the macro. As long as you don't want to do 
crazy things with types in macros, it all works well.


import macros, sugar

type Generator[T] = () -> T

proc gen1(): float =
  result = 42.0

proc gen2(): int =
  result = 66

proc gen3(): string =
  result = "Hello"

# works fine if you use `varargs[typed]`
macro genTuple(args: varargs[typed]): untyped =
  var types: seq[NimNode]
  var impls: seq[NimNode]
  for ch in args:
types.add ch.getTypeImpl
impls.add ch.getImpl
  echo types.repr
  echo impls.repr

genTuple(gen1, gen2, gen3)


Run

where I extracted both the actual implementation (if you wanted to do something 
to those procs / their bodies) and their types. From there you can do whatever 
you want with those types.


Re: Compile time FFI

2020-02-04 Thread Vindaar
I essentially had the same use case as @PMunch in the past.

When I thought about implementing reading Keras stored NNs in Arraymancer, one 
of the problems was that

  1. creating a NN in Arraymancer currently means using the DSL to design the 
network layout. Since that DSL generates a whole bunch of procs etc. doing that 
at runtime is problematic.
  2. Keras networks are stored in HDF5 files and the network layout in 
attributes of some groups.



So for the most straight forward way to implement this, I wanted to read the 
attributes of Keras HDF5 files at compile time. Given that HDF5 is a rather 
complicated file format, implementing my Nim based parser (even if only for 
attributes) wasn't really in the cards.

There's certainly solutions to this without requiring compile time FFI of 
course.


Re: Is "danger" define supposed to also define "release"?

2020-02-03 Thread Vindaar
Oh wow, I somehow thought it was intended behavior, since @mratsim advocated to 
compile with -d:release -d:danger almost immediately after -d:danger was 
introduced. So I thought you were aware of this.


Re: Help with set

2019-12-05 Thread Vindaar
The error message for the second case isn't as clear as it could be, if you 
don't know what to look for maybe. It says: 


Error: type mismatch: got  but expected 'CharSet 
= set[int16]'


Run

See the (int) after the range ...? That tells you the type it gets is actually 
of type int. Your set however takes int16. So to make it work you have to give 
explicit int16 literals:


x = {1'i16..9'i16, 15'i16, 45'i16..78'i16}


Run


Re: Getting fields of an object type at compile time?

2019-12-04 Thread Vindaar
You might want something like this:


import macros

type Foo = object
  a: int
  b: string

macro getFields(x: typed): untyped =
  let impl = getImpl(x)
  let objFields = impl[2][2] # 2 is nnkObjectTy, 2 2 is nnkRecList
  expectKind objFields, nnkRecList
  result = nnkBracket.newTree()
  for f in objFields:
expectKind f, nnkIdentDefs
result.add nnkPar.newTree(newLit f[0].strVal, newLit f[1].strVal)

for (fieldName, typeName) in getFields(Foo):
  echo "Field: ", fieldName, " with type name ", typeName


Run

It returns a bracket of (field name, type name) tuples. Both as strings, since 
you can't mix strings with types in a tuple. For more complicated objects you'd 
have to recurse on the fields with complex types of course.


Re: Advent of Nim 2019 megathread

2019-11-25 Thread Vindaar
I should participate again I guess. I fear I'll have even less time than last 
year though. We'll see!


Re: Empty sequence of specific type given problems when compiling with "cpp"

2019-11-21 Thread Vindaar
Ah yes, indeed. If DICT is just a proc that takes a Context, this also works 
btw:


let ret = DICT(@[])


Run

The compiler can deduce the type of the empty seq itself from the args of proc 
DICT(c: Context).

Although personally if the use case of an empty seq as the argument comes up 
more often, I'd set the empty sequence as the default value for the argument 
proc DICT(s: Context = @[]) instead. And proc newContext(len: int): Context 
helper is probably also useful to emphasize the intention.


Re: Empty sequence of specific type given problems when compiling with "cpp"

2019-11-21 Thread Vindaar
If I'm not missing something, you should call newSeq instead:


var ret = DICT(newSeq[Context]())


Run

But of course, any Nim code you write should either error during nim 
compilation or compile successfully. 


Re: Where is "taint mode" documented?

2019-11-20 Thread Vindaar
That part of the manual is now in the experimental document here:

[https://nim-lang.github.io/Nim/manual_experimental.html#taint-mode](https://nim-lang.github.io/Nim/manual_experimental.html#taint-mode)


Re: Confused about how to use ``inputStream`` with ``osproc.startProcess``

2019-11-18 Thread Vindaar
Sorry, I didn't see the post before.

You're almost there. There's two small things missing in your code.

  1. you should add a newline to the string you write to the input stream. cat 
wants that newline
  2. instead of closing the input stream, you flush it.



Note however that with cat at least the output stream will never be "done". So 
you need some stopping criteria.


import osproc, streams

proc cat(strs: seq[string]): string =
  var command = "cat"
  var p = startProcess(command, options = {poUsePath})
  var inp = inputStream(p)
  var outp = outputStream(p)
  
  for str in strs:
# append a new line!
inp.write(str & "\n")
  
  # make sure to flush the stream!
  inp.flush()
  var line = ""
  var happy = 0
  while p.running:
if happy == strs.len:
  # with `cat` there will always theoretically be data coming from the 
stream, so
  # we have some artificial stopping criterium (maybe just Ctrl-C)
  break
elif outp.readLine(line):
  result.add(line & "\n")
inc happy
  close(p)

echo cat(@["hello", "world"])


Run


Re: Web applications and pattern match

2019-11-17 Thread Vindaar
Good to hear! I couldn't find the source for the new version of the live demo 
though. In the markdown document it's still the old code as far as I can tell.

Yes, please just ask!


Re: Problems with Emacs mode for Nim

2019-11-17 Thread Vindaar
Be that as it may, the fact remains that nim-mode is written in elisp. ;)

I don't even think these changes are hard at all, but I don't know my way 
around how to find the code responsible for those indentations without studying 
all of nim-mode first.


Re: Web applications and pattern match

2019-11-17 Thread Vindaar
So originally I wanted to write a up a nice example to do the replacements via 
the scanf macro:

[https://nim-lang.github.io/Nim/strscans.html](https://nim-lang.github.io/Nim/strscans.html)

by defining tuples of strings to match against and their replacements, but I 
hit a dead end, because an element of a const tuple doesn't count as a static 
string for the pattern.

Also scanf turned out to be more problematic than I thought, because the $* 
term does not like to match any string until the end.

But since your book is (at least partly) about Nim macros and writing macros is 
fun, I built the following even longer version of your code, haha. It also 
includes a custom matcher that matches anything until the end of the string.


# File: web.nim
import strutils, os, strscans, macros
let input = open("rage.md")

let form = """
 
   


  """

echo "Content-type: text/html\n\n"
echo """
  

  
"""
echo ""

proc rest(input: string; match: var string, start: int): int =
  ## matches until the end of the string
  match = input[start .. input.high]
  # result is either 1 (string is empty) or the number of found chars
  result = max(1, input.len - start)

macro match(args, line: typed): untyped =
  ## match the `args` via `scanf` in `line`. `args` must be a `[]` of
  ## `(scanf string matcher, replacement string)` tuples, where the latter
  ## has to include a single `$#` to indicate the position of the 
replacement.
  ## The order of the `args` is important, since an if statement is built.
  let argImpl = args.getImpl
  expectKind argImpl, nnkBracket
  result = newStmtList()
  let matched = genSym(nskVar, "matched")
  result.add quote do:
var `matched`: string
  var ifStmt = nnkIfStmt.newTree()
  for el in argImpl:
expectKind el, nnkTupleConstr
let toMatch = el[0]
let toReplace = el[1]
let ifBody = nnkStmtList.newTree(nnkCall.newTree(ident"echo",
 
nnkCall.newTree(ident"%",
 
toReplace,
 
matched)),
 nnkAsgn.newTree(matched, newLit("")))
let ifCond = nnkCall.newTree(ident"scanf", line, toMatch, matched)
ifStmt.add nnkElifBranch.newTree(ifCond, ifBody)
  result.add ifStmt
  echo result.repr

const h1title = ("# ${rest}", "$#")
const h2title = ("## ${rest}", "$#")
const elseLine = ("${rest}", "$#")
const replacements = [h1title, h2title, elseLine]
for line in input.lines:
  match(replacements, line)
  # produces:
  # var matched: string
  # if scanf("# ${rest}", line, matched):
  #   echo h1title[1] % matched
  # if scanf("## ${rest}", line, matched):
  #   echo h2title[1] % matched
  # if scanf("${rest}", line, matched):
  #   echo elseLine[1] % matched
echo form

let qs = getEnv("QUERY_STRING", "none").split({'+'}).join(" ")
if qs != "none" and qs.len > 0:
  let output = open("visitors.txt", fmAppend)
  write(output, qs&"\n")
  output.close

let inputVisitors= open("visitors.txt")
for line in inputVisitors.lines:
  match(replacements, line)
inputVisitors.close
echo ""
input.close


Run

This is totally not practicle I'd say and one's better off writing something by 
hand or using the excellent 
[https://github.com/zevv/npeg](https://github.com/zevv/npeg) by @zevv.

Still fun though. And if someone wants to improve on this...

Finally, to just remove a prefix of a string, you may just use removePrefix 
from strutils:

[https://nim-lang.github.io/Nim/strutils.html#removePrefix%2Cstring%2Cstring](https://nim-lang.github.io/Nim/strutils.html#removePrefix%2Cstring%2Cstring)

Note that it only works inplace on string. You could use the new outplace 
though:

[https://github.com/nim-lang/Nim/pull/12599](https://github.com/nim-lang/Nim/pull/12599)


Re: Problems with Emacs mode for Nim

2019-11-16 Thread Vindaar
You're right that there's a couple of examples where the indentation of 
nim-mode is all over the place and this is one of them. I've been meaning to 
take a look at this too, but lack of time and not being that experienced with 
elisp means I haven't done so.

Something like this is another: 


proc someProc(binWidth = 0.0,
  breaks: seq[float] = @[],
  binPosition = "none" # <- tab in this line will put it
# binPosition = "none", # <- here
 ): ReturnVal =


Run

The reason in your specific case of course is the tuple unpacking. It seems the 
opening parens is confusing nim-mode. In my case it's the default @[] for the 
breaks. Remove that and it works.

As far as I'm aware @krux02 did most of the recent development on nim-mode. 
Also @kaushalmodi comes to mind as someone who could probably fix this easily.


Re: Marshal and friends

2019-11-13 Thread Vindaar
> For JSON to() macro we have
> 
> > Heterogeneous arrays are not supported.
> 
> I have never seen that term in Nim world before ???

That just refers to the fact that in JSON you can of course have a 
heterogeneous array, like:


let heterogeneous = %* [1, 2.5, "Hello"]


Run

and these simply cannot be mapped to Nim types properly.


Re: A taxonomy of Nim packages

2019-11-11 Thread Vindaar
I just opened an issue on the awesome-nim repo about adding a few more 
collaborators, so that PRs can be merged more quickly.

[https://github.com/VPashkov/awesome-nim/issues/65](https://github.com/VPashkov/awesome-nim/issues/65)


Re: What is the difference between "writeFile" and "newFileStream" and "write"?

2019-11-09 Thread Vindaar
The answer is simply: writeFile sure can write those and it shouldn't break it.

I do essentially the same here, the only difference is where the image comes 
from: 
[https://github.com/brentp/nim-plotly/blob/master/src/plotly/image_retrieve.nim#L119](https://github.com/brentp/nim-plotly/blob/master/src/plotly/image_retrieve.nim#L119)

And I just ran your first code and it works fine on my end.


Re: Nim for Statistics

2019-11-09 Thread Vindaar
May I ask what the main features are you'd require the ecosystem to provide for 
you to consider Nim?

I'm asking since everyone's use cases are different and in my opinion it's 
important to know what people actually want and need.

For instance for [ggplotnim](https://github.com/vindaar/ggplotnim) I know which 
features are important to me so that's how I choose what to work on. But if I 
knew there's people who consider the stats aspects of ggplot2 to be more 
important than geom_whatever than I would consider working on those instead.


Re: Nim for Statistics

2019-11-09 Thread Vindaar
To add to the very good answers so far, I'd mention that there is an issue 
which tracks scientific libraries here:

[https://github.com/nim-lang/needed-libraries/issues/77](https://github.com/nim-lang/needed-libraries/issues/77)

And to answer your explicit question whether Nim is _suitable_ for statistics, 
I'd answer with a definitive YES. But of course being suitable does not mean 
most libraries you'd like to use exist, just that in my opinion it's a perfect 
language to write / port those libraries in / to.

Aside from that I'm personally not a fan of e.g. jupyter notebooks anyways. And 
given the quick compile times I don't feel the need. I rather like to go the 
literate programming path, like e.g. here:

[https://github.com/Vindaar/TimepixAnalysis/tree/refactorRawManipulation/Doc/other](https://github.com/Vindaar/TimepixAnalysis/tree/refactorRawManipulation/Doc/other)


Re: Retrieving field names of an enumeration or other types?

2019-11-09 Thread Vindaar
If I understand you correctly, here you go: 


import macros

type
  Foo = enum
foo = "Foo"
bar = "Bar"
more = "More"

macro enumFields(n: typed): untyped =
  let impl = getType(n)
  expectKind impl[1], nnkEnumTy
  result = nnkBracket.newTree()
  for f in impl[1]:
case f.kind
of nnkSym, nnkIdent:
  result.add newLit(f.strVal)
else: discard

for f in enumFields(Foo):
  echo f


Run


Re: Requesting examples of macros in Nim

2019-10-22 Thread Vindaar
While I'm not sure what kind of features the times -> j syntax should allow (or 
if times and -> are fixed), the simplest implementation for the second usage I 
can come up with:


import macros, strutils, os

macro theMagicWord(statments: untyped): untyped =
  result = statments
  for st in statments:
for node in st:
  if node.kind == nnkStrLit:
node.strVal = node.strVal & ", Please."

proc parseArgs(cmd: NimNode): (NimNode, NimNode) =
  doAssert cmd.len == 2
  expectKind(cmd[1], nnkInfix)
  expectKind(cmd[1][0], nnkIdent)
  expectKind(cmd[1][1], nnkIdent)
  expectKind(cmd[1][2], nnkIdent)
  doAssert cmd[1][0].strVal == "->"
  doAssert cmd[1][1].strVal == "times"
  result = (cmd[0], # leave cmd[0] as is, has to be valid integer expr
cmd[1][2]) # identifier to use for loop

macro rpt(cmd: untyped, stmts: untyped): untyped =
  expectKind(cmd, nnkCommand)
  expectKind(stmts, nnkStmtList)
  let (toIdx, iterVar) = parseArgs(cmd)
  result = quote do:
for `iterVar` in 1..`toIdx`:
  `stmts`
  echo result.repr

# old macro
#rpt j, paramStr(1).parseInt :
#  theMagicWord:
#echo j, "- Give me some bear"
#echo "Now"

rpt paramStr(1).parseInt times -> j:
  theMagicWord:
echo j, "- Give me some bear"
echo "Now"



Run


Re: netcdf for nim

2019-10-19 Thread Vindaar
Up to right now I didn't even know of grib2 files. I need to read up on what 
kind of file format that is first. I suppose it's based on HDF5 files, too? If 
it is, I should think we should be able to make it work.

Regarding other nimhdf5 users: I'd love to be corrected, but as far as I'm 
aware I'm the only actual user of HDF5 files in Nim land so far. Or the other 
people using nimhdf5 are so happy with it they don't raise any issues. Given 
the state of basically non existent documentation and extremely sparse examples 
(sorry about that :/), that'd be a surprise.


Re: Error: expression has no type (or is ambiguous)

2019-10-18 Thread Vindaar
The compiler complains to you, because the you assign the cleanXmi call to 
newNode.

Thus cleanXmi has to return a type.


Re: Nim beginners tutorial

2019-10-14 Thread Vindaar
I'll check it on my kindle tonight.

Ping me on gitter if you don't hear from me until tomorrow.


Re: How to use file system watcher (fsmonitor) in Nim?

2019-09-11 Thread Vindaar
Apparently there's now a wrapper for libfswatch:

[https://github.com/FedericoCeratto/nim-fswatch](https://github.com/FedericoCeratto/nim-fswatch)

While no help to you, maybe it's interesting for someone else reading this. I 
ported over fsmonitor to modern asyncdispatch in February, since I quickly had 
to hack together an online event display. But it only supports linux, same as 
the old fsmonitor.

[https://github.com/Vindaar/fsmonitor2](https://github.com/Vindaar/fsmonitor2)

However, looking at the File System Event API for OSX, adding support doesn't 
seem all that complicated:

[https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/FSEvents_ProgGuide/UsingtheFSEventsFramework/UsingtheFSEventsFramework.html](https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/FSEvents_ProgGuide/UsingtheFSEventsFramework/UsingtheFSEventsFramework.html)

I don't have a Mac, so attempting that would be a pain. Sounds like a fun 
weekend project for someone to attempt though. :)


Re: Need debugging help

2019-08-26 Thread Vindaar
I wasn't sure whether you actually fixed your code with that post now, but I 
was already looking at your code when I saw the post, so I continued.

I fixed the code differently, by just removing the c types you used. I trust 
that the test case works, because I'm not sure if I broke something. :)

[https://github.com/pb-cdunn/nim-help/pull/2](https://github.com/pb-cdunn/nim-help/pull/2)


Re: netcdf for nim

2019-08-26 Thread Vindaar
Hey!

As far as I'm aware there are no bindings to the NetCDF library so far. I 
personally don't have any experience working with NetCDF. However, I'm aware 
that since NetCDF4, it's actually just based on HDF5. So depending on your use 
cases it _might_ be possible to use 
[nimhdf5](https://github.com/vindaar/nimhdf5).

At least HDFView can easily read NetCDF4 files. So it should be no problem to 
read data from a .nc file using nimhdf5. Writing them and staying compatible 
with NetCDF might be more of a problem however. I don't know what the standard 
contains and if it's easy to manually follow it.

I'd be willing to give you some help if you want to attempt that.