Re: OT: why do people use python when it is slow?
On Thursday, 15 October 2015 at 02:20:42 UTC, jmh530 wrote: On Wednesday, 14 October 2015 at 22:11:56 UTC, data pulverizer wrote: On Tuesday, 13 October 2015 at 23:26:14 UTC, Laeeth Isharc wrote: https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow Andrei suggested posting more widely. I believe it is easier and more effective to start on the research side. D will need: [snip] Great list, but tons of work! A journey of a thousand miles ... I tried to start creating a data table type object by investigating variantArray: http://forum.dlang.org/thread/hhzavwrkbrkjzfohc...@forum.dlang.org but hit the snag that D is a static programming language and may not allow the kind of behaviour you need for creating the same kind of behaviour you need in data table - like objects. I envisage such an object as being composed of arrays of vectors where each vector represents a column in a table as in R - easier for model matrix creation. Some people believe that you should work with arrays of tuple rows - which may be more big data friendly. I am not overly wedded to either approach. Anyway it seems I have hit an inherent limitation in the language. Correct me if I am wrong. The data frame needs to have dynamic behaviour bind rows and columns and return parts of itself as a data table etc and since D is a static language we cannot do this.
Re: OT: why do people use python when it is slow?
On Tuesday, 13 October 2015 at 23:26:14 UTC, Laeeth Isharc wrote: https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow Andrei suggested posting more widely. I am coming at D by way of R, C++, Python etc. so I speak as a statistician who is interested in data science applications. It's about programmer time. You have to weight the time it takes you to do the task in each programming language, if you are doing statistical analysis now, R and Python come out streaks ahead. The scope roughly speaking is Research -> Deployment. R and Python sit on the research side, and Python/JVM technologies sit on the deployment side (broadly speaking). The question is where does D sit? What should D's data science strategy be? To sit on the deployment side, D needs to grow it's big data/noSQL infrastructure for a start, then hook into a whole ecosystem of analytic tools in an easy and straightforward manner. This will take a lot of work! I believe it is easier and more effective to start on the research side. D will need: 1. A data table structure like R's data.frame or data.table. This is a dynamic data structure that represents a table that can have lots of operations applied to it. It is the data structure that separates R from most programming languages. It is what pandas tries to emulate. This includes text file and database i/o from mySQL and ODBC for a start. 2. Formula class : the ability to talk about statistical models using formulas e.g. y ~ x1 + x2 + x3 etc and then use these formulas to generate model matrices for input into statistical algorithms. 3. Solid interface to a big data database, that allows a D data table <-> database easily 4. Functional programming: especially around data table and array structures. R's apply(), lapply(), tapply(), plyr and now data.table(,, by = list()) provides powerful tools for data manipulation. 5. A factor data type:for categorical variables. This is easy to implement! This ties into the creation of model matrices. 6. Nullable types makes talking about missing data more straightforward and gives you the opportunity to code them into a set value in your analysis. D is streaks ahead of Python here, but this is built into R at a basic level. If D can get points 1, 2, 3 many people would be all over D because it is a fantastic programming language and is wicked fast.
Re: OT: why do people use python when it is slow?
On Thursday, 15 October 2015 at 07:57:51 UTC, Russel Winder wrote: On Thu, 2015-10-15 at 06:48 +, data pulverizer via Digitalmars-d- learn wrote: Just because D doesn't have this now doesn't mean it cannot. C doesn't have such capability but R and Python do even though R and CPython are just C codes. I think the way R does this is that its dynamic runtime environment is used bind together native C basic type arrays. I wander if we could simulate dynamic behaviour by leveraging D's short compilation time to dynamically write/update data table source file(s) containing the structure of new/modified data tables? Pandas data structures rely on the NumPy n-dimensional array implementation, it is not beyond the bounds of possibility that that data structure could be realized as a D module. Julia's DArray object is an interested take on this: https://github.com/JuliaParallel/DistributedArrays.jl I believe that parallelism on arrays and data tables are different challenges. Data tables are easier since we can parallelise by row, thus the preference of having row-based tuples. The core issue is to have a seriously efficient n-dimensional array that is amenable to data parallelism and is extensible. As far as I am aware currently (I will investigate more) the NumPy array is a good native code array, but has some issues with data parallelism and Pandas has to do quite a lot of work to get the extensibility. I wonder how the R data.table works. R's data table is not currently parallelised I have this nagging feeling that like NumPy, data.table seems a lot better than it could be. From small experiments D is (and also Chapel is even more) hugely faster than Python/NumPy at things Python people think NumPy is brilliant for. Expectations of Python programmers are set by the scale of Python performance, so NumPy seems brilliant. Compared to the scale set by D and Chapel, NumPy is very disappointing. I bet the same is true of R (I have never really used R). Thanks for notifying me about Chapel - something else interesting to investigate. When it comes to speed R is very strange. Basic math (e.g. *, +, /) operation on an R array can be fast but for-looping will kill speed by hundreds of times - most things are slow in R unless they are directly baked into its base operations. You can write code in C and C++ can call it very easily in R though using its Rcpp interface. This is therefore an opportunity for D to step in. However it is a journey of a thousand miles to get something production worthy. Python/NumPy/Pandas have had a very large number of programmer hours expended on them. Doing this poorly as a D modules is likely worse than not doing it at all. I think D has a lot to offer the world of data science.
Re: OT: why do people use python when it is slow?
On Thursday, 15 October 2015 at 21:16:18 UTC, Laeeth Isharc wrote: On Wednesday, 14 October 2015 at 22:11:56 UTC, data pulverizer wrote: On Tuesday, 13 October 2015 at 23:26:14 UTC, Laeeth Isharc wrote: https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow Andrei suggested posting more widely. I am coming at D by way of R, C++, Python etc. so I speak as a statistician who is interested in data science applications. Welcome... Looks like we have similar interests. That's good to know To sit on the deployment side, D needs to grow it's big data/noSQL infrastructure for a start, then hook into a whole ecosystem of analytic tools in an easy and straightforward manner. This will take a lot of work! Indeed. The dlangscience project managed by John Colvin is very interesting. It is not a pure stats project, but there will be many shared areas of need. He has some v interesting ideas, and being able to mix Python and D in a Jupyter notebook is rather nice (you can do this already). Thanks for bringing my attention to this, this looks interesting. Sounds interesting. Take a look at Colvin's dlang science draft white paper, and see what you would add. It's a chance to shape things whilst they are still fluid. Good suggestion. 3. Solid interface to a big data database, that allows a D data table <-> database easily Which ones do you have in mind for stats? The different choices seem to serve quite different needs. And when you say big data, how big do you typically mean ? What I mean is to start by tapping into current big data technologies. HDFS and Cassandra have C APIs which we can wrap for D. 4. Functional programming: especially around data table and array structures. R's apply(), lapply(), tapply(), plyr and now data.table(,, by = list()) provides powerful tools for data manipulation. Any thoughts on what the design should look like? Yes, I think this is easy to implement but still important. The real devil is my point #1 the dynamic data table object. To an extent there is a balance between wanting to explore data iteratively (when you don't know where you will end up), and wanting to build a robust process for production. I have been wondering myself about using LuaJIT to strap together D building blocks for the exploration (and calling it based on a custom console built around Adam Ruppe's terminal). Sounds interesting 6. Nullable types makes talking about missing data more straightforward and gives you the opportunity to code them into a set value in your analysis. D is streaks ahead of Python here, but this is built into R at a basic level. So matrices with nullable types within? Is nan enough for you ? If not then could be quite expensive if back end is C. I am not suggesting that we pass nullable matrices to C algorithms, yes nan is how this is done in practice but you wouldn't have nans in your matrix at the point of modeling - they'll just propagate and trash your answer. Nullable types are useful in data acquisition and exploration - the more practical side of data handling. I was quite shocked to see them in D, when they are essentially absent from "high level" programming languages like Python. Real data is messy and having nullable types is useful in processing, storing and summarizing raw data. I put in as #6 because I think it is possible to do practical statistics working around them by using notional hacks. Nullables are something that C#, and R have and Python's pandas has struggled with. The great news is that they are available in D so we can use them. If D can get points 1, 2, 3 many people would be all over D because it is a fantastic programming language and is wicked fast. What do you like best about it ? And in your own domain, what have the biggest payoffs been in practice? I am playing with D at the moment. To become useful to me the data table structure is a must. I previously said points 1, 2, and 3 would get data scientists sucked into D. But the data table structure is the seed. A dynamic structure like that in D would catalyze the rest. Everything else is either wrappers, routine and maybe a lot of work but straightforward to implement. The data table structure for me is the real enigma. The way that R's data types are structured around SEXPs is the key to all of this. I am currently reading through R's internal documentation to get my head around this. https://cran.r-project.org/doc/manuals/r-release/R-ints.html
dynamic get from variantArray() data table
Hi, I am trying to use variantArray() as a data table object to hold columns each of which is an array of a specific type. I need to be able to get values from data table but I am having problems ... import std.stdio; // i/o import std.variant; // type variations void main(){ // Columns of the table string[] names = ["walter", "paul", "jeff", "andrie"]; int[] age = [55, 62, 27, 52]; string[] language = ["D", "Haskell", "Julia", "D"]; Variant[] dt = variantArray(names, age, language); foreach(col; dt){ foreach(el; col){ // here I try a kind of dynamic cast operator auto x = el.get!(type(el)); // gives error write(x); } write("\n"); } } data_table.d(37): Error: cannot infer type for el data_table.d(38): Error: undefined identifier 'el' Help DP
Re: dynamic get from variantArray() data table
Thanks for the suggestion Alex, however I need the dynamic behaviour properties of variantArray(), writing a struct each time would be undesirable. Perhaps I could boil down the question to something like, is there a way of writing auto x = dt[0][0]; auto y = x.get!(x.type - or whatever); // to get the actual value of x rather than .VariantN! ... type For some kind of auto cast back to basic type. On Tuesday, 13 October 2015 at 15:51:40 UTC, Alex Parrill wrote: On Tuesday, 13 October 2015 at 15:17:15 UTC, data pulverizer wrote: Hi, I am trying to use variantArray() as a data table object to hold columns each of which is an array of a specific type. I need to be able to get values from data table but I am having problems ... import std.stdio; // i/o import std.variant; // type variations void main(){ // Columns of the table string[] names = ["walter", "paul", "jeff", "andrie"]; int[] age = [55, 62, 27, 52]; string[] language = ["D", "Haskell", "Julia", "D"]; Variant[] dt = variantArray(names, age, language); foreach(col; dt){ foreach(el; col){ // here I try a kind of dynamic cast operator auto x = el.get!(type(el)); // gives error write(x); } write("\n"); } } data_table.d(37): Error: cannot infer type for el data_table.d(38): Error: undefined identifier 'el' Help DP You're trying to iterate over a `Variant`, which isn't implemented. You don't want to use a variant here anyway; you should use a struct or tuple for each entry in the table. import std.typecons; alias Entry = Tuple!(string, int, string); void main() { auto table = [Entry("walter", 55, "D"), Entry("paul", 62, "Haskell"), ... ]; // complete array foreach(entry; table) { writeln(entry.expand); } }
Re: Repeated struct definitions for graph data structures and in/out naming conflict in C library
Thanks library now compiles. On Sunday, 3 January 2016 at 13:45:13 UTC, anonymous wrote: On 03.01.2016 14:30, data pulverizer wrote: I am trying to access functionality in the glpk C library using extern(C). It has graph structs in its header file that are specified in an odd recurring manner that I cannot reproduce in D: I don't see what's odd about this. What exactly are your struggling with? typedef struct glp_graph glp_graph; typedef struct glp_vertex glp_vertex; typedef struct glp_arc glp_arc; You can just drop these. http://dlang.org/ctod.html#tagspace struct glp_graph { ... glp_vertex **v; /* glp_vertex *v[1+nv_max]; }; Drop the semicolon after the struct declaration, and move the asterisks one place to the left (purely style): struct glp_graph { glp_vertex** v; } struct glp_vertex { ... glp_arc *in; glp_arc *out; }; As above, and rename in/out to something else, e.g. in_ and out_: struct glp_vertex { glp_arc* in_; glp_arc* out_; } struct glp_arc { glp_vertex *tail; glp_vertex *head; glp_arc *t_prev; glp_arc *t_next; glp_arc *h_prev; glp_arc *h_next; }; Nothing new here. you may also spot that the in, and out keywords are used as members in the struct, which gives an error in D. These structs are required for functions in the library so need to be included in the D interface file. Just rename them to something else. In D code that uses the struct, you use the new names. C code doesn't need to be changed, as the name doesn't matter when compiled.
Repeated struct definitions for graph data structures and in/out naming conflict in C library
Dear D Gurus, I am trying to access functionality in the glpk C library using extern(C). It has graph structs in its header file that are specified in an odd recurring manner that I cannot reproduce in D: typedef struct glp_graph glp_graph; typedef struct glp_vertex glp_vertex; typedef struct glp_arc glp_arc; struct glp_graph { ... glp_vertex **v; /* glp_vertex *v[1+nv_max]; }; struct glp_vertex { ... glp_arc *in; glp_arc *out; }; struct glp_arc { glp_vertex *tail; glp_vertex *head; glp_arc *t_prev; glp_arc *t_next; glp_arc *h_prev; glp_arc *h_next; }; you may also spot that the in, and out keywords are used as members in the struct, which gives an error in D. These structs are required for functions in the library so need to be included in the D interface file. How do I reproduce these structs in in D? Thanks
noob in c macro preprocessor hell converting gsl library header files
I have been converting C numeric libraries and depositing them here: https://github.com/dataPulverizer. So far I have glpk and nlopt converted on a like for like c function basics. I am now stuck on the gsl library, primarily because of the preprocessor c code which I am very new to. The following few are particularly baffling to me: #define INLINE_FUN extern inline // used in gsl_pow_int.h: INLINE_FUN double gsl_pow_2(const double x) { return x*x; } Could I just ignore the INLINE_FUN and use alias for function pointer declaration? For example ... alias gsl_pow_2 = double gsl_pow_2(const(double) x); #define INLINE_DECL // used in interpolation/gsl_interp.h: INLINE_DECL size_t gsl_interp_bsearch(const double x_array[], double x, size_t index_lo, size_t index_hi); I would guess the same as for INLINE_FUN? #define GSL_VAR extern // used in rng/gsl_rng.h:GSL_VAR const gsl_rng_type *gsl_rng_borosh13; perhaps GSL_VAR can be ignored and I could use: gsl_rng_borosh13 const(gsl_rng_type)*; I have been using these kind of fixes and have not been able to get the rng module to recognise the ported functions, meaning that something has been lost in translation. I am currently getting the following error: $ gsl_rng_print_state rng_example.o: In function `_Dmain': rng_example.d:(.text._Dmain+0x13): undefined reference to `gsl_rng_print_state' collect2: error: ld returned 1 exit status I can't seem to call any of the functions but the types are recognized. Thanks in advance
Re: noob in c macro preprocessor hell converting gsl library header files
On Wednesday, 6 January 2016 at 13:59:44 UTC, John Colvin wrote: #define INLINE_FUN extern inline // used in gsl_pow_int.h: INLINE_FUN double gsl_pow_2(const double x) { return x*x; } Could I just ignore the INLINE_FUN and use alias for function pointer declaration? For example ... alias gsl_pow_2 = double gsl_pow_2(const(double) x); Yes, you should be able to ignore INLINE_FUN double gsl_pow_2(const double x); is the correct declaration. #define GSL_VAR extern // used in rng/gsl_rng.h:GSL_VAR const gsl_rng_type *gsl_rng_borosh13; perhaps GSL_VAR can be ignored and I could use: gsl_rng_borosh13 const(gsl_rng_type)*; It should be extern gsl_rng_type* gsl_rng_borosh13; I see. Thanks. I think you might have some confusion between function declarations: T myFunction(Q myArg); function pointer type declarations: alias MyFunctionPointerType = T function(Q myArg); and function pointer declarations: MyFunctionPointerType myFunctionPointer; Sorry in my haste to describe what I was doing I wrote down a function pointer instead of a function - my original code was a function. Your suggestion of looking at the https://github.com/abrown25/gsld library is a good call. I'll probably end up sending a pull request to that library after using it as a basic outline of how to deal with these preprocessors.
Re: ndslice: convert a sliced object to T[]
On Wednesday, 15 June 2016 at 14:14:23 UTC, Seb wrote: On Wednesday, 15 June 2016 at 13:13:05 UTC, data pulverizer And where can I find more cool tricks like that? Browse the source code and the unittests. Phobos is an amazing resource :) Very true! That's great many thanks!
Re: ndslice: convert a sliced object to T[]
On Wednesday, 15 June 2016 at 02:50:30 UTC, Seb wrote: On Wednesday, 15 June 2016 at 02:43:37 UTC, data pulverizer wrote: How do I unravel a sliced item T[].sliced(...) to an array T[]? For instance: import std.experimental.ndslice; auto slice = new int[12].sliced(3, 4); int[] x = ??; Thanks A slice is just a _view_ on your memory, the easiest way is to save a reference to your array like this: ``` int[] arr = new int[12]; auto slice = arr.sliced(3, 4); slice[1, 1] = 42; arr // [0, 0, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0] ``` For a general case, you should give `byElement` a try: https://dlang.org/phobos/std_experimental_ndslice_selection.html#byElement in that case: import std.array : array; int[] x = slice.byElement.array; thanks, now I can go to bed!
ndslice: convert a sliced object to T[]
How do I unravel a sliced item T[].sliced(...) to an array T[]? For instance: import std.experimental.ndslice; auto slice = new int[12].sliced(3, 4); int[] x = ??; Thanks
Re: ndslice: convert a sliced object to T[]
On Wednesday, 15 June 2016 at 08:53:22 UTC, Andrea Fontana wrote: On Wednesday, 15 June 2016 at 08:25:35 UTC, data pulverizer wrote: I guess foreach would not copy the elements? for example: foreach(el; slice.byElement) x ~= el; But it feels wrong to be doing work pulling elements that already exists by using foreach. I feel as if I am missing something obvious but can't get it. The question is: why you need to put them inside an array? If you can, leave them in the lazy range and work on it. I need this to work with external libraries that only deal with one dimensional arrays.
Re: ndslice: convert a sliced object to T[]
On Wednesday, 15 June 2016 at 07:45:12 UTC, Andrea Fontana wrote: On Wednesday, 15 June 2016 at 07:24:23 UTC, data pulverizer wrote: On Wednesday, 15 June 2016 at 03:17:39 UTC, Seb wrote: On Wednesday, 15 June 2016 at 03:11:23 UTC, data pulverizer wrote: in that case: import std.array : array; int[] x = slice.byElement.array; Are you sure you want to create a _copy_ of your data? In most cases you don't need that ;-) thanks, now I can go to bed! You are welcome. Sleep tight! Thanks, I did. I definitely don't want to create a copy! I thought .byElement would provide a range which I assume is a reference am I forcing it to copy by using .array? Yes. You're forcing it to read all elements and copy them in a new array. I guess foreach would not copy the elements? for example: foreach(el; slice.byElement) x ~= el; But it feels wrong to be doing work pulling elements that already exists by using foreach. I feel as if I am missing something obvious but can't get it.
Re: ndslice: convert a sliced object to T[]
On Wednesday, 15 June 2016 at 03:17:39 UTC, Seb wrote: On Wednesday, 15 June 2016 at 03:11:23 UTC, data pulverizer wrote: in that case: import std.array : array; int[] x = slice.byElement.array; Are you sure you want to create a _copy_ of your data? In most cases you don't need that ;-) thanks, now I can go to bed! You are welcome. Sleep tight! Thanks, I did. I definitely don't want to create a copy! I thought .byElement would provide a range which I assume is a reference am I forcing it to copy by using .array?
Re: ndslice: convert a sliced object to T[]
On Wednesday, 15 June 2016 at 09:32:21 UTC, Andrea Fontana wrote: Then I think the slice.byElement.array is the right solution. The problem with that is that it slows down the code. I compared matrix multiplication between R and D's cblas adaptor and ndslice. n = 4000 Matrices: A, B Sizes: both n by n Engine: both call openblas R Elapsed Time: 2.709 s D's cblas and ndslice: 3.593 s The R code: n = 4000; A = matrix(runif(n*n), nr = n); B = matrix(runif(n*n), nr = n) system.time(C <- A%*%B) The D code: import std.stdio : writeln; import std.experimental.ndslice; import std.random : Random, uniform; import std.conv : to; import std.array : array; import cblas; import std.datetime : StopWatch; T[] runif(T)(ulong len, T min, T max){ T[] arr = new T[len]; Random gen; for(ulong i = 0; i < len; ++i) arr[i] = uniform(min, max, gen); return arr; } // Random matrix auto rmat(T)(ulong nrow, ulong ncol, T min, T max){ return runif(nrow*ncol, min, max).sliced(nrow, ncol); } auto matrix_mult(T)(Slice!(2, T*) a, Slice!(2, T*) b){ int M = to!int(a.shape[0]); int K = to!int(a.shape[1]); int N = to!int(b.shape[1]); int n_el = to!int(a.elementsCount); T[] A = a.byElement.array; T[] B = b.byElement.array; T[] C = new T[M*N]; gemm(Order.ColMajor, Transpose.NoTrans, Transpose.NoTrans, M, N, K, 1., A.ptr, K, B.ptr, N, 0, C.ptr, N); return C.sliced(M, N); } void main() { int n = 4000; auto A = rmat(n, n, 0., 1.); auto B = rmat(n, n, 0., 1. ); StopWatch sw; sw.start(); auto C = matrix_mult(A, B); sw.stop(); writeln("Time taken: \n\t", sw.peek().msecs, " [ms]"); } In my system monitor I can see the copy phase in the D process as as single core process. There should be a way to do go from ndslice to T[] without copying. Using a foreach loop is even slower
Re: ndslice: convert a sliced object to T[]
Oh, I didn't see that runif now returns a tuple.
Re: ndslice: convert a sliced object to T[]
On Wednesday, 15 June 2016 at 12:10:32 UTC, Seb wrote: As said you can avoid the copy (see below). I also profiled it a bit and it was interesting to see that 50% of the runtime are spent on generating the random matrix. On my machine now both scripts take 1.5s when compiled with I didn't benchmark the RNG but I did notice it took a lot of time to generate the matrix but for now I am focused on the BLAS side of things. I am puzzled about how your code works: Firstly: I didn't know that you could substitute an array for its first element in D though I am aware that a pointer to an array's first element is equivalent to passing the array in C. auto matrix_mult(T)(T[] A, T[] B, Slice!(2, T*) a, Slice!(2, T*) b){ ... gemm(Order.ColMajor, Transpose.NoTrans, Transpose.NoTrans, M, N, K, 1., A.ptr, K, B.ptr, N, 0, C.ptr, N); return C.sliced(M, N); } Secondly: I am especially puzzled about using the second element to stand in for the slice itself. How does that work? And where can I find more cool tricks like that? void main() { ... auto C = matrix_mult(ta[0], tb[0], ta[1], tb[1]); sw.stop(); writeln("Time taken: \n\t", sw.peek().msecs, " [ms]"); } Many thanks!
Re: Functions that return type
On Sunday, 17 January 2016 at 02:08:06 UTC, Timon Gehr wrote: On 01/16/2016 11:50 PM, data pulverizer wrote: I guess the constraints are that of a static language. (This is not true.) Could you please explain?
Re: Functions that return type
On Saturday, 16 January 2016 at 21:22:15 UTC, data pulverizer wrote: Is it possible to create a function that returns Type like typeof() does? Something such as: Type returnInt(){ return int; } More to the point what is the Type of a type such as int? Thanks p.s. I am aware I could do typeof(1) to return int, but I am looking for something more elegant and some understanding.
Functions that return type
Is it possible to create a function that returns Type like typeof() does? Something such as: Type returnInt(){ return int; } More to the point what is the Type of a type such as int? Thanks
Re: Functions that return type
On Saturday, 16 January 2016 at 21:59:22 UTC, data pulverizer wrote: On Saturday, 16 January 2016 at 21:22:15 UTC, data pulverizer wrote: Is it possible to create a function that returns Type like typeof() does? Something such as: Type returnInt(){ return int; } More to the point what is the Type of a type such as int? Thanks p.s. I am aware I could do typeof(1) to return int, but I am looking for something more elegant and some understanding. Thanks for all the answers. I guess I have been writing a lot of julia where I take creating arrays and tuples of types for granted. In this case types are of type DataType. I am aware that you can create tuples of types in D, but then it cannot be easily manipulated e.g. (int, float)[0] = string or similar. You have to immediately alias it and there are a limited number of operations you can do with the resulting type. I guess the constraints are that of a static language.
Re: Speed of csvReader
On Thursday, 21 January 2016 at 10:40:39 UTC, data pulverizer wrote: On Thursday, 21 January 2016 at 10:20:12 UTC, Rikki Cattermole wrote: Okay without registering not gonna get that data. So usual things to think about, did you turn on release mode? What about inlining? Lastly how about disabling the GC? import core.memory : GC; GC.disable(); dmd -release -inline code.d That helped a lot, I disable GC and inlined as you suggested and the time is now: Time (s): 8.754 However, with R's data.table package gives us: system.time(x <- fread("Acquisition_2009Q2.txt", sep = "|", colClasses = rep("character", 22))) user system elapsed 0.852 0.021 0.872 I should probably have begun with this timing. Its not my intention to turn this into a speed-only competition, however the ingest of files and speed of calculation is very important to me. I should probably add compiler version info: ~$ dmd --version DMD64 D Compiler v2.069.2 Copyright (c) 1999-2015 by Digital Mars written by Walter Bright Running Ubuntu 14.04 LTS
Re: Speed of csvReader
On Thursday, 21 January 2016 at 11:08:18 UTC, Ali Çehreli wrote: On 01/21/2016 02:40 AM, data pulverizer wrote: dmd -release -inline code.d These two as well please: -O -boundscheck=off the ingest of files and speed of calculation is very important to me. We should understand why D is slow in this case. :) Ali Thank you, adding those two flags brings down the time a little more ... Time (s): 6.832
Speed of csvReader
I have been reading large text files with D's csv file reader and have found it slow compared to R's read.table function which is not known to be particularly fast. Here I am reading Fannie Mae mortgage acquisition data which can be found here http://www.fanniemae.com/portal/funding-the-market/data/loan-performance-data.html after registering: D Code: import std.algorithm; import std.array; import std.file; import std.csv; import std.stdio; import std.typecons; import std.datetime; alias row_type = Tuple!(string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string, string); void main(){ StopWatch sw; sw.start(); auto buffer = std.file.readText("Acquisition_2009Q2.txt"); auto records = csvReader!row_type(buffer, '|').array; sw.stop(); double time = sw.peek().msecs; writeln("Time (s): ", time/1000); } Time (s): 13.478 R Code: system.time(x <- read.table("Acquisition_2009Q2.txt", sep = "|", colClasses = rep("character", 22))) user system elapsed 7.810 0.067 7.874 R takes about half as long to read the file. Both read the data in the "equivalent" type format. Am I doing something incorrect here?
Re: Speed of csvReader
On Thursday, 21 January 2016 at 10:20:12 UTC, Rikki Cattermole wrote: Okay without registering not gonna get that data. So usual things to think about, did you turn on release mode? What about inlining? Lastly how about disabling the GC? import core.memory : GC; GC.disable(); dmd -release -inline code.d That helped a lot, I disable GC and inlined as you suggested and the time is now: Time (s): 8.754 However, with R's data.table package gives us: system.time(x <- fread("Acquisition_2009Q2.txt", sep = "|", colClasses = rep("character", 22))) user system elapsed 0.852 0.021 0.872 I should probably have begun with this timing. Its not my intention to turn this into a speed-only competition, however the ingest of files and speed of calculation is very important to me.
Re: Speed of csvReader
On Thursday, 21 January 2016 at 14:56:13 UTC, Saurabh Das wrote: On Thursday, 21 January 2016 at 14:32:52 UTC, Saurabh Das wrote: On Thursday, 21 January 2016 at 13:42:11 UTC, Edwin van Leeuwen wrote: On Thursday, 21 January 2016 at 09:39:30 UTC, data pulverizer wrote: StopWatch sw; sw.start(); auto buffer = std.file.readText("Acquisition_2009Q2.txt"); auto records = csvReader!row_type(buffer, '|').array; sw.stop(); Is it csvReader or readText that is slow? i.e. could you move sw.start(); one line down (after the readText command) and see how long just the csvReader part takes? Please try this: auto records = File("Acquisition_2009Q2.txt").byLine.joiner("\n").csvReader!row_type('|').array; Can you put up some sample data and share the number of records in the file as well. Actually since you're aiming for speed, this might be better: sw.start(); auto records = File("Acquisition_2009Q2.txt").byChunk(1024*1024).joiner.map!(a => cast(dchar)a).csvReader!row_type('|').array sw.stop(); Please do verify that the end result is the same - I'm not 100% confident of the cast. Thanks, Saurabh @Saurabh I have tried your latest suggestion and the time reduces fractionally to: Time (s): 6.345 the previous suggestion actually increased the time @Edwin van Leeuwen The csvReader is what takes the most time, the readText takes 0.229 s
Re: Speed of csvReader
On Thursday, 21 January 2016 at 15:17:08 UTC, data pulverizer wrote: On Thursday, 21 January 2016 at 14:56:13 UTC, Saurabh Das wrote: On Thursday, 21 January 2016 at 14:32:52 UTC, Saurabh Das Actually since you're aiming for speed, this might be better: sw.start(); auto records = File("Acquisition_2009Q2.txt").byChunk(1024*1024).joiner.map!(a => cast(dchar)a).csvReader!row_type('|').array sw.stop(); Please do verify that the end result is the same - I'm not 100% confident of the cast. Thanks, Saurabh @Saurabh I have tried your latest suggestion and the time reduces fractionally to: Time (s): 6.345 the previous suggestion actually increased the time @Edwin van Leeuwen The csvReader is what takes the most time, the readText takes 0.229 s p.s. @Saurabh the result looks fine from the cast. Thanks
Re: Speed of csvReader
On Thursday, 21 January 2016 at 16:25:55 UTC, bachmeier wrote: On Thursday, 21 January 2016 at 10:48:15 UTC, data pulverizer wrote: Running Ubuntu 14.04 LTS In that case, have you looked at http://lancebachmeier.com/rdlang/ If this is a serious bottleneck you can solve it with two lines evalRQ(`x <- fread("Acquisition_2009Q2.txt", sep = "|", colClasses = rep("character", 22))`); auto x = RMatrix(evalR("x")); and then you've got access to the data in D. Thanks. That's certainly something to try.
Re: Speed of csvReader
On Thursday, 21 January 2016 at 16:01:33 UTC, wobbles wrote: Interesting that reading a file is so slow. Your timings from R, is that including reading the file also? Yes, its just insane isn't it?
Re: Speed of csvReader
On Thursday, 21 January 2016 at 17:17:52 UTC, Saurabh Das wrote: On Thursday, 21 January 2016 at 17:10:39 UTC, data pulverizer wrote: On Thursday, 21 January 2016 at 16:01:33 UTC, wobbles wrote: Interesting that reading a file is so slow. Your timings from R, is that including reading the file also? Yes, its just insane isn't it? It is insane. Earlier in the thread we were tackling the wrong problem clearly. Hence the adage, "measure first" :-/. As suggested by Edwin van Leeuwen, can you give us a timing of: auto records = File("Acquisition_2009Q2.txt", "r").byLine.map!(a => a.split("|").array).array; Thanks, Saurabh Good news and bad new. I was going for something similar to what you have above and both slash the time alot: Time (s): 1.024 But now the output is a little garbled. For some reason the splitter isn't splitting correctly - or we are not applying it properly. Line 0: ["11703051", "RETAIL", "BANK OF AMERICA, N.A.|4.875|207000|3", "0", "03/200", "|05", "2009|75", "75|1|26", "80", "|N", "|", "O ", "ASH", "OU", " REFINANCE|PUD|1|INVE", "TOR", "C", "|801||FRM", "\n\n", "863", "", "FRM"]
Re: Speed of csvReader
On Thursday, 21 January 2016 at 18:31:17 UTC, data pulverizer wrote: Good news and bad new. I was going for something similar to what you have above and both slash the time alot: Time (s): 1.024 But now the output is a little garbled. For some reason the splitter isn't splitting correctly - or we are not applying it properly. Line 0: ["11703051", "RETAIL", "BANK OF AMERICA, N.A.|4.875|207000|3", "0", "03/200", "|05", "2009|75", "75|1|26", "80", "|N", "|", "O ", "ASH", "OU", " REFINANCE|PUD|1|INVE", "TOR", "C", "|801||FRM", "\n\n", "863", "", "FRM"] I should probably include the first few lines of the file: 10511550|RETAIL|FLAGSTAR CAPITAL MARKETS CORPORATION|5|222000|360|04/2009|06/2009|44|44|2|37|823|NO|NO CASH-OUT REFINANCE|PUD|1|PRINCIPAL|AZ|863||FRM 11031040|BROKER|SUNTRUST MORTGAGE INC.|4.99|456000|360|03/2009|05/2009|83|83|1|47|744|NO|NO CASH-OUT REFINANCE|SF|1|PRINCIPAL|MD|211|12|FRM 11445182|CORRESPONDENT|CITIMORTGAGE, INC.|4.875|172000|360|05/2009|07/2009|80|80|2|25|797|NO|CASH-OUT REFINANCE|SF|1|PRINCIPAL|TX|758||FRM 11703051|RETAIL|BANK OF AMERICA, N.A.|4.875|207000|360|03/2009|05/2009|75|75|1|26|806|NO|NO CASH-OUT REFINANCE|PUD|1|INVESTOR|CO|801||FRM 16033316|CORRESPONDENT|JPMORGAN CHASE BANK, NATIONAL ASSOCIATION|5|17|360|05/2009|07/2009|80|80|1|23|771|NO|CASH-OUT REFINANCE|PUD|1|PRINCIPAL|VA|224||FRM It's interesting that the output first array is not the same as the input
Re: Speed of csvReader
On Thursday, 21 January 2016 at 23:58:35 UTC, H. S. Teoh wrote: On Thu, Jan 21, 2016 at 11:29:49PM +, data pulverizer via Digitalmars-d-learn wrote: On Thursday, 21 January 2016 at 21:24:49 UTC, H. S. Teoh wrote: >On Thu, Jan 21, 2016 at 07:11:05PM +, Jesse Phillips via >This piqued my interest today, so I decided to take a shot at >writing a fast CSV parser. First, I downloaded a sample >large CSV file from: [...] Hi H. S. Teoh, I tried to compile your code (fastcsv.d) on my machine but I get ctr1.o errors for example: .../crt1.o(.debug_info): relocation 0 has invalid symbol index 0 are there flags that I should be compiling with or some other thing that I am missing? Did you supply a main() function? If not, it won't run, because fastcsv.d is only a module. If you want to run the benchmark, you'll have to compile both benchmark.d and fastcsv.d together. T Thanks, I got used to getting away with running the "script" file in the same folder as a single file module - it usually works but occasionally (like now) I have to compile both together as you suggested.
Re: Speed of csvReader
On Thursday, 21 January 2016 at 21:24:49 UTC, H. S. Teoh wrote: On Thu, Jan 21, 2016 at 07:11:05PM +, Jesse Phillips via This piqued my interest today, so I decided to take a shot at writing a fast CSV parser. First, I downloaded a sample large CSV file from: [...] Hi H. S. Teoh, I tried to compile your code (fastcsv.d) on my machine but I get ctr1.o errors for example: .../crt1.o(.debug_info): relocation 0 has invalid symbol index 0 are there flags that I should be compiling with or some other thing that I am missing?
Re: Speed of csvReader
On Thursday, 21 January 2016 at 20:46:15 UTC, Gerald Jansen wrote: On Thursday, 21 January 2016 at 09:39:30 UTC, data pulverizer wrote: I have been reading large text files with D's csv file reader and have found it slow compared to R's read.table function This great blog post has an optimized FastReader for CSV files: http://tech.adroll.com/blog/data/2014/11/17/d-is-for-data-science.html Thanks a lot Gerald, the blog and the discussions were very useful and revealing - for me it shows that you can use the D language to write fast code and then if you need it, to wring more performance and you can go as low level as you want all without leaving the D language or its tooling ecosystem.
Re: Speed of csvReader
On Thursday, 21 January 2016 at 23:58:35 UTC, H. S. Teoh wrote: are there flags that I should be compiling with or some other thing that I am missing? Did you supply a main() function? If not, it won't run, because fastcsv.d is only a module. If you want to run the benchmark, you'll have to compile both benchmark.d and fastcsv.d together. T Great benchmarks! This is something else for me to learn from.
Re: Speed of csvReader
On Friday, 22 January 2016 at 02:16:14 UTC, H. S. Teoh wrote: On Thu, Jan 21, 2016 at 04:50:12PM -0800, H. S. Teoh via Digitalmars-d-learn wrote: [...] > > https://github.com/quickfur/fastcsv [...] Fixed some boundary condition crashes and reverted doubled quote handling in unquoted fields (since those are illegal according to RFC 4810). Performance is back in the ~1200 msec range. T Hi H. S. Teoh, I have used you fastcsv on my file: import std.file; import fastcsv; import std.stdio; import std.datetime; void main(){ StopWatch sw; sw.start(); auto input = cast(string) read("Acquisition_2009Q2.txt"); auto mydata = fastcsv.csvToArray!('|')(input); sw.stop(); double time = sw.peek().msecs; writeln("Time (s): ", time/1000); } $ dmd file_read_5.d fastcsv.d $ ./file_read_5 Time (s): 0.679 Fastest so far, very nice.
Re: Speed of csvReader
On Friday, 22 January 2016 at 21:41:46 UTC, data pulverizer wrote: On Friday, 22 January 2016 at 02:16:14 UTC, H. S. Teoh wrote: [...] Hi H. S. Teoh, I have used you fastcsv on my file: import std.file; import fastcsv; import std.stdio; import std.datetime; void main(){ StopWatch sw; sw.start(); auto input = cast(string) read("Acquisition_2009Q2.txt"); auto mydata = fastcsv.csvToArray!('|')(input); sw.stop(); double time = sw.peek().msecs; writeln("Time (s): ", time/1000); } $ dmd file_read_5.d fastcsv.d $ ./file_read_5 Time (s): 0.679 Fastest so far, very nice. I guess the next step is allowing Tuple rows with mixed types.
Re: Speed of csvReader
On Thursday, 21 January 2016 at 18:46:03 UTC, Justin Whear wrote: On Thu, 21 Jan 2016 18:37:08 +, data pulverizer wrote: It's interesting that the output first array is not the same as the input byLine reuses a buffer (for speed) and the subsequent split operation just returns slices into that buffer. So when byLine progresses to the next line the strings (slices) returned previously now point into a buffer with different contents. You should either use byLineCopy or .idup to create copies of the relevant strings. If your use-case allows for streaming and doesn't require having all the data present at once, you could continue to use byLine and just be careful not to refer to previous rows. Thanks. It now works with byLineCopy() Time (s): 1.128
Re: Speed of csvReader
On Thursday, 21 January 2016 at 19:08:38 UTC, data pulverizer wrote: On Thursday, 21 January 2016 at 18:46:03 UTC, Justin Whear wrote: On Thu, 21 Jan 2016 18:37:08 +, data pulverizer wrote: It's interesting that the output first array is not the same as the input byLine reuses a buffer (for speed) and the subsequent split operation just returns slices into that buffer. So when byLine progresses to the next line the strings (slices) returned previously now point into a buffer with different contents. You should either use byLineCopy or .idup to create copies of the relevant strings. If your use-case allows for streaming and doesn't require having all the data present at once, you could continue to use byLine and just be careful not to refer to previous rows. Thanks. It now works with byLineCopy() Time (s): 1.128 Currently the timing is similar to python pandas: # Script (Python 2.7.6) import pandas as pd import time col_types = {'col1': str, 'col2': str, 'col3': str, 'col4': str, 'col5': str, 'col6': str, 'col7': str, 'col8': str, 'col9': str, 'col10': str, 'col11': str, 'col12': str, 'col13': str, 'col14': str, 'col15': str, 'col16': str, 'col17': str, 'col18': str, 'col19': str, 'col20': str, 'col21': str, 'col22': str} begin = time.time() x = pd.read_csv('Acquisition_2009Q2.txt', sep = '|', dtype = col_types) end = time.time() print end - begin $ python file_read.py 1.19544792175
Scala Spark-like RDD for D?
Are there are any plans to create a scala spark-like RDD class for D (https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)? This is a powerful model that has taken the data science world by storm; it would be useful to have something like this in the D world. Most of the algorithms in statistics/data science are iterative in nature which fits well with this kind of data model. I read through the Kind Of Container thread which has some relationship with this issue (https://forum.dlang.org/thread/n07rh8$dmb$1...@digitalmars.com). It looks like Immutability would be the way to go for an RDD data structure. But I am not wedded to any model as long as we can have something that performs the same functionality as the RDD. As an alternative are there plans for parallel/cluster computing frameworks for D? Apologies if I am kicking a hornet's nest. It is not my intention. Thanks
Re: Scala Spark-like RDD for D?
On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer wrote: Are there are any plans to create a scala spark-like RDD class for D (https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)? This is a powerful model that has taken the data science world by storm; it would be useful to have something like this in the D world. Most of the algorithms in statistics/data science are iterative in nature which fits well with this kind of data model. I read through the Kind Of Container thread which has some relationship with this issue (https://forum.dlang.org/thread/n07rh8$dmb$1...@digitalmars.com). It looks like Immutability would be the way to go for an RDD data structure. But I am not wedded to any model as long as we can have something that performs the same functionality as the RDD. As an alternative are there plans for parallel/cluster computing frameworks for D? Apologies if I am kicking a hornet's nest. It is not my intention. Thanks Perhaps the question is too prescriptive. Another way is: Does D have a big data strategy? But I tried to anchor it to some currently functioning framework which is why I suggested RDD.
Re: Obtaining argument names in (variadic) functions
On Wednesday, 16 March 2016 at 20:53:42 UTC, JR wrote: On Wednesday, 16 March 2016 at 20:24:38 UTC, data pulverizer wrote: Hi D gurus, is there a way to obtain parameter names within the function body? I am particularly interested in variadic functions. Something like: void myfun(T...)(T x){ foreach(i, arg; x) writeln(i, " : ", arg); } void main(){ myfun(a = 2, b = "two", c = 2.0); } // should print a : 2 b : two c : 2.0 Thanks in advance Loving the mixins and tuples You can do it precisely like that if the variables/symbols you pass as (template) arguments are properly declared first. http://dpaste.dzfl.pl/0b452efeaaab void printVars(Args...)() if (Args.length > 0) { import std.stdio : writefln; foreach (i, arg; Args) { writefln("%s\t%s:\t%s", typeof(Args[i]).stringof, Args[i].stringof, arg); } } void main() { int abc = 3; string def = "58"; float ghi = 3.14f; double jkl = 3.14; printVars!(abc,def,ghi,jkl)(); } That's brilliant! Thanks JR
Re: Struct array assignment behaviour using example from Programming in D, chapter 78
On Thursday, 24 March 2016 at 18:46:14 UTC, Ali Çehreli wrote: On 03/24/2016 10:24 AM, data pulverizer wrote: > I have been playing with the matrix example given at the end of chapter > 78 of Ali Çehreli's For reference, it's "Multi-dimensional operator overloading example" here: http://ddili.org/ders/d.en/templates_more.html >having problems with overloading the opAssign operator. > > rows is a private int[][] in a Matrix struct. > > I have added the following ... > > Matrix opAssign(int[][] arr) > { > this.rows = arr; > // rows = arr // does not work either ... > return this; > } > > However this does not work (no error occurs, it just doesn't do > anything) How are you testing it? The following worked for me: 1) Added that opAssign() to the struct. (Verified that it gets called.) 2) Tested with the following code: auto m2 = Matrix(); auto rows = [ [ 1, 2 ], [ 3, 4 ], [ 5, 6 ] ]; m2 = rows; writeln(m2); (I've tested with a dynamically generated 'rows' as well.) Ali Thank you. Let me try to ask the question again. The problem I am experiencing is to do with opIndexAssign(). I added the following public operators: Matrix opAssign(int[][] arr) { writeln(__FUNCTION__); this.rows = arr; return this; } Matrix opAssign(Matrix mat) { writeln(__FUNCTION__); this.rows = mat.rows; return this; } Matrix opIndexAssign(A...)(int[][] arr, A arguments) if(A.length <= 2){ writeln(__FUNCTION__); Matrix subMatrix = opIndex(arguments); assert(((arr.length == subMatrix.nrow()) & (arr[0].length == subMatrix.ncol())), "Array dimension do not match matrix replacement.\n"); /*foreach(i, row; subMatrix.rows){ row[] = arr[i]; }*/ subMatrix = arr; // Does not work return subMatrix; } Matrix opIndexAssign(A...)(Matrix mat, A arguments) if(A.length <= 2){ writeln(__FUNCTION__); Matrix subMatrix = opIndex(arguments); assert(((mat.nrow() == subMatrix.nrow()) & (mat.ncol() == subMatrix.ncol())), "Array dimension do not match matrix replacement.\n"); /*foreach(i, row; subMatrix.rows){ row[] = mat.rows[i]; }*/ subMatrix = mat; // Does not work return subMatrix; } void main(){ // Here we test opAssign [][]int auto a = Matrix(); a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]; // this works writeln(a); // opIndexAssign int[][] a[0..2, 0..2] = [[88, 88], [88, 88]]; // this does not work writeln(a); auto b = Matrix(); // opAssign Matrix b = a; // this works writeln(b); b = [[88, 88, 88, 88], [88, 88, 88, 88], [88, 88, 88, 88], [88, 88, 88, 88]]; // opIndexAssign Matrix b[0..3, 0..3] = a; // this does not work writeln(b); } If you uncomment the foreach lines, the opIndexAssign() work.
Re: Struct array assignment behaviour using example from Programming in D, chapter 78
On Friday, 25 March 2016 at 08:53:20 UTC, Ali Çehreli wrote: On 03/25/2016 12:00 AM, data pulverizer wrote: > On Thursday, 24 March 2016 at 18:46:14 UTC, Ali Çehreli wrote: >> On 03/24/2016 10:24 AM, data pulverizer wrote: >> > I have been playing with the matrix example given at the end >> of chapter >> > 78 of Ali Çehreli's >> >> For reference, it's "Multi-dimensional operator overloading example" >> here: >> >> http://ddili.org/ders/d.en/templates_more.html >> >> >having problems with overloading the opAssign operator. > Thank you. Let me try to ask the question again. The problem I am > experiencing is to do with opIndexAssign(). > > I added the following public operators: > > Matrix opAssign(int[][] arr) > { > writeln(__FUNCTION__); > this.rows = arr; > return this; > } The problem is due to the aliasing of 'rows' members of Matrix objects. subMatrix is supposed to be a reference into some elements of an existing Matrix. As soon as we do the above assignment, this Matrix (which may be a subMatrix in a specific context) breaks lose from its actual Matrix elements. We need to implement the function above "in place": Matrix opAssign(int[][] arr) { writeln(__FUNCTION__); if (rows.length < arr.length) { rows.length = arr.length; } foreach (i, row; arr) { const newLength = row.length; if (rows[i].length < newLength) { rows[i].length = newLength; } rows[i][0..newLength] = row[]; } return this; } (There must be an existing function that does that.) > Matrix opAssign(Matrix mat) > { > writeln(__FUNCTION__); > this.rows = mat.rows; Same thing applies above: We need to assign to this.rows in place (which is easier by taking advantage of the previous function): this = mat.rows; > return this; > } No changes needed for the other two functions but I would 'return this' instead of 'return subMatrix' for them as well. Ali That's great! Thank you very much for the fix and extra suggestions, and for your patience putting up with my poorly formulated question! Looks like I need to go and read all the structs and operators chapters thoroughly!
Re: Struct array assignment behaviour using example from Programming in D, chapter 78
On Thursday, 24 March 2016 at 17:24:38 UTC, data pulverizer wrote: I have been playing with the matrix example given at the end of chapter 78 of Ali Çehreli's fabulous book and am having problems with overloading the opAssign operator. rows is a private int[][] in a Matrix struct. I have added the following ... Matrix opAssign(int[][] arr) { this.rows = arr; // rows = arr // does not work either ... return this; } However this does not work (no error occurs, it just doesn't do anything) but this does work ... Matrix opAssign(int[][] arr) { foreach(i, row; rows){ row[] = arr[i]; } return this; } The second is not efficient since it has to loop to assign the array. Is there a more efficient way of overwriting the array? Sorry. Please disregard. I'll drive home and ask this question properly!
Struct array assignment behaviour using example from Programming in D, chapter 78
I have been playing with the matrix example given at the end of chapter 78 of Ali Çehreli's fabulous book and am having problems with overloading the opAssign operator. rows is a private int[][] in a Matrix struct. I have added the following ... Matrix opAssign(int[][] arr) { this.rows = arr; // rows = arr // does not work either ... return this; } However this does not work (no error occurs, it just doesn't do anything) but this does work ... Matrix opAssign(int[][] arr) { foreach(i, row; rows){ row[] = arr[i]; } return this; } The second is not efficient since it has to loop to assign the array. Is there a more efficient way of overwriting the array?
Template recursion error on table struct
I am attempting to create a table struct with generic column types using templates. The subTable() member function subsets the table, however I am getting a template recursion error. I know where the problem is from, I don't know how to resolve it. I am modelling it after the matrix example in Ali Çehreli book: http://ddili.org/ders/d.en/templates_more.html import std.stdio; template ColumnTable(T...){ struct ColumnTable{ private: typeof(T) data; string[] names; struct Range{ size_t begin; size_t end; } // This function is the source of the issue auto subTable(Range rowRange, Range columnRange)(){ auto new_data = data[columnRange.begin .. columnRange.end]; auto output = ColumnTable!(new_data)(new_data); // This is the problem string[] new_names = names[columnRange.begin .. columnRange.end]; output.setNames(new_names); return output; } public: this(T...)(T args){ data = args; foreach(i, arg; T){ names ~= args[i].stringof; } } void setNames(string[] names){ this.names = names; } void test(){ writeln(subTable!(Range(0, 2), Range(0, 2))()); } } } void main(){ string[] names = ["tariq", "sharma", "peter", "rakel"]; double[] salary = [44.5, 32.2, 40.1, 28.1]; int[] age = [24, 20, 22, 25, 19]; writeln(ColumnTable!(names, salary, age)(names, salary, age)); }
Re: Template recursion error on table struct
p.s. I realise that the ColumnTable call is a little ponderous but I tidy it up in a convenience wrapper function: auto CreateDataTable(Args...)(){ string[] names; foreach(i, arg; Args){ names ~= Args[i].stringof; } auto df = ColumnTable!(Args)(Args); df.setNames(names); return df; } // auto myTable = CreateDataTable!(names, salary, age)();
Re: Random Access I/O
On Saturday, 26 March 2016 at 00:10:23 UTC, Chris Williams wrote: I need to be able to perform random access I/O against a file, creating a new file if it doesn't exist, or opening as-is (no truncation) if it already exists. None of the access modes for std.stdio.File seem to allow that. Any usage of the "w" mode causes my code to consider the file empty if it pre-exists (though, it doesn't always actually truncate the disk on file?) If I was coding in C, I would use open() as it gives more options for access: http://pubs.opengroup.org/onlinepubs/009695399/functions/open.html However, I don't see this exposed in phobos anywhere? The Programming in D book chapter on Files http://ddili.org/ders/d.en/files.html will help. I think the "std.stdio.File struct" section on the same page has what you need. Also, take a look at http://dlang.org/phobos/std_stdio.html#.File.open.
Re: Initializing global delegate variable - bug or on purpose?
On Friday, 25 March 2016 at 20:54:28 UTC, Atila Neves wrote: int delegate(int) dg = (i) => i * 2; Error: non-constant nested delegate literal expression __lambda3 int delegate(int) dg; static this() { dg = i => i * 2; // ok } Am I doing anything wrong? Atila Hmm, looks like your first delegate is a function type and the second is a function instance. So the first version written like this ... import std.stdio; alias dg = int delegate(int); dg make_dg(){ return i => i*2; } void main(){ auto my_dg = make_dg(); writeln(my_dg(3)); } will work.
Re: Initializing global delegate variable - bug or on purpose?
On Friday, 25 March 2016 at 23:40:37 UTC, data pulverizer wrote: On Friday, 25 March 2016 at 20:54:28 UTC, Atila Neves wrote: int delegate(int) dg = (i) => i * 2; Error: non-constant nested delegate literal expression __lambda3 int delegate(int) dg; static this() { dg = i => i * 2; // ok } Am I doing anything wrong? Atila Hmm, looks like your first delegate is a function type and the second is a function instance. So the first version written like this ... import std.stdio; alias dg = int delegate(int); dg make_dg(){ return i => i*2; } void main(){ auto my_dg = make_dg(); writeln(my_dg(3)); } will work. In fact for the second case I'd probably need to see a working struct/class prototype to make a firm comment on it.
Re: Template recursion error on table struct
On Saturday, 26 March 2016 at 06:28:42 UTC, Ali Çehreli wrote: WARNING: Do not try to compile this code. Your computer may be unresponsive for a while. :) On 03/25/2016 02:54 PM, data pulverizer wrote: > I am attempting to create a table struct with generic column types using > templates. However, the template arguments are not types; rather, aliases. > template ColumnTable(T...){ > struct ColumnTable{ > private: > typeof(T) data; > string[] names; > struct Range{ > size_t begin; > size_t end; > } > // This function is the source of the issue > auto subTable(Range rowRange, Range columnRange)(){ > auto new_data = data[columnRange.begin .. columnRange.end]; > auto output = ColumnTable!(new_data)(new_data); // This is the > problem new_data is a local variable. So, this instantiation of ColumnTable is with that symbol. > void main(){ > string[] names = ["tariq", "sharma", "peter", "rakel"]; > double[] salary = [44.5, 32.2, 40.1, 28.1]; > int[] age = [24, 20, 22, 25, 19]; > writeln(ColumnTable!(names, salary, age)(names, salary, age)); > } Likewise, that instantiation of ColumnTable is with the symbols 'names', 'salary', and 'age'. Is that what you want? Or do you want to instantiate with their types? Can you explain some more what you are trying to do. Ali I am trying to build a table similar to R's dataframe but with unbounded data types. I am hoping that D's flexible template programming methods will allow table-like class or struct to be constructed with "naked" types i.e. unwrapped with variant. The ColumnTable object consists of vector columns that are bound together in a tuple - the data member and accessed in a similar way to your matrix example. The table will have column names and should be accessible by: table[0..4, "colname"] table[0..4, ["colname_1", "colname_2", "colname_3"]] table[0..$, 0..4] and should be able to be subsetted and inserted into by another table table_1[0..4, 3..4] = table_2 table_1[0..4, ["colname_1", "colname_2"]] = table_2 and columns could be added or overwritten table[0..$, 0] = someVector table.columnBind(someVector) column bind two table together ... auto new_table = ColumnTable(table_1, table_2) auto new_table = ColumnTable(table_1, someVector) row bind two table having the same column characteristics auto new_table = rbind(table_1, table_2) Then perhaps do something similar for row-based tables.
Re: Template recursion error on table struct
On Saturday, 26 March 2016 at 06:28:42 UTC, Ali Çehreli wrote: Likewise, that instantiation of ColumnTable is with the symbols 'names', 'salary', and 'age'. Is that what you want? Or do you want to instantiate with their types? Can you explain some more what you are trying to do. Ali I guess am trying to instantiate the class using symbols that should pass a tuple right to the data member. Does that mean that I need to use a mixin template instead of a template?
Re: Template recursion error on table struct
On Saturday, 26 March 2016 at 09:47:10 UTC, Ali Çehreli wrote: Please ignore my earlier response. :) On 03/25/2016 02:54 PM, data pulverizer wrote: > template ColumnTable(T...){ [...] > auto output = ColumnTable!(new_data)(new_data); // This is the > problem You want to slice the template arguments there. The following removes the infinite recursion: auto output = ColumnTable!(T[columnRange.begin .. columnRange.end])(new_data); Ali Thanks a lot! The subTable() method includes a loop to subset the rows which I forgot to include originally ... auto subTable(Range rowRange, Range columnRange)(){ auto new_data = data[columnRange.begin .. columnRange.end]; foreach(i, col; new_data){ new_data[i] = col[rowRange.begin .. rowRange.end]; } auto output = ColumnTable!(T[columnRange.begin .. columnRange.end])(new_data); string[] new_names = names[columnRange.begin .. columnRange.end]; output.setNames(new_names); return output; }
Obtaining argument names in (variadic) functions
Hi D gurus, is there a way to obtain parameter names within the function body? I am particularly interested in variadic functions. Something like: void myfun(T...)(T x){ foreach(i, arg; x) writeln(i, " : ", arg); } void main(){ myfun(a = 2, b = "two", c = 2.0); } // should print a : 2 b : two c : 2.0 Thanks in advance Loving the mixins and tuples
Re: Obtaining argument names in (variadic) functions
On Wednesday, 16 March 2016 at 21:05:43 UTC, JR wrote: On Wednesday, 16 March 2016 at 20:43:09 UTC, jkpl wrote: I try to anticipate the reason why you want this. [...] I use something *kinda* sort of similar in my toy project to print all fields of a struct, for debugging purposes when stuff goes wrong. Getting the names of the member variables is crucial then. http://dpaste.dzfl.pl/748c4dd97de6 That's a nice learning piece. I think "with" is cool, reminds me of a nice R feature.
Pointers vs functional or array semantics
I have noticed that some numerical packages written in D use pointer semantics heavily (not referring to packages that link to C libraries). I am in the process of writing code for a numerical computing library and would like to know whether there times when addressing an array using pointers conveys performance benefits over using D's array or functional semantics?
Re: Pointers vs functional or array semantics
On Saturday, 25 February 2017 at 11:15:53 UTC, ketmar wrote: data pulverizer wrote: I have noticed that some numerical packages written in D use pointer semantics heavily (not referring to packages that link to C libraries). I am in the process of writing code for a numerical computing library and would like to know whether there times when addressing an array using pointers conveys performance benefits over using D's array or functional semantics? using `arr.ptr[n]` instead of `arr[n]` bypasses bounds checking. this may be diserable in tight loops (while disabling bounds checking globally is not). but note that `foreach (int n; arr)` doesn't do bounds checking in loop too (afair), so you prolly better use `foreach` instead of pointers. this way your code will be fast, but still safe. Thanks ketmar and thanks in advance to anyone else that comments.
Convert call to a string
I'd like to convert a call to a string for debug printing purposes for example: ``` import std.stdio : writeln; void someFunction(int x, string y){} string myCall = debugPrint(someFunction(1, "hello")); writeln(myCall); ``` writes: someFunction(1, "hello") Does this functionality exists? If not how can I construct it? Please note that the call `someFunction(1, "hello")` should also be executed. Thank you
Re: Convert call to a string
On Wednesday, 15 February 2017 at 22:07:22 UTC, data pulverizer wrote: That's great, thanks both of you!
Re: Template-style polymorphism in table structure
On Monday, 5 September 2016 at 06:45:07 UTC, data pulverizer wrote: On Sunday, 4 September 2016 at 14:49:30 UTC, Lodovico Giaretta wrote: Your getCol(i) could become getCol!T(i) and return an instance of GenericVector!T directly, after checking that the required column has in fact that type: GenericVector!T getCol!T(size_t i) { if(typeid(cols[i]) == typeid(GenericVector!T)) return cast(GenericVector!T)cols[i]; else // assert(0) or throw exception } I just realized that typeid only gives the class and not the actual type, so the object will still need to be cast as you mentioned above, however your above function will not infer T, so the user will have to provide it. I wonder if there is a way to dispatch the right type by a dynamic cast or I fear that ZombineDev may be correct and the types will have to be limited, which I definitely want to avoid! Just found this on dynamic dispatching (https://wiki.dlang.org/Dispatching_an_object_based_on_its_dynamic_type) but even if you took this approach, you'd still have to register all the types you would be using at the start of the script for all your methods. It's either that or explicitly limited type list as ZombineDev suggests.
Re: Templates problem
On Thursday, 8 September 2016 at 10:18:36 UTC, Russel Winder wrote: I am certainly hoping that Chapel will be the language to displace NumPy for serious computation in the Python-sphere. Given it's foundation in the PGAS model, it has all the parallelism needs, both cluster and local, built in. Given Chapel there is no need to look at C++, D, Rust, Cython, etc. I can see where you are coming from, I have taken a look at Chapel and high performance computing is their top priority. I think they hope that it will be the next Fortran, but I think it is very much a domain specific language. They have clearly given plenty of thought to distributed computing, parallelization and concurrency that could yield some very nice performance advantages. However Python's advantage is that it is a dynamic language and can act as a front end to algorithms written in C/C++ for instance as Google has done with TensorFlow. In the future it could even act as a front end to Chapel since they now have a C API. However, I feel as if computer programming languages are still in this static-dynamic partnership, e.g. Python with C/C++, R and Fortran/C/C++. It means language overhead always maintaining code in more than one language and always having to amend your interface every time you change something in one or the other. In essence, nothing fundamentally different is happening with current new languages. I hate to sound like a broken record, but what Sparrow proposes is a unification in such a way that all kinds of overheads go away. Making something like that work with the principles of Sparrow would be a revolution in computing.
Re: Template-style polymorphism in table structure
On Sunday, 4 September 2016 at 14:02:03 UTC, Lodovico Giaretta wrote: Your code is not very D style ... Well I guess I could have contracted the multiple constructors in GenericVector(T) and and DataFrame?
Re: Template-style polymorphism in table structure
On Sunday, 4 September 2016 at 14:49:30 UTC, Lodovico Giaretta wrote: On Sunday, 4 September 2016 at 14:24:12 UTC, data pulverizer wrote: On Sunday, 4 September 2016 at 14:20:24 UTC, data pulverizer wrote: @Lodovico Giaretta BTW what do you mean that my code is not very D style? Please expand on this ... The constructors can be less. In fact, a typesafe variadic ctor also works for the single element case and for the array case. But you already recognized that. Instead of reinventing the wheel for your GenericVector!T, you could use an `alias this` to directly inherit all operation on the underlying array, without having to reimplement them (like your append method). Your getCol(i) could become getCol!T(i) and return an instance of GenericVector!T directly, after checking that the required column has in fact that type: GenericVector!T getCol!T(size_t i) { if(typeid(cols[i]) == typeid(GenericVector!T)) return cast(GenericVector!T)cols[i]; else // assert(0) or throw exception } Another solution: if you don't need to dynamically change the type of the columns you can have the addColumn function create a new type. I show you with Tuples because it's easier: Tuple!(T,U) append(U, T...)(Tuple!T tup, U col) { return Tuple!(T,U)(tup.expand, col); } Tuple!int t1; Tuple!(int, float) t2 = t1.append(2.0); Tuple!(int, float, char) t3 = t2.append('c'); Thank you for the very useful suggestions, I shall take these forward. On the suggestion of creating Tuple-like tables, I already tried that but found as you said that once the table is created, adding/removing columns is essentially creating a different data type, which needs a new variable name each time. I am building a table type I hope will be used for data manipulation for data science and statistics applications, so I require a data structure that can allow adding and removing columns of various types as well as a data structure that can cope with any type that hasn't been planned for, which is why I selected this polymorphic template approach. It is more flexible than other data structures I have seen in dynamic programming languages R's data frame and Python pandas. Even Scala's Spark dataframes rely on wrapping everything in Any and the user still has to write a special data structure for each new type. The only thing that is similar to this approach is Julia's DataFrame but Julia - though a very good programming language has limitations. I feel as if I am constantly scratching the surface of what D can do, but I have recently managed to get more time on my hands and it looks as if that will continue into the future which will mean more focusing on D, improving my generic programming skills and hopefully creating some useful artifacts. Perhaps I need to read Andrei's Modern C++ Design book for a better way to think about generics.
Re: Template-style polymorphism in table structure
On Sunday, 4 September 2016 at 09:55:53 UTC, data pulverizer wrote: My main question is how to return GenericVector!(T) from the getCol() method in the Table class instead of BaseVector. I think I just solved my own query, change the BaseVector interface to a class and override it in the GenericVector(T) class: class BaseVector{ BaseVector get(size_t){ return new BaseVector; }; } class GenericVector(T) : BaseVector{ T[] data; alias data this; override GenericVector get(size_t i){ return new GenericVector!(T)(data[i]); } this(T[] arr){ this.data = arr; } this(T elem){ this.data ~= elem; } void append(T[] arr){ this.data ~= arr; } override string toString() const { return format("%s", data); } } class Table{ // ... as before } void main(){ auto index = new GenericVector!(int)([1, 2, 3, 4, 5]); auto numbers = new GenericVector!(double)([1.1, 2.2, 3.3, 4.4, 5.5]); auto names = new GenericVector!(string)(["one", "two", "three", "four", "five"]); Table df = new Table(index, numbers, names); // now prints table.GenericVector!int.GenericVector writeln(typeid(df.getCol(0))); }
Re: Template-style polymorphism in table structure
On Sunday, 4 September 2016 at 14:20:24 UTC, data pulverizer wrote: On Sunday, 4 September 2016 at 14:07:54 UTC, data pulverizer wrote: @Lodovico Giaretta Thanks I just saw your update! @Lodovico Giaretta BTW what do you mean that my code is not very D style? Please expand on this ...
Re: Template-style polymorphism in table structure
On Sunday, 4 September 2016 at 14:07:54 UTC, data pulverizer wrote: @Lodovico Giaretta Thanks I just saw your update!
Template-style polymorphism in table structure
I am trying to build a data table object with unrestricted column types. The approach I am taking is to build a generic interface BaseVector class and then a subtype GenericVector(T) which inherits from the BaseVector. I then to build a Table class which contains columns that is a BaseVector array to represent the columns in the table. My main question is how to return GenericVector!(T) from the getCol() method in the Table class instead of BaseVector. Perhaps my Table implementation somehow needs to be linked to GenericVector(T) or maybe I have written BaseTable instead and I need to do something like a GenericTable(T...). However, my previous approach created a tuple type data object but once created, the type structure (column type configuration) could not be changed so no addition/removal of columns. import std.stdio : writeln, write, writefln; import std.format : format; interface BaseVector{ BaseVector get(size_t); } class GenericVector(T) : BaseVector{ T[] data; alias data this; GenericVector get(size_t i){ return new GenericVector!(T)(data[i]); } this(T[] arr){ this.data = arr; } this(T elem){ this.data ~= elem; } void append(T[] arr){ this.data ~= arr; } override string toString() const { return format("%s", data); } } class Table{ private: BaseVector[] data; public: // How to return GenericVector!(T) here instead of BaseVector BaseVector getCol(size_t i){ return data[i]; } this(BaseVector[] x ...){ foreach(col; x) this.data ~= col; } this(BaseVector[] x){ this.data ~= x; } this(Table x, BaseVector[] y ...){ this.data = x.data; foreach(col; y){ this.data ~= col; } } void append(BaseVector[] x ...){ foreach(col; x) this.data ~= x; } } void main(){ auto index = new GenericVector!(int)([1, 2, 3, 4, 5]); auto numbers = new GenericVector!(double)([1.1, 2.2, 3.3, 4.4, 5.5]); auto names = new GenericVector!(string)(["one", "two", "three", "four", "five"]); Table df = new Table(index, numbers, names); // I'd like this to be GenericVector!(T) writeln(typeid(df.getCol(0))); }
Re: Template-style polymorphism in table structure
On Sunday, 4 September 2016 at 14:49:30 UTC, Lodovico Giaretta wrote: Your getCol(i) could become getCol!T(i) and return an instance of GenericVector!T directly, after checking that the required column has in fact that type: GenericVector!T getCol!T(size_t i) { if(typeid(cols[i]) == typeid(GenericVector!T)) return cast(GenericVector!T)cols[i]; else // assert(0) or throw exception } I just realized that typeid only gives the class and not the actual type, so the object will still need to be cast as you mentioned above, however your above function will not infer T, so the user will have to provide it. I wonder if there is a way to dispatch the right type by a dynamic cast or I fear that ZombineDev may be correct and the types will have to be limited, which I definitely want to avoid!
Re: Templates problem
On Wednesday, 7 September 2016 at 20:57:15 UTC, bachmeier wrote: What are you doing with Rcpp that you can't do with D? That's a very good point, there's nothing that R + C++ can do that is out of D's reach. But, I wander if we can go further
Re: Templates problem
On Wednesday, 7 September 2016 at 20:37:50 UTC, jmh530 wrote: On Wednesday, 7 September 2016 at 19:19:23 UTC, data pulverizer I don't see any reason why D can't implement pandas DataFrames without needing to change the language at all http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html It's just a lot of work. The simplest I can think of is a struct containing a tuple that contains slices of equal length and an array of strings containing column names. You could have a specialization with a two-dimensional array (or ndslice). p.s. it goes beyond just tables, ... having dynamic capability in a static compiled language really does take computing to a different place indeed.
Re: Templates problem
On Wednesday, 7 September 2016 at 21:25:30 UTC, jmh530 wrote: Consider a potential use case. You have an existing data frame and you want to add a column of data to it that has a different type than the existing frame. I imagine the function call would look something like: auto newFrame = oldFrame.addCol(newData); Yes, but from a usability point of view this would be very poor - forcing the user to create a new variable each time they modified a table. I am aware that databases do this but it is hidden away. ... I only wonder if you would lose performance if wanted something fully dynamic. A static approach is a good starting place. Yes you would, which is why I see the hyper-meta route as being the potential solution to this issue.
Re: Templates problem
On Wednesday, 7 September 2016 at 20:29:51 UTC, deXtoRious wrote: On Wednesday, 7 September 2016 at 19:19:23 UTC, data pulverizer wrote: The "One language to rule them all" motif of Julia has hit the rocks; one reason is because they now realize that their language is being held back because the compiler cannot infer certain types for example: http://www.johnmyleswhite.com/notebook/2015/11/28/why-julias-dataframes-are-still-slow/ As an avid user of Julia, I'm going to have to disagree very strongly with this statement. The language is progressing very nicely and while it doesn't aim to be the best choice for every programming task imaginable... Ahem (http://www.wired.com/2014/02/julia/), I'm not saying that the Julia founders approved that title, we all know how the press can inflate things, but there was a certain rhetoric that Julia was creating something super-special that would change everything.
Re: Templates problem
On Wednesday, 7 September 2016 at 21:07:20 UTC, data pulverizer wrote: Don't get me wrong, I still think Julia is a very cool language. My opinion is that we should have more languages. Let me correct myself ... I think that hyper-meta-programming as in Sparrow could certainly revolutionize computing. I think that a big deal.
Re: Templates problem
On Wednesday, 7 September 2016 at 20:37:50 UTC, jmh530 wrote: On Wednesday, 7 September 2016 at 19:19:23 UTC, data pulverizer wrote: For some time I have been considering a problem to do with creating tables with unbounded types, one of the failed attempts is here: https://forum.dlang.org/thread/gdjaoxypicsxlfvzw...@forum.dlang.org?page=1 I then exchanged emails with Lucian, Sparrows creator and he very quickly and simply outlined the solution to the problem. Thereafter I read his PhD thesis - one of the most informative texts in computer science I have read and very well written. At the moment, there are lots of languages attempting to solve the dynamic-static loop, being able to have features inherent in dynamic programming languages, while keeping the safety and performance that comes with a static compiled programming language, and then doing so in a language that doesn't cause your brain to bleed. The "One language to rule them all" motif of Julia has hit the rocks; one reason is because they now realize that their language is being held back because the compiler cannot infer certain types for example: http://www.johnmyleswhite.com/notebook/2015/11/28/why-julias-dataframes-are-still-slow/ I don't see any reason why D can't implement pandas DataFrames without needing to change the language at all http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html It's just a lot of work. The simplest I can think of is a struct containing a tuple that contains slices of equal length and an array of strings containing column names. You could have a specialization with a two-dimensional array (or ndslice). You're quite right that D doesn't need to change at all to implement something like pandas or dataframes in R, but I am thinking of how to got further. Very often in data science applications types will turn up that are required but are not currently configured for your table. The choice you have is to have to modify the code or as scala does give programmers the ability to write their own interface to the type so that the it can be stored in their DataFrame. The best solution is that the data table is able to cope with arbitrary number of types which can be done in Sparrow.
Re: Templates problem
On Wednesday, 7 September 2016 at 21:01:59 UTC, deXtoRious wrote: That's just typical press nonsense, and even they quote Bezanson saying how Julia isn't at all suited to a whole host of applications. Julia certainly has (justifiable, imho, though only time will tell) ... Don't get me wrong, I still think Julia is a very cool language. My opinion is that we should have more languages.
Re: Templates problem
On Wednesday, 7 September 2016 at 20:57:15 UTC, bachmeier wrote: I too come from the R world and I have been playing the game of flitting between R and C++; using C++ (through RCpp) to speed up slow things in R for some time and I have been looking for a better solution. What are you doing with Rcpp that you can't do with D? Sorry I'll correct myself again! Because R is a dynamic programming language, you could do things that you could not do in D, however they would be very inefficient. hyper-meta-programming takes this barrier away.
Re: Templates problem
On Friday, 9 September 2016 at 13:32:16 UTC, Russel Winder wrote: Should we be giving up on D and switching to Sparrow? Most certainly not! I don't think it has to be either D or Sparrow. There is a quote liked from one of Walter's presentation. Someone asked the question: "What happens when the next great modelling idea comes along?" Walter: "D will absorb it" Link to the youtube vid (https://youtu.be/WKRRgcEk0wg?t=2205) Polyglots programmers tend to be better programmers. This is not opinion, there is experimental evidence for this in the psychology of programming literature. I certainly think that training programmers on lots of different programming languages and paradigms will produce better programmers on average. I guess the analogy is in multi-lingual language training for children. However this does not affect my original point on the overhead caused by constantly switching languages in development and the host of benefits that would come from making Sparrow's programming model work.
Re: Templates problem
On Wednesday, 7 September 2016 at 15:04:38 UTC, jmh530 wrote: On Wednesday, 7 September 2016 at 11:37:44 UTC, Russel Winder wrote: I really don't see what's not working in this. Trying to get new D users from Python users is the main problem. I came to D from Python/R/Matlab. The biggest issue for me wasn't error messages so much as the lack of good libraries for a lot of things. Nevertheless, the longer I've been using D, the more I agree that there could be some improvements in D's error messages. Andre had posted about the Sparrow language a while back https://forum.dlang.org/thread/ne3265$uef$1...@digitalmars.com?page=1 He liked their use of concepts. I think at a minimum it would enable better error messages. I too come from the R world and I have been playing the game of flitting between R and C++; using C++ (through RCpp) to speed up slow things in R for some time and I have been looking for a better solution. For some time I have been considering a problem to do with creating tables with unbounded types, one of the failed attempts is here: https://forum.dlang.org/thread/gdjaoxypicsxlfvzw...@forum.dlang.org?page=1 I then exchanged emails with Lucian, Sparrows creator and he very quickly and simply outlined the solution to the problem. Thereafter I read his PhD thesis - one of the most informative texts in computer science I have read and very well written. At the moment, there are lots of languages attempting to solve the dynamic-static loop, being able to have features inherent in dynamic programming languages, while keeping the safety and performance that comes with a static compiled programming language, and then doing so in a language that doesn't cause your brain to bleed. The "One language to rule them all" motif of Julia has hit the rocks; one reason is because they now realize that their language is being held back because the compiler cannot infer certain types for example: http://www.johnmyleswhite.com/notebook/2015/11/28/why-julias-dataframes-are-still-slow/ A language that can create arbitrary complex programs is the kind of thing that changes the world. I don't think D should be left out and should take Sparrow very seriously indeed.
Cryptic C function pointer for conversion
I have come across a function pointer in C that I am attempting to convert, and am not sure what the current interpretation is: ``` \\ The C Code: void (*(*xDlSym)(sqlite3_vfs*,void*, const char *zSymbol))(void); ``` The best I can tell is that this is a function pointer that returns a function that returns void and the correct translation to D is: ``` alias void function(sqlite3_vfs*,void*, const char *zSymbol) ptr; ptr* function() xDlSym; ``` I've never seen a construction like this before so my interpretation might be wrong!
Re: Cryptic C function pointer for conversion
On Saturday, 17 December 2016 at 14:06:07 UTC, ketmar wrote: On Saturday, 17 December 2016 at 13:39:27 UTC, data pulverizer wrote: that is what it means, in D: //void (*(*xDlSym)(sqlite3_vfs*,void*, const char *zSymbol))(void); struct sqlite3_vfs {} extern(C) { alias RetRes = void function (); alias DeclType = RetRes function (sqlite3_vfs *a,void *b, const char *zSymbol); DeclType xDlSym; void zoo (void) {} auto goo (sqlite3_vfs *a,void *b, const char *zSymbol) { return } } void main () { xDlSym = } at least that is what i managed to decode, fed to C(++) compiler and translate to D. p.s. I confirmed your interpretation on stackoverflow: http://stackoverflow.com/questions/8722817/syntax-for-a-pointer-to-a-function-returning-a-function-pointer-in-c
Re: Cryptic C function pointer for conversion
On Saturday, 17 December 2016 at 14:06:07 UTC, ketmar wrote: that is what it means, in D: //void (*(*xDlSym)(sqlite3_vfs*,void*, const char *zSymbol))(void); struct sqlite3_vfs {} extern(C) { alias RetRes = void function (); alias DeclType = RetRes function (sqlite3_vfs *a,void *b, const char *zSymbol); ... } Thanks ketmar, I guess that this means I got it the other way round the function pointer that is returned is the function that takes in and returns void. at least that is what i managed to decode, fed to C(++) compiler and translate to D. Does this mean that you can translate C code to D natively? I am currently only aware of the dstep package.
Re: Exporting template function instances to C
On Thursday, 23 March 2017 at 17:58:21 UTC, H. S. Teoh wrote: On Thu, Mar 23, 2017 at 05:29:22PM +, data pulverizer via Thanks. Is there a less ham-handed way of exporting them other than wrapping them in functions as I have? Wrapping them in functions is probably the simplest way to call them from C. You *could*, I suppose, use their mangled names directly, then you wouldn't need a wrapper, but that would be rather difficult to use on the C end. On the D side, there's .mangleof that will tell you what mangled names to use, but if you're calling from C you don't have that luxury. T Thanks. Mangling sounds painful and scary, I think I'll stick to wrapping which sounds much less dangerous.
Undefined Reference calling D from C using static linking
I am trying to call a D function from C. Here is the D code: ``` /* dcode.d */ extern (C) nothrow @nogc @system { double multNum(double x, double y) { return x*y; } } ``` Then the C code: ``` /* ccode.c */ #include #include #include extern double multNum(double x, double y); int main() { printf("output: %f", multNum(3.0, 4.0)); return 0; } ``` Then I compile with: ``` ldc2 -c dcode.d gcc -c ccode.c gcc -o output ccode.o dcode.o ``` I get the error: ``` dcode.o: In function `ldc.register_dso': dcode.d:(.text.ldc.register_dso+0x6e): undefined reference to `_d_dso_registry' collect2: error: ld returned 1 exit status ``` Compiler versions: ``` $ ldc2 --version LDC - the LLVM D compiler (1.1.0): based on DMD v2.071.2 and LLVM 3.9.1 built with LDC - the LLVM D compiler (1.1.0) Default target: x86_64-unknown-linux-gnu Host CPU: ivybridge http://dlang.org - http://wiki.dlang.org/LDC Registered Targets: x86- 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64 ``` ``` $ gcc --version gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ``` I would appreciate it if someone could point out my mistake. Thank you in advance
Re: Undefined Reference calling D from C using static linking
On Thursday, 23 March 2017 at 10:16:22 UTC, Nicholas Wilson wrote: It has to do with module references to druntime stuff. You can either try adding a pragma(LDC_no_module_info); //I think it is spelled correctly. or you can use ldc to link and it will link druntime gcc ccode.c -c ldc2 dcode.d code.o I don't know how well that will work. Many thanks, I tried: ``` pragma(LDC_no_moduleinfo) // https://wiki.dlang.org/LDC-specific_language_changes#LDC_no_moduleinfo ``` which worked, your method of doing the final compilation using ldc2 (or dmd) also works :-) Is there a dmd equivalent for `pragma(LDC_no_module_info);`? Attempting the final compilation `gcc -o output ccode.o dcode.o` after the second stage compilation `dmd -c dcode.d` gives an error: ``` dcode.o: In function `_D5dcode7__arrayZ': dcode.d:(.text._D5dcode7__arrayZ+0x23): undefined reference to `_d_arraybounds' dcode.o: In function `_D5dcode8__assertFiZv': dcode.d:(.text._D5dcode8__assertFiZv+0x23): undefined reference to `_d_assert' dcode.o: In function `_D5dcode15__unittest_failFiZv': dcode.d:(.text._D5dcode15__unittest_failFiZv+0x23): undefined reference to `_d_unittest' dcode.o:(.text.d_dso_init[.data.d_dso_rec]+0x22): undefined reference to `_d_dso_registry' collect2: error: ld returned 1 exit status ``` ``` dmd --version DMD64 D Compiler v2.073.2 Copyright (c) 1999-2016 by Digital Mars written by Walter Bright ```
Re: Exporting template function instances to C
On Friday, 24 March 2017 at 01:00:31 UTC, Nicholas Wilson wrote: On Thursday, 23 March 2017 at 19:46:43 UTC, data pulverizer wrote: On Thursday, 23 March 2017 at 17:58:21 UTC, H. S. Teoh wrote: On Thu, Mar 23, 2017 at 05:29:22PM +, data pulverizer via Thanks. Is there a less ham-handed way of exporting them other than wrapping them in functions as I have? Wrapping them in functions is probably the simplest way to call them from C. You *could*, I suppose, use their mangled names directly, then you wouldn't need a wrapper, but that would be rather difficult to use on the C end. On the D side, there's .mangleof that will tell you what mangled names to use, but if you're calling from C you don't have that luxury. T Thanks. Mangling sounds painful and scary, I think I'll stick to wrapping which sounds much less dangerous. There's nothing scary or dangerous about it. It happens automatically to allow overloads and templates so that you get a unique symbol foreach version (unless you use extern(C), extern(C++) or pragma mangle). C++,Java and any other compiled language that has overloads does mangling. Heck, you can even do it in C with __attribute__((overloadable)) (at least with clang), it just transparently mangles (just as in D)the name as whatever C++ would mangle it as. So instead of doing T mult(T)(T x, T y) { return x*y; } doing something like template mult(T) { extern(C++) T mult(T x, T y) { return x*y; } } in D, and then in C (noting that you have to declare the name and signature anyway) __attribute__((overloadable)) float mult(float,float); __attribute__((overloadable)) double mult(double, double); which I think is the least painful way of doing it. I seem to remember somewhere in phobos template Instantiate(alias a) { alias Instantiate = a; } to instantiate template, because you reference them from another symbol it somehow magically works. Used like Instantiate!(mult!float); // at module scope Thanks a lot ... I was half joking playing with the name "mangling" but I appreciate your explanations and suggestions.
Re: Exporting template function instances to C
On Saturday, 25 March 2017 at 06:17:15 UTC, Nicholas Wilson wrote: On Saturday, 25 March 2017 at 02:21:33 UTC, data pulverizer wrote: Thanks a lot ... I was half joking playing with the name "mangling" but I appreciate your explanations and suggestions. This is the internet, I can't tell if you're a newb or sarcastic, and given this is a learn forum I'm going to make a conservative estimate of the former ;) (possibly 'cause I was half asleep when answering) I would however be interested to know if the extern(C++)/__attribute__((overloadable))/Instansiate combo actually worked. I usually try to be as clear as possible, I just couldn't help slipping in a joke with a name like "mangling" I could not resits. I am definitely a noob to compilers, my day job is a data scientist/statistician, however I really like the D programming language and I am increasing my knowledge by blogging, writing D interfaces for C libraries and I am building a low performance BLAS alternative to GLAS (https://github.com/dataPulverizer/dblas) - its funny because its true! The article I am working on is connecting D to other languages. The draft for my article interfacing D to C and Fortran is here (https://github.com/dataPulverizer/interface-d-c-fortran). I can't decide if name mangling deserves its own topic. Feel free to put in a suggestion/pull request to either.
Re: Exporting template function instances to C
On Thursday, 23 March 2017 at 16:38:02 UTC, Adam D. Ruppe wrote: On Thursday, 23 March 2017 at 16:28:18 UTC, data pulverizer wrote: alias mult!double dmult; alias mult!float fmult; Those are just aliases in the D compiler, they don't actually exist in the object file for C to use like regular functions. Templates need to actually be *used* to be instantiated for export, and aliases make them easier to use, but don't actually use them yet. Why is that? Thanks. Is there a less ham-handed way of exporting them other than wrapping them in functions as I have?
Re: Undefined Reference calling D from C using static linking
On Thursday, 23 March 2017 at 11:32:25 UTC, Nicholas Wilson wrote: On Thursday, 23 March 2017 at 10:49:37 UTC, data pulverizer wrote: On Thursday, 23 March 2017 at 10:16:22 UTC, Nicholas Wilson wrote: It has to do with module references to druntime stuff. You can either try adding a pragma(LDC_no_module_info); //I think it is spelled correctly. or you can use ldc to link and it will link druntime gcc ccode.c -c ldc2 dcode.d code.o I don't know how well that will work. Many thanks, I tried: ``` pragma(LDC_no_moduleinfo) // https://wiki.dlang.org/LDC-specific_language_changes#LDC_no_moduleinfo ``` which worked, your method of doing the final compilation using ldc2 (or dmd) also works :-) Is there a dmd equivalent for `pragma(LDC_no_module_info);`? Attempting the final compilation `gcc -o output ccode.o dcode.o` after the second stage compilation `dmd -c dcode.d` gives an error: ``` dcode.o: In function `_D5dcode7__arrayZ': dcode.d:(.text._D5dcode7__arrayZ+0x23): undefined reference to `_d_arraybounds' dcode.o: In function `_D5dcode8__assertFiZv': dcode.d:(.text._D5dcode8__assertFiZv+0x23): undefined reference to `_d_assert' dcode.o: In function `_D5dcode15__unittest_failFiZv': dcode.d:(.text._D5dcode15__unittest_failFiZv+0x23): undefined reference to `_d_unittest' dcode.o:(.text.d_dso_init[.data.d_dso_rec]+0x22): undefined reference to `_d_dso_registry' collect2: error: ld returned 1 exit status ``` ``` dmd --version DMD64 D Compiler v2.073.2 Copyright (c) 1999-2016 by Digital Mars written by Walter Bright ``` Those functions are the bounds checking function, the non unittest assert function, the unittest function an module initialisation function respectively. dmd -boundscheck=off -release should get rid of the first two, you didn't compile with -unittest so I'm not sure why the thord one is there at all. For _d_dso_registry all i can suggest is see what -betterC gets you. I just compiled `dmd -c dcode.d -betterC -boundscheck=off` (-betterC probably makes -boundscheck=off irrelevant but I threw it in as a prayer) I am still getting: ``` dcode.o:(.text.d_dso_init[.data.d_dso_rec]+0x22): undefined reference to `_d_dso_registry' collect2: error: ld returned 1 exit status ```
Exporting template function instances to C
I have noticed that the following will not successfully export `dmult` and `fmult` to C: ``` extern (C) nothrow @nogc @system: pragma(LDC_no_moduleinfo); T mult(T)(T x, T y) { return x*y; } alias mult!double dmult; alias mult!float fmult; ``` but this will ``` extern (C) nothrow @nogc @system: pragma(LDC_no_moduleinfo); T mult(T)(T x, T y) { return x*y; } double dmult(double x, double y) { return mult(x, y); } float fmult(float x, float y) { return mult(x, y); } ``` Why is that?
Re: Template specialisation for range of types
On Sunday, 12 March 2017 at 20:15:43 UTC, Meta wrote: import std.stdio : writeln; import std.traits : ConstOf; auto max(T)(T x, T y) { writeln("General template"); return x > y ? x : y; } auto max(T: const U, U)(T* x, T* y) <- Changed `ConstOf!U` to `const U` { writeln("Const template"); return *x > *y ? x : y; } void main(){ const double p = 2.4, q = 3; writeln(max(, )); //Prints "Const template" } This is great Meta, thanks very much! I was trying to avoid using template constraints because the more cases you add, the more complicated the constraints get.
Re: Template specialisation for range of types
On Sunday, 12 March 2017 at 20:15:43 UTC, Meta wrote: auto max(T: const U, U)(T* x, T* y) <- Changed `ConstOf!U` to `const U` { writeln("Const template"); return *x > *y ? x : y; } How detailed can I be about the template specialisation? From example in the book "C++ the complete guide" we can have: /* pointer const reference */ template inline T* const& max(T* const& a, T* const) { return a* < b* ? b : a; } /* const reference const pointer */ template inline T const* const& max(T* const* const& a, T* const* const& b) { ...; } What would be the equivalent in D?
Re: Template specialisation for range of types
On Sunday, 12 March 2017 at 19:32:37 UTC, ketmar wrote: data pulverizer wrote: In this case would like to use the ConstOf specialisation instead of the default implementation for the inputs which are const. actually, second template is uninstantiable at all. you want to do type deconstruction at instantiation, and that doesn't work. i.e. what your code wants to do (as it is written) is to have `T` in second template to be equal to `double`. you cannot deconstruct the type like that in template. what you *can* do, though, is this: auto max(T)(const(T)* x, const(T)* y) this way it will select your second template. If I change the implementation of the second template to your above declaration, I get the error: max.max called with argument types (const(double)*, const(double)*) matches both: max.d(34): max.max!(const(double)*).max(const(double)* x, const(double)* y) and: max.d(42): max.max!double.max(const(double)* x, const(double)* y) I need at least those two implementation for the different cases, a general "default", and for specified types and type qualifications.
Template specialisation for range of types
Hello all, I am attempting to write templates for differently qualified types using specialisations. Below is an example for const and non-const outlining my approach: `` import std.stdio : writeln; import std.traits : ConstOf; auto max(T)(T x, T y) { writeln("General template"); return x > y ? x : y; } auto max(T: ConstOf!U, U)(T* x, T* y) { writeln("Const template"); return *x > *y ? x : y; } void main(){ const double p = 2.4, q = 3; writeln(max(, )); } `` I get this output: General template 7FFE5B3759A8 In this case would like to use the ConstOf specialisation instead of the default implementation for the inputs which are const. Thanks for you answers in advance
Re: Issue with template constraints in numeric types
On Thursday, 3 August 2017 at 12:31:00 UTC, Adam D. Ruppe wrote: On Thursday, 3 August 2017 at 12:24:02 UTC, data pulverizer wrote: import std.traits: isIntegral, isNumeric; Are you familiar with isFloatingPoint? http://dpldocs.info/experimental-docs/std.traits.isFloatingPoint.html if(is(T: double) && isNumeric!T) Keep in mind that T:double here means "if T can implicitly convert to double". Since int can implicitly convert to double too, this case covers both families! You might want to try == instead of : for a more exact match. Thank you very much! What about this case: ``` T test(T: double)(T x, T y) { return x*y; } auto test(T)(T x, T y) { return 5*test!double(x, y); } ``` which also gives: ``` int test: 4 double test: 4 ```
Launch and obtain thread output during compile time
Hi all, Is it possible to launch/spawn a thread/fibre or some other appropriate item and obtain an immutable/enum or some appropriate output at compile-time? For instance return an immutable(string) from the external thread to be used as the input to a template parameter or a CTFE function. To be clear the thread carries out run-time processing e.g. reading a file. Thanks in advance.
Re: Launch and obtain thread output during compile time
On Sunday, 13 August 2017 at 08:09:28 UTC, Petar Kirov [ZombineDev] wrote: On Sunday, 13 August 2017 at 07:37:15 UTC, data pulverizer wrote: Hi all, Is it possible to launch/spawn a thread/fibre or some other appropriate item and obtain an immutable/enum or some appropriate output at compile-time? For instance return an immutable(string) from the external thread to be used as the input to a template parameter or a CTFE function. To be clear the thread carries out run-time processing e.g. reading a file. Thanks in advance. No, CTFE is single-threaded and additionally it is required that functions executed at compile-time are "pure", i.e. they don't affect the global state of the system and don't use any non-portable facilities. Essentially, only pure computation is allowed. Though, specifically reading files is allowed at compile-time via the `string s = import("file.txt");` syntax, provided that the file `file.txt` is located in a directory, specified to the compiler by the `-J` flag. For more information see: http://dlang.org/spec/expression.html#import_expressions Thank you. Great explanation!
Specify dmd or ldc compiler and version in a json dub file?
Hi, I would like to know how to specify dmd or ldc compiler and version in a json dub file. Thanks in advance.