Re: Best way to read CSV data file into Mir (2d array) ndslice?

2022-09-21 Thread mw via Digitalmars-d-learn

On Wednesday, 21 September 2022 at 19:14:30 UTC, jmh530 wrote:


I just tried doing it with `std.csv`, but my version was a bit 
awkward since it doesn't seem quite so straightforward to just 
take the result of csvReader and put it in a array. I had to 
read it in there. I also wanted to allocate the array up front, 
but to do that I needed to know how big it was and ended up 
doing two passes on reading the data, which isn't ideal.




Thanks, as you said this isn't ideal.

For Mir to catch up with numpy, being able to easily read CSV to 
import data is a must to attract data scientists.


In numpy/pandas, it's just *one* liner.

I logged an issue here as a feature request:

https://github.com/libmir/mir-algorithm/issues/442



Re: Best way to read CSV data file into Mir (2d array) ndslice?

2022-09-21 Thread jmh530 via Digitalmars-d-learn

On Wednesday, 21 September 2022 at 13:08:14 UTC, jmh530 wrote:

On Wednesday, 21 September 2022 at 05:31:48 UTC, mw wrote:

Hi,

I'm just wondering what is the best way to read CSV data file 
into Mir (2d array) ndslice? Esp. if it can parse date into 
int/float.


I searched a bit, but can't find any example.


Thanks.


It probably can't hurt to try the simplest approach first. 
`std.csv` can return an input range that you can then use to 
create a ndslice. Offhand, I don't know what D tools are an 
alternative to `std.csv` for reading CSVs.


ndslice assumes homogenous data, but you can put the Dates (as 
Date types) as part of the labels (as Data Frames). However, 
there's a bit to be desired in terms of getting that 
functionality integrated into the rest of the package [1].


[1] https://github.com/libmir/mir-algorithm/issues/426


I just tried doing it with `std.csv`, but my version was a bit 
awkward since it doesn't seem quite so straightforward to just 
take the result of csvReader and put it in a array. I had to read 
it in there. I also wanted to allocate the array up front, but to 
do that I needed to know how big it was and ended up doing two 
passes on reading the data, which isn't ideal.


```d
import std.csv;
import std.stdio: writeln;
import mir.ndslice.allocation: slice;

void main() {
string text = 
"date,x1,x2\n1/31/2010,65,2.5\n2/28/2010,123,7.5";

auto records_firstpass = text.csvReader!double(["x1","x2"]);
auto records_secondpass = text.csvReader!double(["x1","x2"]);
size_t len = 0;
foreach (record; records_firstpass) {
len++;
}
auto data = slice!double(len, 2);
size_t i = 0;
size_t j;
foreach (record; records_secondpass)
{
j = 0;
foreach (r; record) {
data[i, j] = r;
j++;
}
i++;
}
writeln(data);
}
```


Re: Best way to read CSV data file into Mir (2d array) ndslice?

2022-09-21 Thread jmh530 via Digitalmars-d-learn

On Wednesday, 21 September 2022 at 05:31:48 UTC, mw wrote:

Hi,

I'm just wondering what is the best way to read CSV data file 
into Mir (2d array) ndslice? Esp. if it can parse date into 
int/float.


I searched a bit, but can't find any example.


Thanks.


It probably can't hurt to try the simplest approach first. 
`std.csv` can return an input range that you can then use to 
create a ndslice. Offhand, I don't know what D tools are an 
alternative to `std.csv` for reading CSVs.


ndslice assumes homogenous data, but you can put the Dates (as 
Date types) as part of the labels (as Data Frames). However, 
there's a bit to be desired in terms of getting that 
functionality integrated into the rest of the package [1].


[1] https://github.com/libmir/mir-algorithm/issues/426