Re: text based file formats

2022-12-22 Thread Per Nordlöw via Digitalmars-d-announce

On Wednesday, 21 December 2022 at 04:19:46 UTC, 9il wrote:
It has already been replaced with 
[mir.csv](https://github.com/libmir/mir-ion/blob/master/source/mir/csv.d). Mir is faster, SIMD accelerated, and supports numbers and timestamp recognition.


Great work. Will this module be extracted into a separate package?


Re: text based file formats

2022-12-21 Thread Walter Bright via Digitalmars-d-announce

On 12/20/2022 1:51 PM, Adrian Matoga wrote:
I frequently find it useful for a text data file parser to call a diagnostic 
callback instead of assuming some default behavior (whether that's forgiving, 
printing warnings, throwing or something else). With template callback 
parameters the parser can throw if the user wants it or stay pure nothrow if no 
action is required.


Yes, sometimes I think this might be the right answer.


Re: text based file formats

2022-12-21 Thread Walter Bright via Digitalmars-d-announce

On 12/21/2022 6:27 AM, Adam D Ruppe wrote:

On Tuesday, 20 December 2022 at 00:16:57 UTC, Walter Bright wrote:

LOL, learn something every day! I've even written my own, but it isn't very 
good.


Yeah, I wrote a csv module too back in... I think 2010, before Phobos had one.

It is about 90 lines, still works. Nothing special but I actually kinda like it.

https://github.com/adamdruppe/arsd/blob/master/csv.d


What this all means is Phobos could use a better one!


Re: text based file formats

2022-12-21 Thread Walter Bright via Digitalmars-d-announce

On 12/20/2022 8:19 PM, 9il wrote:
It has already been replaced with 
[mir.csv](https://github.com/libmir/mir-ion/blob/master/source/mir/csv.d). Mir 
is faster, SIMD accelerated, and supports numbers and timestamp recognition.



Propose this for Phobos?


Re: text based file formats

2022-12-21 Thread Walter Bright via Digitalmars-d-announce

On 12/20/2022 11:46 AM, John Colvin wrote:

We use this at work with some light tweaks, it’s done a lot work 


Sweet!


Re: text based file formats

2022-12-21 Thread John Colvin via Digitalmars-d-announce

On Wednesday, 21 December 2022 at 04:19:46 UTC, 9il wrote:

On Tuesday, 20 December 2022 at 19:46:36 UTC, John Colvin wrote:

On Tuesday, 20 December 2022 at 00:40:07 UTC, H. S. Teoh wrote:

[...]


We use this at work with some light tweaks, it’s done a lot 
work 


It has already been replaced with 
[mir.csv](https://github.com/libmir/mir-ion/blob/master/source/mir/csv.d). Mir is faster, SIMD accelerated, and supports numbers and timestamp recognition.


Hah, so it has! Well anyway, it did do a lot of hard work for us 
for a long time, so thanks :)


Re: text based file formats

2022-12-21 Thread Adam D Ruppe via Digitalmars-d-announce

On Tuesday, 20 December 2022 at 00:16:57 UTC, Walter Bright wrote:
LOL, learn something every day! I've even written my own, but 
it isn't very good.


Yeah, I wrote a csv module too back in... I think 2010, before 
Phobos had one.


It is about 90 lines, still works. Nothing special but I actually 
kinda like it.


https://github.com/adamdruppe/arsd/blob/master/csv.d


Re: text based file formats

2022-12-21 Thread Tejas via Digitalmars-d-announce

On Wednesday, 21 December 2022 at 04:19:46 UTC, 9il wrote:

On Tuesday, 20 December 2022 at 19:46:36 UTC, John Colvin wrote:

On Tuesday, 20 December 2022 at 00:40:07 UTC, H. S. Teoh wrote:

[...]


We use this at work with some light tweaks, it’s done a lot 
work 


It has already been replaced with 
[mir.csv](https://github.com/libmir/mir-ion/blob/master/source/mir/csv.d). Mir is faster, SIMD accelerated, and supports numbers and timestamp recognition.


Wow, I didn't even know `mir.csv` was a thing


Thank you very much!!!

# 朗


Re: text based file formats

2022-12-20 Thread 9il via Digitalmars-d-announce

On Tuesday, 20 December 2022 at 19:46:36 UTC, John Colvin wrote:

On Tuesday, 20 December 2022 at 00:40:07 UTC, H. S. Teoh wrote:
On Mon, Dec 19, 2022 at 04:16:57PM -0800, Walter Bright via 
Digitalmars-d-announce wrote:

On 12/19/2022 4:35 AM, Adam D Ruppe wrote:
> On Monday, 19 December 2022 at 09:55:47 UTC, Walter Bright 
> wrote:

> > Curious why CSV isn't in the list.
> 
> Maybe std.csv is already good enough?


LOL, learn something every day! I've even written my own, but 
it isn't very good.


There's also my little experimental csv parser that was 
designed to be as fast as possible:


https://github.com/quickfur/fastcsv

However, it can only handle input that fits in memory (using 
std.mmfile is one possible workaround), has a static limit on 
field sizes, and does not do validation.



T


We use this at work with some light tweaks, it’s done a lot 
work 


It has already been replaced with 
[mir.csv](https://github.com/libmir/mir-ion/blob/master/source/mir/csv.d). Mir is faster, SIMD accelerated, and supports numbers and timestamp recognition.




Re: text based file formats

2022-12-20 Thread Adrian Matoga via Digitalmars-d-announce
On Sunday, 18 December 2022 at 16:12:35 UTC, rikki cattermole 
wrote:

> * make it @safe and pure if possible (and its likely possible)

pure is always a worry for me, but yeah @safe and ideally 
nothrow (if they are forgiving which they absolutely should be, 
there is no reason to throw an exception until its time to 
inspect it).


I frequently find it useful for a text data file parser to call a 
diagnostic callback instead of assuming some default behavior 
(whether that's forgiving, printing warnings, throwing or 
something else). With template callback parameters the parser can 
throw if the user wants it or stay pure nothrow if no action is 
required.


Re: text based file formats

2022-12-20 Thread H. S. Teoh via Digitalmars-d-announce
On Tue, Dec 20, 2022 at 07:46:36PM +, John Colvin via 
Digitalmars-d-announce wrote:
[...]
> > There's also my little experimental csv parser that was designed to
> > be as fast as possible:
> > 
> > https://github.com/quickfur/fastcsv
> > 
> > However it can only handle input that fits in memory (using std.mmfile
> > is one possible workaround), has a static limit on field sizes, and does
> > not do validation.
[...]
> We use this at work with some light tweaks, it’s done a lot work 

Wow, I never expected it to be actually useful. :-P  Good to know it's
worth something!


T

-- 
They say that "guns don't kill people, people kill people." Well I think the 
gun helps. If you just stood there and yelled BANG, I don't think you'd kill 
too many people. -- Eddie Izzard, Dressed to Kill


Re: text based file formats

2022-12-20 Thread John Colvin via Digitalmars-d-announce

On Tuesday, 20 December 2022 at 00:40:07 UTC, H. S. Teoh wrote:
On Mon, Dec 19, 2022 at 04:16:57PM -0800, Walter Bright via 
Digitalmars-d-announce wrote:

On 12/19/2022 4:35 AM, Adam D Ruppe wrote:
> On Monday, 19 December 2022 at 09:55:47 UTC, Walter Bright 
> wrote:

> > Curious why CSV isn't in the list.
> 
> Maybe std.csv is already good enough?


LOL, learn something every day! I've even written my own, but 
it isn't very good.


There's also my little experimental csv parser that was 
designed to be as fast as possible:


https://github.com/quickfur/fastcsv

However it can only handle input that fits in memory (using 
std.mmfile is one possible workaround), has a static limit on 
field sizes, and does not do validation.



T


We use this at work with some light tweaks, it’s done a lot work 


Re: text based file formats

2022-12-19 Thread H. S. Teoh via Digitalmars-d-announce
On Mon, Dec 19, 2022 at 04:16:57PM -0800, Walter Bright via 
Digitalmars-d-announce wrote:
> On 12/19/2022 4:35 AM, Adam D Ruppe wrote:
> > On Monday, 19 December 2022 at 09:55:47 UTC, Walter Bright wrote:
> > > Curious why CSV isn't in the list.
> > 
> > Maybe std.csv is already good enough?
> 
> LOL, learn something every day! I've even written my own, but it isn't very 
> good.

There's also my little experimental csv parser that was designed to be
as fast as possible:

https://github.com/quickfur/fastcsv

However it can only handle input that fits in memory (using std.mmfile
is one possible workaround), has a static limit on field sizes, and does
not do validation.


T

-- 
Debian GNU/Linux: Cray on your desktop.


Re: text based file formats

2022-12-19 Thread Walter Bright via Digitalmars-d-announce

On 12/19/2022 4:35 AM, Adam D Ruppe wrote:

On Monday, 19 December 2022 at 09:55:47 UTC, Walter Bright wrote:

Curious why CSV isn't in the list.


Maybe std.csv is already good enough?


LOL, learn something every day! I've even written my own, but it isn't very 
good.


Re: text based file formats

2022-12-19 Thread bachmeier via Digitalmars-d-announce

On Sunday, 18 December 2022 at 15:56:38 UTC, Robert Schadek wrote:

I complaint before that D and phobos needs more stuff.
But I can't do it all by myself, but I can ask for help.

So here it goes https://github.com/burner/textbasedfileformats

As on the tin, text based file formats is a library of SAX and 
DOM parsers for text based file formats.


I would like to get the following file formats in.

* json (JSON5) there is actually some code in there already
* xml, there is some code already, the old std.experimental.xml 
code

* yaml, maybe there is something in code.dlang.org to be reused
* toml, maybe there is something in code.dlang.org  to be reused
  * ini, can likely be parsed by the toml parser
* sdl, I know I know, but D uses it.


A natural complement to this would be the functionality in 
https://github.com/eBay/tsv-utils


I've created versions of the filter and select functions that 
take a string as input and return a string or string[] as output. 
It's a performant way to query text files. Most important, all 
the hard work is already done.


Re: text based file formats

2022-12-19 Thread Robert Schadek via Digitalmars-d-announce

replay -> reply




Re: text based file formats

2022-12-19 Thread Robert Schadek via Digitalmars-d-announce
Curious why CSV isn't in the list. I encounter that a lot at 
tax time.


As Adam said, std.csv is already there and its at least from my 
perspective okay enough.


That being said, I liked how you quoted me here

On Monday, 19 December 2022 at 09:55:47 UTC, Walter Bright wrote:

On 12/18/2022 7:56 AM, Robert Schadek wrote:

So stop talking, and start creating PR's.


Yup!



and replay, create an PR that puts it on the list ;-)


Re: text based file formats

2022-12-19 Thread Adam D Ruppe via Digitalmars-d-announce

On Monday, 19 December 2022 at 09:55:47 UTC, Walter Bright wrote:

Curious why CSV isn't in the list.


Maybe std.csv is already good enough?


Re: text based file formats

2022-12-19 Thread Walter Bright via Digitalmars-d-announce

On 12/18/2022 7:56 AM, Robert Schadek wrote:

So stop talking, and start creating PR's.


Yup!

Curious why CSV isn't in the list. I encounter that a lot at tax time.

https://en.wikipedia.org/wiki/Comma-separated_values

Maybe just ask OpenAI?


Re: text based file formats

2022-12-19 Thread Per Nordlöw via Digitalmars-d-announce

On Sunday, 18 December 2022 at 15:56:38 UTC, Robert Schadek wrote:

So stop talking, and start creating PR's.
For the project admin stuff, this will use github. There are 
milestones for the five formats, so please start creating the 
issues you want/can work on and start typing.


If I were you I would join forces with Ilya and work on getting 
the mir libraries doing text-parsing integrated into Phobos.


Re: text based file formats

2022-12-18 Thread CM via Digitalmars-d-announce

On Sunday, 18 December 2022 at 15:56:38 UTC, Robert Schadek wrote:

* sdl, I know I know, but D uses it.


Thank you for remembering it. I feel like I'm one of the few who 
prefer SDL to YAML, JSON, and the like.


Re: text based file formats

2022-12-18 Thread rikki cattermole via Digitalmars-d-announce

On 19/12/2022 4:56 AM, Robert Schadek wrote:
> * xml, there is some code already, the old std.experimental.xml code

I've toyed with std.experimental.xml.

I'm not convinced that it is a good code base for inclusion.


* no return by ref


As a bit of a follow up of what we were talking about on BeerConf:

Because these are not data structures, they won't own externally facing 
memory (thats the GC job). So these lifetimes issues with ref should 
never be encountered.


> * make it @safe and pure if possible (and its likely possible)

pure is always a worry for me, but yeah @safe and ideally nothrow (if 
they are forgiving which they absolutely should be, there is no reason 
to throw an exception until its time to inspect it).


Re: text based file formats

2022-12-18 Thread Adam D Ruppe via Digitalmars-d-announce

On Sunday, 18 December 2022 at 15:56:38 UTC, Robert Schadek wrote:
* xml, there is some code already, the old std.experimental.xml 
code


my dom.d doesn't do the sax parser part but has its own 
advantages over the other things (including being continually 
maintained for over a decade, unlike the phobos things)


text based file formats

2022-12-18 Thread Robert Schadek via Digitalmars-d-announce

I complaint before that D and phobos needs more stuff.
But I can't do it all by myself, but I can ask for help.

So here it goes https://github.com/burner/textbasedfileformats

As on the tin, text based file formats is a library of SAX and 
DOM parsers for text based file formats.


I would like to get the following file formats in.

* json (JSON5) there is actually some code in there already
* xml, there is some code already, the old std.experimental.xml 
code

* yaml, maybe there is something in code.dlang.org to be reused
* toml, maybe there is something in code.dlang.org  to be reused
  * ini, can likely be parsed by the toml parser
* sdl, I know I know, but D uses it.

There are a few design guidelines I would like to adhere to.
* If it exists in phobos, use phobos
* have the DOM parser based on the sax parser
* no return by ref
* make it @safe and pure if possible (and its likely possible)
* share the std.sumtype type if possible (yaml, toml should work)
* no @nogc, this should eventually get into phobos

So stop talking, and start creating PR's.
For the project admin stuff, this will use github. There are 
milestones for the five formats, so please start creating the 
issues you want/can work on and start typing.