Re: tsv-utils 2.0 release: Named field support

2020-08-01 Thread tastyminerals via Digitalmars-d-announce

On Sunday, 26 July 2020 at 20:28:56 UTC, Jon Degenhardt wrote:

Hi all,

I'm happy to announce a new major release of eBay's TSV 
Utilities. The 2.0 release supports named field selection in 
all of the tools, a significant usability enhancement.


[...]


Really nice to see the update! tsv-utils is one of the libs I am 
advertising when talking about D speed to colleagues ;)


Re: tsv-utils 2.0 release: Named field support

2020-07-29 Thread Jon Degenhardt via Digitalmars-d-announce

On Tuesday, 28 July 2020 at 15:57:57 UTC, bachmeier wrote:
Thanks for your work. I've recommended tsv-utils to some 
students for their data analysis. It's a nice substitute for a 
database depending on what you're doing. It really helps that 
you store can store your "database" in repo like any other text 
file. I'm going to be checking out the new version soon.


Thanks for the support and for checking out tools! Much 
appreciated.




Re: tsv-utils 2.0 release: Named field support

2020-07-28 Thread bachmeier via Digitalmars-d-announce

On Tuesday, 28 July 2020 at 00:16:05 UTC, Jon Degenhardt wrote:

On Monday, 27 July 2020 at 14:32:27 UTC, aberba wrote:

On Sunday, 26 July 2020 at 20:28:56 UTC, Jon Degenhardt wrote:
I'm happy to announce a new major release of eBay's TSV 
Utilities. The 2.0 release supports named field selection in 
all of the tools, a significant usability enhancement.


So I didn't checked it out until today and I'm really 
impressed about the documentation, presentation and just about 
everything.


Thanks for the kind words, and for taking the time to check out 
the toolkit. Both are very much appreciated!


Thanks for your work. I've recommended tsv-utils to some students 
for their data analysis. It's a nice substitute for a database 
depending on what you're doing. It really helps that you store 
can store your "database" in repo like any other text file. I'm 
going to be checking out the new version soon.


Re: tsv-utils 2.0 release: Named field support

2020-07-27 Thread Jon Degenhardt via Digitalmars-d-announce

On Monday, 27 July 2020 at 14:32:27 UTC, aberba wrote:

On Sunday, 26 July 2020 at 20:28:56 UTC, Jon Degenhardt wrote:
I'm happy to announce a new major release of eBay's TSV 
Utilities. The 2.0 release supports named field selection in 
all of the tools, a significant usability enhancement.


So I didn't checked it out until today and I'm really impressed 
about the documentation, presentation and just about everything.


Thanks for the kind words, and for taking the time to check out 
the toolkit. Both are very much appreciated!


Re: tsv-utils 2.0 release: Named field support

2020-07-27 Thread Ali Çehreli via Digitalmars-d-announce

On 7/27/20 7:32 AM, aberba wrote:

> Goes to show most of us will do just fine with GC code. Our job is to
> learn how to use it well.

Exactly. My programs sometimes run for minutes on dozens of gigabytes of 
files. Compared to that, the number of and the total time spent for 
garbage collections is comically low.[1]


Ali

[1] Michael Parker shows how to profile the GC here:

  https://dlang.org/blog/2017/06/16/life-in-the-fast-lane/

Spoiler: It's as simple as passing the --DRT-gcopt=profile:1 command 
line option to any compiled program.


Re: tsv-utils 2.0 release: Named field support

2020-07-27 Thread aberba via Digitalmars-d-announce

On Sunday, 26 July 2020 at 20:28:56 UTC, Jon Degenhardt wrote:

Hi all,

I'm happy to announce a new major release of eBay's TSV 
Utilities. The 2.0 release supports named field selection in 
all of the tools, a significant usability enhancement.


[...]



So I didn't checked it out until today and I'm really impressed 
about the documentation, presentation and just about everything.


I personally don't do data science and related stuff, yet. 
However I'm sure my data science friend is REALLY going to like 
this.



Unnecessary GC allocation was avoided, but GC was used rather 
than manual memory management. Higher-level I/O primitives were 
used rather than custom buffer management.


Goes to show most of us will do just fine with GC code. Our job 
is to learn how to use it well.


tsv-utils 2.0 release: Named field support

2020-07-26 Thread Jon Degenhardt via Digitalmars-d-announce

Hi all,

I'm happy to announce a new major release of eBay's TSV 
Utilities. The 2.0 release supports named field selection in all 
of the tools, a significant usability enhancement.


For those not familiar, tsv-utils is a set of command line tools 
for manipulating tabular data files of the type commonly found in 
machine learning and data mining environments. Filtering, 
statistics, sampling, joins, etc. The tools are patterned after 
traditional Unix common line tools like 'cut', 'grep', 'sort', 
etc., and are intended to work with these tools. Each tool is a 
standalone executable. Most people will only care about a subset 
of the tools. It is not necessary to learn the entire toolkit to 
get value from the tools.


The tools are all written in D and are the fastest tools of their 
type available (benchmarks are on the GitHub repository).


Previous versions of the tools referenced fields by field number, 
same as traditional Unix tools like 'cut'. In version 2.0, 
tsv-utils tools take fields either by field number or by field 
name, for files with header lines. A few examples using 
'tsv-select', a tool similar to 'cut' that also supports field 
reordering and dropping fields:


$ # Field numbers: Output fields 2 and 1, in that order.
$ tsv-select -f 2,1 data.tsv

$ # Field names: Output the 'Name' and 'RecordNum' fields.
$ tsv-select -H -f Name,RecordNum data.tsv

$ # Drop the 'Color' field, keep everything else.
$ tsv-select -H --exclude Color file.tsv

$ # Drop all the fields ending in '_time'
$ tsv-select -H -e '*_time' data.tsv

More information is available on the tsv-utils GitHub repository, 
including documentation and pre-built binaries: 
https://github.com/eBay/tsv-utils


--Jon