Re: Profiling
On Wednesday, 10 February 2021 at 23:42:31 UTC, mw wrote: On Wednesday, 10 February 2021 at 11:52:51 UTC, JG wrote: As a follow up question I would like to know what tool people use to profile d programs? I use this one: https://code.dlang.org/packages/profdump e.g. ``` dub build --build=debug --build=profile # run your program to generate trace.log profdump -b trace.log trace.log.b profdump -f --dot --threshold 1 trace.log trace.log.dot echo 'view it with: xdot trace.log.dot' ``` Nice, didn't even know that existed
Re: Trying to reduce memory usage
On Friday, 12 February 2021 at 02:22:35 UTC, H. S. Teoh wrote: This turns the OP's O(n log n) algorithm into an O(n) algorithm, doesn't need to copy the entire content of the file into memory, and also uses much less memory by storing only hashes. But this kind of hash is maybe insufficient to avoid hash collisions. For such big data slower but stronger algorithms like SHA are advisable. Also associative arrays uses the same weak algorithm where you can run into collision issues. Thus using the hash from string data as key can be a problem. I always use a quick hash as key but hold actually a collection of hashes in them and do a lookup to be on the safe side.
Re: Trying to reduce memory usage
On Fri, Feb 12, 2021 at 01:45:23AM +, mw via Digitalmars-d-learn wrote: > On Friday, 12 February 2021 at 01:23:14 UTC, Josh wrote: > > I'm trying to read in a text file that has many duplicated lines and > > output a file with all the duplicates removed. > > If you only need to remove duplicates, keep (and compare) a string > hash for each line is good enough. Memory usage should be just n x > integers. [...] +1. This can be even done on-the-fly: you don't even need to use .sort or .uniq. Just something like this: bool[size_t] hashes; foreach (line; stdin.byLine) { auto h = hashOf(line); // use a suitable hash function here if (h !in hashes) { outfile.writeln(line); hashes[h] = true; } // else this line already seen before; just skip it } This turns the OP's O(n log n) algorithm into an O(n) algorithm, doesn't need to copy the entire content of the file into memory, and also uses much less memory by storing only hashes. T -- MASM = Mana Ada Sistem, Man!
Re: Trying to reduce memory usage
On Friday, 12 February 2021 at 01:23:14 UTC, Josh wrote: I'm trying to read in a text file that has many duplicated lines and output a file with all the duplicates removed. If you only need to remove duplicates, keep (and compare) a string hash for each line is good enough. Memory usage should be just n x integers.
Trying to reduce memory usage
I'm trying to read in a text file that has many duplicated lines and output a file with all the duplicates removed. By the end of this code snippet, the memory usage is ~5x the size of the infile (which can be multiple GB each), and when this is in a loop the memory usage becomes unmanageable and often results in an OutOfMemory error or just a complete lock up of the system. Is there a way to reduce the memory usage of this code without sacrificing speed to any noticeable extent? My assumption is the .sort.uniq needs improving, but I can't think of an easier/not much slower way of doing it. Windows 10 x64 LDC - the LLVM D compiler (1.21.0-beta1): based on DMD v2.091.0 and LLVM 10.0.0 --- auto filename = "path\\to\\file.txt.temp"; auto array = appender!(string[]); File infile = File(filename, "r"); foreach (line; infile.byLine) { array ~= line.to!string; } File outfile = File(stripExtension(filename), "w"); foreach (element; (array[]).sort.uniq) { outfile.myrawWrite(element ~ "\n"); // used to not print the \r on windows } outfile.close; array.clear; array.shrinkTo(0); infile.close; --- Thanks.
Re: how to properly compare this type?
On 2/9/21 6:12 PM, Jack wrote: static if(is(typeof(__traits(getMember, A, member)) == string function(string))) That's not what you want. string function(string) is a *pointer* to a function that accepts a string and returns a string. In addition to getting the overloads (you only get one "b" in the list of members), take the address of the overload. This worked for me: foreach(overload; __traits(getOverloads, A, member)) static if(is(typeof(&overload) == string function(string))) { arr ~= member; } -Steve
Re: Real simple unresolved external symbols question...
On Thursday, 11 February 2021 at 00:18:23 UTC, H. S. Teoh wrote: On Wed, Feb 10, 2021 at 11:35:27PM +, WhatMeWorry via Digitalmars-d-learn wrote: [...] Okay, thanks. Then why does the README.md at https://github.com/dlang/druntime say "Runtime is typically linked together with Phobos in a release such that the compiler only has to link to a single library to provide the user with the runtime and the standard library." Probably outdated information. Somebody should submit a PR for it. ;-) T Can someone in here update the docs?