Too slow readln

2017-07-16 Thread unDEFER via Digitalmars-d-learn

Hello, there!

I have the next "grep" code:
https://dpaste.dzfl.pl/7b7273f96ab2

And I have the directory to run it:
$ time /home/undefer/MyFiles/Projects/TEST/D/grep "HELLO" .
./strace.log: [pid 18365] write(1, "HELLO\n", 6HELLO

real1m17.096s
user0m54.828s
sys 0m13.340s

The same result I get with ldc2..

The same with bash and grep:
$ time for i in `find .`; do file -b "$i" | grep -q text && grep 
-a "HELLO" "$i"; done

[pid 18365] write(1, "HELLO\n", 6HELLO

real0m42.461s
user0m23.244s
sys 0m22.300s

Only `file` for all files:
$ time find . -exec file {} + >/dev/null

real0m15.013s
user0m14.556s
sys 0m0.436s

Only grep for all files:
$ for i in `find .`; do file -b "$i" | grep -q text && echo "$i"; 
done > LIST1

$ time for i in `cat LIST1`; do grep -a "HELLO" "$i"; done
[pid 18365] write(1, "HELLO\n", 6HELLO

real0m4.431s
user0m1.112s
sys 0m3.148s

So 15+4.4 much lesser than 42.46. Why? How "find" so fast can run 
"file" so many times?

And why 42.461s much lesser 1m17.096s?

The second version of grep:
https://dpaste.dzfl.pl/9db5bc2f0a26

$ time /home/undefer/MyFiles/Projects/TEST/D/grep2 "HELLO" `cat 
LIST1`

./strace.log: [pid 18365] write(1, "HELLO\n", 6HELLO

real0m1.871s
user0m1.824s
sys 0m0.048s

$ time grep -a "HELLO" `cat LIST1`
./strace.log:[pid 18365] write(1, "HELLO\n", 6HELLO

real0m0.075s
user0m0.044s
sys 0m0.028s

The profiler says that readln eats CPU. So why 0m0.075s much 
lesser 0m1.871s?


How to write in D grep not slower than GNU grep?


Re: Too slow readln

2017-07-16 Thread unDEFER via Digitalmars-d-learn

On Sunday, 16 July 2017 at 17:37:34 UTC, Jon Degenhardt wrote:

On Sunday, 16 July 2017 at 17:03:27 UTC, unDEFER wrote:

[snip]

How to write in D grep not slower than GNU grep?


GNU grep is pretty fast, it's tough to beat it reading one line 
at a time. That's because it can play a bit of a trick and do 
the initial match ignoring line boundaries and correct line 
boundaries later. There's a good discussion in this thread 
("Why GNU grep is fast" by Mike Haertel): 
https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html


--Jon


Thank you. I understand yet another trick:
$ find . -exec file -bi {} +
is the same
$ file -bi `find .`


Re: Too slow readln

2017-07-16 Thread Jon Degenhardt via Digitalmars-d-learn

On Sunday, 16 July 2017 at 17:03:27 UTC, unDEFER wrote:

[snip]

How to write in D grep not slower than GNU grep?


GNU grep is pretty fast, it's tough to beat it reading one line 
at a time. That's because it can play a bit of a trick and do the 
initial match ignoring line boundaries and correct line 
boundaries later. There's a good discussion in this thread ("Why 
GNU grep is fast" by Mike Haertel): 
https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html


--Jon


Re: Avoid if statements for checking neighboring indexes in a 2D array

2017-07-16 Thread Timon Gehr via Digitalmars-d-learn

On 16.07.2017 19:10, Timon Gehr wrote:

...

(This works even if there are * at the border.)


Well, not really. :)

Version that actually works if there are * at the border:

import std.stdio, std.range, std.algorithm, std.array;
char[][] arr;
int componentSize(int row,int col){
if(row>=arr.length||col>=arr[row].length||arr[row][col]!='*')
return 0;
arr[row][col]='x';
return 1+cartesianProduct(iota(row-1,row+2),iota(col-1,col+2))
.map!(a=>componentSize(a.expand)).sum;
}
void main (){
arr=["***",
 "*xx",
 "xx**xxx",
 "xxx*x**",
 "**x"].map!dup.array;

cartesianProduct(iota(cast(int)arr.length),iota(cast(int)arr[0].length))
.filter!(a=>arr[a[0]][a[1]]=='*')
.each!(a=>writeln(componentSize(a.expand)));
}


Re: Avoid if statements for checking neighboring indexes in a 2D array

2017-07-16 Thread Timon Gehr via Digitalmars-d-learn

On 16.07.2017 12:37, kerdemdemir wrote:
My goal is to find connected components in a 2D array for example 
finding connected '*'

chars below.

   x x x x x x
   x x x x x x
   x x * * x x
   x x * * x x
   x x x * * x
   * x x x x x


There are two connected '*' group in this example. First group is 
composes of six '*' located closer to middle and the second group 
composes only one '*' char located in the left bottom corner.


Do to this I generally implement a recursive algorithm which repeat 
calling the same function by checking all neighbors around the current 
index. I generally end up with something like :


void foo( int row, int col)
{
 //Do something here like caching the index

 if ( twoDimensionData[row - 1][col] == '*')
foo(row- 1, col);
 else if ( twoDimensionData[row + 1][col] == '*')
foo(row+ 1, col);
 else if ( twoDimensionData[row - 1 ][col - 1] == '*')
foo(row - 1, col - 1);

//. I need 5 more of this bad boys I mean if checks
}
...


It is wrong to explore in only one direction, so I assume you do not 
mean "else".



Is there any better way to achieve this

foreach(i;row-1..row+2){
foreach(j;col-1..col+2){
if(i==row && j==col) continue;
if(twoDimensionData[i][j] == '*')
foo(row,col);
}
}

with cool std functions like 
enumerate or iota without needing to write eight if checks?


cartesianProduct(iota(row-1,row+2),iota(col-1,col+2))
.filter!(a=>(a[0]!=row||a[1]!=col))
.filter!(a=>twoDimensionData[a[0]][a[1]]=='*')
.each!(a=>foo(a.expand));

(You can usually drop the first filter because "doing something" will 
usually involve checking if the node has been visited and returning or 
else marking the node as visited.)


Re: Avoid if statements for checking neighboring indexes in a 2D array

2017-07-16 Thread Timon Gehr via Digitalmars-d-learn

On 16.07.2017 18:55, Timon Gehr wrote:

On 16.07.2017 12:37, kerdemdemir wrote:
My goal is to find connected components in a 2D array for example 
finding connected '*'

chars below.

   x x x x x x
   x x x x x x
   x x * * x x
   x x * * x x
   x x x * * x
   * x x x x x


There are two connected '*' group in this example. First group is 
composes of six '*' located closer to middle and the second group 
composes only one '*' char located in the left bottom corner.


Do to this I generally implement a recursive algorithm which repeat 
calling the same function by checking all neighbors around the current 
index. I generally end up with something like :


void foo( int row, int col)
{
 //Do something here like caching the index

 if ( twoDimensionData[row - 1][col] == '*')
foo(row- 1, col);
 else if ( twoDimensionData[row + 1][col] == '*')
foo(row+ 1, col);
 else if ( twoDimensionData[row - 1 ][col - 1] == '*')
foo(row - 1, col - 1);

//. I need 5 more of this bad boys I mean if checks
}
...


It is wrong to explore in only one direction, so I assume you do not 
mean "else".



Is there any better way to achieve this

foreach(i;row-1..row+2){
 foreach(j;col-1..col+2){
 if(i==row && j==col) continue;
 if(twoDimensionData[i][j] == '*')
 foo(row,col);
 }
}

with cool std functions like enumerate or iota without needing to 
write eight if checks?


cartesianProduct(iota(row-1,row+2),iota(col-1,col+2))
 .filter!(a=>(a[0]!=row||a[1]!=col))
 .filter!(a=>twoDimensionData[a[0]][a[1]]=='*')
 .each!(a=>foo(a.expand));

(You can usually drop the first filter because "doing something" will 
usually involve checking if the node has been visited and returning or 
else marking the node as visited.)


Ivan's example in this style:

import std.stdio, std.range, std.algorithm, std.array;
char[][] arr;
int componentSize(size_t row, size_t col){
if(row>=arr.length||col>=arr[row].length||arr[row][col]!='*')
return 0;
arr[row][col]='x';
return 1+cartesianProduct(iota(row-1,row+2),iota(col-1,col+2))
.map!(a=>componentSize(a.expand)).sum;
}
void main (){
arr=["xx*",
 "*xx",
 "xx**xxx",
 "xx**x**",
 "xxx"].map!dup.array;
cartesianProduct(iota(arr.length),iota(arr[0].length))
.filter!(a=>arr[a[0]][a[1]]=='*')
.each!(a=>writeln(componentSize(a.expand)));
}

(This works even if there are * at the border.)


Re: Too slow readln

2017-07-16 Thread unDEFER via Digitalmars-d-learn
I understand the main problem. dirEntries by default follows 
symlinks.

Without it my first grep works only 28.338s. That really cool!


Re: Avoid if statements for checking neighboring indexes in a 2D array

2017-07-16 Thread Nicholas Wilson via Digitalmars-d-learn

On Sunday, 16 July 2017 at 10:37:39 UTC, kerdemdemir wrote:
My goal is to find connected components in a 2D array for 
example finding connected '*'

chars below.

  x x x x x x
  x x x x x x
  x x * * x x
  x x * * x x
  x x x * * x
  * x x x x x


There are two connected '*' group in this example. First group 
is composes of six '*' located closer to middle and the second 
group composes only one '*' char located in the left bottom 
corner.


Do to this I generally implement a recursive algorithm which 
repeat calling the same function by checking all neighbors 
around the current index. I generally end up with something 
like :


void foo( int row, int col)
{
//Do something here like caching the index

if ( twoDimensionData[row - 1][col] == '*')
   foo(row- 1, col);
else if ( twoDimensionData[row + 1][col] == '*')
   foo(row+ 1, col);
else if ( twoDimensionData[row - 1 ][col - 1] == '*')
   foo(row - 1, col - 1);

//. I need 5 more of this bad boys I mean if checks
}

Is there any better way to achieve this with cool std functions 
like enumerate or iota without needing to write eight if checks?


What you probably want is a convolution, used a lot in image 
processing.


insted of using recursion you walk left to right in blocks of 3x3 
and compute a "sum"
and then do the same vertically, then each cell contains the 
number of neighbours that are *.

In this case you want the kernel
111
101
111

   o o o x x x
   o o o x x x
   o o # * x x
   x x * * x x
   x x x * * x
   * x x x x x

   x o o o x x
   x o o o x x
   x o # # x x
   x x * * x x
   x x x * * x
   * x x x x x

   x x o o o x
   x x o o o x
   x x # # o x
   x x * * x x
   x x x * * x
   * x x x x x

   x x x o o o
   x x x o o o
   x x * # o o
   x x * * x x
   x x x * * x
   * x x x x x

Have a look at the video on http://halide-lang.org describing the 
different methods used.




Re: Avoid if statements for checking neighboring indexes in a 2D array

2017-07-16 Thread Ivan Kazmenko via Digitalmars-d-learn

On Sunday, 16 July 2017 at 10:37:39 UTC, kerdemdemir wrote:
My goal is to find connected components in a 2D array for 
example finding connected '*'

chars below.

  x x x x x x
  x x x x x x
  x x * * x x
  x x * * x x
  x x x * * x
  * x x x x x



...


Is there any better way to achieve this with cool std functions 
like enumerate or iota without needing to write eight if checks?


I don't know of a library facility to do this.

Still, there is a language-agnostic way to make it more concise.  
Instead of repeating eight similar blocks, define an array of 
delta-rows and delta-columns to neighboring cells, and use that 
array in a loop.  A complete example follows:


-
import std.algorithm, std.array, std.range, std.stdio;

immutable int dirs = 8;
immutable int [dirs] dRow = [-1, -1, -1,  0, +1, +1, +1,  0];
immutable int [dirs] dCol = [-1,  0, +1, +1, +1,  0, -1, -1];

char [] [] arr;

int componentSizeRecur (int row, int col)
{
int res = 1;
arr[row][col] = 'x';
foreach (dir; 0..dirs)
{
auto nRow = row + dRow[dir];
auto nCol = col + dCol[dir];
if (arr[nRow][nCol] == '*')
res += componentSizeRecur (nRow, nCol);
}
return res;
}

void main ()
{
arr = ["xxx",
   "*xx",
   "xx**xxx",
   "xx**x*x",
   "xxx",
   ].map !(line => line.dup).array;

foreach (row; 0..arr.length)
foreach (col; 0..arr.front.length)
if (arr[row][col] == '*')
writeln (componentSizeRecur (row, col));
}
-

If the neighbors array is regular and known in advance (like, 4 
edge-connected cells, or 8 corner-connected cells as here), you 
may also like to loop over possible deltas and pick the good 
ones, like below:


-
int componentSizeRecur (int row, int col)
{
int res = 1;
arr[row][col] = 'x';
foreach (dRow; -1..+2)
foreach (dCol; -1..+2)
if (dRow || dCol)
{
auto nRow = row + dRow;
auto nCol = col + dCol;
if (arr[nRow][nCol] == '*')
res += componentSizeRecur (nRow, nCol);
}
return res;
}
-

I have to make two additional notes.

1. This works only if the border does not contain '*' characters. 
 To make it work without that restriction, either add two 
sentinel rows and columns at the four borders of the array, or 
put an if on nRow and nCol before using them.


2. The recursive solution can eat up lots of stack.  If you 
intend using it on large arrays, make sure you don't hit the 
stack size limit of the environment.  On Windows, it can be 
achieved by a compiler switch like "-L/STACK:268435456".  On 
Linux, the "ulimit" command may help.


Ivan Kazmenko.



Re: Avoid if statements for checking neighboring indexes in a 2D array

2017-07-16 Thread kerdemdemir via Digitalmars-d-learn

Hi Guys,

@Nicholas , thanks a lot for cool solution but actually I weren't 
working on image processing. I was trying to solve 
"http://codeforces.com/contest/828/problem/B;. I really needed 
finding connected components this time.


@Ivan, your solution is much more elegant than what I did. But I 
find @Timon's solution with cartesian product a bit nicer in this 
case since I love to see std function more and more.


Thanks guys for all your advises. D community is really the best.

Here is my solution to question. It seems I didn't get it working 
completely yet. In my debugger(Msvc MonoD) even there are many 
rows it seems Recurse function only loops the columns in the 
first row. And debugger is jumping so strangely I couldn't tag 
the problem.
But I don't expect a big change there should be a small bug that 
is it.


Sorry if code contains some foreign words I just replaced many 
variable names from my native language I might be missing some.


import std.stdio;
import std.string;
import std.algorithm;
import std.conv;
import std.array;
import std.range;
import std.math;

int totalrow;
int totalcolumn;
dchar[][] twoDimensionArray;

struct ConnectedElementsSolver
{
this(  dchar[][] twoDimArray )
{
m_twoDimStruct = twoDimArray;
Recurse(0, 0);
}

void Recurse ( int row, int column )
{
if( row < 0 || column < 0  )
return;

for ( ; row <  m_twoDimStruct.length ; row++  )
{
for ( ; column <  m_twoDimStruct[row].length; column++  
)
{
Process( row, column, m_twoDimStruct.length, 
m_twoDimStruct[row].length );

}
}
}

	void Process( int row, int column, ulong maxrow, ulong maxcolumn 
)

{
		if( row < 0 || column < 0 || row >= maxrow || column >= 
maxcolumn  )

return;

if (  m_twoDimStruct[row][column] == 'B' )
{
m_twoDimStruct[row][column] = 'W';
m_tempResult.Process(row, column );
Process(row-1,column-1, maxrow, maxcolumn);
Process(row,column-1, maxrow, maxcolumn);
Process(row+1,column-1, maxrow, maxcolumn); 

Process(row-1,column, maxrow, maxcolumn);   

Process(row+1,column, maxrow, maxcolumn);   

Process(row-1,column+1, maxrow, maxcolumn); 

Process(row,column+1, maxrow, maxcolumn);   
Process(row-1,column+1, maxrow, maxcolumn);
}
else
{
if ( m_tempResult.HowManyFilled )
m_results ~= m_tempResult;
m_tempResult.Resetle();
}   
}

SquareCandidate   m_tempResult;
SquareCandidate[] m_results;
dchar[][] m_twoDimStruct;
}

struct SquareCandidate
{
int MaxY;
int MinY;
int MaxX;
int MinX;
int HowManyFilled;

this( int howManyFilled )
{
HowManyFilled = howManyFilled;
}

void Resetle()
{
this = SquareCandidate();
}


void Process( int row, int column )
{
HowManyFilled++;
MaxY = max( column, MaxY);
MinY = min( column, MinY);
MaxX = max( row, MaxX);
MinX = min( row, MinX);
}

int FindEmptySlots()
{
int kareKenarUzunlugu = max(MaxX-MinX, MaxY-MinY);
int kareAlani = kareKenarUzunlugu*kareKenarUzunlugu;
return kareAlani - HowManyFilled;
}

bool CanCreateSquare( int xMax, int yMax )
{
int xUzunlugu = MaxX-MinX;
int yUzunlugu = MaxY-MinY;
if ( xUzunlugu > yUzunlugu )
{
return yMax >= xUzunlugu;
}
else
{
return xMax >= yUzunlugu;
}
}
}


void main()
{
auto dimensions = stdin.readln.strip.split().map!(a => 
to!int(a)).array();

totalrow = dimensions[0];
totalcolumn = dimensions[1];
twoDimensionArray = stdin
.byLine()
.take(totalrow)
.map!(line => line
  .map!(a => to!dchar(a))
  .array())
.array;

	ConnectedElementsSolver baglantiliElemCozucu = 
ConnectedElementsSolver(twoDimensionArray);
	bool isAnyProblemMakingSquare = 
baglantiliElemCozucu.m_results.any!(a => 

Re: Avoid if statements for checking neighboring indexes in a 2D array

2017-07-16 Thread kerdemdemir via Digitalmars-d-learn
Of course now I will try to have it work first. Than replace for 
loops with Cartesian product calls. Than I will make 2D array 
template and even maybe with random access range. And finally for 
being able to use this class later in the some other coding 
challenge I will make Searching( == 'W' part) and Process 
functions passed by parameter.


Thanks a lot for help.
Erdem


Re: Too slow readln

2017-07-16 Thread Ali Çehreli via Digitalmars-d-learn

On 07/16/2017 10:37 AM, Jon Degenhardt wrote:


There's a good discussion in this thread ("Why GNU grep is fast" by Mike
Haertel):
https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html

--Jon


Another fast GNU utility was on Reddit a month ago:


https://www.reddit.com/r/programming/comments/6gxf02/how_is_gnus_yes_so_fast_xpost_runix/

Ali



Re: stacktrace for InvalidMemoryOperationError

2017-07-16 Thread crimaniak via Digitalmars-d-learn

On Saturday, 15 July 2017 at 18:14:13 UTC, Joakim wrote:

core.exception.InvalidMemoryOperationError@src/core/exception.d(696): Invalid 
memory operation

...

See the wiki page about this:

https://wiki.dlang.org/InvalidMemoryOperationError

If you can't do all that, look for places you might be 
allocating in a destructor.  The recent GC allocation flag -vgc 
might help:


https://dlang.org/blog/2017/06/16/life-in-the-fast-lane/


Yes, I found it already and make documentation proposition about 
https://issues.dlang.org/show_bug.cgi?id=17642
Then it turned out that it was more difficult to avoid this 
problem than to find the cause. Yes, I have a heavy cleanup in 
the destructor and I can't definitely make this cleanup @nogc. So 
first I tried to make problematic object instances RefCounted. In 
my case, it means change ClientController[string] to 
RefCounted!ClientController[string] and create an object by lib 
interface, not by new. I have to change ClientController from 
object to struct because automem can't work with the object. I 
hunted all extra copies of these objects and kill them all, so 
now I sure there are no copies except the main container. I tried 
different libraries from std.typecons.RefCounted to automem. And 
I fail to make it _really_ reference counted. It's not deleted 
when I delete it from the container. Every time it's collected by 
GC and application is dropped. So I decided that in the container 
itself. Obviously, the associative array does not delete the 
elements themselves on demand, but loses them and gives them to 
the garbage collector.
I didn't find any map or multimap @nogc container so I make the 
ugly solution: I move all destruction to method destructMe() and 
in ~this() now is just assert(destructMeIsCalled). And now I have 
to call manually destructMe() before object removing. It seems it 
works. I'm wondering if there is a less ugly way to have the map 
of reference counted objects.


Re: Add property setting to chain

2017-07-16 Thread closescreen via Digitalmars-d-learn

Thanks for reply, Ali.


Avoid if statements for checking neighboring indexes in a 2D array

2017-07-16 Thread kerdemdemir via Digitalmars-d-learn
My goal is to find connected components in a 2D array for example 
finding connected '*'

chars below.

  x x x x x x
  x x x x x x
  x x * * x x
  x x * * x x
  x x x * * x
  * x x x x x


There are two connected '*' group in this example. First group is 
composes of six '*' located closer to middle and the second group 
composes only one '*' char located in the left bottom corner.


Do to this I generally implement a recursive algorithm which 
repeat calling the same function by checking all neighbors around 
the current index. I generally end up with something like :


void foo( int row, int col)
{
//Do something here like caching the index

if ( twoDimensionData[row - 1][col] == '*')
   foo(row- 1, col);
else if ( twoDimensionData[row + 1][col] == '*')
   foo(row+ 1, col);
else if ( twoDimensionData[row - 1 ][col - 1] == '*')
   foo(row - 1, col - 1);

//. I need 5 more of this bad boys I mean if checks
}

Is there any better way to achieve this with cool std functions 
like enumerate or iota without needing to write eight if checks?