[PING]RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-26 Thread Iyer, Balaji V
Hi Jakub,
Did you get a chance to look at this?

Thanks,

Balaji V. Iyer.

 -Original Message-
 From: Iyer, Balaji V
 Sent: Monday, February 24, 2014 6:17 PM
 To: 'Jakub Jelinek'
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
 
 Hi Jakub,
   Please see my responses below.
 
  -Original Message-
  From: Iyer, Balaji V
  Sent: Thursday, February 20, 2014 11:38 PM
  To: Jakub Jelinek
  Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
  'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
  Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
 
  Hi Jakub,
  I have attached the fixed patch and have answered your questions
  below.
 
   -Original Message-
   From: Jakub Jelinek [mailto:ja...@redhat.com]
   Sent: Wednesday, February 19, 2014 6:24 AM
   To: Iyer, Balaji V
   Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
   'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
   Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
  
   On Wed, Feb 19, 2014 at 04:43:06AM +, Iyer, Balaji V wrote:
Attached, please find a patch with the test case attached (for1.cc).
The patch is the same but the cp-changelog has been modified to
reflect the new test-case.  Is this OK to install?
  
   1) have you tested the patch at all?  I see
   FAIL: g++.dg/gomp/for-1.C -std=c++98  (test for errors, line 27)
   FAIL: g++.dg/gomp/for-1.C -std=c++98 (test for excess errors)
   FAIL: g++.dg/gomp/for-1.C -std=c++11  (test for errors, line 27)
   FAIL: g++.dg/gomp/for-1.C -std=c++11 (test for excess errors)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (internal compiler error)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line
   30)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line
   37)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line
   40)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for excess errors)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (internal compiler error)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line
   30)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line
   37)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line
   40)
   FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for excess errors)
   regressions caused by the patch, that is of course unacceptable.
  
 
  Fixed. I apologize for them. I have confirmed that it is OK now.
 
 
   2) try this updated cf3.cc, e.g. with -O2 -fcilkplus if you can't
   find out why calling something multiple times is a bad idea,
   actually the latest patch is even worse than the older one, you now
   create 3 calls to the end method and 3 calls to operator-.  There
   should be just one call to that, before the #pragma omp parallel
   obviously, anything that
  doesn't do that is just bad.
   I don't see a point in having if clause on the _Cilk_for, just keep
   it on the #pragma omp parallel only, at ompexp time you can easily
   find it there, there is no point to check it again in the parallel
   body of the
  function.
  
 
  I have removed the if-clause from the _Cilk_for and now it is just in
  #pragma omp parallel.
 
  I have removed the 3rd operator-, but I am not able to remove the 2nd.
  I am looking into it, but I am not able to do it. The thing is, first
  operator- was for the if-clause, which I need to calculate the
  loop-count for the
  __cilkrts_cilk_for_64 function. The second one is not necessary
  because the end-value does not matter for _Cilk_for since they will be
  replaced with the low and high values.  I tried several things such as
  stopping gimplifcation of the cond value, or replacing it with a
  constant, etc and those are causing other problems elsewhere.
 
   The thing is, I am not able to find a way to automate this. I can't
  assume the cond's  end-value is same as count, because this is only
  true if we have an iterator and the handle_omp_for_iterator function
  modifies the cond value correctly. I need to use the count value
  (retval.0) as retval.1 but count-value is computed at a later time
  than handle_omp_for_iterator (since it does not have any knowledge
  about the #pragma omp parallel). It is giving the correct answers for
  the benchmarks and is removing the 2nd operator- when optimization is
 turned on for the inlinable operator-.
 
  Can you provide me some advice about how to do it?
 
 I have fixed the issue. Now the 2nd operator- does not occur. Attached,
 please fixed a fixed patch. Is this OK for trunk?
 
 Thanks,
 
 Balaji V. Iyer.


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-19 Thread Jakub Jelinek
On Wed, Feb 19, 2014 at 04:43:06AM +, Iyer, Balaji V wrote:
 Attached, please find a patch with the test case attached (for1.cc). The
 patch is the same but the cp-changelog has been modified to reflect the
 new test-case.  Is this OK to install?

1) have you tested the patch at all?  I see
FAIL: g++.dg/gomp/for-1.C -std=c++98  (test for errors, line 27)
FAIL: g++.dg/gomp/for-1.C -std=c++98 (test for excess errors)
FAIL: g++.dg/gomp/for-1.C -std=c++11  (test for errors, line 27)
FAIL: g++.dg/gomp/for-1.C -std=c++11 (test for excess errors)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (internal compiler error)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 30)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 37)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 40)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for excess errors)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (internal compiler error)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 30)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 37)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 40)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for excess errors)
regressions caused by the patch, that is of course unacceptable.

2) try this updated cf3.cc, e.g. with -O2 -fcilkplus if you can't find out
why calling something multiple times is a bad idea, actually the latest patch
is even worse than the older one, you now create 3 calls to the end method
and 3 calls to operator-.  There should be just one call to that, before the
#pragma omp parallel obviously, anything that doesn't do that is just bad.
I don't see a point in having if clause on the _Cilk_for, just keep it on
the #pragma omp parallel only, at ompexp time you can easily find it there,
there is no point to check it again in the parallel body of the function.

typedef __PTRDIFF_TYPE__ ptrdiff_t;

template typename T
class I
{
public:
  typedef ptrdiff_t difference_type;
  I ();
  ~I ();
  I (T *);
  I (const I );
  T operator * ();
  T *operator - ();
  T operator [] (const difference_type ) const;
  I operator = (const I );
  I operator ++ ();
  I operator ++ (int);
  I operator -- ();
  I operator -- (int);
  I operator += (const difference_type );
  I operator -= (const difference_type );
  I operator + (const difference_type ) const;
  I operator - (const difference_type ) const;
  template typename S friend bool operator == (IS , IS );
  template typename S friend bool operator == (const IS , const IS );
  template typename S friend bool operator  (IS , IS );
  template typename S friend bool operator  (const IS , const IS );
  template typename S friend bool operator = (IS , IS );
  template typename S friend bool operator = (const IS , const IS );
  template typename S friend bool operator  (IS , IS );
  template typename S friend bool operator  (const IS , const IS );
  template typename S friend bool operator = (IS , IS );
  template typename S friend bool operator = (const IS , const IS );
  template typename S friend typename IS::difference_type operator - (IS 
, IS );
  template typename S friend typename IS::difference_type operator - (const 
IS , const IS );
  template typename S friend IS operator + (typename IS::difference_type 
, const IS );
private:
  T *p;
};
template typename T IT::I () : p (0) {}
template typename T IT::~I () {}
template typename T IT::I (T *x) : p (x) {}
template typename T IT::I (const I x) : p (x.p) {}
template typename T T IT::operator * () { return *p; }
template typename T T *IT::operator - () { return p; }
template typename T T IT::operator [] (const difference_type x) const { 
return p[x]; }
template typename T IT IT::operator = (const I x) { p = x.p; return 
*this; }
template typename T IT IT::operator ++ () { ++p; return *this; }
template typename T IT IT::operator ++ (int) { return I (p++); }
template typename T IT IT::operator -- () { --p; return *this; }
template typename T IT IT::operator -- (int) { return I (p--); }
template typename T IT IT::operator += (const difference_type x) { p += 
x; return *this; }
template typename T IT IT::operator -= (const difference_type x) { p -= 
x; return *this; }
template typename T IT IT::operator + (const difference_type x) const { 
return I (p + x); }
template typename T __attribute__((noinline)) IT IT::operator - (const 
difference_type x) const { __asm (); return I (p - x); }
template typename T bool operator == (IT x, IT y) { return x.p == y.p; }
template typename T bool operator == (const IT x, const IT y) { return 
x.p == y.p; }
template typename T bool operator != (IT x, IT y) { return !(x == y); }
template typename T bool operator != (const IT x, const IT y) { return 
!(x == y); }
template typename T bool operator  (IT x, IT y) { return x.p  y.p; }
template typename T bool operator  (const IT x, const IT y) { return 
x.p  y.p; }
template typename T bool operator = (IT x, IT y) { return x.p = y.p; }

[PING] RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Iyer, Balaji V
Hi Jakub,
Did you get a chance to look at this patch?

Thanks,

Balaji V. Iyer.

 -Original Message-
 From: Iyer, Balaji V
 Sent: Monday, February 10, 2014 5:07 PM
 To: Jakub Jelinek
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
 
 Hi Jakub,
 
  -Original Message-
  From: Jakub Jelinek [mailto:ja...@redhat.com]
  Sent: Monday, February 10, 2014 12:58 PM
  To: Iyer, Balaji V
  Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
  'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
  Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
  On Fri, Feb 07, 2014 at 10:14:21PM +, Iyer, Balaji V wrote:
 Attached, please find a fixed patch. Along with it, I have also
   added
   2 changelog files for C and C++ respectively.
 
  Have you even looked at the second testcase I've posted?
  gimplification ICEs on it with your latest patch, because firstprivate
  clause is added for the same variable multiple times, and it seems
  parallel still isn't around _Cilk_for.
 
 I looked at both but forgot to test them with my implementation. Sorry
 about this. I have fixed the ICE issue. To make sure this does not happen
 further, I have added your test cf3.C into test suite (renamed to cf3.cc). I
 hope that is OK with you.
 
 I have attached a fixed patch and Changelogs. Is this OK?
 
 Thanks,
 
 Balaji V. Iyer.
 
 
  Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Jakub Jelinek
On Mon, Feb 10, 2014 at 10:07:18PM +, Iyer, Balaji V wrote:
 I looked at both but forgot to test them with my implementation. Sorry
 about this.  I have fixed the ICE issue.  To make sure this does not
 happen further, I have added your test cf3.C into test suite (renamed to
 cf3.cc).  I hope that is OK with you.

The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure.
Perhaps it would be much better though if instead of having a compile time
testcase you'd just do what libgomp.c++/for-1.C does, just replace all the
#pragma omp parallel for in there with _Cilk_for and turn it into a runtime
testcase.

 I have attached a fixed patch and Changelogs. Is this OK?

Looks better (note, still looking just at the dumps), but not completely ok
yet.  On cf3.cc, I see in *.gimple:

D.2883 = Jint::begin (j);
Iint::I (i, D.2883);
D.2885 = Jint::end (j);
retval.0 = operator-int (D.2885, i);
D.2886 = retval.0 / 2;
#pragma omp parallel firstprivate(i) if(D.2886) shared(D.2865) shared(j)
  {
difference_type retval.1;
const struct I  D.2888;
const difference_type D.2866;
long int D.2889;
struct I  D.2890;

try
  {

_Cilk_for (D.2864 = 0; D.2864  retval.1; D.2864 = D.2864 + 2) 
private(D.2864)
  {
D.2889 = D.2864 - D.2865;
D.2866 = D.2889;
try
  {
D.2890 = Iint::operator+= (i, D.2866);

First a minor nit, there is extra newline before _Cilk_for:
  newline_and_indent (buffer, spc);
  if (flag_cilkplus
   gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
pp_string (buffer, _Cilk_for ();
  else
pp_string (buffer, for ();
I guess for _Cilk_for collapse is never  1, right?  If that is the case,
then perhaps you should move the newline_and_indent (buffer, spc); call
into the else block.

More importantly, what is retval.1?  I'd expect you should be using retval.0
there and have it also as firstprivate(retval.0) on the parallel.
In *.omplower dump I actually see:
retval.0 = operator-int (D.2885, i);
...
retval.1 = operator-int (D.2888, i);
i.e. operator-int is called twice.

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Iyer, Balaji V


 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Wednesday, February 12, 2014 9:59 AM
 To: Iyer, Balaji V
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
 On Mon, Feb 10, 2014 at 10:07:18PM +, Iyer, Balaji V wrote:
  I looked at both but forgot to test them with my implementation. Sorry
  about this.  I have fixed the ICE issue.  To make sure this does not
  happen further, I have added your test cf3.C into test suite (renamed
  to cf3.cc).  I hope that is OK with you.
 
 The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure.
 Perhaps it would be much better though if instead of having a compile time
 testcase you'd just do what libgomp.c++/for-1.C does, just replace all the
 #pragma omp parallel for in there with _Cilk_for and turn it into a runtime
 testcase.
 
I really don't want to do that because I don't think there is a 1:1 match-up 
between the rules of #pragma omp for and _Cilk_for. 

  I have attached a fixed patch and Changelogs. Is this OK?
 
 Looks better (note, still looking just at the dumps), but not completely ok
 yet.  On cf3.cc, I see in *.gimple:
 
 D.2883 = Jint::begin (j);
 Iint::I (i, D.2883);
 D.2885 = Jint::end (j);
 retval.0 = operator-int (D.2885, i);
 D.2886 = retval.0 / 2;
 #pragma omp parallel firstprivate(i) if(D.2886) shared(D.2865) 
 shared(j)
   {
 difference_type retval.1;
 const struct I  D.2888;
 const difference_type D.2866;
 long int D.2889;
 struct I  D.2890;
 
 try
   {
 
 _Cilk_for (D.2864 = 0; D.2864  retval.1; D.2864 = D.2864 + 2)
 private(D.2864)
   {
 D.2889 = D.2864 - D.2865;
 D.2866 = D.2889;
 try
   {
 D.2890 = Iint::operator+= (i, D.2866);
 
 First a minor nit, there is extra newline before _Cilk_for:
   newline_and_indent (buffer, spc);
   if (flag_cilkplus
gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
 pp_string (buffer, _Cilk_for ();
   else
 pp_string (buffer, for (); I guess for _Cilk_for collapse is 
 never  1,
 right?  If that is the case, then perhaps you should move the
 newline_and_indent (buffer, spc); call into the else block.
 

OK. I will fix this and send you a patch? 

 More importantly, what is retval.1?  I'd expect you should be using retval.0
 there and have it also as firstprivate(retval.0) on the parallel.
 In *.omplower dump I actually see:
 retval.0 = operator-int (D.2885, i); ...
 retval.1 = operator-int (D.2888, i); i.e. 
 operator-int is
 called twice.
 

Yes, one is for the if-clause and one is for the condition. It really doesn't 
matter because we get of the stuff in the condition and replace with our own 
for loop with something like the for-loop shown  below. So retval1 doesn't come 
into picture. It is only alive from parser to the expand_cilk_for function.

For (i = low; i  high; i++)
{
_Cilk_for_body
}

So, is there any other changes that you need me to make?

Thanks,

Balaji V. Iyer.


   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Jakub Jelinek
On Wed, Feb 12, 2014 at 03:14:23PM +, Iyer, Balaji V wrote:
  The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure.
  Perhaps it would be much better though if instead of having a compile time
  testcase you'd just do what libgomp.c++/for-1.C does, just replace all the
  #pragma omp parallel for in there with _Cilk_for and turn it into a runtime
  testcase.
  

 I really don't want to do that because I don't think there is a 1:1
 match-up between the rules of #pragma omp for and _Cilk_for.

But there is nothing OpenMP specific on any of the tests, all could as well
be tested by removing all the #pragma omp ... lines and just be tested as
normal C+++ loops.  Is there anything that _Cilk_for wouldn't handle?

IMNSHO if you remove all the #pragma omp parallel for lines (even with any
clauses it sometimes has) and replace for with _Cilk_for on the following
line, you should have a valid Cilk+ program.

Only f11 would need to be changed from:
#pragma omp parallel
  {
#pragma omp for nowait
for (T i = x; i = y; i += 3)
  baz (i);
#pragma omp single
{
  T j = y + 3;
  baz (j);
}
  }
to say:
  _Cilk_for (T i = x; i = y; i += 3)
baz (i);
  {
T j = y + 3;
baz (j);
  }
so that it performs the same thing.

  First a minor nit, there is extra newline before _Cilk_for:
newline_and_indent (buffer, spc);
if (flag_cilkplus
 gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
  pp_string (buffer, _Cilk_for ();
else
  pp_string (buffer, for (); I guess for _Cilk_for collapse is 
  never  1,
  right?  If that is the case, then perhaps you should move the
  newline_and_indent (buffer, spc); call into the else block.
  
 
 OK. I will fix this and send you a patch? 

Sure.

  More importantly, what is retval.1?  I'd expect you should be using retval.0
  there and have it also as firstprivate(retval.0) on the parallel.
  In *.omplower dump I actually see:
  retval.0 = operator-int (D.2885, i); ...
  retval.1 = operator-int (D.2888, i); i.e. 
  operator-int is
  called twice.
  
 
 Yes, one is for the if-clause and one is for the condition. It really doesn't 
 matter because we get of the stuff in the condition and replace with our own 
 for loop with something like the for-loop shown  below. So retval1 doesn't 
 come into picture. It is only alive from parser to the expand_cilk_for 
 function.
 
 For (i = low; i  high; i++)
 {
   _Cilk_for_body
 }

No, it really does matter.  Just look at the *.optimized dump with -O0 
-fcilkplus:

  retval.0_4 = operator-int (_3, i);
in _Z3foo1JIiE function and
  _4 = .omp_data_i_2(D)-j;
  _5 = Jint::end (_4);
  retval.1_6 = operator-int (_5, i);
  retval.3_7 = retval.1_6;
in _Z3foo1JIiE._cilk_for_fn.0.  All the 4 statements are dead, you really
shouldn't emit them, even when optimizing, if e.g. the operator- isn't
inline, g++ won't be able to optimize it away.

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Iyer, Balaji V
   More importantly, what is retval.1?  I'd expect you should be using
   retval.0 there and have it also as firstprivate(retval.0) on the parallel.
   In *.omplower dump I actually see:
   retval.0 = operator-int (D.2885, i); ...
   retval.1 = operator-int (D.2888, i);
   i.e. operator-int is called twice.
  
 
  Yes, one is for the if-clause and one is for the condition. It really 
  doesn't
 matter because we get of the stuff in the condition and replace with our own
 for loop with something like the for-loop shown  below. So retval1 doesn't
 come into picture. It is only alive from parser to the expand_cilk_for 
 function.
 
  For (i = low; i  high; i++)
  {
  _Cilk_for_body
  }
 
 No, it really does matter.  Just look at the *.optimized dump with -O0 -
 fcilkplus:
 
   retval.0_4 = operator-int (_3, i);
 in _Z3foo1JIiE function and
   _4 = .omp_data_i_2(D)-j;
   _5 = Jint::end (_4);
   retval.1_6 = operator-int (_5, i);
   retval.3_7 = retval.1_6;
 in _Z3foo1JIiE._cilk_for_fn.0.  All the 4 statements are dead, you really
 shouldn't emit them, even when optimizing, if e.g. the operator- isn't inline,
 g++ won't be able to optimize it away.
 

I looked at the test code you send me (cf3.cc) at -O1 and it is removing all 
the lines you have shown above. Yes, I would imagine -O0 to have code that can 
be redundant or unnecessary. Some of it could be the artifact of internal code 
insertion. But isn't the main job of the instruction scheduler to remove all 
these redundant work? Besides, it is just a function call. The compiler at -O2, 
-O and -O3 removes the chunk of code that you mentioned.

-Balaji V. Iyer.


   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Jakub Jelinek
On Wed, Feb 12, 2014 at 05:04:38PM +, Iyer, Balaji V wrote:
 I looked at the test code you send me (cf3.cc) at -O1 and it is removing
 all the lines you have shown above.  Yes, I would imagine -O0 to have code
 that can be redundant or unnecessary.  Some of it could be the artifact of
 internal code insertion.  But isn't the main job of the instruction
 scheduler to remove all these redundant work?  Besides, it is just a
 function call.  The compiler at -O2, -O and -O3 removes the chunk of code
 that you mentioned.

As I said, just change the testcase so that the operator isn't inline, and
suddenly even -O3 will not be able to remove the call.

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Iyer, Balaji V


 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Wednesday, February 12, 2014 12:10 PM
 To: Iyer, Balaji V
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
 On Wed, Feb 12, 2014 at 05:04:38PM +, Iyer, Balaji V wrote:
  I looked at the test code you send me (cf3.cc) at -O1 and it is
  removing all the lines you have shown above.  Yes, I would imagine -O0
  to have code that can be redundant or unnecessary.  Some of it could
  be the artifact of internal code insertion.  But isn't the main job of
  the instruction scheduler to remove all these redundant work?
  Besides, it is just a function call.  The compiler at -O2, -O and -O3
  removes the chunk of code that you mentioned.
 
 As I said, just change the testcase so that the operator isn't inline, and
 suddenly even -O3 will not be able to remove the call.

I am sorry, I do not see any operators being asked to inline explicitly..

class I
{
public:
  typedef ptrdiff_t difference_type;
  I ();
  ~I ();
  I (T *);
  I (const I );
  T operator * ();
  T *operator - ();
  T operator [] (const difference_type ) const;
  I operator = (const I );
  I operator ++ ();
  I operator ++ (int);
  I operator -- ();
  I operator -- (int);
  I operator += (const difference_type );
  I operator -= (const difference_type );
  I operator + (const difference_type ) const;
  I operator - (const difference_type ) const;
  template typename S friend bool operator == (IS , IS );
  template typename S friend bool operator == (const IS , const IS );
  template typename S friend bool operator  (IS , IS );
  template typename S friend bool operator  (const IS , const IS );
  template typename S friend bool operator = (IS , IS );
  template typename S friend bool operator = (const IS , const IS );
  template typename S friend bool operator  (IS , IS );
  template typename S friend bool operator  (const IS , const IS );
  template typename S friend bool operator = (IS , IS );
  template typename S friend bool operator = (const IS , const IS );
  template typename S friend typename IS::difference_type operator - (IS 
, IS );
  template typename S friend typename IS::difference_type operator - (const 
IS , const IS );
  template typename S friend IS operator + (typename IS::difference_type 
, const IS );
private:
  T *p;
};


 
   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-10 Thread Jakub Jelinek
On Fri, Feb 07, 2014 at 10:14:21PM +, Iyer, Balaji V wrote:
   Attached, please find a fixed patch. Along with it, I have also
 added 2 changelog files for C and C++ respectively.

Have you even looked at the second testcase I've posted?
gimplification ICEs on it with your latest patch, because firstprivate
clause is added for the same variable multiple times, and it seems parallel
still isn't around _Cilk_for.

Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-07 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 05:27:26AM +, Iyer, Balaji V wrote:
   Attached, please find a fixed patch (diff.txt) that will do as you 
 requested (model _Cilk_for like a #pragma omp parallel for). Along with this, 
 I have also attached two Changelog entries (1 for C and 1 for C++).
   It passes all the tests on my x86_64 box (both 32 and 64 bit modes) and 
 does not affect any other tests in the testsuite.
   Is this Ok for trunk?

A step in the right direction, but I still see issues just from looking at
the *.gimple dump:

For the first testcase, I see:
iter = std::vectorint::begin (array); [return slot optimization]
iter.1 = iter;
D.13615 = std::vectorint::end (array); [return slot optimization]
try
  {
retval.0 = __gnu_cxx::operator-int*, std::vectorint  
(D.13615, iter);
  }
finally
  {
D.13615 = {CLOBBER};
  }
#pragma omp parallel schedule(cilk-for grain,0) if(retval.0)
#shared(iter.1) shared(D.13632) shared(D.13615) shared(iter)
  {
difference_type retval.2;
const difference_type D.13633;
int D.13725;
struct __normal_iterator  D.13726;
bool retval.3;
int  D.13728;
int D.13729;
int  D.13732;

iter = iter.1;
 private(D.13631)
_Cilk_for (D.13631 = 0; D.13631 != retval.2; D.13631 = D.13631 
+ 1)
  D.13725 = D.13631 - D.13632;

So, the issues I see:
1) what is iter.1, why do you have it at all, and, after all, the iterator
is a class that needs to be constructed/destructed in the general way, so
creating any further copies of something is both costly and undesirable

2) the schedule clause doesn't belong on the omp parallel, but on the _Cilk_for

3) iter should be firstprivate, and there should be no explicit private var
with assignment during gimplification, just handle it like any other
firstprivate during omp lowering

4) the printing looks weird for _Cilk_for, as I said earlier, the clauses
should probably be printed after the closing ) of _Cilk_for rather than
after nothing on the previous line; also, there is no {} printed around the
_Cilk_for body and the next line is weirdly indented

But more importantly, if I create some testcase with a generic C++
conforming iterator (copied over from
libgomp/testsuite/libgomp.c++/for-1.C), as in the second testcase, the
*.gimple dump shows that _Cilk_for is still around the #pragma omp parallel.
The intent of the second testcase is that you can really eyeball all the
ctors/dtors/copy ctors etc. that should happen, and for -O0 shouldn't be
really inlined.

Jakub
#include vector

void
foo (std::vectorint array)
{
  _Cilk_for (std::vectorint::iterator iter = array.begin(); iter != 
array.end(); iter++)
  {
if (*iter  == 6)
  *iter = 13;
  }
}
typedef __PTRDIFF_TYPE__ ptrdiff_t;

template typename T
class I
{
public:
  typedef ptrdiff_t difference_type;
  I ();
  ~I ();
  I (T *);
  I (const I );
  T operator * ();
  T *operator - ();
  T operator [] (const difference_type ) const;
  I operator = (const I );
  I operator ++ ();
  I operator ++ (int);
  I operator -- ();
  I operator -- (int);
  I operator += (const difference_type );
  I operator -= (const difference_type );
  I operator + (const difference_type ) const;
  I operator - (const difference_type ) const;
  template typename S friend bool operator == (IS , IS );
  template typename S friend bool operator == (const IS , const IS );
  template typename S friend bool operator  (IS , IS );
  template typename S friend bool operator  (const IS , const IS );
  template typename S friend bool operator = (IS , IS );
  template typename S friend bool operator = (const IS , const IS );
  template typename S friend bool operator  (IS , IS );
  template typename S friend bool operator  (const IS , const IS );
  template typename S friend bool operator = (IS , IS );
  template typename S friend bool operator = (const IS , const IS );
  template typename S friend typename IS::difference_type operator - (IS 
, IS );
  template typename S friend typename IS::difference_type operator - (const 
IS , const IS );
  template typename S friend IS operator + (typename IS::difference_type 
, const IS );
private:
  T *p;
};
template typename T IT::I () : p (0) {}
template typename T IT::~I () {}
template typename T IT::I (T *x) : p (x) {}
template typename T IT::I (const I x) : p (x.p) {}
template typename T T IT::operator * () { return *p; }
template typename T T *IT::operator - () { return p; }
template typename T T IT::operator [] (const difference_type x) const { 
return p[x]; }
template typename T IT IT::operator = (const I x) { p = x.p; return 
*this; }
template typename T IT IT::operator ++ () { ++p; return 

RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-07 Thread Iyer, Balaji V


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
 Sent: Friday, February 7, 2014 9:03 AM
 To: Iyer, Balaji V
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
 On Wed, Feb 05, 2014 at 05:27:26AM +, Iyer, Balaji V wrote:
  Attached, please find a fixed patch (diff.txt) that will do as you
 requested (model _Cilk_for like a #pragma omp parallel for). Along with this,
 I have also attached two Changelog entries (1 for C and 1 for C++).
  It passes all the tests on my x86_64 box (both 32 and 64 bit modes)
 and does not affect any other tests in the testsuite.
  Is this Ok for trunk?
 
 A step in the right direction, but I still see issues just from looking at the
 *.gimple dump:
 
 For the first testcase, I see:
 iter = std::vectorint::begin (array); [return slot optimization]
 iter.1 = iter;
 D.13615 = std::vectorint::end (array); [return slot 
 optimization]
 try
   {
 retval.0 = __gnu_cxx::operator-int*, std::vectorint  
 (D.13615,
 iter);
   }
 finally
   {
 D.13615 = {CLOBBER};
   }
 #pragma omp parallel schedule(cilk-for grain,0) if(retval.0)
 #shared(iter.1) shared(D.13632) shared(D.13615) shared(iter)
   {
 difference_type retval.2;
 const difference_type D.13633;
 int D.13725;
 struct __normal_iterator  D.13726;
 bool retval.3;
 int  D.13728;
 int D.13729;
 int  D.13732;
 
 iter = iter.1;
  private(D.13631)
 _Cilk_for (D.13631 = 0; D.13631 != retval.2; D.13631 = 
 D.13631 + 1)
   D.13725 = D.13631 - D.13632;
 
 So, the issues I see:
 1) what is iter.1, why do you have it at all, and, after all, the iterator is 
 a class
 that needs to be constructed/destructed in the general way, so creating any
 further copies of something is both costly and undesirable
 

Well, to get the loop count, I need to calculate it using operator-(array.end 
(), iter).

Now, if I do that iter is already set. I need to reset iter back to the 
original one (array.begin ()) in the child function. This is why I used a 
temporary variable called iter1.



 2) the schedule clause doesn't belong on the omp parallel, but on the
 _Cilk_for
 

What if grain is a variable say x? If I have it in the _Cilk_for, then won't 
it create omp_data_i-x. That is not correct. It should just emit x. But let 
me look into this to make sure...

 3) iter should be firstprivate, and there should be no explicit private var 
 with
 assignment during gimplification, just handle it like any other firstprivate
 during omp lowering
 

Do you mean to say I should manually insert a firstprivate for iter and not the 
system figure out that it is shared? 


 4) the printing looks weird for _Cilk_for, as I said earlier, the clauses 
 should
 probably be printed after the closing ) of _Cilk_for rather than after nothing
 on the previous line; also, there is no {} printed around the _Cilk_for body
 and the next line is weirdly indented
 

Ok will look into this.

 But more importantly, if I create some testcase with a generic C++
 conforming iterator (copied over from libgomp/testsuite/libgomp.c++/for-
 1.C), as in the second testcase, the *.gimple dump shows that _Cilk_for is 
 still
 around the #pragma omp parallel.
 The intent of the second testcase is that you can really eyeball all the
 ctors/dtors/copy ctors etc. that should happen, and for -O0 shouldn't be
 really inlined.
 
   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-07 Thread Jakub Jelinek
On Fri, Feb 07, 2014 at 02:33:41PM +, Iyer, Balaji V wrote:
  So, the issues I see:
  1) what is iter.1, why do you have it at all, and, after all, the iterator 
  is a class
  that needs to be constructed/destructed in the general way, so creating any
  further copies of something is both costly and undesirable
  
 
 Well, to get the loop count, I need to calculate it using operator-(array.end 
 (), iter).
 
 Now, if I do that iter is already set. I need to reset iter back to the
 original one (array.begin ()) in the child function.  This is why I used a
 temporary variable called iter1.

operator- shouldn't really change iter, if it does, it is purely the user's
fault, isn't it?  It isn't operator -=, so it shouldn't really change
array.end () either.

  2) the schedule clause doesn't belong on the omp parallel, but on the
  _Cilk_for
  
 
 What if grain is a variable say x? If I have it in the _Cilk_for, then
 won't it create omp_data_i-x.  That is not correct.  It should just emit
 x. But let me look into this to make sure...

You certainly should gimplify the clause operand before the omp parallel, it
must be an integral anyway, right?  So just use get_temp_regvar?
Then simply use firstprivate on the #pragma omp parallel.  When you actually
omp expand, you'll still be able to find the original variable and look it
up on the parallel.  But, if you can't make it work, guess I could live with
the clause on the parallel.

  3) iter should be firstprivate, and there should be no explicit private var 
  with
  assignment during gimplification, just handle it like any other firstprivate
  during omp lowering
  
 
 Do you mean to say I should manually insert a firstprivate for iter and
 not the system figure out that it is shared?

Yes.  The class iterator is quite special thing, because already the C++ FE
lowers it to an integral iterator instead.  And when you make it
firstprivate, omp lowering/expansion should take care of running the copy
constructor/destructor in the parallel for you.

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-01-31 Thread Iyer, Balaji V
Hello Everyone,
Did anyone get a chance to look at this patch (link to patches: 
http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01612.html)? I tried to do as 
Jakub mentioned but it hits a road-block when it comes to iterators due to 
variable scoping issues.
This patch does not disrupt any other parts of the code and all the 
code here are wrapped inside a check for for Cilk Plus enabling. It also passes 
all the tests and does not disrupt any existing failing or passing ones for 
both 32 and 64 bit modes in my x86_64 machine. 

It is the last feature to make Cilk Plus feature-complete. Is the patch OK for 
trunk?

Thanks,

Balaji V. Iyer.

 -Original Message-
 From: Iyer, Balaji V
 Sent: Wednesday, January 29, 2014 10:54 AM
 To: Jakub Jelinek
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
 
 Hi Jakub,
 
  -Original Message-
  From: Jakub Jelinek [mailto:ja...@redhat.com]
  Sent: Wednesday, January 29, 2014 6:31 AM
  To: Iyer, Balaji V
  Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
  'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
  Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
  On Tue, Jan 28, 2014 at 04:55:38PM +, Iyer, Balaji V wrote:
 I thought about it a bit more, and the main issue here is that we
   need access to the _Cilk_for loop's components both inside the child
   function and the parent function.
 
  I guess for the C++ iterators, if in the _Cilk_for model you need to
  provide number of iterations before parallelization, it really depends
  on what the standard allows and what you want to do exactly.
 
 Yes, I need the value before the parallelization context hits. This is why in 
 my
 last patch I had the parallel around the body and omp for around the _Cilk-
 for statement.
 
 
  If you need to provide the iteration count before spawning the threads
  and the standard allows you that, then just lower it in the C++ FE
  already so that you do:
vectorint::iterator temp = array.begin ();
sizetype tempcount = (array.end () - temp); before the parallel, and then
#pragma omp parallel firstprivate(temp, tempcount)
  _Cilk_for (sizetype temp2 = 0; temp2  tempcount; temp2++)
{
  vectorint::iterator ii = temp + temp2;
  body
}
 
 This is kind of what I did (atlest tried to accomplish what you mentioned
 above). I can look into doing this, but is it possible for you to accept the 
 patch
 as-is and we will look into fixing it in the future?
 
 Thanks,
 
 Balaji V. Iyer.
 
  or similar.  The C++ FE needs to lower the C++ iterators anyway, the
  middle- end can really only work with integral or pointer iterators,
  and it depends on how exactly the Cilk+ standard defines _Cilk_for
  with iterators (what methods must be implemented on the iterators and
  what methods and in what order should be called).
 
  Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-01-29 Thread Jakub Jelinek
On Tue, Jan 28, 2014 at 04:55:38PM +, Iyer, Balaji V wrote:
   I thought about it a bit more, and the main issue here is that we
 need access to the _Cilk_for loop's components both inside the child
 function and the parent function.

I guess for the C++ iterators, if in the _Cilk_for model you need to provide
number of iterations before parallelization, it really depends on what the
standard allows and what you want to do exactly.
If you need to provide the iteration count before spawning the threads and
the standard allows you that, then just lower it in the C++ FE already
so that you do:
  vectorint::iterator temp = array.begin ();
  sizetype tempcount = (array.end () - temp);
before the parallel, and then
  #pragma omp parallel firstprivate(temp, tempcount)
_Cilk_for (sizetype temp2 = 0; temp2  tempcount; temp2++)
  {
vectorint::iterator ii = temp + temp2;
body
  }
or similar.  The C++ FE needs to lower the C++ iterators anyway, the
middle-end can really only work with integral or pointer iterators, and it
depends on how exactly the Cilk+ standard defines _Cilk_for with iterators
(what methods must be implemented on the iterators and what methods and in
what order should be called).

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-01-29 Thread Iyer, Balaji V
Hi Jakub,

 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Wednesday, January 29, 2014 6:31 AM
 To: Iyer, Balaji V
 Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
 On Tue, Jan 28, 2014 at 04:55:38PM +, Iyer, Balaji V wrote:
  I thought about it a bit more, and the main issue here is that we
  need access to the _Cilk_for loop's components both inside the child
  function and the parent function.
 
 I guess for the C++ iterators, if in the _Cilk_for model you need to provide
 number of iterations before parallelization, it really depends on what the
 standard allows and what you want to do exactly.

Yes, I need the value before the parallelization context hits. This is why in 
my last patch I had the parallel around the body and omp for around the 
_Cilk-for statement. 


 If you need to provide the iteration count before spawning the threads and
 the standard allows you that, then just lower it in the C++ FE already so that
 you do:
   vectorint::iterator temp = array.begin ();
   sizetype tempcount = (array.end () - temp); before the parallel, and then
   #pragma omp parallel firstprivate(temp, tempcount)
 _Cilk_for (sizetype temp2 = 0; temp2  tempcount; temp2++)
   {
 vectorint::iterator ii = temp + temp2;
 body
   }

This is kind of what I did (atlest tried to accomplish what you mentioned 
above). I can look into doing this, but is it possible for you to accept the 
patch as-is and we will look into fixing it in the future?

Thanks,

Balaji V. Iyer.

 or similar.  The C++ FE needs to lower the C++ iterators anyway, the middle-
 end can really only work with integral or pointer iterators, and it depends on
 how exactly the Cilk+ standard defines _Cilk_for with iterators (what
 methods must be implemented on the iterators and what methods and in
 what order should be called).
 
   Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-01-28 Thread Iyer, Balaji V


 -Original Message-
 From: Iyer, Balaji V
 Sent: Monday, January 27, 2014 4:36 PM
 To: Jakub Jelinek
 Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
 
  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
  ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
  Sent: Monday, January 27, 2014 3:50 PM
  To: Iyer, Balaji V
  Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez';
  'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
  Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
  On Mon, Jan 27, 2014 at 08:41:14PM +, Iyer, Balaji V wrote:
 Did you get a chance to look at this _Cilk_for patch?
 
  IMHO it is not as much work as you are fearing, at most a few hours of
  work to get it right, and well worth doing.  So, please at least try
  it out and if you get stuck with it, explain why.
 
 Hi Jakub,
   I tried it that way in the original patch submission for C
 (http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01369.html), but it hit a
 dead-end when I was trying to get STL iterators working for C++. This is why I
 re-structured things this way to get them both working.
 
 Thanks,
 
 Balaji V. Iyer.
 

Hi Jakub,
I thought about it a bit more, and the main issue here is that we need 
access to the _Cilk_for loop's components both inside the child function and 
the parent function.

So, at the moment, I have modelled the _Cilk_for as something like this:

#pragma omp for  schedule (runtime: grain)
_Cilk-for (vectorint::iterator ii = array.begin (); ii != array.end (); ii++)
#pragma omp parallel 
{
body
}

From what I understand, you feel this is a bit ugly and you want this to be 
modelled something like this?

#pragma omp parallel for schedule (runtime: grain)
_Cilk_for (vectorint::iterator ii = array.begin (); ii != array.end(); ii++)
{
body
}

Am I right?

As it stands, doing it the way you suggested did not work when we have 
iterators since iterator expansion pushed inside the child function and its 
expanded variables are not accessible outside the child function by 
gimplify_omp_for. That is, the expansion is put after #pragma omp parallel for 
and that is all pulled into the child function and thus the information to 
compute the count is lost for the parent function.

There is a hack that I think may get around this. This is a bit ugly and really 
is not the way I would think of _Cilk_fors. I am OK with trying this if you 
will accept it.
 
If I do something like this:
 
#pragma omp parallel for schedule (runtime:grain) if ((array.end() - 
array.begin ())/1)
_Cilk_for (vector int::iterator ii = array.begin (); ii != array.end (); ii++)
{
body
}

The new addition is if clause where if ((end - start) / step)

Then, in the expand_parallel_task, I can extract the if (...) clause and then 
pass the expression as a parameter for the loop-count. Yes, it's bit ugly but 
if you are willing to accept it, I can try to implement this.

Please let me know.

Thanks,

Balaji V. Iyer.


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-01-27 Thread Jakub Jelinek
On Mon, Jan 27, 2014 at 08:41:14PM +, Iyer, Balaji V wrote:
   Did you get a chance to look at this _Cilk_for patch? 

IMHO it is not as much work as you are fearing, at most a few hours of work
to get it right, and well worth doing.  So, please at least try it out
and if you get stuck with it, explain why.

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-01-27 Thread Iyer, Balaji V
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
 Sent: Monday, January 27, 2014 3:50 PM
 To: Iyer, Balaji V
 Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
 'r...@redhat.com'
 Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
 
 On Mon, Jan 27, 2014 at 08:41:14PM +, Iyer, Balaji V wrote:
  Did you get a chance to look at this _Cilk_for patch?
 
 IMHO it is not as much work as you are fearing, at most a few hours of work
 to get it right, and well worth doing.  So, please at least try it out and if 
 you
 get stuck with it, explain why.

Hi Jakub,
I tried it that way in the original patch submission for C 
(http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01369.html), but it hit a 
dead-end when I was trying to get STL iterators working for C++. This is why I 
re-structured things this way to get them both working.

Thanks,

Balaji V. Iyer.

 
   Jakub