[PING]RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-26 Thread Iyer, Balaji V
Hi Jakub,
Did you get a chance to look at this?

Thanks,

Balaji V. Iyer.

> -Original Message-
> From: Iyer, Balaji V
> Sent: Monday, February 24, 2014 6:17 PM
> To: 'Jakub Jelinek'
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'r...@redhat.com'
> Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
> 
> Hi Jakub,
>   Please see my responses below.
> 
> > -Original Message-
> > From: Iyer, Balaji V
> > Sent: Thursday, February 20, 2014 11:38 PM
> > To: Jakub Jelinek
> > Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
> > 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
> > Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
> >
> > Hi Jakub,
> > I have attached the fixed patch and have answered your questions
> > below.
> >
> > > -Original Message-
> > > From: Jakub Jelinek [mailto:ja...@redhat.com]
> > > Sent: Wednesday, February 19, 2014 6:24 AM
> > > To: Iyer, Balaji V
> > > Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
> > > 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
> > > Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> > >
> > > On Wed, Feb 19, 2014 at 04:43:06AM +, Iyer, Balaji V wrote:
> > > > Attached, please find a patch with the test case attached (for1.cc).
> > > > The patch is the same but the cp-changelog has been modified to
> > > > reflect the new test-case.  Is this OK to install?
> > >
> > > 1) have you tested the patch at all?  I see
> > > FAIL: g++.dg/gomp/for-1.C -std=c++98  (test for errors, line 27)
> > > FAIL: g++.dg/gomp/for-1.C -std=c++98 (test for excess errors)
> > > FAIL: g++.dg/gomp/for-1.C -std=c++11  (test for errors, line 27)
> > > FAIL: g++.dg/gomp/for-1.C -std=c++11 (test for excess errors)
> > > FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (internal compiler error)
> > > FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line
> > > 30)
> > > FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line
> > > 37)
> > > FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line
> > > 40)
> > > FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for excess errors)
> > > FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (internal compiler error)
> > > FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line
> > > 30)
> > > FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line
> > > 37)
> > > FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line
> > > 40)
> > > FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for excess errors)
> > > regressions caused by the patch, that is of course unacceptable.
> > >
> >
> > Fixed. I apologize for them. I have confirmed that it is OK now.
> >
> >
> > > 2) try this updated cf3.cc, e.g. with -O2 -fcilkplus if you can't
> > > find out why calling something multiple times is a bad idea,
> > > actually the latest patch is even worse than the older one, you now
> > > create 3 calls to the end method and 3 calls to operator-.  There
> > > should be just one call to that, before the #pragma omp parallel
> > > obviously, anything that
> > doesn't do that is just bad.
> > > I don't see a point in having if clause on the _Cilk_for, just keep
> > > it on the #pragma omp parallel only, at ompexp time you can easily
> > > find it there, there is no point to check it again in the parallel
> > > body of the
> > function.
> > >
> >
> > I have removed the if-clause from the _Cilk_for and now it is just in
> > #pragma omp parallel.
> >
> > I have removed the 3rd operator-, but I am not able to remove the 2nd.
> > I am looking into it, but I am not able to do it. The thing is, first
> > operator- was for the if-clause, which I need to calculate the
> > loop-count for the
> > __cilkrts_cilk_for_64 function. The second one is not necessary
> > because the end-value does not matter for _Cilk_for since they will be
> > replaced with the low and high values.  I tried several things such as
> > stopping gimplifcation of the cond value, or replacing it with a
> > constant, etc and those are causing other problems elsewhere.
> >
> >  The thing is, I am not able to find a way to automate this. I can't
> > assume the cond's  end-value is same as count, because this is only
> > true if we have an iterator and the handle_omp_for_iterator function
> > modifies the cond value correctly. I need to use the count value
> > (retval.0) as retval.1 but count-value is computed at a later time
> > than handle_omp_for_iterator (since it does not have any knowledge
> > about the #pragma omp parallel). It is giving the correct answers for
> > the benchmarks and is removing the 2nd operator- when optimization is
> turned on for the inlinable operator-.
> >
> > Can you provide me some advice about how to do it?
> 
> I have fixed the issue. Now the 2nd operator- does not occur. Attached,
> please fixed a fixed patch. Is this OK for trunk?
> 
> Thanks,
> 
> Balaji V. Iyer.


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-19 Thread Jakub Jelinek
On Wed, Feb 19, 2014 at 04:43:06AM +, Iyer, Balaji V wrote:
> Attached, please find a patch with the test case attached (for1.cc). The
> patch is the same but the cp-changelog has been modified to reflect the
> new test-case.  Is this OK to install?

1) have you tested the patch at all?  I see
FAIL: g++.dg/gomp/for-1.C -std=c++98  (test for errors, line 27)
FAIL: g++.dg/gomp/for-1.C -std=c++98 (test for excess errors)
FAIL: g++.dg/gomp/for-1.C -std=c++11  (test for errors, line 27)
FAIL: g++.dg/gomp/for-1.C -std=c++11 (test for excess errors)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (internal compiler error)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 30)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 37)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98  (test for warnings, line 40)
FAIL: g++.dg/gomp/for-19.C -std=gnu++98 (test for excess errors)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (internal compiler error)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 30)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 37)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11  (test for warnings, line 40)
FAIL: g++.dg/gomp/for-19.C -std=gnu++11 (test for excess errors)
regressions caused by the patch, that is of course unacceptable.

2) try this updated cf3.cc, e.g. with -O2 -fcilkplus if you can't find out
why calling something multiple times is a bad idea, actually the latest patch
is even worse than the older one, you now create 3 calls to the end method
and 3 calls to operator-.  There should be just one call to that, before the
#pragma omp parallel obviously, anything that doesn't do that is just bad.
I don't see a point in having if clause on the _Cilk_for, just keep it on
the #pragma omp parallel only, at ompexp time you can easily find it there,
there is no point to check it again in the parallel body of the function.

typedef __PTRDIFF_TYPE__ ptrdiff_t;

template 
class I
{
public:
  typedef ptrdiff_t difference_type;
  I ();
  ~I ();
  I (T *);
  I (const I &);
  T &operator * ();
  T *operator -> ();
  T &operator [] (const difference_type &) const;
  I &operator = (const I &);
  I &operator ++ ();
  I operator ++ (int);
  I &operator -- ();
  I operator -- (int);
  I &operator += (const difference_type &);
  I &operator -= (const difference_type &);
  I operator + (const difference_type &) const;
  I operator - (const difference_type &) const;
  template  friend bool operator == (I &, I &);
  template  friend bool operator == (const I &, const I &);
  template  friend bool operator < (I &, I &);
  template  friend bool operator < (const I &, const I &);
  template  friend bool operator <= (I &, I &);
  template  friend bool operator <= (const I &, const I &);
  template  friend bool operator > (I &, I &);
  template  friend bool operator > (const I &, const I &);
  template  friend bool operator >= (I &, I &);
  template  friend bool operator >= (const I &, const I &);
  template  friend typename I::difference_type operator - (I 
&, I &);
  template  friend typename I::difference_type operator - (const 
I &, const I &);
  template  friend I operator + (typename I::difference_type 
, const I &);
private:
  T *p;
};
template  I::I () : p (0) {}
template  I::~I () {}
template  I::I (T *x) : p (x) {}
template  I::I (const I &x) : p (x.p) {}
template  T &I::operator * () { return *p; }
template  T *I::operator -> () { return p; }
template  T &I::operator [] (const difference_type &x) const { 
return p[x]; }
template  I &I::operator = (const I &x) { p = x.p; return 
*this; }
template  I &I::operator ++ () { ++p; return *this; }
template  I I::operator ++ (int) { return I (p++); }
template  I &I::operator -- () { --p; return *this; }
template  I I::operator -- (int) { return I (p--); }
template  I &I::operator += (const difference_type &x) { p += 
x; return *this; }
template  I &I::operator -= (const difference_type &x) { p -= 
x; return *this; }
template  I I::operator + (const difference_type &x) const { 
return I (p + x); }
template  __attribute__((noinline)) I I::operator - (const 
difference_type &x) const { __asm (""); return I (p - x); }
template  bool operator == (I &x, I &y) { return x.p == y.p; }
template  bool operator == (const I &x, const I &y) { return 
x.p == y.p; }
template  bool operator != (I &x, I &y) { return !(x == y); }
template  bool operator != (const I &x, const I &y) { return 
!(x == y); }
template  bool operator < (I &x, I &y) { return x.p < y.p; }
template  bool operator < (const I &x, const I &y) { return 
x.p < y.p; }
template  bool operator <= (I &x, I &y) { return x.p <= y.p; }
template  bool operator <= (const I &x, const I &y) { return 
x.p <= y.p; }
template  bool operator > (I &x, I &y) { return x.p > y.p; }
template  bool operator > (const I &x, const I &y) { return 
x.p > y.p; }
template  bool operator >= (I &x, I &y) { return x.p >= y.p; }
template  bool operator >= (const I &x, const I &y) { return 
x.p >= y.p; }
te

RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Iyer, Balaji V


> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Wednesday, February 12, 2014 12:10 PM
> To: Iyer, Balaji V
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'r...@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Wed, Feb 12, 2014 at 05:04:38PM +, Iyer, Balaji V wrote:
> > I looked at the test code you send me (cf3.cc) at -O1 and it is
> > removing all the lines you have shown above.  Yes, I would imagine -O0
> > to have code that can be redundant or unnecessary.  Some of it could
> > be the artifact of internal code insertion.  But isn't the main job of
> > the instruction scheduler to remove all these redundant work?
> > Besides, it is just a function call.  The compiler at -O2, -O and -O3
> > removes the chunk of code that you mentioned.
> 
> As I said, just change the testcase so that the operator isn't inline, and
> suddenly even -O3 will not be able to remove the call.

I am sorry, I do not see any operators being asked to inline explicitly..

class I
{
public:
  typedef ptrdiff_t difference_type;
  I ();
  ~I ();
  I (T *);
  I (const I &);
  T &operator * ();
  T *operator -> ();
  T &operator [] (const difference_type &) const;
  I &operator = (const I &);
  I &operator ++ ();
  I operator ++ (int);
  I &operator -- ();
  I operator -- (int);
  I &operator += (const difference_type &);
  I &operator -= (const difference_type &);
  I operator + (const difference_type &) const;
  I operator - (const difference_type &) const;
  template  friend bool operator == (I &, I &);
  template  friend bool operator == (const I &, const I &);
  template  friend bool operator < (I &, I &);
  template  friend bool operator < (const I &, const I &);
  template  friend bool operator <= (I &, I &);
  template  friend bool operator <= (const I &, const I &);
  template  friend bool operator > (I &, I &);
  template  friend bool operator > (const I &, const I &);
  template  friend bool operator >= (I &, I &);
  template  friend bool operator >= (const I &, const I &);
  template  friend typename I::difference_type operator - (I 
&, I &);
  template  friend typename I::difference_type operator - (const 
I &, const I &);
  template  friend I operator + (typename I::difference_type 
, const I &);
private:
  T *p;
};


> 
>   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Jakub Jelinek
On Wed, Feb 12, 2014 at 05:04:38PM +, Iyer, Balaji V wrote:
> I looked at the test code you send me (cf3.cc) at -O1 and it is removing
> all the lines you have shown above.  Yes, I would imagine -O0 to have code
> that can be redundant or unnecessary.  Some of it could be the artifact of
> internal code insertion.  But isn't the main job of the instruction
> scheduler to remove all these redundant work?  Besides, it is just a
> function call.  The compiler at -O2, -O and -O3 removes the chunk of code
> that you mentioned.

As I said, just change the testcase so that the operator isn't inline, and
suddenly even -O3 will not be able to remove the call.

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Iyer, Balaji V
> > > More importantly, what is retval.1?  I'd expect you should be using
> > > retval.0 there and have it also as firstprivate(retval.0) on the parallel.
> > > In *.omplower dump I actually see:
> > > retval.0 = operator- (D.2885, &i); ...
> > > retval.1 = operator- (D.2888, &i);
> > > i.e. operator- is called twice.
> > >
> >
> > Yes, one is for the if-clause and one is for the condition. It really 
> > doesn't
> matter because we get of the stuff in the condition and replace with our own
> for loop with something like the for-loop shown  below. So retval1 doesn't
> come into picture. It is only alive from parser to the expand_cilk_for 
> function.
> >
> > For (i = low; i < high; i++)
> > {
> > <_Cilk_for_body>
> > }
> 
> No, it really does matter.  Just look at the *.optimized dump with -O0 -
> fcilkplus:
> 
>   retval.0_4 = operator- (_3, &i);
> in _Z3foo1JIiE function and
>   _4 = .omp_data_i_2(D)->j;
>   _5 = J::end (_4);
>   retval.1_6 = operator- (_5, &i);
>   retval.3_7 = retval.1_6;
> in _Z3foo1JIiE._cilk_for_fn.0.  All the 4 statements are dead, you really
> shouldn't emit them, even when optimizing, if e.g. the operator- isn't inline,
> g++ won't be able to optimize it away.
> 

I looked at the test code you send me (cf3.cc) at -O1 and it is removing all 
the lines you have shown above. Yes, I would imagine -O0 to have code that can 
be redundant or unnecessary. Some of it could be the artifact of internal code 
insertion. But isn't the main job of the instruction scheduler to remove all 
these redundant work? Besides, it is just a function call. The compiler at -O2, 
-O and -O3 removes the chunk of code that you mentioned.

-Balaji V. Iyer.


>   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Jakub Jelinek
On Wed, Feb 12, 2014 at 03:14:23PM +, Iyer, Balaji V wrote:
> > The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure.
> > Perhaps it would be much better though if instead of having a compile time
> > testcase you'd just do what libgomp.c++/for-1.C does, just replace all the
> > #pragma omp parallel for in there with _Cilk_for and turn it into a runtime
> > testcase.
> > 

> I really don't want to do that because I don't think there is a 1:1
> match-up between the rules of #pragma omp for and _Cilk_for.

But there is nothing OpenMP specific on any of the tests, all could as well
be tested by removing all the #pragma omp ... lines and just be tested as
normal C+++ loops.  Is there anything that _Cilk_for wouldn't handle?

IMNSHO if you remove all the #pragma omp parallel for lines (even with any
clauses it sometimes has) and replace for with _Cilk_for on the following
line, you should have a valid Cilk+ program.

Only f11 would need to be changed from:
#pragma omp parallel
  {
#pragma omp for nowait
for (T i = x; i <= y; i += 3)
  baz (i);
#pragma omp single
{
  T j = y + 3;
  baz (j);
}
  }
to say:
  _Cilk_for (T i = x; i <= y; i += 3)
baz (i);
  {
T j = y + 3;
baz (j);
  }
so that it performs the same thing.

> > First a minor nit, there is extra newline before _Cilk_for:
> >   newline_and_indent (buffer, spc);
> >   if (flag_cilkplus
> >   && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
> > pp_string (buffer, "_Cilk_for (");
> >   else
> > pp_string (buffer, "for ("); I guess for _Cilk_for collapse is 
> > never > 1,
> > right?  If that is the case, then perhaps you should move the
> > newline_and_indent (buffer, spc); call into the else block.
> > 
> 
> OK. I will fix this and send you a patch? 

Sure.

> > More importantly, what is retval.1?  I'd expect you should be using retval.0
> > there and have it also as firstprivate(retval.0) on the parallel.
> > In *.omplower dump I actually see:
> > retval.0 = operator- (D.2885, &i); ...
> > retval.1 = operator- (D.2888, &i); i.e. 
> > operator- is
> > called twice.
> > 
> 
> Yes, one is for the if-clause and one is for the condition. It really doesn't 
> matter because we get of the stuff in the condition and replace with our own 
> for loop with something like the for-loop shown  below. So retval1 doesn't 
> come into picture. It is only alive from parser to the expand_cilk_for 
> function.
> 
> For (i = low; i < high; i++)
> {
>   <_Cilk_for_body>
> }

No, it really does matter.  Just look at the *.optimized dump with -O0 
-fcilkplus:

  retval.0_4 = operator- (_3, &i);
in _Z3foo1JIiE function and
  _4 = .omp_data_i_2(D)->j;
  _5 = J::end (_4);
  retval.1_6 = operator- (_5, &i);
  retval.3_7 = retval.1_6;
in _Z3foo1JIiE._cilk_for_fn.0.  All the 4 statements are dead, you really
shouldn't emit them, even when optimizing, if e.g. the operator- isn't
inline, g++ won't be able to optimize it away.

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Iyer, Balaji V


> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Wednesday, February 12, 2014 9:59 AM
> To: Iyer, Balaji V
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'r...@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Mon, Feb 10, 2014 at 10:07:18PM +, Iyer, Balaji V wrote:
> > I looked at both but forgot to test them with my implementation. Sorry
> > about this.  I have fixed the ICE issue.  To make sure this does not
> > happen further, I have added your test cf3.C into test suite (renamed
> > to cf3.cc).  I hope that is OK with you.
> 
> The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure.
> Perhaps it would be much better though if instead of having a compile time
> testcase you'd just do what libgomp.c++/for-1.C does, just replace all the
> #pragma omp parallel for in there with _Cilk_for and turn it into a runtime
> testcase.
> 
I really don't want to do that because I don't think there is a 1:1 match-up 
between the rules of #pragma omp for and _Cilk_for. 

> > I have attached a fixed patch and Changelogs. Is this OK?
> 
> Looks better (note, still looking just at the dumps), but not completely ok
> yet.  On cf3.cc, I see in *.gimple:
> 
> D.2883 = J::begin (j);
> I::I (&i, D.2883);
> D.2885 = J::end (j);
> retval.0 = operator- (D.2885, &i);
> D.2886 = retval.0 / 2;
> #pragma omp parallel firstprivate(i) if(D.2886) shared(D.2865) 
> shared(j)
>   {
> difference_type retval.1;
> const struct I & D.2888;
> const difference_type D.2866;
> long int D.2889;
> struct I & D.2890;
> 
> try
>   {
> 
> _Cilk_for (D.2864 = 0; D.2864 < retval.1; D.2864 = D.2864 + 2)
> private(D.2864)
>   {
> D.2889 = D.2864 - D.2865;
> D.2866 = D.2889;
> try
>   {
> D.2890 = I::operator+= (&i, &D.2866);
> 
> First a minor nit, there is extra newline before _Cilk_for:
>   newline_and_indent (buffer, spc);
>   if (flag_cilkplus
>   && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
> pp_string (buffer, "_Cilk_for (");
>   else
> pp_string (buffer, "for ("); I guess for _Cilk_for collapse is 
> never > 1,
> right?  If that is the case, then perhaps you should move the
> newline_and_indent (buffer, spc); call into the else block.
> 

OK. I will fix this and send you a patch? 

> More importantly, what is retval.1?  I'd expect you should be using retval.0
> there and have it also as firstprivate(retval.0) on the parallel.
> In *.omplower dump I actually see:
> retval.0 = operator- (D.2885, &i); ...
> retval.1 = operator- (D.2888, &i); i.e. 
> operator- is
> called twice.
> 

Yes, one is for the if-clause and one is for the condition. It really doesn't 
matter because we get of the stuff in the condition and replace with our own 
for loop with something like the for-loop shown  below. So retval1 doesn't come 
into picture. It is only alive from parser to the expand_cilk_for function.

For (i = low; i < high; i++)
{
<_Cilk_for_body>
}

So, is there any other changes that you need me to make?

Thanks,

Balaji V. Iyer.


>   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Jakub Jelinek
On Mon, Feb 10, 2014 at 10:07:18PM +, Iyer, Balaji V wrote:
> I looked at both but forgot to test them with my implementation. Sorry
> about this.  I have fixed the ICE issue.  To make sure this does not
> happen further, I have added your test cf3.C into test suite (renamed to
> cf3.cc).  I hope that is OK with you.

The testcase is GPL as the original libgomp.c++/for-1.C testcase, so sure.
Perhaps it would be much better though if instead of having a compile time
testcase you'd just do what libgomp.c++/for-1.C does, just replace all the
#pragma omp parallel for in there with _Cilk_for and turn it into a runtime
testcase.

> I have attached a fixed patch and Changelogs. Is this OK?

Looks better (note, still looking just at the dumps), but not completely ok
yet.  On cf3.cc, I see in *.gimple:

D.2883 = J::begin (j);
I::I (&i, D.2883);
D.2885 = J::end (j);
retval.0 = operator- (D.2885, &i);
D.2886 = retval.0 / 2;
#pragma omp parallel firstprivate(i) if(D.2886) shared(D.2865) shared(j)
  {
difference_type retval.1;
const struct I & D.2888;
const difference_type D.2866;
long int D.2889;
struct I & D.2890;

try
  {

_Cilk_for (D.2864 = 0; D.2864 < retval.1; D.2864 = D.2864 + 2) 
private(D.2864)
  {
D.2889 = D.2864 - D.2865;
D.2866 = D.2889;
try
  {
D.2890 = I::operator+= (&i, &D.2866);

First a minor nit, there is extra newline before _Cilk_for:
  newline_and_indent (buffer, spc);
  if (flag_cilkplus
  && gimple_omp_for_kind (gs) == GF_OMP_FOR_KIND_CILKFOR)
pp_string (buffer, "_Cilk_for (");
  else
pp_string (buffer, "for (");
I guess for _Cilk_for collapse is never > 1, right?  If that is the case,
then perhaps you should move the newline_and_indent (buffer, spc); call
into the else block.

More importantly, what is retval.1?  I'd expect you should be using retval.0
there and have it also as firstprivate(retval.0) on the parallel.
In *.omplower dump I actually see:
retval.0 = operator- (D.2885, &i);
...
retval.1 = operator- (D.2888, &i);
i.e. operator- is called twice.

Jakub


[PING] RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-12 Thread Iyer, Balaji V
Hi Jakub,
Did you get a chance to look at this patch?

Thanks,

Balaji V. Iyer.

> -Original Message-
> From: Iyer, Balaji V
> Sent: Monday, February 10, 2014 5:07 PM
> To: Jakub Jelinek
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'r...@redhat.com'
> Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
> 
> Hi Jakub,
> 
> > -Original Message-
> > From: Jakub Jelinek [mailto:ja...@redhat.com]
> > Sent: Monday, February 10, 2014 12:58 PM
> > To: Iyer, Balaji V
> > Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
> > 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
> > Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> >
> > On Fri, Feb 07, 2014 at 10:14:21PM +, Iyer, Balaji V wrote:
> > >   Attached, please find a fixed patch. Along with it, I have also
> > > added
> > > 2 changelog files for C and C++ respectively.
> >
> > Have you even looked at the second testcase I've posted?
> > gimplification ICEs on it with your latest patch, because firstprivate
> > clause is added for the same variable multiple times, and it seems
> > parallel still isn't around _Cilk_for.
> 
> I looked at both but forgot to test them with my implementation. Sorry
> about this. I have fixed the ICE issue. To make sure this does not happen
> further, I have added your test cf3.C into test suite (renamed to cf3.cc). I
> hope that is OK with you.
> 
> I have attached a fixed patch and Changelogs. Is this OK?
> 
> Thanks,
> 
> Balaji V. Iyer.
> 
> >
> > Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-10 Thread Jakub Jelinek
On Fri, Feb 07, 2014 at 10:14:21PM +, Iyer, Balaji V wrote:
>   Attached, please find a fixed patch. Along with it, I have also
> added 2 changelog files for C and C++ respectively.

Have you even looked at the second testcase I've posted?
gimplification ICEs on it with your latest patch, because firstprivate
clause is added for the same variable multiple times, and it seems parallel
still isn't around _Cilk_for.

Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-07 Thread Jakub Jelinek
On Fri, Feb 07, 2014 at 02:33:41PM +, Iyer, Balaji V wrote:
> > So, the issues I see:
> > 1) what is iter.1, why do you have it at all, and, after all, the iterator 
> > is a class
> > that needs to be constructed/destructed in the general way, so creating any
> > further copies of something is both costly and undesirable
> > 
> 
> Well, to get the loop count, I need to calculate it using operator-(array.end 
> (), &iter).
> 
> Now, if I do that iter is already set. I need to reset iter back to the
> original one (array.begin ()) in the child function.  This is why I used a
> temporary variable called iter1.

operator- shouldn't really change iter, if it does, it is purely the user's
fault, isn't it?  It isn't operator -=, so it shouldn't really change
array.end () either.

> > 2) the schedule clause doesn't belong on the omp parallel, but on the
> > _Cilk_for
> > 
> 
> What if grain is a variable say "x"? If I have it in the _Cilk_for, then
> won't it create omp_data_i->x.  That is not correct.  It should just emit
> "x." But let me look into this to make sure...

You certainly should gimplify the clause operand before the omp parallel, it
must be an integral anyway, right?  So just use get_temp_regvar?
Then simply use firstprivate on the #pragma omp parallel.  When you actually
omp expand, you'll still be able to find the original variable and look it
up on the parallel.  But, if you can't make it work, guess I could live with
the clause on the parallel.

> > 3) iter should be firstprivate, and there should be no explicit private var 
> > with
> > assignment during gimplification, just handle it like any other firstprivate
> > during omp lowering
> > 
> 
> Do you mean to say I should manually insert a firstprivate for iter and
> not the system figure out that it is shared?

Yes.  The class iterator is quite special thing, because already the C++ FE
lowers it to an integral iterator instead.  And when you make it
firstprivate, omp lowering/expansion should take care of running the copy
constructor/destructor in the parallel for you.

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-02-07 Thread Iyer, Balaji V


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
> Sent: Friday, February 7, 2014 9:03 AM
> To: Iyer, Balaji V
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'r...@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Wed, Feb 05, 2014 at 05:27:26AM +, Iyer, Balaji V wrote:
> > Attached, please find a fixed patch (diff.txt) that will do as you
> requested (model _Cilk_for like a #pragma omp parallel for). Along with this,
> I have also attached two Changelog entries (1 for C and 1 for C++).
> > It passes all the tests on my x86_64 box (both 32 and 64 bit modes)
> and does not affect any other tests in the testsuite.
> > Is this Ok for trunk?
> 
> A step in the right direction, but I still see issues just from looking at the
> *.gimple dump:
> 
> For the first testcase, I see:
> iter = std::vector::begin (array); [return slot optimization]
> iter.1 = iter;
> D.13615 = std::vector::end (array); [return slot 
> optimization]
> try
>   {
> retval.0 = __gnu_cxx::operator- > 
> (&D.13615,
> &iter);
>   }
> finally
>   {
> D.13615 = {CLOBBER};
>   }
> #pragma omp parallel schedule(cilk-for grain,0) if(retval.0)
> #shared(iter.1) shared(D.13632) shared(D.13615) shared(iter)
>   {
> difference_type retval.2;
> const difference_type D.13633;
> int D.13725;
> struct __normal_iterator & D.13726;
> bool retval.3;
> int & D.13728;
> int D.13729;
> int & D.13732;
> 
> iter = iter.1;
>  private(D.13631)
> _Cilk_for (D.13631 = 0; D.13631 != retval.2; D.13631 = 
> D.13631 + 1)
>   D.13725 = D.13631 - D.13632;
> 
> So, the issues I see:
> 1) what is iter.1, why do you have it at all, and, after all, the iterator is 
> a class
> that needs to be constructed/destructed in the general way, so creating any
> further copies of something is both costly and undesirable
> 

Well, to get the loop count, I need to calculate it using operator-(array.end 
(), &iter).

Now, if I do that iter is already set. I need to reset iter back to the 
original one (array.begin ()) in the child function. This is why I used a 
temporary variable called iter1.



> 2) the schedule clause doesn't belong on the omp parallel, but on the
> _Cilk_for
> 

What if grain is a variable say "x"? If I have it in the _Cilk_for, then won't 
it create omp_data_i->x. That is not correct. It should just emit "x." But let 
me look into this to make sure...

> 3) iter should be firstprivate, and there should be no explicit private var 
> with
> assignment during gimplification, just handle it like any other firstprivate
> during omp lowering
> 

Do you mean to say I should manually insert a firstprivate for iter and not the 
system figure out that it is shared? 


> 4) the printing looks weird for _Cilk_for, as I said earlier, the clauses 
> should
> probably be printed after the closing ) of _Cilk_for rather than after nothing
> on the previous line; also, there is no {} printed around the _Cilk_for body
> and the next line is weirdly indented
> 

Ok will look into this.

> But more importantly, if I create some testcase with a generic C++
> conforming iterator (copied over from libgomp/testsuite/libgomp.c++/for-
> 1.C), as in the second testcase, the *.gimple dump shows that _Cilk_for is 
> still
> around the #pragma omp parallel.
> The intent of the second testcase is that you can really eyeball all the
> ctors/dtors/copy ctors etc. that should happen, and for -O0 shouldn't be
> really inlined.
> 
>   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-02-07 Thread Jakub Jelinek
On Wed, Feb 05, 2014 at 05:27:26AM +, Iyer, Balaji V wrote:
>   Attached, please find a fixed patch (diff.txt) that will do as you 
> requested (model _Cilk_for like a #pragma omp parallel for). Along with this, 
> I have also attached two Changelog entries (1 for C and 1 for C++).
>   It passes all the tests on my x86_64 box (both 32 and 64 bit modes) and 
> does not affect any other tests in the testsuite.
>   Is this Ok for trunk?

A step in the right direction, but I still see issues just from looking at
the *.gimple dump:

For the first testcase, I see:
iter = std::vector::begin (array); [return slot optimization]
iter.1 = iter;
D.13615 = std::vector::end (array); [return slot optimization]
try
  {
retval.0 = __gnu_cxx::operator- > 
(&D.13615, &iter);
  }
finally
  {
D.13615 = {CLOBBER};
  }
#pragma omp parallel schedule(cilk-for grain,0) if(retval.0)
#shared(iter.1) shared(D.13632) shared(D.13615) shared(iter)
  {
difference_type retval.2;
const difference_type D.13633;
int D.13725;
struct __normal_iterator & D.13726;
bool retval.3;
int & D.13728;
int D.13729;
int & D.13732;

iter = iter.1;
 private(D.13631)
_Cilk_for (D.13631 = 0; D.13631 != retval.2; D.13631 = D.13631 
+ 1)
  D.13725 = D.13631 - D.13632;

So, the issues I see:
1) what is iter.1, why do you have it at all, and, after all, the iterator
is a class that needs to be constructed/destructed in the general way, so
creating any further copies of something is both costly and undesirable

2) the schedule clause doesn't belong on the omp parallel, but on the _Cilk_for

3) iter should be firstprivate, and there should be no explicit private var
with assignment during gimplification, just handle it like any other
firstprivate during omp lowering

4) the printing looks weird for _Cilk_for, as I said earlier, the clauses
should probably be printed after the closing ) of _Cilk_for rather than
after nothing on the previous line; also, there is no {} printed around the
_Cilk_for body and the next line is weirdly indented

But more importantly, if I create some testcase with a generic C++
conforming iterator (copied over from
libgomp/testsuite/libgomp.c++/for-1.C), as in the second testcase, the
*.gimple dump shows that _Cilk_for is still around the #pragma omp parallel.
The intent of the second testcase is that you can really eyeball all the
ctors/dtors/copy ctors etc. that should happen, and for -O0 shouldn't be
really inlined.

Jakub
#include 

void
foo (std::vector &array)
{
  _Cilk_for (std::vector::iterator iter = array.begin(); iter != 
array.end(); iter++)
  {
if (*iter  == 6)
  *iter = 13;
  }
}
typedef __PTRDIFF_TYPE__ ptrdiff_t;

template 
class I
{
public:
  typedef ptrdiff_t difference_type;
  I ();
  ~I ();
  I (T *);
  I (const I &);
  T &operator * ();
  T *operator -> ();
  T &operator [] (const difference_type &) const;
  I &operator = (const I &);
  I &operator ++ ();
  I operator ++ (int);
  I &operator -- ();
  I operator -- (int);
  I &operator += (const difference_type &);
  I &operator -= (const difference_type &);
  I operator + (const difference_type &) const;
  I operator - (const difference_type &) const;
  template  friend bool operator == (I &, I &);
  template  friend bool operator == (const I &, const I &);
  template  friend bool operator < (I &, I &);
  template  friend bool operator < (const I &, const I &);
  template  friend bool operator <= (I &, I &);
  template  friend bool operator <= (const I &, const I &);
  template  friend bool operator > (I &, I &);
  template  friend bool operator > (const I &, const I &);
  template  friend bool operator >= (I &, I &);
  template  friend bool operator >= (const I &, const I &);
  template  friend typename I::difference_type operator - (I 
&, I &);
  template  friend typename I::difference_type operator - (const 
I &, const I &);
  template  friend I operator + (typename I::difference_type 
, const I &);
private:
  T *p;
};
template  I::I () : p (0) {}
template  I::~I () {}
template  I::I (T *x) : p (x) {}
template  I::I (const I &x) : p (x.p) {}
template  T &I::operator * () { return *p; }
template  T *I::operator -> () { return p; }
template  T &I::operator [] (const difference_type &x) const { 
return p[x]; }
template  I &I::operator = (const I &x) { p = x.p; return 
*this; }
template  I &I::operator ++ () { ++p; return *this; }
template  I I::operator ++ (int) { return I (p++); }
template  I &I::operator -- () { --p; return *this; }
template  I I::operator -- (int) { return I (p--); }
template  I &I::operator += (const difference_type &x) { p += 
x; return

RE: [PING] [PATCH] _Cilk_for for C and C++

2014-01-31 Thread Iyer, Balaji V
Hello Everyone,
Did anyone get a chance to look at this patch (link to patches: 
http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01612.html)? I tried to do as 
Jakub mentioned but it hits a road-block when it comes to iterators due to 
variable scoping issues.
This patch does not disrupt any other parts of the code and all the 
code here are wrapped inside a check for for Cilk Plus enabling. It also passes 
all the tests and does not disrupt any existing failing or passing ones for 
both 32 and 64 bit modes in my x86_64 machine. 

It is the last feature to make Cilk Plus feature-complete. Is the patch OK for 
trunk?

Thanks,

Balaji V. Iyer.

> -Original Message-
> From: Iyer, Balaji V
> Sent: Wednesday, January 29, 2014 10:54 AM
> To: Jakub Jelinek
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'r...@redhat.com'
> Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
> 
> Hi Jakub,
> 
> > -Original Message-
> > From: Jakub Jelinek [mailto:ja...@redhat.com]
> > Sent: Wednesday, January 29, 2014 6:31 AM
> > To: Iyer, Balaji V
> > Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez';
> > 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
> > Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> >
> > On Tue, Jan 28, 2014 at 04:55:38PM +, Iyer, Balaji V wrote:
> > >   I thought about it a bit more, and the main issue here is that we
> > > need access to the _Cilk_for loop's components both inside the child
> > > function and the parent function.
> >
> > I guess for the C++ iterators, if in the _Cilk_for model you need to
> > provide number of iterations before parallelization, it really depends
> > on what the standard allows and what you want to do exactly.
> 
> Yes, I need the value before the parallelization context hits. This is why in 
> my
> last patch I had the parallel around the body and omp for around the _Cilk-
> for statement.
> 
> 
> > If you need to provide the iteration count before spawning the threads
> > and the standard allows you that, then just lower it in the C++ FE
> > already so that you do:
> >   vector::iterator temp = array.begin ();
> >   sizetype tempcount = (array.end () - temp); before the parallel, and then
> >   #pragma omp parallel firstprivate(temp, tempcount)
> > _Cilk_for (sizetype temp2 = 0; temp2 < tempcount; temp2++)
> >   {
> > vector::iterator ii = temp + temp2;
> > 
> >   }
> 
> This is kind of what I did (atlest tried to accomplish what you mentioned
> above). I can look into doing this, but is it possible for you to accept the 
> patch
> as-is and we will look into fixing it in the future?
> 
> Thanks,
> 
> Balaji V. Iyer.
> 
> > or similar.  The C++ FE needs to lower the C++ iterators anyway, the
> > middle- end can really only work with integral or pointer iterators,
> > and it depends on how exactly the Cilk+ standard defines _Cilk_for
> > with iterators (what methods must be implemented on the iterators and
> > what methods and in what order should be called).
> >
> > Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-01-29 Thread Iyer, Balaji V
Hi Jakub,

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Wednesday, January 29, 2014 6:31 AM
> To: Iyer, Balaji V
> Cc: 'Jason Merrill'; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'r...@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Tue, Jan 28, 2014 at 04:55:38PM +, Iyer, Balaji V wrote:
> > I thought about it a bit more, and the main issue here is that we
> > need access to the _Cilk_for loop's components both inside the child
> > function and the parent function.
> 
> I guess for the C++ iterators, if in the _Cilk_for model you need to provide
> number of iterations before parallelization, it really depends on what the
> standard allows and what you want to do exactly.

Yes, I need the value before the parallelization context hits. This is why in 
my last patch I had the parallel around the body and omp for around the 
_Cilk-for statement. 


> If you need to provide the iteration count before spawning the threads and
> the standard allows you that, then just lower it in the C++ FE already so that
> you do:
>   vector::iterator temp = array.begin ();
>   sizetype tempcount = (array.end () - temp); before the parallel, and then
>   #pragma omp parallel firstprivate(temp, tempcount)
> _Cilk_for (sizetype temp2 = 0; temp2 < tempcount; temp2++)
>   {
> vector::iterator ii = temp + temp2;
> 
>   }

This is kind of what I did (atlest tried to accomplish what you mentioned 
above). I can look into doing this, but is it possible for you to accept the 
patch as-is and we will look into fixing it in the future?

Thanks,

Balaji V. Iyer.

> or similar.  The C++ FE needs to lower the C++ iterators anyway, the middle-
> end can really only work with integral or pointer iterators, and it depends on
> how exactly the Cilk+ standard defines _Cilk_for with iterators (what
> methods must be implemented on the iterators and what methods and in
> what order should be called).
> 
>   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-01-29 Thread Jakub Jelinek
On Tue, Jan 28, 2014 at 04:55:38PM +, Iyer, Balaji V wrote:
>   I thought about it a bit more, and the main issue here is that we
> need access to the _Cilk_for loop's components both inside the child
> function and the parent function.

I guess for the C++ iterators, if in the _Cilk_for model you need to provide
number of iterations before parallelization, it really depends on what the
standard allows and what you want to do exactly.
If you need to provide the iteration count before spawning the threads and
the standard allows you that, then just lower it in the C++ FE already
so that you do:
  vector::iterator temp = array.begin ();
  sizetype tempcount = (array.end () - temp);
before the parallel, and then
  #pragma omp parallel firstprivate(temp, tempcount)
_Cilk_for (sizetype temp2 = 0; temp2 < tempcount; temp2++)
  {
vector::iterator ii = temp + temp2;

  }
or similar.  The C++ FE needs to lower the C++ iterators anyway, the
middle-end can really only work with integral or pointer iterators, and it
depends on how exactly the Cilk+ standard defines _Cilk_for with iterators
(what methods must be implemented on the iterators and what methods and in
what order should be called).

Jakub


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-01-28 Thread Iyer, Balaji V


> -Original Message-
> From: Iyer, Balaji V
> Sent: Monday, January 27, 2014 4:36 PM
> To: Jakub Jelinek
> Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'r...@redhat.com'
> Subject: RE: [PING] [PATCH] _Cilk_for for C and C++
> 
> > -Original Message-
> > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> > ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
> > Sent: Monday, January 27, 2014 3:50 PM
> > To: Iyer, Balaji V
> > Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez';
> > 'gcc-patches@gcc.gnu.org'; 'r...@redhat.com'
> > Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> >
> > On Mon, Jan 27, 2014 at 08:41:14PM +, Iyer, Balaji V wrote:
> > >   Did you get a chance to look at this _Cilk_for patch?
> >
> > IMHO it is not as much work as you are fearing, at most a few hours of
> > work to get it right, and well worth doing.  So, please at least try
> > it out and if you get stuck with it, explain why.
> 
> Hi Jakub,
>   I tried it that way in the original patch submission for C
> (http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01369.html), but it hit a
> dead-end when I was trying to get STL iterators working for C++. This is why I
> re-structured things this way to get them both working.
> 
> Thanks,
> 
> Balaji V. Iyer.
> 

Hi Jakub,
I thought about it a bit more, and the main issue here is that we need 
access to the _Cilk_for loop's components both inside the child function and 
the parent function.

So, at the moment, I have modelled the _Cilk_for as something like this:

#pragma omp for  schedule (runtime: grain)
_Cilk-for (vector::iterator ii = array.begin (); ii != array.end (); ii++)
#pragma omp parallel 
{

}

>From what I understand, you feel this is a bit ugly and you want this to be 
>modelled something like this?

#pragma omp parallel for schedule (runtime: grain)
_Cilk_for (vector::iterator ii = array.begin (); ii != array.end(); ii++)
{

}

Am I right?

As it stands, doing it the way you suggested did not work when we have 
iterators since iterator expansion pushed inside the child function and its 
expanded variables are not accessible outside the child function by 
gimplify_omp_for. That is, the expansion is put after #pragma omp parallel for 
and that is all pulled into the child function and thus the information to 
compute the count is lost for the parent function.

There is a hack that I think may get around this. This is a bit ugly and really 
is not the way I would think of _Cilk_fors. I am OK with trying this if you 
will accept it.
 
If I do something like this:
 
#pragma omp parallel for schedule (runtime:grain) if ((array.end() - 
array.begin ())/1)
_Cilk_for (vector ::iterator ii = array.begin (); ii != array.end (); ii++)
{

}

The new addition is if clause where "if (( - ) / )"

Then, in the expand_parallel_task, I can extract the if (...) clause and then 
pass the expression as a parameter for the loop-count. Yes, it's bit ugly but 
if you are willing to accept it, I can try to implement this.

Please let me know.

Thanks,

Balaji V. Iyer.


RE: [PING] [PATCH] _Cilk_for for C and C++

2014-01-27 Thread Iyer, Balaji V
> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
> Sent: Monday, January 27, 2014 3:50 PM
> To: Iyer, Balaji V
> Cc: Jason Merrill; 'Jeff Law'; 'Aldy Hernandez'; 'gcc-patches@gcc.gnu.org';
> 'r...@redhat.com'
> Subject: Re: [PING] [PATCH] _Cilk_for for C and C++
> 
> On Mon, Jan 27, 2014 at 08:41:14PM +, Iyer, Balaji V wrote:
> > Did you get a chance to look at this _Cilk_for patch?
> 
> IMHO it is not as much work as you are fearing, at most a few hours of work
> to get it right, and well worth doing.  So, please at least try it out and if 
> you
> get stuck with it, explain why.

Hi Jakub,
I tried it that way in the original patch submission for C 
(http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01369.html), but it hit a 
dead-end when I was trying to get STL iterators working for C++. This is why I 
re-structured things this way to get them both working.

Thanks,

Balaji V. Iyer.

> 
>   Jakub


Re: [PING] [PATCH] _Cilk_for for C and C++

2014-01-27 Thread Jakub Jelinek
On Mon, Jan 27, 2014 at 08:41:14PM +, Iyer, Balaji V wrote:
>   Did you get a chance to look at this _Cilk_for patch? 

IMHO it is not as much work as you are fearing, at most a few hours of work
to get it right, and well worth doing.  So, please at least try it out
and if you get stuck with it, explain why.

Jakub