Re: Issue with _Cilk_for

2014-01-31 Thread Jakub Jelinek
On Wed, Jan 29, 2014 at 10:58:55PM +, Iyer, Balaji V wrote:
 This is the testcase I am using:
 
 _Cilk_for (vectorint::iterator iter = array.begin(); iter != array.end();
   iter++)
 {
if (*iter  == 6)
  *iter = 13;
 }

 ==
  try
 {
   #pragma omp parallel schedule(cilk-for,0) 
 if(__gnu_cxx::operator-int*, std::vectorint  ((const struct 
 __normal_iterator ) (const struct __normal_iterator *) TARGET_EXPR 
 D.28395, std::vectorint::end (array), (const struct __normal_iterator ) 
 (const struct __normal_iterator *) iter))
 {
   {
 {
   struct iterator iter;
   difference_type D.28409;
   difference_type D.28410;
 
 struct iterator iter;
   cleanup_point  Unknown tree: expr_stmt
   (void) (iter = TARGET_EXPR D.28366, std::vectorint::begin (array)) 
 ;
 difference_type D.28409;
 difference_type D.28410;
grainsize = 0 if(__gnu_cxx::operator-int*, 
 std::vectorint  ((const struct __normal_iterator ) (const struct 
 __normal_iterator *) TARGET_EXPR D.28395, std::vectorint::end (array), 
 (const struct __normal_iterator ) (const struct __normal_iterator *) iter))
 {
   cleanup_point  Unknown tree: expr_stmt
   (void) (D.28410 = 0) 
   _Cilk_for (D.28409 = 0; D.28409 != cleanup_point 
 __gnu_cxx::operator-int*, std::vectorint  ((const struct 
 __normal_iterator ) (const struct __normal_iterator *) TARGET_EXPR 
 D.28395, std::vectorint::end (array), (const struct __normal_iterator ) 
 (const struct __normal_iterator *) iter); D.28409 = D.28409 + 1)

Can you explain why you emit anything in between the parallel and _Cilk_for? 
I don't see why it should be needed.
Depending on what the Cilk+ standard allows you to do (can array.begin ()
be evaluated multiple times or not, and if not, can you invoke say a copy
constructor on it), you should just emit an EXPR_STMT that initializes
an artificial scalar (say get_temp_regvar created) to
(array.end () - array.begin ()) before you add the parallel, or,
if array.begin () can't be evaluated multiple times, then construct some
temporary before the parallel with array.begin () as initializer
and then do the subtraction between array.end () and that temporary.

Then the parallel, with _Cilk_for immediately in it and just another
temporary scalar as loop iterator there, and only inside of _Cilk_for
declare the iterator var and construct (if it has been declared in _Cilk_for, 
otherwise
just initialize it) to, depending on what Cilk+ requires, either to
array.begin () and then operator+ it to the corresponding value, or
construct already to array.begin () + scalariv.

Jakub


Re: Issue with _Cilk_for

2014-01-31 Thread Jakub Jelinek
On Fri, Jan 31, 2014 at 04:42:57PM +0100, Jakub Jelinek wrote:
 Can you explain why you emit anything in between the parallel and _Cilk_for? 
 I don't see why it should be needed.
 Depending on what the Cilk+ standard allows you to do (can array.begin ()
 be evaluated multiple times or not, and if not, can you invoke say a copy
 constructor on it), you should just emit an EXPR_STMT that initializes
 an artificial scalar (say get_temp_regvar created) to
 (array.end () - array.begin ()) before you add the parallel, or,
 if array.begin () can't be evaluated multiple times, then construct some
 temporary before the parallel with array.begin () as initializer
 and then do the subtraction between array.end () and that temporary.
 
 Then the parallel, with _Cilk_for immediately in it and just another
 temporary scalar as loop iterator there, and only inside of _Cilk_for
 declare the iterator var and construct (if it has been declared in _Cilk_for, 
 otherwise
 just initialize it) to, depending on what Cilk+ requires, either to
 array.begin () and then operator+ it to the corresponding value, or
 construct already to array.begin () + scalariv.

Actually, small correction, the iterator variable likely should be
constructed before the parallel and just marked as firstprivate on the
parallel, then you even don't need any special temporary.

So for:

_Cilk_for (vectorint::iterator iter = array.begin(); iter != array.end(); 
iter++)
{
  if (*iter == 6)
*iter = 13;
}

emit basically:
  {
Iterator iter;
Iterator::Iterator(iter, array.begin());
ptrdiff_t tmpcount = Iterator::operator-(array.end(), iter);
ptrdiff_t tmpiv;
ptrdiff_t tmplast = 0;
#pragma omp parallel firstprivate(iter, tmpcount, tmplast) private(tmpiv)
{
  _Cilk_for(tmpiv = 0; tmpiv  tmpcount; tmpiv++)
  {
Iterator::operator+(iter, tmpiv - tmplast);
tmplast = tmpiv;
{
  if (*iter == 6)
*iter = 13;
}
  }
}
  }

Jakub


RE: Issue with _Cilk_for

2014-01-31 Thread Iyer, Balaji V


 -Original Message-
 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Friday, January 31, 2014 11:04 AM
 To: Iyer, Balaji V
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: Issue with _Cilk_for
 
 On Fri, Jan 31, 2014 at 04:42:57PM +0100, Jakub Jelinek wrote:
  Can you explain why you emit anything in between the parallel and
 _Cilk_for?
  I don't see why it should be needed.
  Depending on what the Cilk+ standard allows you to do (can array.begin
  () be evaluated multiple times or not, and if not, can you invoke say
  a copy constructor on it), you should just emit an EXPR_STMT that
  initializes an artificial scalar (say get_temp_regvar created) to
  (array.end () - array.begin ()) before you add the parallel, or, if
  array.begin () can't be evaluated multiple times, then construct some
  temporary before the parallel with array.begin () as initializer and
  then do the subtraction between array.end () and that temporary.
 
  Then the parallel, with _Cilk_for immediately in it and just another
  temporary scalar as loop iterator there, and only inside of _Cilk_for
  declare the iterator var and construct (if it has been declared in
  _Cilk_for, otherwise just initialize it) to, depending on what Cilk+
  requires, either to array.begin () and then operator+ it to the
  corresponding value, or construct already to array.begin () + scalariv.
 
 Actually, small correction, the iterator variable likely should be constructed
 before the parallel and just marked as firstprivate on the parallel, then you
 even don't need any special temporary.
 
 So for:
 
 _Cilk_for (vectorint::iterator iter = array.begin(); iter != array.end(); 
 iter++)
 {
   if (*iter == 6)
 *iter = 13;
 }
 
 emit basically:
   {
 Iterator iter;
 Iterator::Iterator(iter, array.begin());
 ptrdiff_t tmpcount = Iterator::operator-(array.end(), iter);
 ptrdiff_t tmpiv;
 ptrdiff_t tmplast = 0;
 #pragma omp parallel firstprivate(iter, tmpcount, tmplast) private(tmpiv)
 {
   _Cilk_for(tmpiv = 0; tmpiv  tmpcount; tmpiv++)
   {
 Iterator::operator+(iter, tmpiv - tmplast);
 tmplast = tmpiv;
   {
   if (*iter == 6)
   *iter = 13;
 }
   }
 }
   }
 

I tried to do this. The thing is, you had me model like this:

#pragma omp parallel for 
_Cilk_for (...)
{
Body
}

Now, the compiler will do something like this:

#pragma OMP parallel
{
#pragma Omp for 
{
_Cilk_for (...)
Body
}
}

Now, if by the time it starts to look at evaluating/breaking up the _Cilk-for's 
parts, we are already at the scope inside #pragma omp parallel. If I try to pop 
here, it has some scope issues with #pragma omp parallel. This requires a lot 
of rewrite of existing OMP code to port it for _Cilk_for. This can be done but 
will take significant time which I don't think I have since we are close to end 
of stage3.

But, if I had the parallel around the body, then the body is transplanted into 
the child function and everything I need for _Cilk-for is done. Yes, it does 
not make sense for an OMP expert. I am nowhere close to being an expert and 
even I had a hard time to wrap my head around this.

We can look into what you want, but in the meantime, can you accept this patch 
with the way I have modelled so that the feature can make it into 4.9?

Thanks,

Balaji V. Iyer.



   Jakub


Re: Issue with _Cilk_for

2014-01-31 Thread Jakub Jelinek
On Fri, Jan 31, 2014 at 04:18:28PM +, Iyer, Balaji V wrote:
 I tried to do this. The thing is, you had me model like this:
 
 #pragma omp parallel for 
 _Cilk_for (...)
 {
   Body
 }
 
 Now, the compiler will do something like this:
 
 #pragma OMP parallel
 {
   #pragma Omp for 
   {
   _Cilk_for (...)
   Body
   }
 }
 
 Now, if by the time it starts to look at evaluating/breaking up the
 _Cilk-for's parts, we are already at the scope inside #pragma omp
 parallel.  If I try to pop here, it has some scope issues with #pragma omp
 parallel.  This requires a lot of rewrite of existing OMP code to port it
 for _Cilk_for.  This can be done but will take significant time which I
 don't think I have since we are close to end of stage3.

It really doesn't require lots of rewriting, after all, OpenMP handles
the C++ iterators very similarly.

 We can look into what you want, but in the meantime, can you accept this
 patch with the way I have modelled so that the feature can make it into
 4.9?

No.  In GCC we do not rush in bad changes just because there is time
pressure.  As Cilk+ is new in GCC 4.9, if you adjust things properly in
say 2-3 weeks frame, we might still make an exception and allow it in
during stage4, otherwise it will have to wait for 5.0.

Jakub


RE: Issue with _Cilk_for

2014-01-31 Thread Iyer, Balaji V


 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
 Sent: Friday, January 31, 2014 11:26 AM
 To: Iyer, Balaji V
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: Issue with _Cilk_for
 
 On Fri, Jan 31, 2014 at 04:18:28PM +, Iyer, Balaji V wrote:
  I tried to do this. The thing is, you had me model like this:
 
  #pragma omp parallel for
  _Cilk_for (...)
  {
  Body
  }
 
  Now, the compiler will do something like this:
 
  #pragma OMP parallel
  {
  #pragma Omp for
  {
  _Cilk_for (...)
  Body
  }
  }
 
  Now, if by the time it starts to look at evaluating/breaking up the
  _Cilk-for's parts, we are already at the scope inside #pragma omp
  parallel.  If I try to pop here, it has some scope issues with #pragma
  omp parallel.  This requires a lot of rewrite of existing OMP code to
  port it for _Cilk_for.  This can be done but will take significant
  time which I don't think I have since we are close to end of stage3.
 
 It really doesn't require lots of rewriting, after all, OpenMP handles the C++
 iterators very similarly.
 
  We can look into what you want, but in the meantime, can you accept
  this patch with the way I have modelled so that the feature can make
  it into 4.9?
 
 No.  In GCC we do not rush in bad changes just because there is time
 pressure.  As Cilk+ is new in GCC 4.9, if you adjust things properly in say 
 2-3
 weeks frame, we might still make an exception and allow it in during stage4,
 otherwise it will have to wait for 5.0.

Hi Jakub,
I am not asking to rush anything. Personally, I won't submit anything 
if I don't think it is a correct solution, especially for something this big 
into a system that is used by a large population. On the top level, I don't 
understand why it matters where the #pragma omp parallel is placed. Can you 
please explain me that? I agree it feels weird when you think about from OMP 
standpoint, but so what? It is just used in the internal representation for 
_Cilk_for so that I can use the OMP routines to create child functions and 
transplant the correct parts of code to it. _Cilk_for behaves very differently 
compared to OMP for.  It is almost like apple and orange when it comes to 
functionality. To my best knowledge, there is no additional scoping or overhead 
instructions inserted when I put #prgma omp parallel around the body.

I am just asking this to get the whole picture. Please don't think this as an 
argument/flame.

Thanks,

Balaji V. Iyer.

 
   Jakub