from:"Rohini Palaniswamy"

Re: [VOTE] Release Apache Tez-0.9.2 RC0

2019-03-29 Thread Rohini Palaniswamy

 +1 (binding)

- Verified md5, sha512 and the signature.
- Looked at the Release notes (
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12314426=12342390
)
- Built the source and ran the unit tests and they are good.

 Regards,
 Rohini

On Wed, Mar 27, 2019 at 6:15 AM Kuhu Shukla  wrote:

> Thank you Jon. We need at least one more binding +1 to have this release
> shipped as the official 0.9.2. Reminding all other contributors and
> PMCs/Committers (esp. Pig and Hive devs) on this list to validate the above
> mentioned RC at the earliest.
>
> Appreciate the support.
>
> Regards,
> Kuhu
>
> On Tue, 26 Mar 2019 at 17:30, Jonathan Eagles  wrote:
>
> > +1. I have validated this release and signatures.
> >
> > On Tue, Mar 19, 2019 at 6:12 PM Kuhu Shukla  wrote:
> > >
> > > Hello Tez folks,
> > >
> > > I have created an tez-0.9.2 release candidate rc0.
> > >
> > > Git Source Tag:
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf/tez/repo?p=tez.git;a=log;h=refs/tags/release-0.9.2-rc0
> > > <
> >
> https://git-wip-us.apache.org/repos/asf/tez/repo?p=tez.git;a=log;h=refs/tags/release-0.9.2-rc0
> > >
> > > Staging site :
> > >
> > > https://dist.apache.org/repos/dist/dev/tez/apache-tez-0.9.2-rc0/
> > > 
> > >
> > > Nexus Staging URL :
> > >
> > > https://repository.apache.org/content/repositories/orgapachetez-1065
> > >
> > > PGP release keys (signed using ) :
> > >
> > > http://pgp.surfnet.nl/pks/lookup?op=get=0x4405B74BAAFFE291
> > >
> > > KEYS file available at :
> > >
> > > https://dist.apache.org/repos/dist/release/tez/KEYS
> > >
> > > One can look into the issues fixed in this release at:
> > >
> > > https://issues.apache.org/jira/projects/TEZ/versions/12342390
> > >
> > >
> > > Vote will be open for at least 72 hours or until the required number of
> > PMC
> > > votes are obtained. Please reply to this thread for any
> > > issues/comments/concerns.
> > >
> > > [ ] +1 approve
> > > [ ] +0 no opinion
> > > [ ] -1 disapprove (and reason why)
> > >
> > > Here is my +1 (binding).
> > >
> > > Thanks and Regards,
> > >
> > > Kuhu Shukla
> >
>

Re: [VOTE] Move master to Hadoop 3+ and create separate 0.9.x line

2018-04-16 Thread Rohini Palaniswamy

+1. I don't see a problem for Pig as this is being done mainly for the
hadoop dependencies conflict and there are no API changes in Tez. At least
till we get to the point where we introduce Hadoop 3 specific code into
Tez, Pig compiled with older versions of Tez will continue to run with Tez
master.

On Thu, Apr 12, 2018 at 5:33 PM, Gopal Vijayaraghavan 
wrote:

> +1
>
> Cheers,
> Gopal
>
>
> On 4/12/18, 5:22 PM, "Eric Wohlstadter"  wrote:
>
> Just a friendly reminder that this vote is still open.
>
> On Wed, Apr 11, 2018 at 6:33 AM, Jason Lowe 
> wrote:
>
> > There was a discussion thread that was started two weeks before the
> > vote thread, see
> > http://mail-archives.apache.org/mod_mbox/tez-dev/201803.mbox/browser
> .
> > Granted there weren't many comments, but there was a discussion
> thread
> > with no voiced objections well in advance of the vote thread.
> >
> > Jason
> >
> >
> > On Tue, Apr 10, 2018 at 10:18 AM, Jonathan Eagles  >
> > wrote:
> > > Thoughts/Inputs/Discussion from Pig/Hive/Flink/Scalding/Scope
> > communities?
> > >
> > > I wish we had used a discussion thread to gather more input from
> > > Pig/Hive/Flink/Scalding/Scope community before starting this vote
> whose
> > > outcome affects them. Without discussion or votes from those
> communities
> > > I'm not sure the community support for this decision. Should we
> consider
> > > canceling this vote to gather input first?
> > >
> > > On Mon, Apr 9, 2018 at 10:09 AM, Kuhu Shukla
> 
> > > wrote:
> > >
> > >> +1.
> > >>
> > >> Thank you Eric for floating the proposal.
> > >>
> > >> Regards,
> > >> Kuhu
> > >>
> > >> On Mon, Apr 9, 2018 at 9:56 AM, Jason Lowe  >
> > wrote:
> > >>
> > >> > +1
> > >> >
> > >> > Jason
> > >> >
> > >> > On Fri, Apr 6, 2018 at 4:45 PM, Eric Wohlstadter <
> wohls...@cs.ubc.ca>
> > >> > wrote:
> > >> > > Please vote (binding or unbinding) on the following proposal.
> The
> > vote
> > >> > will
> > >> > > be open until 3pm (Pacific) April 13th.
> > >> > >
> > >> > >
> > >> > > Proposal: Move master to support minimum Hadoop 3+ (0.10.x
> line) and
> > >> > create
> > >> > > separate branch for Hadoop 2 (0.9.x line)
> > >> > >
> > >> > >
> > >> > > Details:
> > >> > >
> > >> > >
> > >> > >
> > >> > >- Tez master branch would support only Hadoop 3+ moving
> forward
> > >> > >
> > >> > >
> > >> > >- As a general policy, Maven dependencies on master are
> required
> > not
> > >> > to
> > >> > >have conflicts with the dependencies of the corresponding
> minimum
> > >> > >supported Hadoop (the dependency versions can vary between
> Tez
> > >> master
> > >> > and
> > >> > >Hadoop if the versions are advertised as compatible by the
> > >> dependency
> > >> > >provider).
> > >> > >
> > >> > >- As a general policy, dependency conflicts between Tez and
> > Hadoop
> > >> > >should be resolved by using compatible jars. Shims/Shading
> could
> > be
> > >> > used on
> > >> > >a case-by-case basis, but not as a general policy.
> > >> > >
> > >> > >
> > >> > >- A separate branch and distribution (e.g. on Maven
> Central)
> > will be
> > >> > >created to maintain the 0.9.x line with minumum support for
> > Hadoop
> > >> > 2.7.x
> > >> > >
> > >> > >
> > >> > >
> > >> > >- Bug fixes would be required to be pushed to both to
> master and
> > the
> > >> > >0.9.x line (unless they are specific to one of them)
> > >> > >
> > >> > >
> > >> > >
> > >> > >- Major feature or performance improvements would be
> required to
> > be
> > >> > >pushed to both master and the 0.9.x line (unless they
> require
> > Hadoop
> > >> > 3+ or
> > >> > >have dependent library conflicts with Hadoop 2.x, in which
> case
> > they
> > >> > may be
> > >> > >pushed only to master)
> > >> > >
> > >> > >
> > >> > >
> > >> > >- Minor feature or performance improvements can be pushed
> only to
> > >> > master
> > >> >
> > >>
> >
>
>
>
>

[jira] [Created] (TEZ-3877) Delete spill files once merge is done

2017-12-15 Thread Rohini Palaniswamy (JIRA)

Rohini Palaniswamy created TEZ-3877:
---

 Summary: Delete spill files once merge is done
 Key: TEZ-3877
 URL: https://issues.apache.org/jira/browse/TEZ-3877
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy


  I see that spill files are not deleted right after merge completes. We should 
do that as it takes up a lot of space and we can't afford that wastage when Tez 
takes up a lot of shuffle space with complex DAGs. [~jlowe] told me they are 
only cleaned up after application completes as they are written in app 
directory and not container directory. That also has to be done so that they 
are cleaned up by node manager during task failures or container crashes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (TEZ-3865) A new vertex manager to partition data for STORE

2017-11-14 Thread Rohini Palaniswamy (JIRA)

Rohini Palaniswamy created TEZ-3865:
---

 Summary: A new vertex manager to partition data for STORE
 Key: TEZ-3865
 URL: https://issues.apache.org/jira/browse/TEZ-3865
 Project: Apache Tez
  Issue Type: New Feature
Reporter: Rohini Palaniswamy


Restricting number of files in output is a very common use case. In Pig, 
currently users add a ORDER BY, GROUP BY or DISTINCT with the required 
parallelism before STORE to achieve it. All of the above operations create 
unnecessary overhead in processing. It would be ideal if STORE clause supported 
the PARALLEL statement and the partitioning of data was handled in a more 
simple and efficient manner.

Partitioning of the data can be achieved using a very efficient vertex manager 
as described below. Going to call it PartitionVertexManager (PVM) for now till 
someone proposes a better name. Will be explaining using Pig examples, but the 
logic is same for hive as well.

There are multiple cases to consider when storing
1) No partitions
   - Data is stored into a single directory using FileOutputFormat 
implementations
2) Partitions
  - Data is stored into multiple partitions. Case of static or dynamic 
partitioning with HCat
3) HBase
I have kind of forgotten what exactly my thoughts were on this when storing 
to multiple regions. Will update once I remember.

Let us consider below script with pig.exec.bytes.per.reducer (this setting is 
usually translated to tez.shuffle-vertex-manager.desired-task-input-size with 
ShuffleVertexManager) set to 1G.
{code}
A = LOAD 'data' ;
B = GROUP A BY $0 PARALLEL 1000;
C = FOREACH B GENERATE group, COUNT(A.a), SUM(A.b), ..;
D = STORE C into 'output' using SomeStoreFunc() PARALLEL 20;
{code}

The implementation will have 3 vertices.
v1 - LOAD vertex
v2 - GROUP BY vertex
v3 - STORE vertex

PVM will be used on v3. It is going to be similar to ShuffleVertexManager but 
with some differences. The main difference is that the source vertex does not 
care about the parallelism of destination vertex and the number of partitioned 
outputs it produces does not depend on that.

1) Case of no partitions
   Each task in vertex v2 will produce a single partition output (no 
Partitioner is required). The PVM will bucket this single partition data from 
1000 source tasks into multiple destination tasks of v3 trying to keep 1G per 
task but max of 20 tasks (auto parallelism).
   
2) Partitions
   Let us say the table has 2 partition keys (dt and region). Since there could 
be any number of regions for a given date, we will use store parallelism as the 
upper limit on the number of partitions. i.e a HashPartitioner with 
numReduceTasks as 20 and (dt, region) as the partition key. If there are only 5 
regions then each task of v2 will produce 5 partitions (with rest 15 being 
empty) if there is no hash collision. If there are 30 regions, then each task 
of v2 will produce 20 partitions.
   
   The PVM when it groups will try to group all Partition0 segments as much as 
possible into one v3 task. Based on skew it could end up in more tasks. i.e 
there is no restriction on one partition going to same reducer task. Doing this 
will avoid having to open multiple ORC files in one task when doing dynamic 
partitioning and will be very efficient reducing namespace usage even further 
while keeping file sizes more uniform.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (TEZ-2319) DAG history in HDFS

2016-09-16 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved TEZ-2319.
-
Resolution: Duplicate

TEZ-2628 addresses this.

> DAG history in HDFS
> ---
>
> Key: TEZ-2319
> URL: https://issues.apache.org/jira/browse/TEZ-2319
> Project: Apache Tez
>  Issue Type: New Feature
>    Reporter: Rohini Palaniswamy
>
>   We have processes, that parse jobconf.xml and job history details (map and 
> reduce task details, etc) in avro files from HDFS and load them into hive 
> tables for analysis for mapreduce jobs. Would like to have Tez also make this 
> information written to a history file in HDFS when AM or each DAG completes 
> so that we can do analytics on Tez jobs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-3391) MR split file validation should be done in the AM

2016-08-01 Thread Rohini Palaniswamy (JIRA)

Rohini Palaniswamy created TEZ-3391:
---

 Summary: MR split file validation should be done in the AM
 Key: TEZ-3391
 URL: https://issues.apache.org/jira/browse/TEZ-3391
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy


  We had a case  where Split metadata size exceeded 1000. Instead of job 
failing from validation during initialization in AM like mapreduce, each of the 
tasks failed doing that validation during initialization.

  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-3385) DAGClient API should be accessible outside of DAG submission

2016-07-28 Thread Rohini Palaniswamy (JIRA)

Rohini Palaniswamy created TEZ-3385:
---

 Summary: DAGClient API should be accessible outside of DAG 
submission
 Key: TEZ-3385
 URL: https://issues.apache.org/jira/browse/TEZ-3385
 Project: Apache Tez
  Issue Type: New Feature
Reporter: Rohini Palaniswamy


  In PIG-4958, I had to resort to  

DAGClient client = new DAGClientImpl(appId, dagID, new TezConfiguration(conf), 
null);

This is not good as DAGClientImpl is a internal class and not something users 
should be referring to. Tez needs to have an API to give DAGClient given the 
appId, dagID and configuration. This is something basic like 
JobClient.getJob(String jobID). 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-3242) Reduce bytearray copy with TezEvent Serialization and deserialization

2016-05-05 Thread Rohini Palaniswamy (JIRA)

Rohini Palaniswamy created TEZ-3242:
---

 Summary: Reduce bytearray copy with TezEvent Serialization and 
deserialization
 Key: TEZ-3242
 URL: https://issues.apache.org/jira/browse/TEZ-3242
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy


Byte arrays are created for serializing protobuf messages and parsing them 
which creates lot of garbage when we have lot of events. 

{code}
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
at 
org.apache.tez.runtime.api.impl.TezEvent.serializeEvent(TezEvent.java:197)
at org.apache.tez.runtime.api.impl.TezEvent.write(TezEvent.java:268)
at 
org.apache.tez.runtime.api.impl.TezHeartbeatResponse.write(TezHeartbeatResponse.java:95)
at 
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:202)
at 
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:128)
at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:82)
at org.apache.hadoop.ipc.Server.setupResponse(Server.java:2496)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

One to one edges and local fetch

2016-04-29 Thread Rohini Palaniswamy

I was under the assumption that we optimized 1-1 edge scheduling to reuse
containers or run as much as possible on same node and do local fetch. But
does not seem to be the case. Is there a jira for this already? Could not
find any.

Regards,
Rohini

[jira] [Created] (TEZ-3140) Reduce AM memory usage while serialization

2016-02-25 Thread Rohini Palaniswamy (JIRA)

Rohini Palaniswamy created TEZ-3140:
---

 Summary: Reduce AM memory usage while serialization
 Key: TEZ-3140
 URL: https://issues.apache.org/jira/browse/TEZ-3140
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.7.1


   There is an unnecessary copy of userpayload byte array during serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-3008) AM java.io.tmpdir should be set to $PWD/tmp

2015-12-16 Thread Rohini Palaniswamy (JIRA)

Rohini Palaniswamy created TEZ-3008:
---

 Summary: AM java.io.tmpdir should be set to $PWD/tmp
 Key: TEZ-3008
 URL: https://issues.apache.org/jira/browse/TEZ-3008
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rohini Palaniswamy


For tasks it is already done by TezRuntimeChildJVM

{code}
Path childTmpDir = new Path(Environment.PWD.$(),
YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR);
vargs.add("-Djava.io.tmpdir=" + childTmpDir);
{code}

Need to do this in AM as well. Mapreduce has uber mode which usually causes 
more problem (MAPREDUCE-6576), when user code writes to  java.io.tmpdir as it 
defaults to /tmp in AM and fills up disk space on cluster nodes. Even though 
there is no uber mode with Tez AM that can cause problem, there is still the 
chance that the output committer code of users which run in AM can write 
something to java.io.tmpdir.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (TEZ-3008) AM java.io.tmpdir should be set to $PWD/tmp

2015-12-16 Thread Rohini Palaniswamy (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved TEZ-3008.
-
Resolution: Duplicate

Sorry. My bad. I did not link the jiras with the original Oozie issue which 
brought out the problem and I thought I had not filed it yet. I did search but 
I did not get it right.

> AM java.io.tmpdir should be set to $PWD/tmp
> ---
>
> Key: TEZ-3008
> URL: https://issues.apache.org/jira/browse/TEZ-3008
> Project: Apache Tez
>  Issue Type: Improvement
>    Reporter: Rohini Palaniswamy
>  Labels: newbie
>
> For tasks it is already done by TezRuntimeChildJVM
> {code}
> Path childTmpDir = new Path(Environment.PWD.$(),
> YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR);
> vargs.add("-Djava.io.tmpdir=" + childTmpDir);
> {code}
> Need to do this in AM as well. Mapreduce has uber mode which usually causes 
> more problem (MAPREDUCE-6576), when user code writes to  java.io.tmpdir as it 
> defaults to /tmp in AM and fills up disk space on cluster nodes. Even though 
> there is no uber mode with Tez AM that can cause problem, there is still the 
> chance that the output committer code of users which run in AM can write 
> something to java.io.tmpdir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (TEZ-2969) Tasks list should have diagnostics next to logs

2015-12-03 Thread Rohini Palaniswamy (JIRA)

Rohini Palaniswamy created TEZ-2969:
---

 Summary: Tasks list should have diagnostics next to logs
 Key: TEZ-2969
 URL: https://issues.apache.org/jira/browse/TEZ-2969
 Project: Apache Tez
  Issue Type: Improvement
  Components: UI
Reporter: Rohini Palaniswamy


Many times failed task logs have no stacktrace (container killed due to 
timeout, preemption or exceeding physical memory usage). Diagnostics for those 
errors is there in the task or task attempt page. But not on the task or task 
attempts list page which is the most visited and leads to confusion. Would be 
good to have some kind of tool tip (diagnostics message are long) or an icon 
with link to indicate that there is some diagnostic message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Why spill in UnorderedPartitionedKVWriter?

2015-11-18 Thread Rohini Palaniswamy

  Came across a job which was taking a long time in
UnorderedPartitionedKVWriter.mergeAll. Saw that it was decompressing and
reading data from spill files (8500 spills) and then writing the final
compressed merge file. Why do we need spill files for
UnorderedPartitionedKVWriter? Why not just buffer and keep directly writing
to the final file which will save a lot of time.

Regards,
Rohini

Re: Problem when running our code with tez

2015-08-30 Thread Rohini Palaniswamy

 A possible
solution is to use conf.get(³mapreduce.workflow.id²) +
conf.get(³mapreduce.workflow.node.name²)
  Daniel, currently they are set only in vertex conf and will not be
available for MROutput.

Shiri,
   Can you tell your actual usecase which lead to implementing a
RecordWriter which writes to a jobID directory? Looks like you want to
write to a temporary directory and do some custom processing before
committing them. Are you committing to some external directory than actual
output directory which requires you to use jobID directory instead of the
_temporary mapreduce uses in general.

Regards,
Rohini

On Sun, Aug 30, 2015 at 2:20 AM, Shiri Marron shiri.mar...@amdocs.com
wrote:

 +Nir

 -Original Message-
 From: Hersh Shafer
 Sent: Thursday, August 27, 2015 11:45 AM
 To: Daniel Dai; dev@tez.apache.org; d...@pig.apache.org; Shiri Marron
 Cc: Almog Shunim
 Subject: RE: Problem when running our code with tez

 +Shiri

 -Original Message-
 From: Daniel Dai [mailto:da...@hortonworks.com]
 Sent: Wednesday, August 26, 2015 1:57 AM
 To: dev@tez.apache.org; d...@pig.apache.org
 Cc: Hersh Shafer; Almog Shunim
 Subject: Re: Problem when running our code with tez

 JobID is vague is Tez, you shall use dagId instead. However, I don¹t see a
 way you can get DagId within RecordWriter/OutputCommitter. A possible
 solution is to use conf.get(³mapreduce.workflow.id²) + conf.get(³
 mapreduce.workflow.node.name²). Note both are Pig specific configuration
 and only applicable if you run with Pig.

 Daniel




 On 8/25/15, 2:08 PM, Hitesh Shah hit...@apache.org wrote:

 +dev@pig as this might be a question better answered by Pig developers.
 
 This probably won¹t answer your question but should give you some
 background info. When Pig uses Tez, it may end up running multiple dags
 within the same YARN application therefore the ³jobId² ( in case of MR,
 job Id maps to the application Id from YARN ) may not be unique.
 Furthermore, there are cases where multiple vertices within the same
 DAG could write to HDFS hence both dagId and vertexId are required to
 guarantee uniqueness when writing to a common location.
 
 thanks
  Hitesh
 
 
 On Aug 25, 2015, at 7:29 AM, Shiri Marron shiri.mar...@amdocs.com
 wrote:
 
  Hi,
 
  We are trying to run our existing workflows that contains pig
 scripts, on tez (version 0.5.2.2.2.6.0-2800, hdp 2.2) , but we are
 facing some problems when we run our code with tez.
 
  In our code, we are writing and reading from/to a temp directory
 which we create with a name based on the  jobID:
  Part 1-  We extend org.apache.hadoop.mapreduce.RecordWriter and
 in the close() -we take the jobID from TaskAttemptContext context.
 Meaning, each task writes a file to
this  directory in the close () method according to the
 jobID from the context.
 Part 2 -  In the end of the whole job (after all the tasks were
 completed), we have our custom outputCommitter (which extends the
 
org.apache.hadoop.mapreduce.OutputCommitter), and in
 the
 commitJob()  it looks for that directory of the job and handles all
 the files under it-  the jobID is taken from JobContext
 context.getJobID().toString()
 
 
 
  We noticed that when we use tez, this mechanism doesn't work since
 the jobID from the tez task (part one ) is combined from the original
 job
 id+vertex id , for example: 14404914675610 instead of 1440491467561 .
 id+So
 the directory name in part 2 is different than part 1.
 
 
  We looked for a way to retrieve only the vertex id or only the job id
 , but didn't find one - on the configuration the  property:
  mapreduce.job.id also had the addition of the vertex id, and no other
 property value was equal to the original job id.
 
  Can you please advise how can we solve this issue?  Is there a way to
 get the original jobID when we're in part 1?
 
  Regards,
  Shiri Marron
  Amdocs
 
  This message and the information contained herein is proprietary and
 confidential and subject to the Amdocs policy statement,  you may
 review at http://www.amdocs.com/email_disclaimer.asp
 
 


 This message and the information contained herein is proprietary and
 confidential and subject to the Amdocs policy statement,
 you may review at http://www.amdocs.com/email_disclaimer.asp

Re: [VOTE] Release Apache Tez-0.7.0 RC0

2015-05-15 Thread Rohini Palaniswamy

+1.

Ran a big Pig job (47K tasks) that was affected by TEZ-776. Ran into OOM
quickly because Pig was giving preference to yarn.app.mapreduce.am.command-opts
instead of using default of tez.am.launch.cmd-opts which had -XX:+UseNUMA.
With -XX:+UseNUMA AM used very less memory and it completed fine even when
running 13K tasks in parallel. So that is good. We will be fixing in pig to
add -XX:+UseNUMA if we pick up mapreduce settings -
https://issues.apache.org/jira/browse/PIG-4555. Bikas is investigating why
the huge difference with and without NUMA for the Tez AM. But would not
consider that a blocker for the release as the tez default has NUMA.

Regards,
Rohini

On Fri, May 15, 2015 at 11:27 AM, Siddharth Seth ss...@apache.org wrote:

+1. Verified signatures, checksums and rat checks. Ran a couple of simple
jobs successfully.

On Thu, May 14, 2015 at 1:58 AM, Jianfeng (Jeff) Zhang
jzh...@hortonworks.com wrote:

Hi folks,

I have created an tez-0.7.0 release candidate rc0.

Tez 0.7.0 is supposed to include all the bug fixes and improvement in
0.5.4 and 0.6.1 (both will be released soon). Besides that,

0.7.0 also make lots of performance and scalability enhancement like

1. Reduce AM mem usage caused by storing TezEvents
2. Stabilize and improve the Pipeline shuffle, make PipelienedSorter
as the default sorter,
3. Remove 2 GB memlimit restriction in MergeManager
4. Enable local fetch optimization by default

GIT source tag

https://git-wip-us.apache.org/repos/asf/tez/repo?p=tez.git;a=log;h=refs/tags/release-0.7.0-rc0

Staging site:
https://dist.apache.org/repos/dist/dev/tez/tez-0.7.0-src-rc0/

Nexus Staging URL:
https://repository.apache.org/content/repositories/orgapachetez-1023

PGP release keys
http://pgp.mit.edu/pks/lookup?op=getsearch=0x090FBE14D9B17D1F

KEYS file available at
https://dist.apache.org/repos/dist/release/tez/KEYS

One can look into the issues fixed in this release at

https://issues.apache.org/jira/issues/?jql=project+%3D+TEZ+AND+fixVersion+%3D+0.7.0+AND+status+%3D+Resolved+ORDER+BY+priority+DESC

Vote will be open for at least 72 hours.

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

I begin with my vote with my +1

Best Regard,
Jeff Zhang

Re: [DISCUSS] Drop Java 6 support in 0.8