Re: [VOTE] Release Apache Tez-0.9.2 RC0

2019-03-29 Thread Rohini Palaniswamy
 +1 (binding)

- Verified md5, sha512 and the signature.
- Looked at the Release notes (
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12314426=12342390
)
- Built the source and ran the unit tests and they are good.

 Regards,
 Rohini

On Wed, Mar 27, 2019 at 6:15 AM Kuhu Shukla  wrote:

> Thank you Jon. We need at least one more binding +1 to have this release
> shipped as the official 0.9.2. Reminding all other contributors and
> PMCs/Committers (esp. Pig and Hive devs) on this list to validate the above
> mentioned RC at the earliest.
>
> Appreciate the support.
>
> Regards,
> Kuhu
>
> On Tue, 26 Mar 2019 at 17:30, Jonathan Eagles  wrote:
>
> > +1. I have validated this release and signatures.
> >
> > On Tue, Mar 19, 2019 at 6:12 PM Kuhu Shukla  wrote:
> > >
> > > Hello Tez folks,
> > >
> > > I have created an tez-0.9.2 release candidate rc0.
> > >
> > > Git Source Tag:
> > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf/tez/repo?p=tez.git;a=log;h=refs/tags/release-0.9.2-rc0
> > > <
> >
> https://git-wip-us.apache.org/repos/asf/tez/repo?p=tez.git;a=log;h=refs/tags/release-0.9.2-rc0
> > >
> > > Staging site :
> > >
> > > https://dist.apache.org/repos/dist/dev/tez/apache-tez-0.9.2-rc0/
> > > 
> > >
> > > Nexus Staging URL :
> > >
> > > https://repository.apache.org/content/repositories/orgapachetez-1065
> > >
> > > PGP release keys (signed using ) :
> > >
> > > http://pgp.surfnet.nl/pks/lookup?op=get=0x4405B74BAAFFE291
> > >
> > > KEYS file available at :
> > >
> > > https://dist.apache.org/repos/dist/release/tez/KEYS
> > >
> > > One can look into the issues fixed in this release at:
> > >
> > > https://issues.apache.org/jira/projects/TEZ/versions/12342390
> > >
> > >
> > > Vote will be open for at least 72 hours or until the required number of
> > PMC
> > > votes are obtained. Please reply to this thread for any
> > > issues/comments/concerns.
> > >
> > > [ ] +1 approve
> > > [ ] +0 no opinion
> > > [ ] -1 disapprove (and reason why)
> > >
> > > Here is my +1 (binding).
> > >
> > > Thanks and Regards,
> > >
> > > Kuhu Shukla
> >
>


Re: [VOTE] Move master to Hadoop 3+ and create separate 0.9.x line

2018-04-16 Thread Rohini Palaniswamy
+1. I don't see a problem for Pig as this is being done mainly for the
hadoop dependencies conflict and there are no API changes in Tez. At least
till we get to the point where we introduce Hadoop 3 specific code into
Tez, Pig compiled with older versions of Tez will continue to run with Tez
master.

On Thu, Apr 12, 2018 at 5:33 PM, Gopal Vijayaraghavan 
wrote:

> +1
>
> Cheers,
> Gopal
>
>
> On 4/12/18, 5:22 PM, "Eric Wohlstadter"  wrote:
>
> Just a friendly reminder that this vote is still open.
>
> On Wed, Apr 11, 2018 at 6:33 AM, Jason Lowe 
> wrote:
>
> > There was a discussion thread that was started two weeks before the
> > vote thread, see
> > http://mail-archives.apache.org/mod_mbox/tez-dev/201803.mbox/browser
> .
> > Granted there weren't many comments, but there was a discussion
> thread
> > with no voiced objections well in advance of the vote thread.
> >
> > Jason
> >
> >
> > On Tue, Apr 10, 2018 at 10:18 AM, Jonathan Eagles  >
> > wrote:
> > > Thoughts/Inputs/Discussion from Pig/Hive/Flink/Scalding/Scope
> > communities?
> > >
> > > I wish we had used a discussion thread to gather more input from
> > > Pig/Hive/Flink/Scalding/Scope community before starting this vote
> whose
> > > outcome affects them. Without discussion or votes from those
> communities
> > > I'm not sure the community support for this decision. Should we
> consider
> > > canceling this vote to gather input first?
> > >
> > > On Mon, Apr 9, 2018 at 10:09 AM, Kuhu Shukla
> 
> > > wrote:
> > >
> > >> +1.
> > >>
> > >> Thank you Eric for floating the proposal.
> > >>
> > >> Regards,
> > >> Kuhu
> > >>
> > >> On Mon, Apr 9, 2018 at 9:56 AM, Jason Lowe  >
> > wrote:
> > >>
> > >> > +1
> > >> >
> > >> > Jason
> > >> >
> > >> > On Fri, Apr 6, 2018 at 4:45 PM, Eric Wohlstadter <
> wohls...@cs.ubc.ca>
> > >> > wrote:
> > >> > > Please vote (binding or unbinding) on the following proposal.
> The
> > vote
> > >> > will
> > >> > > be open until 3pm (Pacific) April 13th.
> > >> > >
> > >> > >
> > >> > > Proposal: Move master to support minimum Hadoop 3+ (0.10.x
> line) and
> > >> > create
> > >> > > separate branch for Hadoop 2 (0.9.x line)
> > >> > >
> > >> > >
> > >> > > Details:
> > >> > >
> > >> > >
> > >> > >
> > >> > >- Tez master branch would support only Hadoop 3+ moving
> forward
> > >> > >
> > >> > >
> > >> > >- As a general policy, Maven dependencies on master are
> required
> > not
> > >> > to
> > >> > >have conflicts with the dependencies of the corresponding
> minimum
> > >> > >supported Hadoop (the dependency versions can vary between
> Tez
> > >> master
> > >> > and
> > >> > >Hadoop if the versions are advertised as compatible by the
> > >> dependency
> > >> > >provider).
> > >> > >
> > >> > >- As a general policy, dependency conflicts between Tez and
> > Hadoop
> > >> > >should be resolved by using compatible jars. Shims/Shading
> could
> > be
> > >> > used on
> > >> > >a case-by-case basis, but not as a general policy.
> > >> > >
> > >> > >
> > >> > >- A separate branch and distribution (e.g. on Maven
> Central)
> > will be
> > >> > >created to maintain the 0.9.x line with minumum support for
> > Hadoop
> > >> > 2.7.x
> > >> > >
> > >> > >
> > >> > >
> > >> > >- Bug fixes would be required to be pushed to both to
> master and
> > the
> > >> > >0.9.x line (unless they are specific to one of them)
> > >> > >
> > >> > >
> > >> > >
> > >> > >- Major feature or performance improvements would be
> required to
> > be
> > >> > >pushed to both master and the 0.9.x line (unless they
> require
> > Hadoop
> > >> > 3+ or
> > >> > >have dependent library conflicts with Hadoop 2.x, in which
> case
> > they
> > >> > may be
> > >> > >pushed only to master)
> > >> > >
> > >> > >
> > >> > >
> > >> > >- Minor feature or performance improvements can be pushed
> only to
> > >> > master
> > >> >
> > >>
> >
>
>
>
>


[jira] [Created] (TEZ-3877) Delete spill files once merge is done

2017-12-15 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created TEZ-3877:
---

 Summary: Delete spill files once merge is done
 Key: TEZ-3877
 URL: https://issues.apache.org/jira/browse/TEZ-3877
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy


  I see that spill files are not deleted right after merge completes. We should 
do that as it takes up a lot of space and we can't afford that wastage when Tez 
takes up a lot of shuffle space with complex DAGs. [~jlowe] told me they are 
only cleaned up after application completes as they are written in app 
directory and not container directory. That also has to be done so that they 
are cleaned up by node manager during task failures or container crashes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (TEZ-3865) A new vertex manager to partition data for STORE

2017-11-14 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created TEZ-3865:
---

 Summary: A new vertex manager to partition data for STORE
 Key: TEZ-3865
 URL: https://issues.apache.org/jira/browse/TEZ-3865
 Project: Apache Tez
  Issue Type: New Feature
Reporter: Rohini Palaniswamy


Restricting number of files in output is a very common use case. In Pig, 
currently users add a ORDER BY, GROUP BY or DISTINCT with the required 
parallelism before STORE to achieve it. All of the above operations create 
unnecessary overhead in processing. It would be ideal if STORE clause supported 
the PARALLEL statement and the partitioning of data was handled in a more 
simple and efficient manner.

Partitioning of the data can be achieved using a very efficient vertex manager 
as described below. Going to call it PartitionVertexManager (PVM) for now till 
someone proposes a better name. Will be explaining using Pig examples, but the 
logic is same for hive as well.

There are multiple cases to consider when storing
1) No partitions
   - Data is stored into a single directory using FileOutputFormat 
implementations
2) Partitions
  - Data is stored into multiple partitions. Case of static or dynamic 
partitioning with HCat
3) HBase
I have kind of forgotten what exactly my thoughts were on this when storing 
to multiple regions. Will update once I remember.

Let us consider below script with pig.exec.bytes.per.reducer (this setting is 
usually translated to tez.shuffle-vertex-manager.desired-task-input-size with 
ShuffleVertexManager) set to 1G.
{code}
A = LOAD 'data' ;
B = GROUP A BY $0 PARALLEL 1000;
C = FOREACH B GENERATE group, COUNT(A.a), SUM(A.b), ..;
D = STORE C into 'output' using SomeStoreFunc() PARALLEL 20;
{code}

The implementation will have 3 vertices.
v1 - LOAD vertex
v2 - GROUP BY vertex
v3 - STORE vertex

PVM will be used on v3. It is going to be similar to ShuffleVertexManager but 
with some differences. The main difference is that the source vertex does not 
care about the parallelism of destination vertex and the number of partitioned 
outputs it produces does not depend on that.

1) Case of no partitions
   Each task in vertex v2 will produce a single partition output (no 
Partitioner is required). The PVM will bucket this single partition data from 
1000 source tasks into multiple destination tasks of v3 trying to keep 1G per 
task but max of 20 tasks (auto parallelism).
   
2) Partitions
   Let us say the table has 2 partition keys (dt and region). Since there could 
be any number of regions for a given date, we will use store parallelism as the 
upper limit on the number of partitions. i.e a HashPartitioner with 
numReduceTasks as 20 and (dt, region) as the partition key. If there are only 5 
regions then each task of v2 will produce 5 partitions (with rest 15 being 
empty) if there is no hash collision. If there are 30 regions, then each task 
of v2 will produce 20 partitions.
   
   The PVM when it groups will try to group all Partition0 segments as much as 
possible into one v3 task. Based on skew it could end up in more tasks. i.e 
there is no restriction on one partition going to same reducer task. Doing this 
will avoid having to open multiple ORC files in one task when doing dynamic 
partitioning and will be very efficient reducing namespace usage even further 
while keeping file sizes more uniform.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (TEZ-2319) DAG history in HDFS

2016-09-16 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved TEZ-2319.
-
Resolution: Duplicate

TEZ-2628 addresses this.

> DAG history in HDFS
> ---
>
> Key: TEZ-2319
> URL: https://issues.apache.org/jira/browse/TEZ-2319
> Project: Apache Tez
>  Issue Type: New Feature
>    Reporter: Rohini Palaniswamy
>
>   We have processes, that parse jobconf.xml and job history details (map and 
> reduce task details, etc) in avro files from HDFS and load them into hive 
> tables for analysis for mapreduce jobs. Would like to have Tez also make this 
> information written to a history file in HDFS when AM or each DAG completes 
> so that we can do analytics on Tez jobs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3391) MR split file validation should be done in the AM

2016-08-01 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created TEZ-3391:
---

 Summary: MR split file validation should be done in the AM
 Key: TEZ-3391
 URL: https://issues.apache.org/jira/browse/TEZ-3391
 Project: Apache Tez
  Issue Type: Bug
Reporter: Rohini Palaniswamy


  We had a case  where Split metadata size exceeded 1000. Instead of job 
failing from validation during initialization in AM like mapreduce, each of the 
tasks failed doing that validation during initialization.

  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3385) DAGClient API should be accessible outside of DAG submission

2016-07-28 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created TEZ-3385:
---

 Summary: DAGClient API should be accessible outside of DAG 
submission
 Key: TEZ-3385
 URL: https://issues.apache.org/jira/browse/TEZ-3385
 Project: Apache Tez
  Issue Type: New Feature
Reporter: Rohini Palaniswamy


  In PIG-4958, I had to resort to  

DAGClient client = new DAGClientImpl(appId, dagID, new TezConfiguration(conf), 
null);

This is not good as DAGClientImpl is a internal class and not something users 
should be referring to. Tez needs to have an API to give DAGClient given the 
appId, dagID and configuration. This is something basic like 
JobClient.getJob(String jobID). 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3242) Reduce bytearray copy with TezEvent Serialization and deserialization

2016-05-05 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created TEZ-3242:
---

 Summary: Reduce bytearray copy with TezEvent Serialization and 
deserialization
 Key: TEZ-3242
 URL: https://issues.apache.org/jira/browse/TEZ-3242
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy


Byte arrays are created for serializing protobuf messages and parsing them 
which creates lot of garbage when we have lot of events. 

{code}
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
at 
org.apache.tez.runtime.api.impl.TezEvent.serializeEvent(TezEvent.java:197)
at org.apache.tez.runtime.api.impl.TezEvent.write(TezEvent.java:268)
at 
org.apache.tez.runtime.api.impl.TezHeartbeatResponse.write(TezHeartbeatResponse.java:95)
at 
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:202)
at 
org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:128)
at org.apache.hadoop.io.ObjectWritable.write(ObjectWritable.java:82)
at org.apache.hadoop.ipc.Server.setupResponse(Server.java:2496)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


One to one edges and local fetch

2016-04-29 Thread Rohini Palaniswamy
I was under the assumption that we optimized 1-1 edge scheduling to reuse
containers or run as much as possible on same node and do local fetch. But
does not seem to be the case. Is there a jira for this already? Could not
find any.

Regards,
Rohini


[jira] [Created] (TEZ-3140) Reduce AM memory usage while serialization

2016-02-25 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created TEZ-3140:
---

 Summary: Reduce AM memory usage while serialization
 Key: TEZ-3140
 URL: https://issues.apache.org/jira/browse/TEZ-3140
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.7.1


   There is an unnecessary copy of userpayload byte array during serialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3008) AM java.io.tmpdir should be set to $PWD/tmp

2015-12-16 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created TEZ-3008:
---

 Summary: AM java.io.tmpdir should be set to $PWD/tmp
 Key: TEZ-3008
 URL: https://issues.apache.org/jira/browse/TEZ-3008
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Rohini Palaniswamy


For tasks it is already done by TezRuntimeChildJVM

{code}
Path childTmpDir = new Path(Environment.PWD.$(),
YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR);
vargs.add("-Djava.io.tmpdir=" + childTmpDir);
{code}

Need to do this in AM as well. Mapreduce has uber mode which usually causes 
more problem (MAPREDUCE-6576), when user code writes to  java.io.tmpdir as it 
defaults to /tmp in AM and fills up disk space on cluster nodes. Even though 
there is no uber mode with Tez AM that can cause problem, there is still the 
chance that the output committer code of users which run in AM can write 
something to java.io.tmpdir.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (TEZ-3008) AM java.io.tmpdir should be set to $PWD/tmp

2015-12-16 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved TEZ-3008.
-
Resolution: Duplicate

Sorry. My bad. I did not link the jiras with the original Oozie issue which 
brought out the problem and I thought I had not filed it yet. I did search but 
I did not get it right.

> AM java.io.tmpdir should be set to $PWD/tmp
> ---
>
> Key: TEZ-3008
> URL: https://issues.apache.org/jira/browse/TEZ-3008
> Project: Apache Tez
>  Issue Type: Improvement
>    Reporter: Rohini Palaniswamy
>  Labels: newbie
>
> For tasks it is already done by TezRuntimeChildJVM
> {code}
> Path childTmpDir = new Path(Environment.PWD.$(),
> YarnConfiguration.DEFAULT_CONTAINER_TEMP_DIR);
> vargs.add("-Djava.io.tmpdir=" + childTmpDir);
> {code}
> Need to do this in AM as well. Mapreduce has uber mode which usually causes 
> more problem (MAPREDUCE-6576), when user code writes to  java.io.tmpdir as it 
> defaults to /tmp in AM and fills up disk space on cluster nodes. Even though 
> there is no uber mode with Tez AM that can cause problem, there is still the 
> chance that the output committer code of users which run in AM can write 
> something to java.io.tmpdir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-2969) Tasks list should have diagnostics next to logs

2015-12-03 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created TEZ-2969:
---

 Summary: Tasks list should have diagnostics next to logs
 Key: TEZ-2969
 URL: https://issues.apache.org/jira/browse/TEZ-2969
 Project: Apache Tez
  Issue Type: Improvement
  Components: UI
Reporter: Rohini Palaniswamy


Many times failed task logs have no stacktrace (container killed due to 
timeout, preemption or exceeding physical memory usage). Diagnostics for those 
errors is there in the task or task attempt page. But not on the task or task 
attempts list page which is the most visited and leads to confusion. Would be 
good to have some kind of tool tip (diagnostics message are long) or an icon 
with link to indicate that there is some diagnostic message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Why spill in UnorderedPartitionedKVWriter?

2015-11-18 Thread Rohini Palaniswamy
  Came across a job which was taking a long time in
UnorderedPartitionedKVWriter.mergeAll. Saw that it was decompressing and
reading data from spill files (8500 spills) and then writing the final
compressed merge file. Why do we need spill files for
UnorderedPartitionedKVWriter? Why not just buffer and keep directly writing
to the final file which will save a lot of time.

Regards,
Rohini


Re: Problem when running our code with tez

2015-08-30 Thread Rohini Palaniswamy
 A possible
solution is to use conf.get(³mapreduce.workflow.id²) +
conf.get(³mapreduce.workflow.node.name²)
  Daniel, currently they are set only in vertex conf and will not be
available for MROutput.

Shiri,
   Can you tell your actual usecase which lead to implementing a
RecordWriter which writes to a jobID directory? Looks like you want to
write to a temporary directory and do some custom processing before
committing them. Are you committing to some external directory than actual
output directory which requires you to use jobID directory instead of the
_temporary mapreduce uses in general.

Regards,
Rohini

On Sun, Aug 30, 2015 at 2:20 AM, Shiri Marron shiri.mar...@amdocs.com
wrote:

 +Nir

 -Original Message-
 From: Hersh Shafer
 Sent: Thursday, August 27, 2015 11:45 AM
 To: Daniel Dai; dev@tez.apache.org; d...@pig.apache.org; Shiri Marron
 Cc: Almog Shunim
 Subject: RE: Problem when running our code with tez

 +Shiri

 -Original Message-
 From: Daniel Dai [mailto:da...@hortonworks.com]
 Sent: Wednesday, August 26, 2015 1:57 AM
 To: dev@tez.apache.org; d...@pig.apache.org
 Cc: Hersh Shafer; Almog Shunim
 Subject: Re: Problem when running our code with tez

 JobID is vague is Tez, you shall use dagId instead. However, I don¹t see a
 way you can get DagId within RecordWriter/OutputCommitter. A possible
 solution is to use conf.get(³mapreduce.workflow.id²) + conf.get(³
 mapreduce.workflow.node.name²). Note both are Pig specific configuration
 and only applicable if you run with Pig.

 Daniel




 On 8/25/15, 2:08 PM, Hitesh Shah hit...@apache.org wrote:

 +dev@pig as this might be a question better answered by Pig developers.
 
 This probably won¹t answer your question but should give you some
 background info. When Pig uses Tez, it may end up running multiple dags
 within the same YARN application therefore the ³jobId² ( in case of MR,
 job Id maps to the application Id from YARN ) may not be unique.
 Furthermore, there are cases where multiple vertices within the same
 DAG could write to HDFS hence both dagId and vertexId are required to
 guarantee uniqueness when writing to a common location.
 
 thanks
  Hitesh
 
 
 On Aug 25, 2015, at 7:29 AM, Shiri Marron shiri.mar...@amdocs.com
 wrote:
 
  Hi,
 
  We are trying to run our existing workflows that contains pig
 scripts, on tez (version 0.5.2.2.2.6.0-2800, hdp 2.2) , but we are
 facing some problems when we run our code with tez.
 
  In our code, we are writing and reading from/to a temp directory
 which we create with a name based on the  jobID:
  Part 1-  We extend org.apache.hadoop.mapreduce.RecordWriter and
 in the close() -we take the jobID from TaskAttemptContext context.
 Meaning, each task writes a file to
this  directory in the close () method according to the
 jobID from the context.
 Part 2 -  In the end of the whole job (after all the tasks were
 completed), we have our custom outputCommitter (which extends the
 
org.apache.hadoop.mapreduce.OutputCommitter), and in
 the
 commitJob()  it looks for that directory of the job and handles all
 the files under it-  the jobID is taken from JobContext
 context.getJobID().toString()
 
 
 
  We noticed that when we use tez, this mechanism doesn't work since
 the jobID from the tez task (part one ) is combined from the original
 job
 id+vertex id , for example: 14404914675610 instead of 1440491467561 .
 id+So
 the directory name in part 2 is different than part 1.
 
 
  We looked for a way to retrieve only the vertex id or only the job id
 , but didn't find one - on the configuration the  property:
  mapreduce.job.id also had the addition of the vertex id, and no other
 property value was equal to the original job id.
 
  Can you please advise how can we solve this issue?  Is there a way to
 get the original jobID when we're in part 1?
 
  Regards,
  Shiri Marron
  Amdocs
 
  This message and the information contained herein is proprietary and
 confidential and subject to the Amdocs policy statement,  you may
 review at http://www.amdocs.com/email_disclaimer.asp
 
 


 This message and the information contained herein is proprietary and
 confidential and subject to the Amdocs policy statement,
 you may review at http://www.amdocs.com/email_disclaimer.asp



Re: [VOTE] Release Apache Tez-0.7.0 RC0

2015-05-15 Thread Rohini Palaniswamy
+1.

Ran a big Pig job (47K tasks) that was affected by TEZ-776. Ran into OOM
quickly because Pig was giving preference to yarn.app.mapreduce.am.command-opts
instead of using default of tez.am.launch.cmd-opts which had -XX:+UseNUMA.
With -XX:+UseNUMA AM used very less memory and it completed fine even when
running 13K tasks in parallel. So that is good. We will be fixing in pig to
add -XX:+UseNUMA if we pick up mapreduce settings -
https://issues.apache.org/jira/browse/PIG-4555. Bikas is investigating why
the huge difference with and without NUMA for the Tez AM.  But would not
consider that a blocker for the release as the tez default has NUMA.

Regards,
Rohini

On Fri, May 15, 2015 at 11:27 AM, Siddharth Seth ss...@apache.org wrote:

 +1. Verified signatures, checksums and rat checks. Ran a couple of simple
 jobs successfully.

 On Thu, May 14, 2015 at 1:58 AM, Jianfeng (Jeff) Zhang 
 jzh...@hortonworks.com wrote:

 
  Hi folks,
 
 
  I have created an tez-0.7.0 release candidate rc0.
 
  Tez 0.7.0 is supposed to include all the bug fixes and improvement in
  0.5.4 and 0.6.1 (both will be released soon).  Besides that,
 
  0.7.0 also make lots of performance and scalability enhancement like
 
1.  Reduce AM mem usage caused by storing TezEvents
2.  Stabilize and improve the Pipeline shuffle,  make PipelienedSorter
  as the default sorter,
3.  Remove 2 GB memlimit restriction in MergeManager
4.  Enable local fetch optimization by default
 
  GIT source tag
 
 https://git-wip-us.apache.org/repos/asf/tez/repo?p=tez.git;a=log;h=refs/tags/release-0.7.0-rc0
 
  Staging site:
  https://dist.apache.org/repos/dist/dev/tez/tez-0.7.0-src-rc0/
 
  Nexus Staging URL:
  https://repository.apache.org/content/repositories/orgapachetez-1023
 
  PGP release keys
  http://pgp.mit.edu/pks/lookup?op=getsearch=0x090FBE14D9B17D1F
 
  KEYS file available at
 https://dist.apache.org/repos/dist/release/tez/KEYS
 
  One can look into the issues fixed in this release at
 
 https://issues.apache.org/jira/issues/?jql=project+%3D+TEZ+AND+fixVersion+%3D+0.7.0+AND+status+%3D+Resolved+ORDER+BY+priority+DESC
 
 
  Vote will be open for at least 72 hours.
 
  [ ] +1 approve
  [ ] +0 no opinion
  [ ] -1 disapprove (and reason why)
 
 
  I begin with my vote with my +1
 
 
  Best Regard,
  Jeff Zhang
 
 



Re: [DISCUSS] Drop Java 6 support in 0.8

2015-05-15 Thread Rohini Palaniswamy
+1.

On Fri, May 15, 2015 at 3:11 PM, Chris K Wensel ch...@wensel.net wrote:

 +1

 fwiw, Cascading 3 is using JDK 1.7 language features, and is tested on JDK
 1.8 nightly.

 ckw

  On May 15, 2015, at 2:57 PM, Gopal Vijayaraghavan gop...@apache.org
 wrote:
 
  Hi,
 
  +1 here.
 
  PIG  Hive are JDK7 targetVersions.
 
  https://github.com/apache/pig/blob/trunk/build.xml#L64
 
  https://github.com/apache/hive/blob/master/pom.xml#L616
 
 
  Flink  Spark have JDK8 code in them via activation.
 
  From earlier mails, I remember seeing only JDK7 bugs reported for Tez so
  far.
 
  Datameer also recommends JDK 1.7, IBM Big Insights uses JDK 1.7. Hadapt
  had /opt/teradata/jvm64/jdk7.
 
  HDInsight uses JDK7 Zulu (AFAIK, there¹s also a 1.8 Zulu available on
 A1).
 
  So I feel like we¹re well past the chasm here.
 
  Cheers,
  Gopal
 
  On 5/15/15, 2:16 PM, Mohammad Islam misla...@yahoo.com.INVALID
 wrote:
 
  Hi Sid,What are the statuses of other Hadoop projects?
  Overall, I'm +1 on this.Regards,Mohammad
 
 
 On Friday, May 15, 2015 10:57 AM, Siddharth Seth ss...@apache.org
  wrote:
 
 
  Java 6 support ended quite a while ago. Trying to support it gets in the
  way of using libraries which may work with Java 7 only (Netty for
 example
  in TEZ-2450). I think we should move Tez 0.8 to work with Java 7 as the
  minimum version. Thoughts ?
  Thanks- Sid
 
 
 
 

 —
 Chris K Wensel
 ch...@wensel.net







Re: [VOTE] Release Apache Tez-0.6.0 RC0

2015-01-26 Thread Rohini Palaniswamy
Jon,
I see that this has been already released. But with the latest internal
tez build from master, i see that there are lot of test failures in Pig
most of it due to two different exceptions. I will file jiras today after
tracking down what jiras caused the problem and whether they went into
0.6.  A patch release might be required based on that.

Regards,
Rohini

On Fri, Jan 23, 2015 at 8:29 AM, Sreenath Somarajapuram 
ssomarajapu...@hortonworks.com wrote:

 +1 (non-binding)
 Verified checksums. Build from src and did a smoke on UI.

 - Sreenath

 On Fri, Jan 23, 2015 at 9:28 PM, Rajesh Balamohan rbalamo...@apache.org
 wrote:

  +1 (binding)
 
  Verified signatures  checksums.  Ran few jobs and they ran successfully.
 
  Got an error while running testing.  Will file a separate jira for it.
 
  ~Rajesh.B
 
  On Fri, Jan 23, 2015 at 7:06 PM, Jianfeng (Jeff) Zhang 
  jzh...@hortonworks.com wrote:
 
   +1
  
   built from source code, and run some tez-example jobs in single-node
  hadoop
  
  
   Best Regards,
   Jeff Zhang
  
  
   On Fri, Jan 23, 2015 at 9:20 PM, Prakash Ramachandran 
   pramachand...@hortonworks.com wrote:
  
+1 (non-binding)
verified checksums, built from src and ran jobs locally.
tested ui functionality
   
thanks
Prakash
   
   
On 1/23/15 1:25 AM, Siddharth Seth wrote:
   
+1 (binding).
Verified signatures and checksums. Built from src and ran some jobs
successfully.
   
Thanks
- Sid
   
On Tue, Jan 20, 2015 at 2:05 PM, Jonathan Eagles jeag...@gmail.com
 
wrote:
   
 I have created an tez-0.6.0 release candidate (rc0).
   
GIT source tag
   
https://git-wip-us.apache.org/repos/asf/tez/repo?p=tez.git;
a=log;h=refs/tags/release-0.6.0-rc0
   
Staging site:
https://dist.apache.org/repos/dist/dev/tez/tez-0.6.0-src-rc0/
   
Nexus Staging URL:
   
  https://repository.apache.org/content/repositories/orgapachetez-1019/
   
PGP release keys:
http://pgp.mit.edu/pks/lookup?op=getsearch=0x7CF638ACEF9F98AE
   
KEYS file available at
https://dist.apache.org/repos/dist/release/tez/KEYS
   
List of issues fixed in the release:
https://issues.apache.org/jira/browse/TEZ/fixforversion/12327652/
   
Vote will be open for at least 72 hours and will close on 1/23/2015
   
[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)
   
   
I begin the vote with my +1.
   
Thanks,
jeagles
   
   
   
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
  entity
to which it is addressed and may contain information that is
   confidential,
privileged and exempt from disclosure under applicable law. If the
  reader
of this message is not the intended recipient, you are hereby
 notified
   that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender
   immediately
and delete it from your system. Thank You.
   
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
  to
   which it is addressed and may contain information that is confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Handling ATS downtime

2015-01-21 Thread Rohini Palaniswamy
Folks,
 In the middle of big discussion on how to get delegation tokens from
ATS for Oozie jobs, another question came up. What is the behaviour of
running tez jobs if ATS goes down. Haven't tried it out, but my guess is
the job is going to fail. Or do we do something now to handle the failure
and still have the job complete successfully?

Regards,
Rohini