Fwd: Travel Assistance applications open. Please inform your communities

2018-02-14 Thread Alan Gates
-- Forwarded message --
From: Gavin McDonald 
Date: Wed, Feb 14, 2018 at 1:34 AM
Subject: Travel Assistance applications open. Please inform your communities
To: travel-assista...@apache.org


Hello PMCs.

Please could you forward on the below email to your dev and user lists.

Thanks

Gav…

—
The Travel Assistance Committee (TAC) are pleased to announce that travel
assistance applications for ApacheCon NA 2018 are now open!

We will be supporting ApacheCon NA Montreal, Canada on 24th - 29th
September 2018

 TAC exists to help those that would like to attend ApacheCon events, but
are unable to do so for financial reasons.
For more info on this years applications and qualifying criteria, please
visit the TAC website at < http://www.apache.org/travel/ >. Applications
are now open and will close 1st May.

*Important*: Applications close on May 1st, 2018. Applicants have until the
closing date above to submit their applications (which should contain as
much supporting material as required to efficiently and accurately process
their request), this will enable TAC to announce successful awards shortly
afterwards.

As usual, TAC expects to deal with a range of applications from a diverse
range of backgrounds. We therefore encourage (as always) anyone thinking
about sending in an application to do so ASAP.
We look forward to greeting many of you in Montreal

Kind Regards,
Gavin - (On behalf of the Travel Assistance Committee)
—


CFP for Dataworks Summit Sydney

2017-05-03 Thread Alan Gates
The Australia/Pacific version of Dataworks Summit is in Sydney this year, 
September 20-21.   This is a great place to talk about work you are doing in 
Apache Pig or how you are using Pig.  Information on submitting an abstract is 
at https://dataworkssummit.com/sydney-2017/abstracts/submit-abstract/

Tracks:
Apache Hadoop
Apache Spark and Data Science
Cloud and Applications
Data Processing and Warehousing
Enterprise Adoption
IoT and Streaming
Operations, Governance and Security

Deadline: Friday, May 26th, 2017.

Alan.



Call for abstracts open for Dataworks & Hadoop Summit San Jose

2017-01-31 Thread Alan Gates
The Dataworks & Hadoop summit will be in San Jose June 13-15, 2017.  The call 
for abstracts closes February 10.  You can submit an abstract at 
http://tinyurl.com/dwsj17CFA

There are tracks for Hadoop, data processing and warehousing, governance and 
security, IoT and streaming, cloud and operations, and Spark and data science.  
As always the talks will be chosen by committees from the relevant communities.

Alan.

Re: Request for addition as contributor

2016-07-12 Thread Alan Gates
Done.  Welcome to the Pig project!

Alan.

> On Jul 12, 2016, at 06:56, Adam Szita  wrote:
> 
> Hi,
> 
> Can you add my userid (szita) as contributor to Pig please.
> 
> Thanks,
> Adam



Re: [VOTE] Release Pig 0.16.0 (candidate 0)

2016-06-03 Thread Alan Gates
+1.  Checked the signatures, did a build, ran a smoke test.  Looks good.

Alan.

> On Jun 1, 2016, at 23:39, Daniel Dai  wrote:
> 
> Hi,
> 
> I have created a candidate build for Pig 0.16.0.
> 
> Keys used to sign the release are available at
> http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.
> 
> Please download, test, and try it out:
> http://people.apache.org/~daijy/pig-0.16.0-rc0/
> 
> Release notes and the rat report are available at the same location.
> 
> Should we release this? Vote closes on Monday EOD, June 6th 2016.
> 
> Thanks,
> Daniel



Re: please unsubscribe

2016-06-01 Thread Alan Gates
To unsubscribe send email to dev-unsubscr...@pig.apache.org

Alan.

> On Jun 1, 2016, at 07:40, asser dennis  wrote:
> 
> 



Re: Using Hive UDF in pig

2016-04-13 Thread Alan Gates
Pig does not have access to Hive’s metastore to locate default functions, so 
you will have to use the full class name.

Alan.

> On Apr 13, 2016, at 01:51, Siddhi Mehta  wrote:
> 
> Hey Guys,
> 
> I have created a custom Hive UDF and have registered it as a permanent
> function using
> 
> CREATE FUNCTION myfunc AS 'com.package.mycustomfunc' USING JAR
> 'applog-udf.jar', FILE 'distributedcachedir;
> 
> 
> I want to make use of the same hive udf in pig as per jira PIG-3294
> .
> 
> 
> I am able to successfully use the udf if I define it using the full class
> name
> 
> define myfunc HiveUDF('com.package.mycustomfunc');
> 
> 
> My assumption was that custom UDF's can also be defined using the
> functionName/alias rather than the classname.
> 
> 
> When I try to do the same I keep getting errors since it cannot resolve the
> udf name using builtins
> 
> 
> define myfunc HiveUDF('default.myfunc');
> 
> 
> Is this assumption correct or do custom hive udf's need to be referenced
> via their full class name
> 
> --Siddhi



Re: Pig UDF Submission to Piggybank

2016-01-18 Thread Alan Gates
Step one is to open a JIRA ticket on Pig's JIRA: 
https://issues.apache.org/jira/browse/PIG  In this you should describe 
the UDF you've built and features it will add.


Step two is to attach the code as a patch.  This should be generated by 
the svn diff facility (I think there's an option in Eclipse to generate 
your patch, but I'm not an Eclipse use so I don't know for sure).  This 
patch should be attached to the JIRA ticket.  Then mark the ticket as 
"patch available".  This will tell Pig developers that it's ready for 
review.


Alan.


Sudeep Pandey 
January 16, 2016 at 0:14
Good Morning:

I have created JAVA UDF using Eclipse IDE and packaged the jar with the
same IDE. I imported the jar in single node Hortonworks cluster in Virtual
machine.
I ran the code and achieved successful result.

I would like to submit the UDF code to Piggybank. What steps should I
follow? I see lots of information in 'How to Contribute' section of the
Apache Pig website. These information were related to 'ant', 'patch' 
etc. I

don't understand those. Do I need to do those?

Please suggest minimum steps to submit my UDF.

Thanks,
Sudeep Pandey



Re: Using Maven instead of Ant?

2015-11-05 Thread Alan Gates
I think we're all for making the switch, just no one's gotten around to 
doing it.


Alan.


Niels Basjes 
November 5, 2015 at 4:52
Hi,

For me using the ant build system in pig is extremely difficult.
Today I spent about 2 hours trying simply compile and run a test in
piggybank (In case you wonder; it was this one
https://issues.apache.org/jira/browse/PIG-4689 ).
I have not been able to get it to work. In the end I created the test in a
separate (maven) project and after I got that working I copied everything
into the pig source tree and pulled the patch.

Many other projects (like Avro where I'm one of the committers) using 
Maven

makes it trivial to import the project (and sub projects like piggybank)
into almost any IDE. I happen to use IntelliJ

Has such a switch (from ant to maven, or anything else) been 
considered for

the Pig project before?
Do you guys also think it's a good idea to make such a switch?



[jira] [Commented] (PIG-4405) Adding 'map[]' support to mock/Storage

2015-07-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644574#comment-14644574
 ] 

Alan Gates commented on PIG-4405:
-

Based on the way it's used I'm surprised to see the HashMap wrapped in a Tuple. 
 That will work because Pig allows nesting of types, but it doesn't seem 
necessary for what you're trying to do.

 Adding 'map[]' support to mock/Storage
 --

 Key: PIG-4405
 URL: https://issues.apache.org/jira/browse/PIG-4405
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Niels Basjes
Assignee: Niels Basjes
 Fix For: 0.16.0

 Attachments: PIG-4405-20150723.patch


 The mock/Storage contains convenience methods for creating a bag and a tuple 
 when doing unit tests. Pig has however 3 complex data types ( see 
 http://pig.apache.org/docs/r0.14.0/basic.html#Simple+and+Complex ) and the 
 third one (the map) is not yet present in such a convenience method.
 Feature request: Add such a method to facilitate testing map[] output better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: PigMix extension

2015-07-15 Thread Alan Gates
The initial goal of PigMix was definitely to give the project a way to 
measure itself against MapReduce and between different versions of 
releases.  So that falls into your synthetic category.


That said, if adding a field enables extending the bench mark into new 
territory and makes it more useful then that seems like a clear win.


Alan.


Keren Ouaknine mailto:ker...@gmail.com
July 14, 2015 at 12:44
Hi,

I am working on expanding the PigMix benchmark.
I am interested to add queries matching more realistic use cases, such as
finding what are the highest revenue of a page or what is the burst of
activity for a specific page. Additionally, I would like to add OLTP-like
queries such as finding other users from the same neighborhood looking 
at a

specific page.

The current PigMix table does not have an id for a page access (see 
details
on page_views here 
https://cwiki.apache.org/confluence/display/PIG/PigMix).

Therefore I cannot run the above queries.

I am wondering why was this field omitted from the schema of page_views?
It seems a fundamental field for all aggregation queries on page_views.

I see two options: either there is another use case that this schema
targets (what is it?) or the benchmark's goal is not to target real use
cases and is merely oriented towards a synthetic performance and
measurement goal.

Any ideas?

Thank you,
Keren

​PS: I sent this email to both the devs and users' mailing list, not to
spam us :) but because these queries are both a users and a development
concern. ​




Re: [VOTE] Release Pig 0.15.0 (candidate 1)

2015-06-01 Thread Alan Gates
+1.  Checked the keys and signature.  Looked for any binary files in the 
source.  Made sure there were no snapshot dependencies.  Ran test-commit 
and a quick smoke test.


Alan.


Daniel Dai mailto:da...@hortonworks.com
June 1, 2015 at 12:04
Hi,

I have created a candidate build for Pig 0.15.0.

Keys used to sign the release are available at
http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.

Please download, test, and try it out:
http://people.apache.org/~daijy/pig-0.15.0-candidate-1/

Release notes and the rat report are available at the same location.

Should we release this? Vote closes on Thursday EOD, June 4th 2015.

Thanks,
Daniel



Re: [VOTE] Release Pig 0.15.0 (candidate 0)

2015-05-26 Thread Alan Gates
+1.  Downloaded, checked signature and hash, built, ran test-commit and 
simple local smoke test.


Alan.


Daniel Dai mailto:da...@hortonworks.com
May 25, 2015 at 20:36
Hi,

I have created a candidate build for Pig 0.15.0.

Keys used to sign the release are available at
http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.

Please download, test, and try it out:
http://people.apache.org/~daijy/pig-0.15.0-candidate-0/

Release notes and the rat report are available at the same location.

Should we release this? Vote closes on Thursday EOD, May 28th 2015.

Thanks,
Daniel



Re: HowTo build pig

2015-05-15 Thread Alan Gates

Usually 'ant' or 'ant jar' will build the jars.

Alan.


Serega Sheypak mailto:serega.shey...@gmail.com
May 14, 2015 at 1:40
Hi, trying to contribute
https://issues.apache.org/jira/browse/PIG-4550

Hi, can you give me guide for building pig? Usually I use maven.
I see that ivy requires two properties: hadoopVersion and hbaseVersion.

1. Is there any list of properties required to build project?
2. Do I have to contribute to trunk?
3. Which hadoop/hbase vesions I have to pick up?
4. Do I have to test contribution against different combinations of
hadoop/hbase versions?

I've read this one:
https://cwiki.apache.org/confluence/display/PIG/HowToContribute

but it gives general rules, nothing specific.



[jira] [Updated] (PIG-4525) Clarify Scalar has more than one row in the output.

2015-04-30 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-4525:

   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Patch committed.  Thanks Niels.

 Clarify Scalar has more than one row in the output.
 -

 Key: PIG-4525
 URL: https://issues.apache.org/jira/browse/PIG-4525
 Project: Pig
  Issue Type: Improvement
Reporter: Niels Basjes
Assignee: Niels Basjes
Priority: Trivial
 Fix For: 0.15.0

 Attachments: PIG-4525-2015-04-30-1115.patch


 The exception Scalar has more than one row in the output. is correct yet is 
 reason for many (starting) pig developers to search the internet for a 
 solution.
 I propose (and I'll include a patch) to simply extend the exception message 
 with a hint towards the right solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3294) Allow Pig use Hive UDFs

2015-04-07 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484012#comment-14484012
 ] 

Alan Gates commented on PIG-3294:
-

+1.

I agree it makes sense to make HCatLoader/Storer share the conversion code.  We 
can file a separate JIRA for that.

 Allow Pig use Hive UDFs
 ---

 Key: PIG-3294
 URL: https://issues.apache.org/jira/browse/PIG-3294
 Project: Pig
  Issue Type: New Feature
Reporter: Daniel Dai
Assignee: Daniel Dai
  Labels: gsoc2013, java
 Fix For: 0.15.0

 Attachments: PIG-3294-1.patch, PIG-3294-2.patch, PIG-3294-3.patch, 
 PIG-3294-4.patch, PIG-3294-5.patch, PIG-3294-before-refactory.patch


 It would be nice if Pig provide some interoperability with Hive. We can wrap 
 Hive UDF in Pig so we can use Hive UDF in Pig.
 This is a candidate project for Google summer of code 2013. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2013



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3294) Allow Pig use Hive UDFs

2015-04-02 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393057#comment-14393057
 ] 

Alan Gates commented on PIG-3294:
-

The checking in of Hive code is ugly.  We need to make sure that gets removed 
before a release so we don't end up forking.

In POForEach you are visiting the physical plan at run time to determine if we 
need the last record.  Could this not be done at compile time to save time and 
runtime?

HiveUtils.java: much of this code to convert Hive types to Pig types must 
already be in HCat.  Is it not possible to re-use that?

 Allow Pig use Hive UDFs
 ---

 Key: PIG-3294
 URL: https://issues.apache.org/jira/browse/PIG-3294
 Project: Pig
  Issue Type: New Feature
Reporter: Daniel Dai
Assignee: Daniel Dai
  Labels: gsoc2013, java
 Fix For: 0.15.0

 Attachments: PIG-3294-1.patch, PIG-3294-2.patch, PIG-3294-3.patch, 
 PIG-3294-4.patch, PIG-3294-before-refactory.patch


 It would be nice if Pig provide some interoperability with Hive. We can wrap 
 Hive UDF in Pig so we can use Hive UDF in Pig.
 This is a candidate project for Google summer of code 2013. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2013



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4417) Pig's register command should support automatic fetching of jars from repo.

2015-03-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378198#comment-14378198
 ] 

Alan Gates commented on PIG-4417:
-

A couple of comments:
# Review board is great for reviewing the patch, but to be official it has to 
be attached here too.
# Why is the DownloadResolver all static?  Why not make it an object with a 
single method?  This is just a style gripe and not a blocker for checking in 
the code.

 Pig's register command should support automatic fetching of jars from repo.
 ---

 Key: PIG-4417
 URL: https://issues.apache.org/jira/browse/PIG-4417
 Project: Pig
  Issue Type: Improvement
Reporter: Akshay Rai
Assignee: Akshay Rai

 Currently Pig's register command takes a local path to a dependency jar . 
 This clutters the local file-system as users may forget to remove this jar 
 later.
 It would be nice if Pig supported a Gradle like notation to download the jar 
 from a repository.
 Ex: At the top of the Pig script a user could add
 register 'group:module:version'; 
 It should be backward compatible and should support a local file path if so 
 desired.
 RB: https://reviews.apache.org/r/31662/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: use hcatalog in eclipse/pig

2015-02-27 Thread Alan Gates

What error message are you getting?

Alan.


李运田 mailto:cumt...@163.com
February 26, 2015 at 18:58
I want to use hcatalog in eclipse to deal with tables in hive.
but I cant store table into hive::
pigServer.registerQuery(tmp = load 'pig' using 
org.apache.hcatalog.pig.HCatLoader(););

pigServer.registerQuery(tmp = foreach tmp generate id;);
pigServer.registerQuery(store tmp into 'hive' using 
org.apache.hcatalog.pig.HCatStorer(););

I can store into file::
pigServer.registerQuery(a = LOAD '/user/hadoop/pig.txt' ;);
pigServer.store(a, /user/hadoop/pig1.txt);
pigServer.registerQuery(store a into '/user/hadoop/pig2.txt';);
perhaps ,the hcatalog jars are wrong?


Re: Newbie

2015-02-04 Thread Alan Gates
Are you looking for ways to contribute to the project?  A great way is 
to find a JIRA someone has filed and fix it.  Another is to write a UDF 
that you always wished Pig had.  If you do that, be sure and file a JIRA 
for it.  Also, check out the Developer Documentation section in 
https://cwiki.apache.org/confluence/display/PIG/Index


Alan.


Dilip Ramesh mailto:dilip...@gmail.com
February 4, 2015 at 5:55
Hello All,

I'm new to the developers list. I have used Pig before. Can anyone 
guide me

to a good start here?

Thank You,
D


*Dilip Ramesh*


*President - Nirmaan Goa http://www.nirmaan.org/chapters/goaB.E. (Hons.)
Computer Science III yearBirla Institute of Technology 
Science, PilaniK.K. Birla Goa Campus+91 9561442426 |
https://www.facebook.com/dilip.ramesh.19*



Re: [VOTE] Release Pig 0.14.0 (candidate 1)

2014-11-17 Thread Alan Gates
+1, checked the signatures, checked the LICENSE and NOTICE files, 
checked for stray binaries, built it and ran some basic smoke tests.


Alan.


Daniel Dai mailto:da...@hortonworks.com
November 16, 2014 at 19:17
Hi,

I have created a candidate build for Pig 0.14.0.

Keys used to sign the release are available at
http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.

Please download, test, and try it out:
http://people.apache.org/~daijy/pig-0.14.0-candidate-1/

Release notes and the rat report are available at the same location.

Should we release this? Vote closes on next Wednesday EOD, Nov 19th 2014.

Thanks,
Daniel



--
Sent with Postbox http://www.getpostbox.com

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Pig 0.14.0 release plan

2014-11-07 Thread Alan Gates
If you're wanting to get HIVE-8484 into Hive 0.14, you should talk to 
Gunther ASAP as he is planning on rolling a release candidate today I 
believe.


Alan.


Lorand Bendig mailto:lben...@gmail.com
November 7, 2014 at 0:23
Hi Daniel,

Currently Pig fetch mode throws an exception if a query is performed 
through HCatalog.
At Pig side I fixed PIG-4238, but there's a tiny patch at the Hive 
side as well (HIVE-8484)
which would be great to have in 0.14. As a Hive committer, would you 
please review it?


Thanks,
Lorand





--
Sent with Postbox http://www.getpostbox.com

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (PIG-4253) Add a SequenceID UDF

2014-10-30 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191040#comment-14191040
 ] 

Alan Gates commented on PIG-4253:
-

+1

 Add a SequenceID UDF
 

 Key: PIG-4253
 URL: https://issues.apache.org/jira/browse/PIG-4253
 Project: Pig
  Issue Type: Improvement
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: PIG-4253-1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Drop support for Hadoop 0.20 from Pig 0.14

2014-09-17 Thread Alan Gates

+1.

Alan.


Rohini Palaniswamy mailto:rohini.adi...@gmail.com
September 16, 2014 at 21:38
Hi,
Hadoop has matured far from Hadoop 0.20 and has had two major releases
after that and there has been no development on branch-0.20 (
http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/) for 3
years now. It is high time we drop support for Hadoop 0.20 and only 
support

Hadoop 1.x and 2.x lines going forward. This will reduce the maintenance
effort and also enable us to right more efficient code and cut down on
reflections.

Vote closes on Tuesday, Sep 23 2014.

Thanks,
Rohini



--
Sent with Postbox http://www.getpostbox.com

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [VOTE] Drop support for JDK 6 from Pig 0.14

2014-09-17 Thread Alan Gates

+1.

Alan.


Rohini Palaniswamy mailto:rohini.adi...@gmail.com
September 16, 2014 at 21:47
Hi,
Hadoop is dropping support for JDK6 from hadoop-2.7 this year as
mentioned in the mail below. Pig should also move to JDK7 to be able to
compile against future hadoop 2.x releases and start making releases with
jars (binaries, maven repo) compiled in JDK 7. This would also open it up
for developers to code with JDK7 specific APIs.

Vote closes on Tuesday, Sep 23 2014.

Thanks,
Rohini




-- Forwarded message --
From: Arun C Murthy a...@hortonworks.com
Date: Tue, Aug 19, 2014 at 10:52 AM
Subject: Dropping support for JDK6 in Apache Hadoop
To: d...@hbase.apache.org d...@hbase.apache.org, d...@hive.apache.org,
dev@pig.apache.org, d...@oozie.apache.org
Cc: common-...@hadoop.apache.org common-...@hadoop.apache.org


[Apologies for the wide distribution.]

Dear HBase/Hive/Pig/Oozie communities,

We, over at Hadoop are considering dropping support for JDK6 this year.

As you maybe aware we just released hadoop-2.5.0 and are now considering
making the next release i.e. hadoop-2.6.0 the *last* release of Apache
Hadoop which supports JDK6. This means, from hadoop-2.7.0 onwards we will
not support JDK6 anymore and we *may* start relying on JDK7-specific apis.

Now, the above releases a proposal and we do not want to pull the trigger
without talking to projects downstream - hence the request for you 
feedback.


Please feel free to forward this to other communities you might deem to be
at risk from this too.

thanks,
Arun



--
Sent with Postbox http://www.getpostbox.com

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [DISCUSS] Re: Dropping support for JDK6 in Apache Hadoop

2014-08-26 Thread Alan Gates
I'm +1 on both of these.  But as a side note Hive actually still 
supports Hadoop 0.20, so you're statement below isn't quite true.


Alan.


Rohini Palaniswamy mailto:rohini.adi...@gmail.com
August 26, 2014 at 9:36
Pig has support for jdk7 from Pig 0.10. I think we should drop support for
JDK6 from Pig 0.14 and also publish maven binaries with jdk 1.7 from Pig
0.14.

Also it is high time to drop support for Hadoop 0.20. None of the other
hadoop projects officially support Hadoop 0.20 anymore. I would like 
to get

rid of the reflection in code w.r.t to UGI, be able to add support for
fetching Credentials in UDFs, Load and StoreFunc, etc.

If there are no major objections, will start two separate voting threads
for that.

Regards,
Rohini





--
Sent with Postbox http://www.getpostbox.com

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [VOTE] Release Pig 0.13.0 (candidate 1)

2014-06-30 Thread Alan Gates
+1, checked the signatures, did a build and ran commit-test, ran some 
smoke tests, built piggy-bank and ran its unit tests.  I did see one 
unit test failure in piggybank (PigDBStorage), though Daniel couldn't 
reproduce it in his environment.


Alan.


Daniel Dai mailto:da...@hortonworks.com
June 29, 2014 at 2:50 AM
Hi,

I have created a candidate build for Pig 0.13.0.

Keys used to sign the release are available at
http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.

Please download, test, and try it out:
http://people.apache.org/~daijy/pig-0.13.0-candidate-1/

Release notes and the rat report are available at the same location.

Should we release this? Vote closes on Wednesday, July 2nd 2014.

Thanks,
Daniel



--
Sent with Postbox http://www.getpostbox.com

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (PIG-2122) Parameter Substitution doesn't work in the Grunt shell

2014-06-25 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043760#comment-14043760
 ] 

Alan Gates commented on PIG-2122:
-

+1 for the patch.

[~olgan], I don't see the backwards compatibility issue.  By definition this is 
for interactive sessions, so users can't have existing scripts that change 
behavior.  I suppose someone somewhere might regularly use $x in his 
interactive session and expect it to come out as $x rather than complain that 
it can't make the substitution, but that seems 1) unlikely, and 2) easy to fix.

 Parameter Substitution doesn't work in the Grunt shell
 --

 Key: PIG-2122
 URL: https://issues.apache.org/jira/browse/PIG-2122
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.8.0, 0.8.1, 0.12.0
Reporter: Grant Ingersoll
Assignee: Daniel Dai
Priority: Minor
 Fix For: 0.14.0

 Attachments: PIG-2122-1.patch


 Simple param substitution and things like %declare (as copied out of the 
 docs) don't work in the grunt shell.
 #Start Pig with: Start Pig with: bin/pig -x local -p time=FOO
 {quote}
 foo = LOAD '/user/grant/foo.txt' AS (a:chararray, b:chararray, c:chararray);
 Y = foreach foo generate *, '$time';
 dump Y;
 {quote}
 Output:
 {quote}
 2011-06-13 20:22:24,197 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input 
 paths to process : 1
 (1 2 3,,,$time)
 (4 5 6,,,$time)
 {quote}
 Same script, stored in junk.pig, run as: bin/pig -x local -p time=FOO junk.pig
 {quote}
 2011-06-13 20:23:38,864 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input 
 paths to process : 1
 (1 2 3,,,FOO)
 (4 5 6,,,FOO)
 {quote}
 Also, things like don't work (nor does %declare):
 {quote}
 grunt %default DATE '20090101';
 2011-06-13 20:18:19,943 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Encountered  PATH %default  at line 1, 
 column 1.
 Was expecting one of:
 EOF 
 cat ...
 fs ...
 sh ...
 cd ...
 cp ...
 copyFromLocal ...
 copyToLocal ...
 dump ...
 describe ...
 aliases ...
 explain ...
 help ...
 kill ...
 ls ...
 mv ...
 mkdir ...
 pwd ...
 quit ...
 register ...
 rm ...
 rmf ...
 set ...
 illustrate ...
 run ...
 exec ...
 scriptDone ...
  ...
 EOL ...
 ; ...
 
 Details at logfile: 
 /Users/grant.ingersoll/projects/apache/pig/release-0.8.1/pig_1308002917912.log
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-4019) Compilation broken after TEZ-1169

2014-06-18 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14036154#comment-14036154
 ] 

Alan Gates commented on PIG-4019:
-

+1

 Compilation broken after TEZ-1169
 -

 Key: PIG-4019
 URL: https://issues.apache.org/jira/browse/PIG-4019
 Project: Pig
  Issue Type: Bug
  Components: tez
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.14.0

 Attachments: PIG-4019-1.patch


 Error message:
 {code}
 [javac] 
 /Users/daijy/pig/src/org/apache/pig/backend/hadoop/executionengine/tez/PartitionerDefinedVertexManager.java:95:
  
 setVertexParallelism(int,org.apache.tez.dag.api.VertexLocationHint,java.util.Mapjava.lang.String,org.apache.tez.dag.api.EdgeManagerDescriptor,java.util.Mapjava.lang.String,org.apache.tez.runtime.api.RootInputSpecUpdate)
  in org.apache.tez.dag.api.VertexManagerPluginContext cannot be applied to 
 (int,nulltype,java.util.Mapjava.lang.String,org.apache.tez.dag.api.EdgeManagerDescriptor)
 [javac] context.setVertexParallelism(dynamicParallelism, 
 null, edgeManagers);
 [javac]^
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3373) XMLLoader returns non-matching nodes when a tag name spans through the block boundary

2014-05-02 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3373:


Status: Open  (was: Patch Available)

Sorry, but the patch no longer applies and I couldn't figure out how apply it 
manually.

 XMLLoader returns non-matching nodes when a tag name spans through the block 
 boundary
 -

 Key: PIG-3373
 URL: https://issues.apache.org/jira/browse/PIG-3373
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Affects Versions: site
Reporter: Ahmed Eldawy
Assignee: Ahmed Eldawy
  Labels: patch
 Attachments: PIG3373.patch, PIG3373_1.patch, PIG3373_2.patch, 
 PIG3373_3.patch, bad-file.xml.bz2, test-file-2.xml.bz2


 When node start tag spans two blocks this tag is returned even if it is not 
 of the type.
 Example: For the following input file
 event id=3423
 ev
  BLOCK BOUNDARY
 entually id=dfasd
 XMLoader with tag type 'event' should return only the first one but it 
 actually returns both of them



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3735) UDF to data cleanse the dirty data with expected pattern

2014-04-29 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3735:


Status: Open  (was: Patch Available)

Canceling patch pending inclusion of a unit test.

 UDF to data cleanse the dirty data with expected pattern
 

 Key: PIG-3735
 URL: https://issues.apache.org/jira/browse/PIG-3735
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.1
Reporter: Rekha Joshi
Assignee: Rekha Joshi
  Labels: piggybank
 Fix For: 0.10.1

 Attachments: PIG-3735.1.patch


 In data processing, often the data is not clean.
 This udf works on large scale data and purifies the data with expected pattern



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3613) UDF for SimilarityMatching between strings with matching scores

2014-04-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977111#comment-13977111
 ] 

Alan Gates commented on PIG-3613:
-

[~rekhajoshm], thanks for the update.  You need to add a unit test so we can 
confirm this works as we make changes to Pig going forward.

 UDF for SimilarityMatching between strings with matching scores
 ---

 Key: PIG-3613
 URL: https://issues.apache.org/jira/browse/PIG-3613
 Project: Pig
  Issue Type: Task
  Components: piggybank
Affects Versions: 0.10.1
Reporter: Rekha Joshi
Assignee: Rekha Joshi
  Labels: piggybank
 Fix For: 0.10.1

 Attachments: PIG-3613.0.patch, PIG-3613.1.patch


 It would be great if we can do similarity matching between strings on big 
 data using pig udf.
 Proposed udf works on tuple of strings and gives a matching score.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3613) UDF for SimilarityMatching between strings with matching scores

2014-04-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3613:


Status: Open  (was: Patch Available)

 UDF for SimilarityMatching between strings with matching scores
 ---

 Key: PIG-3613
 URL: https://issues.apache.org/jira/browse/PIG-3613
 Project: Pig
  Issue Type: Task
  Components: piggybank
Affects Versions: 0.10.1
Reporter: Rekha Joshi
Assignee: Rekha Joshi
  Labels: piggybank
 Fix For: 0.10.1

 Attachments: PIG-3613.0.patch, PIG-3613.1.patch


 It would be great if we can do similarity matching between strings on big 
 data using pig udf.
 Proposed udf works on tuple of strings and gives a matching score.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (PIG-3892) Pig distribution for hadoop 2

2014-04-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970001#comment-13970001
 ] 

Alan Gates commented on PIG-3892:
-

+1 for 1.  IIRC bin/hadoop has a -version option, so we don't even need to 
depend on magic jars being present, we can just ask hadoop.

 Pig distribution for hadoop 2
 -

 Key: PIG-3892
 URL: https://issues.apache.org/jira/browse/PIG-3892
 Project: Pig
  Issue Type: Bug
  Components: build
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.13.0


 Currently Pig distribution only bundle pig.jar for Hadoop 1. For Hadoop 2 
 users they need to compile again using -Dhadoopversion=23 flag. That is a 
 quite confusing process. We need to make Pig work with Hadoop 2 out of box. I 
 am thinking two approaches:
 1. Bundle both pig-h1.jar and pig-h2.jar in distribution, and bin/pig will 
 chose the right pig.jar to run
 2. Make two Pig distributions for Hadoop 1 and Hadoop 
 Any opinion?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Release Pig 0.12.1 (Candidate 0)

2014-04-10 Thread Alan Gates
+1

Reviewed LICENSE, NOTICE, RELEASE_NOTES, and README.  Built, built piggybank 
and ran tests, ran a local smoke test.

Alan.

On Apr 7, 2014, at 1:22 PM, Prashant Kommireddi prkommire...@apache.org wrote:

 I have created a candidate build for Pig 0.12.1. This is a maintenance
 release to Pig 0.12.0 with a few critical bug fixes.
 
 Keys used to sign the release are available at
 http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup.
 
 Please download, test, and try it out:
 
 http://people.apache.org/~prkommireddi/pig-0.12.1-candidate-0/
 
 
 Release notes and the rat report are available from the same location.
 
 
 List of issues fixed in this release
 
 http://svn.apache.org/viewvc/pig/branches/branch-0.12/CHANGES.txt?view=markup
 
 Should we release this? Vote closes EOD this Thursday, April 10th.
 
 -Prashant


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: UDF for converting datatype bytearray to UUID

2014-02-24 Thread Alan Gates
Definitely.  File a JIRA with a patch and one of us can review it.

Alan.

On Feb 22, 2014, at 8:08 PM, deepak rosario tharigopla 
rozartharigo...@gmail.com wrote:

 HI Guys,
 
 I have written an UDF which when used around a column like an aggregate
 function in the PIG script, will convert the bytearray to UUID or HEX chars
 which is read from a cassandra table's UUID column. Could be an handy
 function for users when they use pig scripts on cassandra database.
 
 Does this qualify as an UDF to be added to the master pig's piggybank.
 
 Please comment..
 
 -- 
 Thanks  Regards
 Deepak Rosario Pancras
 *Achiever/Responsibility/Arranger/Maximizer/Harmony*


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (PIG-3774) Piggybank Over UDF get wrong result

2014-02-20 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13907765#comment-13907765
 ] 

Alan Gates commented on PIG-3774:
-

+1.  

 Piggybank Over UDF get wrong result
 ---

 Key: PIG-3774
 URL: https://issues.apache.org/jira/browse/PIG-3774
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.1, 0.13.0

 Attachments: PIG-3774-1.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Dev here. How can I help?

2014-02-14 Thread Alan Gates
Definitely start on whatever you like.  Once you pick a JIRA to start on send 
mail to the list so we can assign it to you.  That way if people have questions 
or feedback on it they know someone’s working on it.

Alan.

On Feb 14, 2014, at 2:03 AM, Kris Peeters peetersk...@gmail.com wrote:

 I'm a dev with quite a few years of experience. I love pig and I want to
 help out. I browsed through the Jira tickets. Is there anything that has a
 higher priority and that a newbie in Pig can start on? Or can I just start
 on whatever I'd like?


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (PIG-3642) Direct HDFS access for small jobs (fetch)

2014-01-02 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860531#comment-13860531
 ] 

Alan Gates commented on PIG-3642:
-

I don't think this will result in the same local mode/mr mode problem that we 
had before.  The issue there was we tried (and failed) to have two modes where 
Pig provided all features.  This is much more limited to doing things locally 
that can easily be done locally.

 Direct HDFS access for small jobs (fetch) 
 --

 Key: PIG-3642
 URL: https://issues.apache.org/jira/browse/PIG-3642
 Project: Pig
  Issue Type: Improvement
Reporter: Lorand Bendig
Assignee: Lorand Bendig
 Fix For: 0.13.0

 Attachments: PIG-3642.patch


 With this patch I'd like to add the possibility to directly read data from 
 HDFS instead of launching MR jobs in case of simple (map-only) tasks. Hive 
 already has this feature (fetch). This patch shares some similarities with 
 the local mode of Pig 0.6. Here, fetching kicks off when the following holds 
 for a script:
 * it contains only LIMIT, FILTER, UNION (if no split is generated), STREAM, 
 (nested) FOREACH with expression operators, custom UDFs..etc
 * no scalar aliases
 * no SampleLoader
 * single leaf job
 * DUMP (no STORE)
 The feature is enabled by default and can be toggled with:
 * -N or -no_fetch 
 * set opt.fetch true/false; 
 There's no STORE support because I wanted to make it explicit that this 
 optimization is for launching small/simple scripts during development, 
 rather than querying and filtering large number of rows on the client 
 machine. However, a threshold could be given on the input size (an 
 estimation) to determine whether to prefer fetch over MR jobs, similar to 
 what Hive's '{{hive.fetch.task.conversion.threshold}}' does. (through Pig's 
 LoadMetadata#getStatistic ?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3622) Allow casting bytearray fileds to bytearray type

2013-12-13 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13848068#comment-13848068
 ] 

Alan Gates commented on PIG-3622:
-

Have you tested that this works ok with the rest of the code?  Does something 
remove the (unnecessary) cast?  If not it seems like there will be issues, as 
there is no binary cast in Pig.  

 Allow casting bytearray fileds to bytearray type
 

 Key: PIG-3622
 URL: https://issues.apache.org/jira/browse/PIG-3622
 Project: Pig
  Issue Type: Improvement
 Environment: 0.12
Reporter: Redis Liu
Priority: Minor
 Attachments: 3622-v1.patch


 test.pig:
 AA = load '1.txt' USING PigStorage(' ') as (a:bytearray, b:chararray, 
 c:chararray);
 AA1 = filter AA by a == '1';
 AA2 = foreach AA1 generate *, ( a == '1' ? a : null ) as myd;
 dump AA2;
 the INPUT file 1.txt is as below:
 a b c
 1 2 3
 4 5 6
 2 3 4
 b a c
 c a b
 run the pig script in this way:
 # pig -x local test.pig
 It'll fail with this error message:
 Pig Stack Trace
 ---
 ERROR 1051: Cannot cast to bytearray
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias AA2
   at org.apache.pig.PigServer.openIterator(PigServer.java:882)
   at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
   at org.apache.pig.Main.run(Main.java:607)
   at org.apache.pig.Main.main(Main.java:156)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:200)
 Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias AA2
   at org.apache.pig.PigServer.storeEx(PigServer.java:984)
   at org.apache.pig.PigServer.store(PigServer.java:944)
   at org.apache.pig.PigServer.openIterator(PigServer.java:857)
   ... 12 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 1059: 
 file test.pig, line 7, column 6 Problem while reconciling output schema of 
 ForEach
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.throwTypeCheckerException(TypeCheckingRelVisitor.java:142)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:182)
   at 
 org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:76)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1733)
   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1710)
   at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1411)
   at org.apache.pig.PigServer.storeEx(PigServer.java:979)
   ... 14 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 2216: 
 file test.pig, line 7, column 34 Problem getting fieldSchema for (Name: 
 Cast Type: bytearray Uid: 17)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:603)
   at 
 org.apache.pig.newplan.logical.expression.BinCondExpression.accept(BinCondExpression.java:84)
   at 
 org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visitExpressionPlan(TypeCheckingRelVisitor.java:191)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:157)
   at 
 org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:242)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:174)
   ... 21 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 1051: Cannot cast to bytearray
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:494)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.insertCast

[jira] [Updated] (PIG-3622) Allow casting bytearray fileds to bytearray type

2013-12-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3622:


Assignee: Redis Liu

 Allow casting bytearray fileds to bytearray type
 

 Key: PIG-3622
 URL: https://issues.apache.org/jira/browse/PIG-3622
 Project: Pig
  Issue Type: Improvement
 Environment: 0.12
Reporter: Redis Liu
Assignee: Redis Liu
Priority: Minor
 Attachments: 3622-v1.patch


 test.pig:
 AA = load '1.txt' USING PigStorage(' ') as (a:bytearray, b:chararray, 
 c:chararray);
 AA1 = filter AA by a == '1';
 AA2 = foreach AA1 generate *, ( a == '1' ? a : null ) as myd;
 dump AA2;
 the INPUT file 1.txt is as below:
 a b c
 1 2 3
 4 5 6
 2 3 4
 b a c
 c a b
 run the pig script in this way:
 # pig -x local test.pig
 It'll fail with this error message:
 Pig Stack Trace
 ---
 ERROR 1051: Cannot cast to bytearray
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias AA2
   at org.apache.pig.PigServer.openIterator(PigServer.java:882)
   at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:774)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
   at org.apache.pig.Main.run(Main.java:607)
   at org.apache.pig.Main.main(Main.java:156)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:200)
 Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store alias AA2
   at org.apache.pig.PigServer.storeEx(PigServer.java:984)
   at org.apache.pig.PigServer.store(PigServer.java:944)
   at org.apache.pig.PigServer.openIterator(PigServer.java:857)
   ... 12 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 1059: 
 file test.pig, line 7, column 6 Problem while reconciling output schema of 
 ForEach
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.throwTypeCheckerException(TypeCheckingRelVisitor.java:142)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:182)
   at 
 org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:76)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1733)
   at org.apache.pig.PigServer$Graph.compile(PigServer.java:1710)
   at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1411)
   at org.apache.pig.PigServer.storeEx(PigServer.java:979)
   ... 14 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 2216: 
 file test.pig, line 7, column 34 Problem getting fieldSchema for (Name: 
 Cast Type: bytearray Uid: 17)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:603)
   at 
 org.apache.pig.newplan.logical.expression.BinCondExpression.accept(BinCondExpression.java:84)
   at 
 org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visitExpressionPlan(TypeCheckingRelVisitor.java:191)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:157)
   at 
 org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:242)
   at 
 org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:174)
   ... 21 more
 Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
 ERROR 1051: Cannot cast to bytearray
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:494)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.insertCast(TypeCheckingExpVisitor.java:472)
   at 
 org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:599)
   ... 30 more

[jira] [Updated] (PIG-3619) Provide XPath function

2013-12-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3619:


Assignee: Saad Patel

 Provide XPath function
 --

 Key: PIG-3619
 URL: https://issues.apache.org/jira/browse/PIG-3619
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Saad Patel
Assignee: Saad Patel
 Attachments: xpath.patch


 Xml is often loaded using XMLLoader with a record boundary tag as one of the 
 parameters. A common use case is to then extract data from those records. 
 XPath would allow those extractions to be done very easily. I'm  proposing a 
 patch that adds simple XPath support as a UDF.
 Example usage of this the XPath UDF would be:
 {code}
 extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
 XPath(record, 'book/title');
 {code}
 The proposed UDF also caches the last xml document. This is helpful for 
 improving performance when multiple consecutive xpath extractions on the same 
 xml document, such as the example above. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Resolved] (PIG-3619) Provide XPath function

2013-12-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-3619.
-

Resolution: Fixed

Patch checked in.  Thanks Saad.

 Provide XPath function
 --

 Key: PIG-3619
 URL: https://issues.apache.org/jira/browse/PIG-3619
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Saad Patel
Assignee: Saad Patel
 Attachments: xpath.patch


 Xml is often loaded using XMLLoader with a record boundary tag as one of the 
 parameters. A common use case is to then extract data from those records. 
 XPath would allow those extractions to be done very easily. I'm  proposing a 
 patch that adds simple XPath support as a UDF.
 Example usage of this the XPath UDF would be:
 {code}
 extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
 XPath(record, 'book/title');
 {code}
 The proposed UDF also caches the last xml document. This is helpful for 
 improving performance when multiple consecutive xpath extractions on the same 
 xml document, such as the example above. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (PIG-3558) ORC support for Pig

2013-12-09 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843632#comment-13843632
 ] 

Alan Gates commented on PIG-3558:
-

+1.

 ORC support for Pig
 ---

 Key: PIG-3558
 URL: https://issues.apache.org/jira/browse/PIG-3558
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.13.0

 Attachments: PIG-3558-1.patch, PIG-3558-2.patch, PIG-3558-3.patch


 Adding LoadFunc and StoreFunc for ORC.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (PIG-3548) Allow pig to load multiple paths specified in a filenames.txt

2013-11-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13824148#comment-13824148
 ] 

Alan Gates commented on PIG-3548:
-

Could you store the parameters in a file rather than specify them on the 
command line?  See http://pig.apache.org/docs/r0.12.0/cont.html#Parameter-Sub 
for details.

 Allow pig to load multiple paths specified in a filenames.txt
 -

 Key: PIG-3548
 URL: https://issues.apache.org/jira/browse/PIG-3548
 Project: Pig
  Issue Type: Improvement
Reporter: Madhavi Nadig

 I have a list of paths stored in a filenames.txt. I would like to load them 
 all using a single LOAD command. The paths don't conform to one or more 
 regexes, so they have to specified individually.
 So far I've used the -param option with pig to specify them. But it results 
 in an extremely long commandline and I'm afraid I wont be able to scale my 
 script.
 shell : pig -param read_paths=my-long-list-of-paths something.pig
 something.pig : requests = LOAD '$read_paths' USING PigStorage(',');



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: How do we determine 'stable' pig version?

2013-10-22 Thread Alan Gates
I don't think we should change our use of stable.  Our usage is in line with 
the Hadoop usage of the term in their releases.  To the best of our knowledge 
as Apache developers it is stable.  It passes all of the tests we have.  We 
have no criteria for deciding stability beyond this.

Alan.

On Oct 22, 2013, at 4:00 PM, Daniel Dai wrote:

 Yes, we can revisit. The question is how to determine the stability? 0.11.1
 is released for a while and should be considered stable, but actually it
 contains problem raised just recently. After we release 0.12.1, how soon
 should we declare it a stable release?
 
 Thanks,
 Daniel
 
 
 On Tue, Oct 22, 2013 at 2:25 PM, Koji Noguchi knogu...@yahoo-inc.comwrote:
 
 Thanks Daniel, Olga!  Keeping 3 versions would be nice.
 
 As for 'stable', can we revisit the definition?
 If it's *always* pointing to the latest release, I don't see the need for
 having this link(dir).
 Is it adding any value?
 
 Koji
 
 
 
 
 On Oct 22, 2013, at 1:43 PM, Daniel Dai da...@hortonworks.com wrote:
 
 That's totally make sense. Let's keep both download/documentation for 3
 versions.
 
 Thanks,
 Daniel
 
 
 On Tue, Oct 22, 2013 at 10:20 AM, Olga Natkovich onatkov...@yahoo.com
 wrote:
 
 Couple of suggestions:
 
 (1) I think we are trying to go for a more frequent release model and in
 that case it would make sense to keep perhaps 3 releases. Based on our
 experience at Yahoo, Pig 10 is the really stable release. We recently
 found
 a couple of critical bugs in 11 for which we posted patches. Also the
 community knows that we delayed a couple of key bugs in 12 till 12.1
 (2) Our documentation needs to be consistent with the number of releases
 we advertise as supported. Our docs currently go all the way to Pig 9.
 
 Olga
 
 
 
 On Tuesday, October 22, 2013 10:13 AM, Daniel Dai 
 da...@hortonworks.com
 wrote:
 
 Hi, Koji,
 Here is the criteria I use:
 (i) How do we determine how many releases to show on the front download
 page?
 We usually keep two most recent releases on the front page according to
 https://cwiki.apache.org/confluence/display/PIG/HowToRelease.
 
 (ii) How do we determine which release is considered 'stable' ?
 Here stable means passing all tests, peer reviewed. It does not mean
 production stable. Actually there is no way for us to know production
 stable after user download it, use it and gives feedback. That's why
 we
 will continue fixing bugs after major release. and make minor releases.
 
 Thanks,
 Daniel
 
 
 
 On Tue, Oct 22, 2013 at 9:45 AM, Koji Noguchi knogu...@yahoo-inc.com
 wrote:
 
 
 When I went to the pig release download page (through
 http://www.apache.org/dyn/closer.cgi/pig), I only saw 0.11.1 and 0.12
 available.
 I later learned that there is an 'archive' link(
 http://archive.apache.org/dist/pig/)  that list other versions (0.8 to
 0.10).
 
 Two questions.
 
 (i) How do we determine how many releases to show on the front download
 page?
 
 (ii) How do we determine which release is considered 'stable' ?
 
 I still consider the stable version to be 0.10.1 so I was surprised not
 to
 see that available on the front download page
 and even more surprised to see release 0.12 flagged as 'stable'.
 
 Koji
 
 
 
 
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the
 reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.
 
 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified
 that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender
 immediately
 and delete it from your system. Thank You.
 
 
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank 

Re: [VOTE] Release Pig 0.12.0 (candidate 2)

2013-10-07 Thread Alan Gates
+1.  Downloaded, ran commit-test, piggybank unit tests, tutorial, and simple 
local mode smoke tests.  Looked over the CHANGES, README, RELEASE_NOTES files 
to make sure they looked reasonable.

Alan.

On Oct 7, 2013, at 12:28 PM, Daniel Dai wrote:

 Hi,
 
 I have created a candidate build for Pig 0.12.0.
 
 Keys used to sign the release are available at
 http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup
 
 Please download, test, and try it out:
 
 http://people.apache.org/~daijy/pig-0.12.0-candidate-2/http://people.apache.org/%7Edaijy/pig-0.12.0-candidate-0/
 
 Should we release this? Vote closes on EOD this Thursday, Oct 10th.
 
 Thanks,
 Daniel
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [Discussion] Any thoughts on PIG-3457?

2013-09-30 Thread Alan Gates
We should separate out two separate concerns.  If I understand correctly we 
don't need any of these changes in 0.12.  So we should revert these patches 
from the 12 branch so that we can get it released quickly in a backwards 
compatible way.  

We will then have plenty of time to discuss the separate question of how we 
proceed going forward (deprecated APIs or new APIs).

Alan.

On Sep 30, 2013, at 11:45 AM, Cheolsoo Park wrote:

 Hi Jeremy,
 
 What you're saying makes sense, and patch is welcome. ;-) But complexity
 comes from that there are many classes that are associated with one
 another, and it seems necessary to bring back all of them together in order
 to provide full backward compatibility.
 
 After spending many hours on the weekend, I concluded that adding more
 workarounds (classes, methods, packages, etc) to the current code makes it
 only less maintainable and readable. So I prefer a simpler approach.
 
 For eg, we can just publish two jars - pig.jar w/ old API and pig-new.jar
 w/ new API - maybe not in 0.12 but in 0.13. Since we already have a
 tez-branch, we can use it to manage the new version of classes. Then, users
 can switch to pig-new.jar gradually in 0.13 and 0.14. When we finally merge
 tez-branch into trunk, we can publish a single jar again.
 
 Of course, this is not trivial either because we have to maintain two
 branches. But I feel that managing two branches independently is easier
 than maintaining all sorts of workarounds for backward compatibility in the
 source code. In addition, we will have more flexibility in terms of
 designing new API because we will be completely free from backward
 compatibility. No?
 
 Thanks,
 Cheolsoo
 
 On Mon, Sep 30, 2013 at 11:12 AM, Jeremy Karn jk...@mortardata.com wrote:
 
 What about the option of leaving all of the MR specific logic in the
 original classes but marking those methods as deprecated and telling people
 to switch to using a MR specific object that extends the original class.
 So for example:
 
 JobStats - Reverted to being as it was before PIG-3419 but with all MR
 specific logic deprecated.
 MRJobStats - Would just extend JobStats.
 
 If we did this, external software could switch their code from using
 JobStats to MRJobStats at their own pace and without breaking against any
 specific version of Pig.  After a few versions the MR specific logic could
 be removed from JobStats and pushed into MRJobStats and it shouldn't break
 anything for people that had made that change.
 
 I'm not familiar with all of the changes in PIG-3419 so this might not work
 everywhere.
 
 
 On Mon, Sep 30, 2013 at 1:43 PM, Cheolsoo Park piaozhe...@gmail.com
 wrote:
 
 To be specific, we will need to revert all the following commits in
 order:
 
 
 commit ad1b87d4ba073680ad0a7fc8c76baeb8b611c982
 Author: Cheolsoo Park cheol...@apache.org
 Date:   Fri Sep 20 22:47:29 2013 +
 
PIG-3471: Add a base abstract class for ExecutionEngine (cheolsoo)
 
git-svn-id:
 
 
 https://svn.apache.org/repos/asf/pig/trunk@152516513f79535-47bb-0310-9956-ffa450edef68
 
 commit 4305a6f4737d07396ae13fd95d7c1da7933b38a1
 Author: Jianyong Dai da...@apache.org
 Date:   Wed Sep 18 19:09:49 2013 +
 
PIG-3457: Provide backward compatibility for PigStatsUtil and
 JobStats
 
git-svn-id:
 
 
 https://svn.apache.org/repos/asf/pig/trunk@152453213f79535-47bb-0310-9956-ffa450edef68
 
 commit e85cf34c92713aa697a1cda7a9c2b3db139350f7
 Author: Cheolsoo Park cheol...@apache.org
 Date:   Wed Sep 18 15:37:58 2013 +
 
PIG-3464: Mark ExecType and ExecutionEngine interfaces as evolving
 (cheolsoo)
 
 commit fd8b7cdf9292b305f02386d560c25298ab492a0b
 Author: Cheolsoo Park cheol...@apache.org
 Date:   Fri Aug 30 20:04:29 2013 +
 
PIG-3419: Pluggable Execution Engine (achalsoni81 via cheolsoo)
 
git-svn-id:
 
 
 https://svn.apache.org/repos/asf/pig/trunk@151906213f79535-47bb-0310-9956-ffa450edef68
 
 
 
 
 On Mon, Sep 30, 2013 at 10:33 AM, Daniel Dai da...@hortonworks.com
 wrote:
 
 Thanks Cheolsoo! My opinion is provide backward compatibility for
 PigStats
 is a must, otherwise it could be a havoc. I imagine PigStats is widely
 used
 by Pig users via PigRunner and PPNL interface. People use PigStats to
 collect MR job details of the Pig job. Though PigStats is marked for
 Evolving, this is mostly for extending PigStats, not limiting PigStats
 as
 PIG-3419 did. Even if we really need to change, we need to very well
 communicate with users over time, Pig 0.12 is not an option.
 
 PIG-3457 is trying to provide a backward compatibility way for
 PigStats,
 but just like Cheolsoo said, it is far from ideal. I now tend to agree
 Rohini's suggestion on PIG-3419, rollback PIG-3419, until we find a
 better
 way. Seems PIG-3419 is a little premature. Besides the above mentioned
 PigStats issue, I've already found 2 additional issues:
 1. explain shows unoptimized logical plan instead of optimized one
 2. HangingJobKiller is removed
 
 How does others think?
 
 

[jira] [Commented] (PIG-3468) PIG-3123 breaks e2e test Jython_Diagnostics_2

2013-09-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776917#comment-13776917
 ] 

Alan Gates commented on PIG-3468:
-

+1

 PIG-3123 breaks e2e test Jython_Diagnostics_2
 -

 Key: PIG-3468
 URL: https://issues.apache.org/jira/browse/PIG-3468
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12.0

 Attachments: PIG-3468-1.patch


 PIG-3123 optimized TypeCastInserter by adding a castInserted flag for LOLoad 
 which do not need a LOForEach just to do the pruning. However, this flag is 
 also used in illustrate to visualize the output from the loader 
 (DisplayExamples:110). That's why Jython_Diagnostics_2 is broken.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Which file translates the program into a map reduce plan

2013-09-19 Thread Alan Gates
Checkout 
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java

Alan.

On Sep 19, 2013, at 3:54 PM, Abdollahian Noghabi, Shadi wrote:

 Hi,
 
 I want to find which file in pig converts the physical plan into the map 
 reduce plan. Actually, I want to get some information out of the map reduce 
 plan, but I cannot find in which file it is located. I would be more than 
 happy if anyone could guide me where is the directory and the file.
 
 Thanks,
 Shadi


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769786#comment-13769786
 ] 

Alan Gates commented on PIG-3255:
-

I gave my +1 above, so we're good from my viewpoint.

 Avoid extra byte array copy in streaming deserialize
 

 Key: PIG-3255
 URL: https://issues.apache.org/jira/browse/PIG-3255
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch, 
 PIG-3255-4.patch, PIG-3255-5.patch


 PigStreaming.java:
  public Tuple deserialize(byte[] bytes) throws IOException {
 Text val = new Text(bytes);  
 return StorageUtil.textToTuple(val, fieldDel);
 }
 Should remove new Text(bytes) copy and construct the tuple directly from the 
 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-12 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765732#comment-13765732
 ] 

Alan Gates commented on PIG-3255:
-

+1

 Avoid extra byte array copy in streaming deserialize
 

 Key: PIG-3255
 URL: https://issues.apache.org/jira/browse/PIG-3255
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch


 PigStreaming.java:
  public Tuple deserialize(byte[] bytes) throws IOException {
 Text val = new Text(bytes);  
 return StorageUtil.textToTuple(val, fieldDel);
 }
 Should remove new Text(bytes) copy and construct the tuple directly from the 
 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3333) Fix remaining Windows core unit test failures

2013-09-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764878#comment-13764878
 ] 

Alan Gates commented on PIG-:
-

+1

 Fix remaining Windows core unit test failures
 -

 Key: PIG-
 URL: https://issues.apache.org/jira/browse/PIG-
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG--1.patch, PIG--2.patch


 I combine a bunch of Windows unit test fixes into one patch to make things 
 cleaner. They all originated from obvious Windows/Unix inconsistencies, which 
 includes:
 1. Path separator inconsistency: / vs \
 2. Path component separator inconsistency: : vs ;
 3. volume: is not acceptable as URI
 4. Unix tools/commands (eg, bash, rm) does not exist in Windows
 5. .sh script need a .cmd companion in Windows
 6. \r\n vs \n as newline
 7. Environment variable use different name (USER vs USERNAME)
 8. File not closed, not an issue in Unix, but an issue in Windows (not able 
 to remove a open file)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13764980#comment-13764980
 ] 

Alan Gates commented on PIG-3255:
-

I don't know if anyone is using StreamToPig either, but marking an interface as 
stable and then changing it without deprecation or anything isn't cool.  So no, 
I don't think this change is ok.

We could add the proposed function public Tuple deserialize(byte[] bytes, int 
offset, int length) throws IOException; to the interface and change Pig to 
call it if it's present or use the old one if not.  

 Avoid extra byte array copy in streaming deserialize
 

 Key: PIG-3255
 URL: https://issues.apache.org/jira/browse/PIG-3255
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch


 PigStreaming.java:
  public Tuple deserialize(byte[] bytes) throws IOException {
 Text val = new Text(bytes);  
 return StorageUtil.textToTuple(val, fieldDel);
 }
 Should remove new Text(bytes) copy and construct the tuple directly from the 
 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3255) Avoid extra byte array copy in streaming deserialize

2013-09-11 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765123#comment-13765123
 ] 

Alan Gates commented on PIG-3255:
-

At compile time, but not at runtime.  At runtime Pig would need to reflect the 
class implementing StreamToPig and see if it contained a deserialize method 
that matches your new signature.  You could then pick which method to call 
based on that.  As Jeremy suggests, you could instead do that with a new 
interface (PigToStreamV2) and then at compile time determine which interface is 
being implemented and act accordingly.  This is actually better than what I 
initially suggested as the determination can be made at compile time.  If you 
choose this route you should also change PIgToStreamV2 to an abstract class so 
that in the future we can add methods without going through this dance.

 Avoid extra byte array copy in streaming deserialize
 

 Key: PIG-3255
 URL: https://issues.apache.org/jira/browse/PIG-3255
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3255-1.patch, PIG-3255-2.patch, PIG-3255-3.patch


 PigStreaming.java:
  public Tuple deserialize(byte[] bytes) throws IOException {
 Text val = new Text(bytes);  
 return StorageUtil.textToTuple(val, fieldDel);
 }
 Should remove new Text(bytes) copy and construct the tuple directly from the 
 bytes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Propose UDF

2013-09-04 Thread Alan Gates
A few questions:

1) Why did you try to use RANK?  I don't see how rank is part of this.
2) The semantics here aren't clear to me.  record_id appears to be crossed with 
name and id but name and id appear to be chosen in order.  If this is join 
semantics I'd have expected two more entries in B, one with (1, Alan, 8) and 
one with (1, Sarai, 7).  If you were just taking each element in order I'd have 
expected the last row to be (null, Sarai, 8) instead.
3) I'm not familiar with the name NLET.  Does that refer to a particular 
function or algorithm?

Alan.

On Aug 31, 2013, at 6:20 PM, Alan del Rio Mendez wrote:

 Hi Dev Team,
 
 I developed a UDF to handle the following situation on pig 10.0 and want to
 see if I could contribute with it to the project.
 
 Let us consider a BAG A with the following data:
 
 A:{record_id:{1),names:{(ALAN),(SARAI)}},ids:{(7),(8)}}
 
 and an expected bag B
 
 B:{{record_id:(1),name:(ALAN),
 id:(7)},{record_id:(1),name:(SARAI), id:(8)}}
 
 Basically I propose a UDF NLET that takes N data bags containing the same
 M elements each of them and creates M tuples with N fields and that is used
 this way:
 
 B = FOREACH A GENERATE record_id, FLATTEN(NLET(names,ids));
 
 I tried to handle the situation described above using JOIN and RANK to
 join the databags, and even though it is not optimal it dind't work, when
 using RANK for the join it generated runtime errors.
 
 B1 = FOREACH A GENERATE record_id, FLATTEN(names);
 B11 = RANK B1;
 B2 = FOREACH A GENERATE FLATTEN(ids);
 B22 = RANK B2;
 C = JOIN B11 BY rank_B1 LEFT OUTER,B22 by rank_B2;   Run time error
 
 I spend some time reading the reference manual information:
http://pig.apache.org/docs/r0.8.1/piglatin_ref2.html
http://pig.apache.org/docs/r0.11.0/basic.html
 and didn't identified a workaround to what I'm describing. I also read the
 UDF manual http://wiki.apache.org/pig/UDFManual to develop the function
 create the NLET UDF.
 
 This far the UDF does generate the expected result/tuples but doesn't add
 the schema information. If nobody has implemented this and it is worth to
 approve, I can spend time on adding the schema information and proper
 documentation.
 
 PS. I'm starting to get involved into the community  and I will try to send
 emails before future development starts to avoid duplicated efforts.
 
 Best regards
 Alan del Rio


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Slow Group By operator

2013-08-22 Thread Alan Gates
When data comes out of a map task, Hadoop serializes it so that it can know its 
exact size as it writes it into the output buffer.  To run it through the 
combiner it needs to deserialize it again, and then re-serialize it when it 
comes out.  So each pass through the combiner costs a serialize/deserialization 
pass, which is expensive and not worth it unless the data reduction is 
significant.  

In other words, the combiner can be slow because Java lacks a sizeof operator.

Alan.

On Aug 22, 2013, at 4:01 AM, Benjamin Jakobus wrote:

 Hi Cheolsoo,
 
 Thanks - I will try this now and get back to you.
 
 Out of interest; could you explain (or point me towards resources that
 would) why the combiner would be a problem?
 
 Also, could the fact that Pig builds an intermediary data structure (?)
 whilst Hive just performs a sort then the arithmetic operation explain the
 slowdown?
 
 (Apologies, I'm quite new to Pig/Hive - just my guesses).
 
 Regards,
 Benjamin
 
 
 On 22 August 2013 01:07, Cheolsoo Park piaozhe...@gmail.com wrote:
 
 Hi Benjamin,
 
 Thank you very much for sharing detailed information!
 
 1) From the runtime numbers that you provided, the mappers are very slow.
 
 CPU time spent (ms)5,081,610168,7405,250,350CPU time spent (ms)5,052,700
 178,2205,230,920CPU time spent (ms)5,084,430193,4805,277,910
 
 2) In your GROUP BY query, you have an algebraic UDF COUNT.
 
 I am wondering whether disabling combiner will help here. I have seen a lot
 of cases where combiner actually hurt performance significantly if it
 doesn't combine mapper outputs significantly. Briefly looking at
 generate_data.pl in PIG-200, it looks like a lot of random keys are
 generated. So I guess you will end up with a large number of small bags
 rather than a small number of large bags. If that's the case, combiner will
 only add overhead to mappers.
 
 Can you try to include this set pig.exec.nocombiner true; and see whether
 it helps?
 
 Thanks,
 Cheolsoo
 
 
 
 
 
 
 On Wed, Aug 21, 2013 at 3:52 AM, Benjamin Jakobus jakobusbe...@gmail.com
 wrote:
 
 Hi Cheolsoo,
 
 What's your query like? Can you share it? Do you call any algebraic UDF
 after group by? I am wondering whether combiner matters in your test.
 I have been running 3 different types of queries.
 
 The first was performed on datasets of 6 different sizes:
 
 
   - Dataset size 1: 30,000 records (772KB)
   - Dataset size 2: 300,000 records (6.4MB)
   - Dataset size 3: 3,000,000 records (63MB)
   - Dataset size 4: 30 million records (628MB)
   - Dataset size 5: 300 million records (6.2GB)
   - Dataset size 6: 3 billion records (62GB)
 
 The datasets scale linearly, whereby the size equates to 3000 * 10n .
 A seventh dataset consisting of 1,000 records (23KB) was produced to
 perform join
 operations on. Its schema is as follows:
 name - string
 marks - integer
 gpa - float
 The data was generated using the generate data.pl perl script available
 for
 download
 from https://issues.apache.org/jira/browse/PIG-200 to produce the
 datasets. The results are as follows:
 
 
 *  * *  * *  * *Set 1  * *Set 2**  * *Set 3**  *
 *Set
 4**  * *Set 5**  * *Set 6*
 *Arithmetic**  * 32.82*  * 36.21*  * 49.49*  * 83.25*
 *
 423.63*  * 3900.78
 *Filter 10%**  * 32.94*  * 34.32*  * 44.56*  * 66.68*
 *
 295.59*  * 2640.52
 *Filter 90%**  * 33.93*  * 32.55*  * 37.86*  * 53.22*
 *
 197.36*  * 1657.37
 *Group**  * *  *49.43*  * 53.34*  * 69.84*  * 105.12*
   *497.61*  * 4394.21
 *Join**  * *  *   49.89*  * 50.08*  * 78.55*  *
 150.39*
   *1045.34* *10258.19
 *Averaged performance of arithmetic, join, group, order, distinct select
 and filter operations on six datasets using Pig. Scripts were configured
 as
 to use 8 reduce and 11 map tasks.*
 
 
 
 *  * *  Set 1**  * *Set 2**  * *Set 3**  *
 *Set
 4**  * *Set 5**  * *Set 6*
 *Arithmetic**  *  32.84*  * 37.33*  * 72.55*  * 300.08
 2633.7227821.19
 *Filter 10%  *   32.36*  * 53.28*  * 59.22*  * 209.5*
 *
 1672.3* *18222.19
 *Filter 90%  *  31.23*  * 32.68*  *  36.8*  *  69.55*
 *
 331.88* *3320.59
 *Group  * *  * 48.27*  * 47.68*  * 46.87*  * 53.66*
 *141.36* *1233.4
 *Join  * *  * *   *48.54*  *56.86*  * 104.6*  *
 517.5*
   * 4388.34*  * -
 *Distinct**  * * *48.73*  *53.28*  * 72.54*  *
 109.77*
   * - *  * *  *  -
 *Averaged performance of arithmetic, join, group, distinct select and
 filter operations on six datasets using Hive. Scripts were configured as
 to
 use 8 reduce and 11 map tasks.*
 
 (If you want to see the standard deviation, let me know).
 
 So, to summarize the results: Pig outperforms Hive, with the exception of
 using *Group By*.
 
 The Pig scripts used for this benchmark are as follows:
 *Arithmetic*
 -- 

Re: JsonLoader fails the pig job in case of malformed json input

2013-08-08 Thread Alan Gates
Definitely, please provide a patch.

Alan.

On Aug 8, 2013, at 4:58 AM, Demeter Sztanko wrote:

 Hi all,
 
 Suppose I have a text file that contains only one line:
 {a, bad}
 
 This is obviously not a valid json.
 
 This input fails the this simple script:
 b = load 'bad.input' using JsonLoader('a0: chararray');
 dump b;
 
 
 Same script works fine for this line:
 {a: good}
 
 I was expecting that it will just skip the line and go further.
 
 I could not find any bug report for this. Is anyone working on that?
 In case if not, would you mind if I submit a patch for it?
 A simple handling of exception seems to solve the problem.
 
 Thanks,
 
 Dimi.



Re: Pig and Storm

2013-07-24 Thread Alan Gates
This sounds exciting.  The next question is how do you plan to do it?  Would a 
physical plan be translated to a Storm job (or jobs)?  Would it need a 
different physical plan?  Or would you just have the connection at the language 
layer and all the planning separate?  Do you envision needing 
extensions/changes to the language to support Storm?  Feel free to add a page 
to Pig's wiki with your thoughts on an approach.

Alan.

On Jul 23, 2013, at 9:52 AM, Pradeep Gollakota wrote:

 Hi Pig Developers,
 
 I wanted to reach out to you all and ask for you opinion on something.
 
 As a Pig user, I have come to love Pig as a framework. Pig provides a great
 set of abstractions that make working with large datasets easy. Currently
 Pig is only backed by hadoop. However, with the new rise of Twitter Storm
 as a distributed real time processing engine, Pig users are missing out on
 a great opportunity to be able to work with Pig in Storm. As a user of Pig,
 Hadoop and Storm, and keeping with the Pig philosophy of Pigs live
 anywhere, I'd like to get your thoughts on starting the implementation of
 a Pig backend for Storm.
 
 Thanks
 Pradeep



[jira] [Updated] (PIG-2248) Pig parser does not detect when a macro name masks a UDF name

2013-07-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2248:


Status: Open  (was: Patch Available)

Canceling patch as discussion is still on-going as to best approach

 Pig parser does not detect when a macro name masks a UDF name
 -

 Key: PIG-2248
 URL: https://issues.apache.org/jira/browse/PIG-2248
 Project: Pig
  Issue Type: Bug
  Components: parser
Affects Versions: 0.9.0
Reporter: Alan Gates
Assignee: Johnny Zhang
Priority: Minor
 Attachments: PIG-2248.patch.txt, PIG-2248.patch.txt, 
 PIG-2248.patch.txt, PIG-2248.patch.txt


 Pig accepts a macro like:
 {code}
 define COUNT(in_relation, min_gpa) returns c {
b = filter $in_relation by gpa = $min_gpa;
$c = foreach b generate age, name;
}
 {code}
 This should produce a warning that it is masking a UDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3389) Set job.name does not work with dump command

2013-07-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718904#comment-13718904
 ] 

Alan Gates commented on PIG-3389:
-

+1

 Set job.name does not work with dump command
 --

 Key: PIG-3389
 URL: https://issues.apache.org/jira/browse/PIG-3389
 Project: Pig
  Issue Type: Bug
  Components: grunt
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
Priority: Minor
 Fix For: 0.12

 Attachments: PIG-3389.patch


 The job.name property can be used to overwrite the default job name in Pig, 
 but the dump command does not honor it.
 To reproduce the issue, run the following commands in Grunt shell in MR mode:
 {code}
 SET job.name 'FOO';
 a = LOAD '/foo';
 DUMP a;
 {code}
 You will see the job name is not 'FOO' in the JT UI. However, using store 
 instead of dump sets the job name correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3247) Piggybank functions to mimic OVER clause in SQL

2013-07-19 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3247:


  Resolution: Fixed
Release Note: Added OVER clause like functionality in Piggybank.
  Status: Resolved  (was: Patch Available)

Patch committed.  Thanks Cheolsoo for the review.

 Piggybank functions to mimic OVER clause in SQL
 ---

 Key: PIG-3247
 URL: https://issues.apache.org/jira/browse/PIG-3247
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: Over.2.patch, Over.patch


 In order to test Hive I have written some UDFs to mimic the behavior of SQL's 
 OVER clause.  I thought they would be useful to share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3372) test

2013-07-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved PIG-3372.
-

Resolution: Invalid

 test
 

 Key: PIG-3372
 URL: https://issues.apache.org/jira/browse/PIG-3372
 Project: Pig
  Issue Type: Test
  Components: impl
Reporter: Manuel
Priority: Trivial

 test

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Fwd: DesignLounge @ HadoopSummit

2013-06-24 Thread Alan Gates


Begin forwarded message:

 From: Eric Baldeschwieler eri...@hortonworks.com
 Date: June 23, 2013 9:32:12 PM PDT
 To: common-...@hadoop.apache.org common-...@hadoop.apache.org, 
 mapreduce-...@hadoop.apache.org mapreduce-...@hadoop.apache.org, 
 hdfs-...@hadoop.apache.org hdfs-...@hadoop.apache.org
 Subject: DesignLounge @ HadoopSummit
 Reply-To: common-...@hadoop.apache.org
 
 Hi Folks,
 
 I've integrated the feedback I've gotten on the design lounge.  A couple of 
 clarifications:
 
 1) The space will be open both days of the formal summit.  Apache Committers 
 / contributors are invited to stop by any time and use the space to meet / 
 network any time during the show.
 
 2) Below I've listed the times that various project members have suggested 
 they will be present to talk with others contributors about their project.  
 If we get a big showing for any of these slots we'll encourage folks to do 
 the unconference thing: Select a set of topics they want to talk about and 
 break up into groups to do so.
 
 3) This is an experiment.  Our goal is to make the summit as useful as 
 possible to the folks who build the Apache projects in the Apache Hadoop 
 stack.  Please let me know how it works for you and ideas for making this 
 even more effective.
 
 Committed times so far, with topic champion (Note - I've adjusted suggested 
 times to fit with the program a bit more smoothly):
 
 Wednesday
 11-1 - Hive - Ashutosh - The stinger initiative and other Hive activities
 2 - 4 - Security breakout - Kevin Minder - HSSO, Knox, Rhino
 3 - 4 - Frameworks to run services like HBase on Yarn - Weave, Hoya … - 
 Devaraj Das
 4 - 5 - Accumulo - Billie Rinaldi
 
 
 Thursday
 11-1 - Finishing Yarn - Arun Murthy - Near term improvements needed
 2 - 4 - HDFS - Suresh  Sanjay
 4 - 5 - Getting involved in Apache - Billie Rinaldi
 
 
 See you all soon!
 
 E14
 
 PS Please forward to other Apache -dev lists and CC me.  Thanks!
 
 On Jun 11, 2013, at 10:42 AM, Eric Baldeschwieler eri...@hortonworks.com 
 wrote:
 
 Hi Folks,
 
 We thought we'd try something new at Hadoop Summit this year to build upon 
 two pieces of feedback I've heard a lot this year:
 
  • Apache project developers would like to take advantage of the Hadoop 
 summit to meet with their peers to on work on specific technical details of 
 their projects
  • That they want to do this during the summit, not before it starts or 
 at night. I've been told BoFs and other such traditional formats have not 
 historically worked for them, because they end up being about educating 
 users about their projects, not actually working with their peers on how to 
 make their projects better.
 So we are creating a space in the summit - marked in the event guide as 
 DesignLounge - concurrent with the presentation tracks where Apache Project 
 contributors can meet with their peers to plan the future of their project 
 or work through various technical issues near and dear to their hearts.
 
 We're going to provide white boards and message boards and let folks take it 
 from there in an unconference style.  We think there will be room for about 
 4 groups to meet at once.  Interested? Let me know what you think.  Send me 
 any ideas for how we can make this work best for you.
 
 The room will be 231A and B at the Hadoop Summit and will run from 10:30am 
 to 5:00pm on Day 1 (26th June), and we can also run from 10:30am to 5:00pm 
 on Day 2 (27th June) if we have a lot of topics that folk want to cover.
 
 Some of the early topics some folks told me they hope can be covered:
 
  • Hadoop Core security proposals.  There are a couple of detailed 
 proposals circulating.  Let's get together and hash out the differences.
  • Accumulo 1.6 features
  • The Hive vectorization project.  Discussion of the design and how to 
 phase it in incrementally with minimum complexity.
  • Finishing Yarn - what things need to get done NOW to make Yarn more 
 effective
 If you are a project lead for one of the Apache projects, look at the 
 schedule below and suggest a few slots when you think it would be best for 
 your project to meet.  I'll try to work out a schedule where no more than 2 
 projects are using the lounge at once.  
 
 Day 1, 26th June: 10:30am - 12:30pm, 1:45pm - 3:30pm, 3:45pm - 5:00pm
 
 Day 2, 27th June: 10:30am - 12:30pm, 1:45pm - 3:30pm, 3:45pm - 5:00pm
 
 It will be up to you, the hadoop contributors, from there.
 
 Look forward to seeing you all at the summit,
 
 E14
 
 PS Please forward to the other -dev lists.  This event is for folks on the 
 -dev lists.
 
 



Fwd: DesignLounge @ HadoopSummit

2013-06-13 Thread Alan Gates


Begin forwarded message:

 From: Eric Baldeschwieler eri...@hortonworks.com
 Date: June 11, 2013 10:46:25 AM PDT
 To: common-...@hadoop.apache.org common-...@hadoop.apache.org
 Subject: DesignLounge @ HadoopSummit
 Reply-To: common-...@hadoop.apache.org
 
 Hi Folks,
 
 We thought we'd try something new at Hadoop Summit this year to build upon 
 two pieces of feedback I've heard a lot this year:
 
 Apache project developers would like to take advantage of the Hadoop summit 
 to meet with their peers to on work on specific technical details of their 
 projects
 That they want to do this during the summit, not before it starts or at 
 night. I've been told BoFs and other such traditional formats have not 
 historically worked for them, because they end up being about educating users 
 about their projects, not actually working with their peers on how to make 
 their projects better.
 So we are creating a space in the summit - marked in the event guide as 
 DesignLounge - concurrent with the presentation tracks where Apache Project 
 contributors can meet with their peers to plan the future of their project or 
 work through various technical issues near and dear to their hearts.
 
 We're going to provide white boards and message boards and let folks take it 
 from there in an unconference style.  We think there will be room for about 4 
 groups to meet at once.  Interested? Let me know what you think.  Send me any 
 ideas for how we can make this work best for you.
 
 The room will be 231A and B at the Hadoop Summit and will run from 10:30am to 
 5:00pm on Day 1 (26th June), and we can also run from 10:30am to 5:00pm on 
 Day 2 (27th June) if we have a lot of topics that folk want to cover.
 
 Some of the early topics some folks told me they hope can be covered:
 
 Hadoop Core security proposals.  There are a couple of detailed proposals 
 circulating.  Let's get together and hash out the differences.
 Accumulo 1.6 features
 The Hive vectorization project.  Discussion of the design and how to phase it 
 in incrementally with minimum complexity.
 Finishing Yarn - what things need to get done NOW to make Yarn more effective
 If you are a project lead for one of the Apache projects, look at the 
 schedule below and suggest a few slots when you think it would be best for 
 your project to meet.  I'll try to work out a schedule where no more than 2 
 projects are using the lounge at once.  
 
 Day 1, 26th June: 10:30am - 12:30pm, 1:45pm - 3:30pm, 3:45pm - 5:00pm
 
 Day 2, 27th June: 10:30am - 12:30pm, 1:45pm - 3:30pm, 3:45pm - 5:00pm
 
 It will be up to you, the hadoop contributors, from there.
 
 Look forward to seeing you all at the summit,
 
 E14
 
 PS Please forward to the other -dev lists.  This event is for folks on the 
 -dev lists.
 



Re: Uploading patches for review

2013-06-06 Thread Alan Gates
I think it's fine for a reviewer to ask for a particular patch to be put in 
review board.  I think it would also be fine to put in our HowToContribute doc 
that for larger patches putting it in review board may help get it reviewed 
more quickly.  I'm not in favor of requiring it, as some reviewers don't use 
review board.

Alan.

On Jun 6, 2013, at 2:21 AM, Rohini Palaniswamy wrote:

 Hi,
Reviewing uploaded patches for few lines of change is easy. But when
 the change is more it is hard to read, review is more time consuming and at
 times you have to switch between the patch and eclipse to get more context.
 Without the surrounding code it is also easy to miss things on review. Can
 we make it a practice and decide on putting up the patches in review board
 for review if it is slightly bigger? Commenting on the patch is also a
 breeze in the review board.
 
 Thoughts ???
 
 Regards,
 Rohini



[jira] [Updated] (PIG-2956) Invalid cache specification for some streaming statement

2013-05-29 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2956:


Status: Patch Available  (was: Open)

 Invalid cache specification for some streaming statement
 

 Key: PIG-2956
 URL: https://issues.apache.org/jira/browse/PIG-2956
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-2956-1_0.10.patch, PIG-2956-1.patch, PIG-2956-2.patch


 Another category of failure in e2e tests, such as ComputeSpec_1, 
 ComputeSpec_2, ComputeSpec_3, RaceConditions_1, RaceConditions_3, 
 RaceConditions_4, RaceConditions_7, RaceConditions_8.
 Here is stack:
 ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files 
 (x86)/GnuWin32/bin/head.exe
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
  ERROR 2017: Internal error creating job configuration.
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:723)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1318)
 at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1303)
 at org.apache.pig.PigServer.execute(PigServer.java:1293)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:364)
 at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:133)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
 at org.apache.pig.Main.run(Main.java:561)
 at org.apache.pig.Main.main(Main.java:111)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6003: 
 Invalid cache specification. File doesn't exist: C:/Program Files 
 (x86)/GnuWin32/bin/head.exe
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1151)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1129)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:447)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2956) Invalid cache specification for some streaming statement

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669566#comment-13669566
 ] 

Alan Gates commented on PIG-2956:
-

+1

 Invalid cache specification for some streaming statement
 

 Key: PIG-2956
 URL: https://issues.apache.org/jira/browse/PIG-2956
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-2956-1_0.10.patch, PIG-2956-1.patch, PIG-2956-2.patch


 Another category of failure in e2e tests, such as ComputeSpec_1, 
 ComputeSpec_2, ComputeSpec_3, RaceConditions_1, RaceConditions_3, 
 RaceConditions_4, RaceConditions_7, RaceConditions_8.
 Here is stack:
 ERROR 6003: Invalid cache specification. File doesn't exist: C:/Program Files 
 (x86)/GnuWin32/bin/head.exe
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
  ERROR 2017: Internal error creating job configuration.
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:723)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:258)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:151)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1318)
 at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1303)
 at org.apache.pig.PigServer.execute(PigServer.java:1293)
 at org.apache.pig.PigServer.executeBatch(PigServer.java:364)
 at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:133)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
 at org.apache.pig.Main.run(Main.java:561)
 at org.apache.pig.Main.main(Main.java:111)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6003: 
 Invalid cache specification. File doesn't exist: C:/Program Files 
 (x86)/GnuWin32/bin/head.exe
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1151)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.setupDistributedCache(JobControlCompiler.java:1129)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:447)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3257) Add unique identifier UDF

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669593#comment-13669593
 ] 

Alan Gates commented on PIG-3257:
-

Would it make you happy if we added to the javadoc comments on this function 
not to use it as a key in the same job it's generated in?

 Add unique identifier UDF
 -

 Key: PIG-3257
 URL: https://issues.apache.org/jira/browse/PIG-3257
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3257.patch


 It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3333) Fix remaining Windows core unit test failures

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669771#comment-13669771
 ] 

Alan Gates commented on PIG-:
-

StreamingCommand.addPathToCache - This appears to always convert the path from 
/ to \.  Don't we only want to do this in the Windows case?  Alternatively we 
could always convert / and \ to System.getProperties(file.separator).

JavaCompilerHelp.addClassToPath - Rather than if on windows/unix why not just 
change it to 
{code}
this.classPath = this.classPath+ System.getProperties(path.separator) +path;
{code}

It looks like a bunch of \r's slipped into TestSample.java



 Fix remaining Windows core unit test failures
 -

 Key: PIG-
 URL: https://issues.apache.org/jira/browse/PIG-
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG--1.patch


 I combine a bunch of Windows unit test fixes into one patch to make things 
 cleaner. They all originated from obvious Windows/Unix inconsistencies, which 
 includes:
 1. Path separator inconsistency: / vs \
 2. Path component separator inconsistency: : vs ;
 3. volume: is not acceptable as URI
 4. Unix tools/commands (eg, bash, rm) does not exist in Windows
 5. .sh script need a .cmd companion in Windows
 6. \r\n vs \n as newline
 7. Environment variable use different name (USER vs USERNAME)
 8. File not closed, not an issue in Unix, but an issue in Windows (not able 
 to remove a open file)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3334) Fix Windows piggybank unit test failures

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669774#comment-13669774
 ] 

Alan Gates commented on PIG-3334:
-

+1

 Fix Windows piggybank unit test failures
 

 Key: PIG-3334
 URL: https://issues.apache.org/jira/browse/PIG-3334
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-3334-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3337) Fix remaining Window e2e tests

2013-05-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13669776#comment-13669776
 ] 

Alan Gates commented on PIG-3337:
-

+1

 Fix remaining Window e2e tests
 --

 Key: PIG-3337
 URL: https://issues.apache.org/jira/browse/PIG-3337
 Project: Pig
  Issue Type: Sub-task
  Components: e2e harness
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-3337-1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3257) Add unique identifier UDF

2013-05-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668691#comment-13668691
 ] 

Alan Gates commented on PIG-3257:
-

No it would not, but it would be very weird to use this as a key anyway, since 
it would produce a different random key for each record.  I can't see how it 
would matter whether it produced random key X1 vs random key X2 for any given 
record.

 Add unique identifier UDF
 -

 Key: PIG-3257
 URL: https://issues.apache.org/jira/browse/PIG-3257
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3257.patch


 It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (PIG-3257) Add unique identifier UDF

2013-05-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668748#comment-13668748
 ] 

Alan Gates edited comment on PIG-3257 at 5/28/13 10:32 PM:
---

I don't see how records can be missing or redundant.  Take the following query:

{code}
A = load ...
B = group A by UUID();
C = foreach B...
{code}

This won't reduce at all.  For every record it is totally irrelevant what 
particular value its key is, because it's guaranteed to be unique for each 
record.  So 1) this is a totally meaningless thing to do; 2) if a particular 
map does get rerun or is used in speculative execution it doesn't matter 
because which particular key is generated by UUID is irrelevant.  The way this 
intended to be used is something like this:

{code}
A = load 'over100k' using org.apache.hcatalog.pig.HCatLoader();
B = foreach A generate *, UUID();
C = group B by s;
D = foreach C generate flatten(B), SUM(B.i) as sum_b;
E = group B by si;
F = foreach E generate flatten(B), SUM(B.f) as sum_f;
G = join D by uuid, F by uuid;
H = foreach G generate D::B::s, sum_b, sum_f;
store H into 'output';
{code}


  was (Author: alangates):
I don't see how records can be missing or redundant.  Take the following 
query:

{code}
A = load ...
B = group A by UUID();
C = foreach B...
{code]

This won't reduce at all.  For every record it is totally irrelevant what 
particular value its key is, because it's guaranteed to be unique for each 
record.  So 1) this is a totally meaningless thing to do; 2) if a particular 
map does get rerun or is used in speculative execution it doesn't matter 
because which particular key is generated by UUID is irrelevant.  The way this 
intended to be used is something like this:

{code}
A = load 'over100k' using org.apache.hcatalog.pig.HCatLoader();
B = foreach A generate *, UUID();
C = group B by s;
D = foreach C generate flatten(B), SUM(B.i) as sum_b;
E = group B by si;
F = foreach E generate flatten(B), SUM(B.f) as sum_f;
G = join D by uuid, F by uuid;
H = foreach G generate D::B::s, sum_b, sum_f;
store H into 'output';
{code}

  
 Add unique identifier UDF
 -

 Key: PIG-3257
 URL: https://issues.apache.org/jira/browse/PIG-3257
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3257.patch


 It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3257) Add unique identifier UDF

2013-05-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668748#comment-13668748
 ] 

Alan Gates commented on PIG-3257:
-

I don't see how records can be missing or redundant.  Take the following query:

{code}
A = load ...
B = group A by UUID();
C = foreach B...
{code]

This won't reduce at all.  For every record it is totally irrelevant what 
particular value its key is, because it's guaranteed to be unique for each 
record.  So 1) this is a totally meaningless thing to do; 2) if a particular 
map does get rerun or is used in speculative execution it doesn't matter 
because which particular key is generated by UUID is irrelevant.  The way this 
intended to be used is something like this:

{code}
A = load 'over100k' using org.apache.hcatalog.pig.HCatLoader();
B = foreach A generate *, UUID();
C = group B by s;
D = foreach C generate flatten(B), SUM(B.i) as sum_b;
E = group B by si;
F = foreach E generate flatten(B), SUM(B.f) as sum_f;
G = join D by uuid, F by uuid;
H = foreach G generate D::B::s, sum_b, sum_f;
store H into 'output';
{code}


 Add unique identifier UDF
 -

 Key: PIG-3257
 URL: https://issues.apache.org/jira/browse/PIG-3257
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3257.patch


 It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: CHANGES.txt in trunk

2013-05-06 Thread Alan Gates
Cool, just wanted to make sure.  I agree this is a good idea.

Alan.

On May 5, 2013, at 7:06 PM, Rohini Palaniswamy wrote:

 Alan,
  I meant relocating only - Moving jiras from 0.12 to 0.11.x releases
 section :).
 
 Regards,
 Rohini
 
 
 On Fri, May 3, 2013 at 3:08 PM, Alan Gates ga...@hortonworks.com wrote:
 
 What do mean by remove?  They should still be in the file.  They may need
 to be relocated under the 0.11 section.  But the trunk CHANGES file should
 include all changes that are on trunk.
 
 Alan.
 
 On May 3, 2013, at 1:34 PM, Rohini Palaniswamy wrote:
 
 Hi,
  I see lot of patches that went into 0.11 are under trunk in the
 CHANGES.txt. Should we sync the file with the CHANGES.txt in branch-0.11
 and remove those jiras from trunk that went into 0.11? What is the usual
 process of updating CHANGES.txt when a jira is checked both into a branch
 and also trunk?
 
 Regards,
 Rohini
 
 



Re: CHANGES.txt in trunk

2013-05-03 Thread Alan Gates
What do mean by remove?  They should still be in the file.  They may need to be 
relocated under the 0.11 section.  But the trunk CHANGES file should include 
all changes that are on trunk.

Alan.

On May 3, 2013, at 1:34 PM, Rohini Palaniswamy wrote:

 Hi,
   I see lot of patches that went into 0.11 are under trunk in the
 CHANGES.txt. Should we sync the file with the CHANGES.txt in branch-0.11
 and remove those jiras from trunk that went into 0.11? What is the usual
 process of updating CHANGES.txt when a jira is checked both into a branch
 and also trunk?
 
 Regards,
 Rohini



Re: A major addition to Pig. Working with spatial data

2013-05-02 Thread Alan Gates
I know this is frustrating, but the different licenses do have different 
requirements that make it so that Apache can't ship GPL code.  A legal 
explanation is at http://www.apache.org/licenses/GPL-compatibility.html  For 
additional info on the LGPL specific questions see 
http://www.apache.org/legal/3party.html

As far as pulling it in via ivy, the issue isn't so much where the code lives 
as much as what code we are requiring to make Pig work.  If something that is 
[L]GPL is required for Pig it violates Apache rules as outlined above.  It also 
would be a show stopper for a lot of companies that redistribute Pig and that 
are allergic to GPL software.

So, as I said before, if you wanted to continue with that library and they are 
not willing to relicense it then it would have to be bolted on after Apache Pig 
is built.  Nothing stops you from doing this by downloading Apache Pig, adding 
this library and your code, and redistributing, though it wouldn't then be open 
to all Pig users.

Alan.

On May 1, 2013, at 6:08 PM, Ahmed Eldawy wrote:

 Thanks for your response. I was never good at differentiating all those
 open source licenses. I mean what is the point making open source licenses
 if it blocks me from using a library in an open source project. Any way,
 I'm not going into debate here. Just one question, if we use JTS as a
 library (jar file) without adding the code in Pig, is it still a violation?
 We'll use ivy, for example, to download the jar file when compiling.
 On May 1, 2013 7:50 PM, Alan Gates ga...@hortonworks.com wrote:
 
 Passing on the technical details for a moment, I see a licensing issue.
 JTS is licensed under LGPL.  Apache projects cannot contain or ship
 [L]GPL.  Apache does not meet the requirements of GPL and thus we cannot
 repackage their code. If you wanted to go forward using that class this
 would have to be packaged as an add on that was downloaded separately and
 not from Apache.  Another option is to work with the JTS community and see
 if they are willing to dual license their code under BSD or Apache license
 so that Pig could include it.  If neither of those are an option you would
 need to come up with a new class to contain your spatial data.
 
 Alan.
 
 On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote:
 
 Hi all,
 First, sorry for the long email. I wanted to put all my thoughts here
 and
 get your feedback.
 I'm proposing a major addition to Pig that will greatly increase its
 functionality and user base. It is simply to add spatial support to the
 language and the framework. I've already started working on that but I
 don't want it to be just another branch. I want it, eventually, to be
 merged with the trunk of Apache Pig. So, I'm sending this email mainly to
 reach out the main contributors of Pig to see the feasibility of this.
 This addition is a part of a big project we have been working on in
 University of Minnesota; the project is called Spatial Hadoop.
 http://spatialhadoop.cs.umn.edu. It's about building a MapReduce
 framework
 (Hadoop) that is capable of maintaining and analyzing spatial data
 efficiently. I'm the main guy behind that project and since we released
 its
 first version, we received very encouraging responses from different
 groups
 in the research and industrial community. I'm sure the addition we want
 to
 make to Pig Latin will be widely accepted by the people in the spatial
 community.
 I'm proposing a plan here while we're still in the early phases of this
 task to be able to discuss it with the main contributors and see its
 feasibility. First of all, I think that we need to change the core of Pig
 to be able to support spatial data. Providing a set of UDFs only is not
 enough. The main reason is that Pig Latin does not provide a way to
 create
 a new data type which is needed for spatial data. Once we have the
 spatial
 data types we need, the functionality can be expanded using more UDFs.
 
 Here's the plan as I see it.
 1- Introduce a new primitive data type Geometry which represents all
 spatial data types. In the underlying system, this will map to
 com.vividsolutions.jts.geom.Geometry. This is a class from Java Topology
 Suite (JTS) [http://www.vividsolutions.com/jts/JTSHome.htm], a stable
 and
 efficient open source Java library for spatial data types and algorithms.
 It is very popular in the spatial community and a C++ port of it is used
 in
 PostGIS [http://postgis.net/] (a spatial library for Postgres). JTS also
 conforms with Open Geospatial Consortium (OGC) [
 http://www.opengeospatial.org/] which is an open standard for the
 spatial
 data types. The Geometry data type is read from and written to text files
 using the Well Known Text (WKT) format. There is also a way to convert it
 to/from binary so that it can work with binary files and streams.
 2- Add functions that manipulate spatial data types. These will be added
 as
 UDFs and we will not need to mess with the internals of Pig. Most
 probably

Re: A major addition to Pig. Working with spatial data

2013-05-01 Thread Alan Gates
Passing on the technical details for a moment, I see a licensing issue.  JTS is 
licensed under LGPL.  Apache projects cannot contain or ship [L]GPL.  Apache 
does not meet the requirements of GPL and thus we cannot repackage their code. 
If you wanted to go forward using that class this would have to be packaged as 
an add on that was downloaded separately and not from Apache.  Another option 
is to work with the JTS community and see if they are willing to dual license 
their code under BSD or Apache license so that Pig could include it.  If 
neither of those are an option you would need to come up with a new class to 
contain your spatial data.

Alan.

On May 1, 2013, at 5:40 PM, Ahmed Eldawy wrote:

 Hi all,
  First, sorry for the long email. I wanted to put all my thoughts here and
 get your feedback.
  I'm proposing a major addition to Pig that will greatly increase its
 functionality and user base. It is simply to add spatial support to the
 language and the framework. I've already started working on that but I
 don't want it to be just another branch. I want it, eventually, to be
 merged with the trunk of Apache Pig. So, I'm sending this email mainly to
 reach out the main contributors of Pig to see the feasibility of this.
 This addition is a part of a big project we have been working on in
 University of Minnesota; the project is called Spatial Hadoop.
 http://spatialhadoop.cs.umn.edu. It's about building a MapReduce framework
 (Hadoop) that is capable of maintaining and analyzing spatial data
 efficiently. I'm the main guy behind that project and since we released its
 first version, we received very encouraging responses from different groups
 in the research and industrial community. I'm sure the addition we want to
 make to Pig Latin will be widely accepted by the people in the spatial
 community.
 I'm proposing a plan here while we're still in the early phases of this
 task to be able to discuss it with the main contributors and see its
 feasibility. First of all, I think that we need to change the core of Pig
 to be able to support spatial data. Providing a set of UDFs only is not
 enough. The main reason is that Pig Latin does not provide a way to create
 a new data type which is needed for spatial data. Once we have the spatial
 data types we need, the functionality can be expanded using more UDFs.
 
 Here's the plan as I see it.
 1- Introduce a new primitive data type Geometry which represents all
 spatial data types. In the underlying system, this will map to
 com.vividsolutions.jts.geom.Geometry. This is a class from Java Topology
 Suite (JTS) [http://www.vividsolutions.com/jts/JTSHome.htm], a stable and
 efficient open source Java library for spatial data types and algorithms.
 It is very popular in the spatial community and a C++ port of it is used in
 PostGIS [http://postgis.net/] (a spatial library for Postgres). JTS also
 conforms with Open Geospatial Consortium (OGC) [
 http://www.opengeospatial.org/] which is an open standard for the spatial
 data types. The Geometry data type is read from and written to text files
 using the Well Known Text (WKT) format. There is also a way to convert it
 to/from binary so that it can work with binary files and streams.
 2- Add functions that manipulate spatial data types. These will be added as
 UDFs and we will not need to mess with the internals of Pig. Most probably,
 there will be one new class for each operation (e.g., union or
 intersection). I think it will be good to put these new operations inside
 the core of Pig so that users can use it without having to write the fully
 qualified class name. Also, since there is no way to implicitly cast a
 spatial data type to a non-spatial data types, there will not be any
 conflicts in existing operations or new operations. All new operations, and
 only the new operations, will be working on spatial data types. Here is an
 initial list of operations that can be added. All those operations are
 already implemented in JTS and the UDFs added to Pig will be just wrappers
 around them.
 **Predicates (used for spatial filtering)
 Equals
 Disjoint
 Intersects
 Touches
 Crosses
 Within
 Contains
 Overlaps
 
 **Operations
 Envelope
 Area
 Length
 Buffer
 ConvexHull
 Intersection
 Union
 Difference
 SymDifference
 
 **Aggregate functions
 Accum
 ConvexHull
 Union
 
 3- The third step is to implement spatial indexes (e.g., Grid or R-tree). A
 Pig loader and Pig output classes will be created for those indexes. Note
 that currently we have SpatialOutputFormat and SpatialInputFormat for those
 indexes inside the Spatial Hadoop project, but we need to tweak them to
 work with Pig.
 
 4- (Advanced) Implement more sophisticated algorithms for spatial
 operations that utilize the indexes. For example, we can have a specific
 algorithm for spatial range query or spatial join. Again, we already have
 algorithms built for different operations implemented in Spatial Hadoop as
 MapReduce programs, but they will need 

[jira] [Updated] (PIG-3010) Allow UDF's to flatten themselves

2013-04-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3010:


Status: Open  (was: Patch Available)

Patch no longer applies.  This causes review board to not show the diffs 
either.  Sorry for waiting so long on this.

 Allow UDF's to flatten themselves
 -

 Key: PIG-3010
 URL: https://issues.apache.org/jira/browse/PIG-3010
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3010-0.patch, PIG-3010-1.patch, 
 PIG-3010-2_nowhitespace.patch, PIG-3010-2.patch, PIG-3010-3_nows.patch, 
 PIG-3010-3.patch, PIG-3010-4_nows.patch, PIG-3010-4.patch, 
 PIG-3010-5_nows.patch, PIG-3010-5.patch


 This is something I thought would be cool for a while, so I sat down and did 
 it because I think there are some useful debugging tools it'd help with.
 The idea is that if you attach an annotation to a UDF, the Tuple or DataBag 
 you output will be flattened. This is quite powerful. A very common pattern 
 is:
 a = foreach data generate Flatten(MyUdf(thing)) as (a,b,c);
 This would let you just do:
 a = foreach data generate MyUdf(thing);
 With the exact same result!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (PIG-3164) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.

2013-04-25 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reopened PIG-3164:
-


Backed these changes out; I should never have checked them in.  I missed that 
this was only in test and not in main, so I ended up compiling the wrong thing 
to make sure this worked.

UDFs should not be added under piggybank/java/src/test.  That's for unit tests 
for the UDF.  The UDFs should be under piggybank/java/src/main.  

Thanks Niels for catching my mistake.

 Pig current releases lack a UDF endsWith.This UDF tests if a given string 
 ends with the specified suffix.
 -

 Key: PIG-3164
 URL: https://issues.apache.org/jira/browse/PIG-3164
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Anuroopa George
Assignee: Anuroopa George
 Fix For: 0.12

 Attachments: ENDSWITH.java.patch, ENDSWITH_updated.java


 Pig current releases lack a UDF endsWith.This UDF tests if a given string  
 ends with the specified suffix.This UDF returns true if the character 
 sequence represented by the string argument given as a suffix is a suffix of 
 the character sequence represented by the given string; false otherwise.Also 
 true will be returned if the given suffix is an empty string or is equal to 
 the given String.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3027) pigTest unit test needs a newline filter for comparisons of golden multi-line

2013-04-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3027:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks John.

 pigTest unit test needs a newline filter for comparisons of golden multi-line
 -

 Key: PIG-3027
 URL: https://issues.apache.org/jira/browse/PIG-3027
 Project: Pig
  Issue Type: Sub-task
  Components: build
Affects Versions: 0.10.0
Reporter: John Gordon
Assignee: John Gordon
 Fix For: 0.12

 Attachments: PIG-3027.trunk.1.patch


 pigTest leverages assertOutput throughout for text file comparisons to golden 
 checked-in baselines.  This method doesn't take into account line ending 
 differences across platforms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3198) Let users use any function from PigType - PigType as if it were builtlin

2013-04-18 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635744#comment-13635744
 ] 

Alan Gates commented on PIG-3198:
-

I looked through this.  Other than spare tabs (rather than spaces) in some of 
the files it looks good.  +1.  I think this is exciting functionality.  I'm 
glad to see it added.

 Let users use any function from PigType - PigType as if it were builtlin
 -

 Key: PIG-3198
 URL: https://issues.apache.org/jira/browse/PIG-3198
 Project: Pig
  Issue Type: Bug
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3198-0.patch


 This idea is an extension of PIG-2643. Ideally, someone should be able to 
 call any function currently registered in Pig as if it were builtin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3173) Partition filter push down does not happen partition keys condition include a AND and OR construct

2013-04-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3173:


Status: Open  (was: Patch Available)

Canceling patch until feedback from Dmitriy is addressed.

 Partition filter push down does not happen partition keys condition include a 
 AND and OR construct
 --

 Key: PIG-3173
 URL: https://issues.apache.org/jira/browse/PIG-3173
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.1
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.12

 Attachments: PIG-3173-1.patch


 A = load 'db.table' using org.apache.hcatalog.pig.HCatLoader();
 B = filter A by (region=='usa' AND dt=='201302051800') OR (region=='uk' AND 
 dt=='201302051800');
 C = foreach B generate name, age;
 DUMP C;
 gives the below warning and scans the whole table.
 2013-02-06 22:22:16,233 [main] WARN  
 org.apache.pig.newplan.PColFilterExtractor  - No partition filter push down: 
 You have an partition column (region ) in a construction like: (pcond  and 
 ...) or (pcond and ...) where pcond is a condition on a partition column.
 2013-02-06 22:22:16,233 [main] WARN  
 org.apache.pig.newplan.PColFilterExtractor  - No partition filter push down: 
 You have an partition column (datestamp ) in a construction like: (pcond  and 
 ...) or (pcond and ...) where pcond is a condition on a partition column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-3164) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.

2013-04-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-3164:
---

Assignee: Anuroopa George

 Pig current releases lack a UDF endsWith.This UDF tests if a given string 
 ends with the specified suffix.
 -

 Key: PIG-3164
 URL: https://issues.apache.org/jira/browse/PIG-3164
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Anuroopa George
Assignee: Anuroopa George
 Fix For: 0.12

 Attachments: ENDSWITH.java.patch, ENDSWITH_updated.java


 Pig current releases lack a UDF endsWith.This UDF tests if a given string  
 ends with the specified suffix.This UDF returns true if the character 
 sequence represented by the string argument given as a suffix is a suffix of 
 the character sequence represented by the given string; false otherwise.Also 
 true will be returned if the given suffix is an empty string or is equal to 
 the given String.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3164) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.

2013-04-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3164:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Anuroopa.

 Pig current releases lack a UDF endsWith.This UDF tests if a given string 
 ends with the specified suffix.
 -

 Key: PIG-3164
 URL: https://issues.apache.org/jira/browse/PIG-3164
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Anuroopa George
Assignee: Anuroopa George
 Fix For: 0.12

 Attachments: ENDSWITH.java.patch, ENDSWITH_updated.java


 Pig current releases lack a UDF endsWith.This UDF tests if a given string  
 ends with the specified suffix.This UDF returns true if the character 
 sequence represented by the string argument given as a suffix is a suffix of 
 the character sequence represented by the given string; false otherwise.Also 
 true will be returned if the given suffix is an empty string or is equal to 
 the given String.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3114) Duplicated macro name error when using pigunit

2013-04-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3114:


Status: Open  (was: Patch Available)

Canceling patch pending agreement on how to address the issue.

 Duplicated macro name error when using pigunit
 --

 Key: PIG-3114
 URL: https://issues.apache.org/jira/browse/PIG-3114
 Project: Pig
  Issue Type: Bug
  Components: parser
Affects Versions: 0.11
Reporter: Chetan Nadgire
Assignee: Chetan Nadgire
 Fix For: 0.12

 Attachments: PIG-3114.patch, PIG-3114.patch


 I'm using PigUnit to test a pig script within which a macro is defined.
 Pig runs fine on cluster but getting parsing error with pigunit.
 So I tried very basic pig script with macro and getting similar error.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
 parsing. line 9 null. Reason: Duplicated macro name 'my_macro_1'
   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546)
   at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
   at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988)
   at 
 org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
   at 
 org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:56)
   at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160)
   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:231)
   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:261)
   at FirstPigTest.MyPigTest.testTop2Queries(MyPigTest.java:32)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at junit.framework.TestCase.runTest(TestCase.java:176)
   at junit.framework.TestCase.runBare(TestCase.java:141)
   at junit.framework.TestResult$1.protect(TestResult.java:122)
   at junit.framework.TestResult.runProtected(TestResult.java:142)
   at junit.framework.TestResult.run(TestResult.java:125)
   at junit.framework.TestCase.run(TestCase.java:129)
   at junit.framework.TestSuite.runTest(TestSuite.java:255)
   at junit.framework.TestSuite.run(TestSuite.java:250)
   at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
   at 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
   at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
   at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Caused by: Failed to parse: line 9 null. Reason: Duplicated macro name 
 'my_macro_1'
   at 
 org.apache.pig.parser.QueryParserDriver.makeMacroDef(QueryParserDriver.java:406)
   at 
 org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:277)
   at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178)
   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599)
   ... 30 more
  
 Pig script which is failing :
 {code:title=test.pig|borderStyle=solid}
 DEFINE my_macro_1 (QUERY, A) RETURNS C {
 $C = ORDER $QUERY BY total DESC, $A;
 } ;
 data =  LOAD 'input' AS (query:CHARARRAY);
 queries_group = GROUP data BY query;
 queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS 
 total;
 queries_ordered = my_macro_1(queries_count, query);
 queries_limit = LIMIT queries_ordered 2;
 STORE queries_limit INTO 'output';
 {code}
 If I remove macro pigunit works fine. Even just defining macro without using 
 it results in parsing error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3237) Pig current releases lack a UDF MakeSet(). This UDF returns a set value (a string containing substrings separated by , characters) consisting of the strings that have the

2013-04-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3237:


Fix Version/s: (was: 0.10.0)
   Status: Open  (was: Patch Available)

Thanks for the patch.  Some belated feedback.

# Please add some documentation (preferably in the form of javadocs on the 
class) explaining what this does.  Looking over the code it's not clear to me 
what you're trying to accomplish or even how this is related to creating a set.
# It needs unit tests
# You're hard wiring the number of allowed tokens in a couple of places. bits[] 
and strings[] both have hard coded values.  This will result in 
IndexOutOfBoundsExceptions with no error message indicating why.  These should 
be extensible, or at least check the bounds and tell users they have exceeded 
them.

 Pig current releases lack a UDF MakeSet(). This UDF returns a set value (a 
 string containing substrings separated by , characters) consisting of the 
 strings that have the corresponding bit in the first argument
 

 Key: PIG-3237
 URL: https://issues.apache.org/jira/browse/PIG-3237
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.10.0
Reporter: Seethal Vincent
 Attachments: MakeSet.java.patch


 Pig current releases lack a UDF MakeSet(). This UDF returns a set value (a 
 string containing substrings separated by , characters) consisting of the 
 strings that have the corresponding bit in the first argument

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3238) Pig current releases lack a UDF Stuff(). This UDF deletes a specified length of characters and inserts another set of characters at a specified starting point.

2013-04-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3238:


Fix Version/s: (was: 0.10.0)
   Status: Open  (was: Patch Available)

 Pig current releases lack a UDF Stuff(). This UDF deletes a specified length 
 of characters and inserts another set of characters at a specified starting 
 point.
 ---

 Key: PIG-3238
 URL: https://issues.apache.org/jira/browse/PIG-3238
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.10.0
Reporter: Sonu Prathap
 Attachments: Stuff.java.patch


 Pig current releases lack a UDF Stuff(). This UDF deletes a specified length 
 of characters and inserts another set of characters at a specified starting 
 point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3215) [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files

2013-04-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3215:


Status: Open  (was: Patch Available)

 [piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files
 

 Key: PIG-3215
 URL: https://issues.apache.org/jira/browse/PIG-3215
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Reporter: MIYAKAWA Taku
Assignee: MIYAKAWA Taku
  Labels: piggybank
 Attachments: LTSVLoader-6.html, LTSVLoader.html, PIG-3215-6.patch, 
 PIG-3215.patch


 LTSV, or Labeled Tab-separated Values format is now getting popular in Japan 
 for log files, especially of web servers. The goal of this jira is to add 
 LTSVLoader in PiggyBank to load LTSV files.
 LTSV is based on TSV thus columns are separated by tab characters. 
 Additionally each of columns includes a label and a value, separated by : 
 character.
 Read about LTSV on http://ltsv.org/.
 h4. Example LTSV file (access.log)
 Columns are separated by tab characters.
 {noformat}
 host:host1.example.orgreq:GET /index.html ua:Opera/9.80
 host:host1.example.orgreq:GET /favicon.icoua:Opera/9.80
 host:pc.example.com   req:GET /news.html  ua:Mozilla/5.0
 {noformat}
 h4. Usage 1: Extract fields from each line
 Users can specify an input schema and get columns as Pig fields.
 This example loads the LTSV file shown in the previous section.
 {code}
 -- Parses the access log and count the number of lines
 -- for each pair of the host column and the ua column.
 access = LOAD 'access.log' USING 
 org.apache.pig.piggybank.storage.LTSVLoader('host:chararray, ua:chararray');
 grouped_access = GROUP access BY (host, ua);
 count_for_host_ua = FOREACH grouped_access GENERATE group.host, group.ua, 
 COUNT(access);
 DUMP count_for_host_ua;
 {code}
 The below text will be printed out.
 {noformat}
 (host1.example.org,Opera/9.80,2)
 (pc.example.com,Firefox/5.0,1)
 {noformat}
 h4. Usage 2: Extract a map from each line
 Users can get a map for each LTSV line. The key of a map is a label of the 
 LTSV column. The value of a map comes from characters after : in the LTSV 
 column.
 {code}
 -- Parses the access log and projects the user agent field.
 access = LOAD 'access.log' USING 
 org.apache.pig.piggybank.storage.LTSVLoader() AS (m:map[]);
 user_agent = FOREACH access GENERATE m#'ua' AS ua;
 DUMP user_agent;
 {code}
 The below text will be printed out.
 {noformat}
 (Opera/9.80)
 (Opera/9.80)
 (Firefox/5.0)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3190) Add LuceneTokenizer and SnowballTokenizer to Pig - useful text tokenization

2013-04-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3190:


Status: Open  (was: Patch Available)

Canceling patch until issues around location and build failures are resolved.

 Add LuceneTokenizer and SnowballTokenizer to Pig - useful text tokenization
 ---

 Key: PIG-3190
 URL: https://issues.apache.org/jira/browse/PIG-3190
 Project: Pig
  Issue Type: Bug
  Components: internal-udfs
Affects Versions: 0.11
Reporter: Russell Jurney
Assignee: Russell Jurney
 Fix For: 0.12

 Attachments: PIG-3190-2.patch, PIG-3190-3.patch, PIG-3190.patch


 TOKENIZE is literally useless. The Lucene Standard/Snowball tokenizers in 
 lucene, as used by, varaha is much more useful for actual tasks: 
 https://github.com/Ganglion/varaha/blob/master/src/main/java/varaha/text/TokenizeText.java
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3193) Fix ant docs warnings

2013-04-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633081#comment-13633081
 ] 

Alan Gates commented on PIG-3193:
-

+1.  For the two you didn't fix, why don't you open a separate JIRA so that you 
can resolve this one with the issues you addressed.

 Fix ant docs warnings
 ---

 Key: PIG-3193
 URL: https://issues.apache.org/jira/browse/PIG-3193
 Project: Pig
  Issue Type: Bug
  Components: build, documentation
Affects Versions: 0.11
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
  Labels: newbie
 Fix For: 0.12

 Attachments: PIG-3193.patch


 I see many warnings every time when I run ant clean docs. They don't break 
 build, but it would be nice if we could clean them if possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2767) Pig creates wrong schema after dereferencing nested tuple fields

2013-04-16 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13633111#comment-13633111
 ] 

Alan Gates commented on PIG-2767:
-

+1.

 Pig creates wrong schema after dereferencing nested tuple fields
 

 Key: PIG-2767
 URL: https://issues.apache.org/jira/browse/PIG-2767
 Project: Pig
  Issue Type: Bug
  Components: parser
Affects Versions: 0.10.0
 Environment: Amazon EMR, patched to use Pig 0.10.0
Reporter: Jonathan Packer
Assignee: Daniel Dai
 Fix For: 0.12

 Attachments: PIG-2767-1.patch, test_data.txt


 The following script fails:
 data = LOAD 'test_data.txt' USING PigStorage() AS (f1: int, f2: int, f3:
 int, f4: int);
 nested = FOREACH data GENERATE f1, (f2, f3, f4) AS nested_tuple;
 dereferenced = FOREACH nested GENERATE f1, nested_tuple.(f2, f3);
 DESCRIBE dereferenced;
 uses_dereferenced = FOREACH dereferenced GENERATE nested_tuple.f3;
 DESCRIBE uses_dereferenced;
 The schema of dereferenced should be {f1: int, nested_tuple: (f2: int,
 f3: int)}. DESCRIBE thinks it is {f1: int, f2: int} instead. When dump is
 used, the data is actually in form of the correct schema however, ex.
 (1,(2,3))
 (5,(6,7))
 ...
 This is not just a problem with DESCRIBE. Because the schema is incorrect,
 the reference to nested_tuple in the uses_dereferenced statement is
 considered to be invalid, and the script fails to run. The error is:
 Invalid field projection. Projected field [nested_tuple] does not exist in
 schema: f1:int,f2:int.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-3186) tar/deb/pkg ant targets should depend on piggybank

2013-04-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-3186:
---

Assignee: Lorand Bendig

 tar/deb/pkg ant targets should depend on piggybank
 --

 Key: PIG-3186
 URL: https://issues.apache.org/jira/browse/PIG-3186
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham
Assignee: Lorand Bendig
  Labels: low-hanging-fruit, simple
 Fix For: 0.12

 Attachments: piggy.patch


 The tar, deb and rpm artifacts should contain piggybank but they don't when 
 built via ant unless piggybank is built separately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3186) tar/deb/pkg ant targets should depend on piggybank

2013-04-16 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3186:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Lorand.

 tar/deb/pkg ant targets should depend on piggybank
 --

 Key: PIG-3186
 URL: https://issues.apache.org/jira/browse/PIG-3186
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham
Assignee: Lorand Bendig
  Labels: low-hanging-fruit, simple
 Fix For: 0.12

 Attachments: piggy.patch


 The tar, deb and rpm artifacts should contain piggybank but they don't when 
 built via ant unless piggybank is built separately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-200) Pig Performance Benchmarks

2013-04-15 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13632338#comment-13632338
 ] 

Alan Gates commented on PIG-200:


+1.  Latest patch changes look good.  I think it would be good to get this 
checked in and maintained going forward.

 Pig Performance Benchmarks
 --

 Key: PIG-200
 URL: https://issues.apache.org/jira/browse/PIG-200
 Project: Pig
  Issue Type: Task
Reporter: Amir Youssefi
Assignee: Alan Gates
 Fix For: 0.2.0

 Attachments: generate_data.pl, perf-0.6.patch, perf.hadoop.patch, 
 perf.patch, pig-0.8.1-vs-0.9.0.png, PIG-200-0.12.patch, pigmix2.patch, 
 pigmix_pig0.11.patch


 To benchmark Pig performance, we need to have a TPC-H like Large Data Set 
 plus Script Collection. This is used in comparison of different Pig releases, 
 Pig vs. other systems (e.g. Pig + Hadoop vs. Hadoop Only).
 Here is Wiki for small tests: http://wiki.apache.org/pig/PigPerformance
 I am currently running long-running Pig scripts over data-sets in the order 
 of tens of TBs. Next step is hundreds of TBs.
 We need to have an open large-data set (open source scripts which generate 
 data-set) and detailed scripts for important operations such as ORDER, 
 AGGREGATION etc.
 We can call those the Pig Workouts: Cardio (short processing), Marathon (long 
 running scripts) and Triathlon (Mix). 
 I will update this JIRA with more details of current activities soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3186) tar/deb/pkg ant targets should depend on piggybank

2013-03-29 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617680#comment-13617680
 ] 

Alan Gates commented on PIG-3186:
-

Is this ready for review?  If so please click Submit Patch so we know to 
review it.  Thanks for the patch.

 tar/deb/pkg ant targets should depend on piggybank
 --

 Key: PIG-3186
 URL: https://issues.apache.org/jira/browse/PIG-3186
 Project: Pig
  Issue Type: Bug
Reporter: Bill Graham
  Labels: low-hanging-fruit, simple
 Fix For: 0.12

 Attachments: piggy.patch


 The tar, deb and rpm artifacts should contain piggybank but they don't when 
 built via ant unless piggybank is built separately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3247) Piggybank functions to mimic OVER clause in SQL

2013-03-26 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3247:


Attachment: Over.2.patch

A new version of the patch that fixes an error in the percent_rank calculation 
and adds the ability to specify the return type of the Over function.

 Piggybank functions to mimic OVER clause in SQL
 ---

 Key: PIG-3247
 URL: https://issues.apache.org/jira/browse/PIG-3247
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: Over.2.patch, Over.patch


 In order to test Hive I have written some UDFs to mimic the behavior of SQL's 
 OVER clause.  I thought they would be useful to share.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3257) Add unique identifier UDF

2013-03-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3257:


Attachment: PIG-3257.patch

 Add unique identifier UDF
 -

 Key: PIG-3257
 URL: https://issues.apache.org/jira/browse/PIG-3257
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3257.patch


 It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3257) Add unique identifier UDF

2013-03-22 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3257:


Status: Patch Available  (was: Open)

A simple UDF that calls Java's UUID.getRandomUUID() function.  I believe this 
could be done with a combination of the piggybank ToString function and using 
StringInvoker for UUID.getRandomUUID, but this seems like a useful and simple 
enough thing to just build in.

 Add unique identifier UDF
 -

 Key: PIG-3257
 URL: https://issues.apache.org/jira/browse/PIG-3257
 Project: Pig
  Issue Type: Improvement
  Components: internal-udfs
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 0.12

 Attachments: PIG-3257.patch


 It would be good to have a Pig function to generate unique identifiers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   5   6   7   8   >