unittest for Jython pig UDFs

2013-09-16 Thread Serega Sheypak
Hi, I'm trying to integrate Jython UDF into my maven project. I have a problem with running scripts where @outputSchema(blabla) is defined Here is an error: File /home/ssa/devel/etl-masterdata/gsmcell-merger/src/test/python/Test.py, line 3, in module from pig.udf.mergerUDF import * File

RE: Converting xml to csv

2013-09-16 Thread william.dowling
Your example had newlines in the employee element. The regular expression .* does not match newlines. One way to remove newlines is REPLACE(x,'[\\n]',''). If the text ranges you are interested in do not contain newlines, for example if you are interested in employee_id but do not care about its

Pig Parameter Substitution

2013-09-16 Thread Siddhi Mehta
Hey All, How does pig deal with handling null param values. Should there be an exception on null param value? Currently it just translates it to String null e.g InputStream queryStream = IOUtils.toInputStream(A = LOAD '$VAL' using PigStorage(), UTF-8); MapString, String

Re: Pig Parameter Substitution

2013-09-16 Thread Ruslan Al-Fakikh
Hi, No sure whether it helps, but I did a lot of testing in such cases. Test and see was my main approach. It is really tricky sometimes. Also you can try the -dryrun option when launching pig. Best Regards, Ruslan Al-Fakikh https://www.odesk.com/users/~015b7b5f617eb89923 On Tue, Sep 17, 2013

how to load custom Writable class from sequence file?

2013-09-16 Thread Yang
I tried to do a quick and dirty inspection of some of our data feeds, which are encoded in gzipped SequenceFile. basically I did a = load 'myfile' using ..SequenceFileLoader() AS ( mykey, myvalue); but it gave me some error: 2013-09-16 17:34:28,915 [Thread-5] INFO

Re: how to load custom Writable class from sequence file?

2013-09-16 Thread Pradeep Gollakota
The problem is that pig only speaks its data types. So you need to tell it how to translate from your custom writable to a pig datatype. Apparently elephant-bird has some support for doing this type of thing... take a look at this SO post

Re: how to load custom Writable class from sequence file?

2013-09-16 Thread Yang
I think my custom type has toString(), well at least writable() says it's writable to bytes, so supposedly if I force it to bytes or string, pig should be able to cast like load ... AS ( k:chararray, v:chararray); but this actually fails On Mon, Sep 16, 2013 at 6:22 PM, Pradeep Gollakota

Re: how to load custom Writable class from sequence file?

2013-09-16 Thread Pradeep Gollakota
Thats correct... The load ... AS (k:chararray, v:charrary); doesn't actually do what you think it does. The AS statement tell Pig what the schema types are, so it will call the appropriate LoadCaster method to get it into the right type. A LoadCaster object defines how to map byte[] into

Re: how to load custom Writable class from sequence file?

2013-09-16 Thread Pradeep Gollakota
It doesn't look like the SequenceFileLoader from the piggybank has much support. The elephant bird version looks like it does what you need it to do. https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java You'll have to