Hi, I'm trying to integrate Jython UDF into my maven project.
I have a problem with running scripts where
@outputSchema(blabla) is defined
Here is an error:
File
/home/ssa/devel/etl-masterdata/gsmcell-merger/src/test/python/Test.py,
line 3, in module
from pig.udf.mergerUDF import *
File
Your example had newlines in the employee element. The regular expression .*
does not match newlines. One way to remove newlines is REPLACE(x,'[\\n]','').
If the text ranges you are interested in do not contain newlines, for example
if you are interested in employee_id but do not care about its
Hey All,
How does pig deal with handling null param values.
Should there be an exception on null param value?
Currently it just translates it to String null
e.g
InputStream queryStream = IOUtils.toInputStream(A = LOAD '$VAL'
using PigStorage(), UTF-8);
MapString, String
Hi,
No sure whether it helps, but I did a lot of testing in such cases. Test
and see was my main approach. It is really tricky sometimes. Also you can
try the -dryrun option when launching pig.
Best Regards,
Ruslan Al-Fakikh
https://www.odesk.com/users/~015b7b5f617eb89923
On Tue, Sep 17, 2013
I tried to do a quick and dirty inspection of some of our data feeds, which
are encoded in gzipped SequenceFile.
basically I did
a = load 'myfile' using ..SequenceFileLoader() AS ( mykey, myvalue);
but it gave me some error:
2013-09-16 17:34:28,915 [Thread-5] INFO
The problem is that pig only speaks its data types. So you need to tell it
how to translate from your custom writable to a pig datatype.
Apparently elephant-bird has some support for doing this type of thing...
take a look at this SO post
I think my custom type has toString(), well at least writable() says it's
writable to bytes, so supposedly if I force it to bytes or string, pig
should be able to cast
like
load ... AS ( k:chararray, v:chararray);
but this actually fails
On Mon, Sep 16, 2013 at 6:22 PM, Pradeep Gollakota
Thats correct...
The load ... AS (k:chararray, v:charrary); doesn't actually do what you
think it does. The AS statement tell Pig what the schema types are, so it
will call the appropriate LoadCaster method to get it into the right type.
A LoadCaster object defines how to map byte[] into
It doesn't look like the SequenceFileLoader from the piggybank has much
support. The elephant bird version looks like it does what you need it to
do.
https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/SequenceFileLoader.java
You'll have to