Re: HCatalog select one column from hive

2014-05-29 Thread David LaBarbera
The pig script should be as easy as hive_table = LOAD ‘a_table’ USING org.apache.hcatalog.pig.HCatLoader(); hive_column = FOREACH hive_table GENERATE a_column; The trickier part could be the pig infrastructure. You can try just running the script with -useHCatalog. You may need to register the h

Re: Embedded pig in java

2014-04-03 Thread David LaBarbera
Try running with java -cp pig.jar idlocal.Idlocal David On Apr 3, 2014, at 7:54 AM, Junior Tsire wrote: > Hi everybody, > > I have been following a tutorial on how to embed pig in java because I am > currently learning it. I used the sample code that I found on > http://pig.apache.org/docs/

Re: Unit test for Pig UDF using DistributedCache

2014-02-10 Thread David LaBarbera
One approach is to separate your code from the pig wrapper. That way you only need to unit test your business logic. An example would be something like public class wrapperUdf extends EvalFunc { public Integer exec() { foo.exec() } public List getCachedFiles { } } public class foo { pub

Re: How can I pass command-line parameters with whitespace to an apache pig script?

2013-04-24 Thread David LaBarbera
My first though is to try flt='\'a1==1 AND a2=2\'' but mostly want to recommend running pig with the dry run (-r or -dryrun) flag so you can see how the substitution is being made. David On Apr 24, 2013, at 7:25 AM, Abhijit Chanda wrote: > Hi, > > I want to pass a filter statement with

Re: pig script - failed reading input from s3

2013-04-08 Thread David LaBarbera
Try fs.s3n.aws… and also load from s3 data = load 's3n://...' The "n" stands for native. I believe S3 also supports block device storage (s3://) which allows bigger files to be stored. I don't know how (if at all) the two types interact. David On Apr 7, 2013, at 1:11 PM, Panshul Whisper w

Re: Loader for small files

2013-02-11 Thread David LaBarbera
e; > > & a few other 'dfs' options given below: > > mapreduce.min.split.size > mapreduce.max.split.size > > Thanks. > > On Mon, Feb 11, 2013 at 10:29 AM, David LaBarbera < > davidlabarb...@localresponse.com> wrote: > >> You could st

Re: Loader for small files

2013-02-11 Thread David LaBarbera
You could store your data in smaller block sizes. Do something like hadoop fs HADOOP_OPTS="-Ddfs.block.size=1048576 -Dfs.local.block.size=1048576" -cp /org-input /small-block-input You might only need one of those parameters. You can verify the block size with hadoop fsck /small-block-input In yo

Re: Pig prints help options in start

2013-02-11 Thread David LaBarbera
Before the help information, do you see any message like JAVA_HOME not set … David On Feb 4, 2013, at 12:11 PM, Ionut Ignatescu wrote: > ./pig > I expected to run Grunt shell, like in the previous version. > I strongly believe it's a problem on my side - how I adopted Apache Pig, > but I don't

Re: How to read Mahout generated sequence files in Pig

2013-02-06 Thread David LaBarbera
The elephant bird sequence file loader should work, you'll just need to register the mahout jar with the vector writable they use. David On Feb 4, 2013, at 7:06 PM, Harsha wrote: > keeyong, >we used elephantbird( https://github.com/kevinweil/elephant-bird ) from > twitter to read/write s

Re: How do I load JSON in Pig?

2012-11-21 Thread David LaBarbera
Try com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') This should allow access to nested object as nested map ($0#'level1#'level2'#'level3' …) David On Nov 21, 2012, at 12:56 AM, Saxifrage Cucvara wrote: > I'm also experiencing problems working with JSON objects in Pig. > > I have

Re: issues with using JOIN inside a MACRO?

2012-11-08 Thread David LaBarbera
Have you tried using a positional reference ($0)? David On Nov 7, 2012, at 6:44 PM, Yang wrote: > hadoop@ip-10-245-54-191:~/top50/new$ cat a.pig > DEFINE mymacro(blah, zoo) RETURNS foo { > x = JOIN $blah BY id, $zoo BY id; >y = JOIN x BY $blah::id, $zoo BY id; > $foo = foreach y generat

Re: force schema with TOBAG

2012-10-31 Thread David LaBarbera
sugar for "TOTUPLE( )" that was introduced in 0.10. (Sorry > that I forgot that "( )" doesn't work in 0.9.) > > Thanks, > Cheolsoo > > On Wed, Oct 31, 2012 at 4:53 AM, David LaBarbera < > davidlabarb...@localresponse.com> wrote: > >> Cheol

Re: force schema with TOBAG

2012-10-31 Thread David LaBarbera
, 0L), you mean { ('$ID_NULL'), > (0) }. But I believe that what you want is { ('$ID_NULL', 0) } given the > schema of relation 1. > > Thanks, > Cheolsoo > > On Tue, Oct 30, 2012 at 10:22 AM, David LaBarbera < > davidlabarb...@localresponse.com> wrote

force schema with TOBAG

2012-10-30 Thread David LaBarbera
I have a cogroup which effectively does a full outer join of two relations. Some of the relations are blank, so I have a FOREACH statement like grouped = COGROUP relation1 BY x, relation2 BY y; normalized = FOREACH grouped { normal1 = TOBAG('$ID_NULL', 0L); value1 = ( IsEmpty(relation1) ? n

Re: FOREACH GENERATE Conditional?

2012-10-24 Thread David LaBarbera
Try http://pig.apache.org/docs/r0.8.1/piglatin_ref2.html#SPLIT On Oct 24, 2012, at 2:51 AM, Eli Finkelshteyn wrote: > Hi folks, > I have a pig script that right now looks like this: > > … > likes = FILTER main_set BY blah == 'a' AND meh == 'b'; > likes_time = FOREACH likes GENERATE date, 'likes