from:"David LaBarbera"

Re: HCatalog select one column from hive

2014-05-29 Thread David LaBarbera

The pig script should be as easy as hive_table = LOAD ‘a_table’ USING org.apache.hcatalog.pig.HCatLoader(); hive_column = FOREACH hive_table GENERATE a_column; The trickier part could be the pig infrastructure. You can try just running the script with -useHCatalog. You may need to register the h

Re: Embedded pig in java

2014-04-03 Thread David LaBarbera

Try running with java -cp pig.jar idlocal.Idlocal David On Apr 3, 2014, at 7:54 AM, Junior Tsire wrote: > Hi everybody, > > I have been following a tutorial on how to embed pig in java because I am > currently learning it. I used the sample code that I found on > http://pig.apache.org/docs/

Re: Unit test for Pig UDF using DistributedCache

2014-02-10 Thread David LaBarbera

One approach is to separate your code from the pig wrapper. That way you only need to unit test your business logic. An example would be something like public class wrapperUdf extends EvalFunc { public Integer exec() { foo.exec() } public List getCachedFiles { } } public class foo { pub

Re: How can I pass command-line parameters with whitespace to an apache pig script?

2013-04-24 Thread David LaBarbera

My first though is to try flt='\'a1==1 AND a2=2\'' but mostly want to recommend running pig with the dry run (-r or -dryrun) flag so you can see how the substitution is being made. David On Apr 24, 2013, at 7:25 AM, Abhijit Chanda wrote: > Hi, > > I want to pass a filter statement with

Re: pig script - failed reading input from s3

2013-04-08 Thread David LaBarbera

Try fs.s3n.aws… and also load from s3 data = load 's3n://...' The "n" stands for native. I believe S3 also supports block device storage (s3://) which allows bigger files to be stored. I don't know how (if at all) the two types interact. David On Apr 7, 2013, at 1:11 PM, Panshul Whisper w

Re: Loader for small files

2013-02-11 Thread David LaBarbera

e; > > & a few other 'dfs' options given below: > > mapreduce.min.split.size > mapreduce.max.split.size > > Thanks. > > On Mon, Feb 11, 2013 at 10:29 AM, David LaBarbera < > davidlabarb...@localresponse.com> wrote: > >> You could st

Re: Loader for small files

2013-02-11 Thread David LaBarbera

You could store your data in smaller block sizes. Do something like hadoop fs HADOOP_OPTS="-Ddfs.block.size=1048576 -Dfs.local.block.size=1048576" -cp /org-input /small-block-input You might only need one of those parameters. You can verify the block size with hadoop fsck /small-block-input In yo

Re: Pig prints help options in start

2013-02-11 Thread David LaBarbera

Before the help information, do you see any message like JAVA_HOME not set … David On Feb 4, 2013, at 12:11 PM, Ionut Ignatescu wrote: > ./pig > I expected to run Grunt shell, like in the previous version. > I strongly believe it's a problem on my side - how I adopted Apache Pig, > but I don't

Re: How to read Mahout generated sequence files in Pig

2013-02-06 Thread David LaBarbera

The elephant bird sequence file loader should work, you'll just need to register the mahout jar with the vector writable they use. David On Feb 4, 2013, at 7:06 PM, Harsha wrote: > keeyong, >we used elephantbird( https://github.com/kevinweil/elephant-bird ) from > twitter to read/write s

Re: How do I load JSON in Pig?

2012-11-21 Thread David LaBarbera

Try com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') This should allow access to nested object as nested map ($0#'level1#'level2'#'level3' …) David On Nov 21, 2012, at 12:56 AM, Saxifrage Cucvara wrote: > I'm also experiencing problems working with JSON objects in Pig. > > I have

Re: issues with using JOIN inside a MACRO?

2012-11-08 Thread David LaBarbera

Have you tried using a positional reference ($0)? David On Nov 7, 2012, at 6:44 PM, Yang wrote: > hadoop@ip-10-245-54-191:~/top50/new$ cat a.pig > DEFINE mymacro(blah, zoo) RETURNS foo { > x = JOIN $blah BY id, $zoo BY id; >y = JOIN x BY $blah::id, $zoo BY id; > $foo = foreach y generat

Re: force schema with TOBAG

2012-10-31 Thread David LaBarbera

sugar for "TOTUPLE( )" that was introduced in 0.10. (Sorry > that I forgot that "( )" doesn't work in 0.9.) > > Thanks, > Cheolsoo > > On Wed, Oct 31, 2012 at 4:53 AM, David LaBarbera < > davidlabarb...@localresponse.com> wrote: > >> Cheol

Re: force schema with TOBAG

2012-10-31 Thread David LaBarbera

, 0L), you mean { ('$ID_NULL'), > (0) }. But I believe that what you want is { ('$ID_NULL', 0) } given the > schema of relation 1. > > Thanks, > Cheolsoo > > On Tue, Oct 30, 2012 at 10:22 AM, David LaBarbera < > davidlabarb...@localresponse.com> wrote

force schema with TOBAG

2012-10-30 Thread David LaBarbera

I have a cogroup which effectively does a full outer join of two relations. Some of the relations are blank, so I have a FOREACH statement like grouped = COGROUP relation1 BY x, relation2 BY y; normalized = FOREACH grouped { normal1 = TOBAG('$ID_NULL', 0L); value1 = ( IsEmpty(relation1) ? n

Re: FOREACH GENERATE Conditional?

2012-10-24 Thread David LaBarbera

Try http://pig.apache.org/docs/r0.8.1/piglatin_ref2.html#SPLIT On Oct 24, 2012, at 2:51 AM, Eli Finkelshteyn wrote: > Hi folks, > I have a pig script that right now looks like this: > > … > likes = FILTER main_set BY blah == 'a' AND meh == 'b'; > likes_time = FOREACH likes GENERATE date, 'likes

Re: HCatalog select one column from hive

Re: Embedded pig in java

Re: Unit test for Pig UDF using DistributedCache

Re: How can I pass command-line parameters with whitespace to an apache pig script?

Re: pig script - failed reading input from s3

Re: Loader for small files

Re: Loader for small files

Re: Pig prints help options in start

Re: How to read Mahout generated sequence files in Pig

Re: How do I load JSON in Pig?

Re: issues with using JOIN inside a MACRO?

Re: force schema with TOBAG

Re: force schema with TOBAG

force schema with TOBAG

Re: FOREACH GENERATE Conditional?

15 matches

Site Navigation

Mail list logo

Footer information