The pig script should be as easy as
hive_table = LOAD ‘a_table’ USING org.apache.hcatalog.pig.HCatLoader();
hive_column = FOREACH hive_table GENERATE a_column;
The trickier part could be the pig infrastructure. You can try just running the
script with -useHCatalog. You may need to register the h
Try running with
java -cp pig.jar idlocal.Idlocal
David
On Apr 3, 2014, at 7:54 AM, Junior Tsire wrote:
> Hi everybody,
>
> I have been following a tutorial on how to embed pig in java because I am
> currently learning it. I used the sample code that I found on
> http://pig.apache.org/docs/
One approach is to separate your code from the pig wrapper. That way you
only need to unit test your business logic.
An example would be something like
public class wrapperUdf extends EvalFunc {
public Integer exec() {
foo.exec()
}
public List getCachedFiles {
}
}
public class foo {
pub
My first though is to try
flt='\'a1==1 AND a2=2\''
but mostly want to recommend running pig with the dry run (-r or -dryrun) flag
so you can see how the substitution is being made.
David
On Apr 24, 2013, at 7:25 AM, Abhijit Chanda
wrote:
> Hi,
>
> I want to pass a filter statement with
Try
fs.s3n.aws…
and also load from s3
data = load 's3n://...'
The "n" stands for native. I believe S3 also supports block device storage
(s3://) which allows bigger files to be stored. I don't know how (if at all)
the two types interact.
David
On Apr 7, 2013, at 1:11 PM, Panshul Whisper w
e;
>
> & a few other 'dfs' options given below:
>
> mapreduce.min.split.size
> mapreduce.max.split.size
>
> Thanks.
>
> On Mon, Feb 11, 2013 at 10:29 AM, David LaBarbera <
> davidlabarb...@localresponse.com> wrote:
>
>> You could st
You could store your data in smaller block sizes. Do something like
hadoop fs HADOOP_OPTS="-Ddfs.block.size=1048576 -Dfs.local.block.size=1048576"
-cp /org-input /small-block-input
You might only need one of those parameters. You can verify the block size with
hadoop fsck /small-block-input
In yo
Before the help information, do you see any message like JAVA_HOME not set …
David
On Feb 4, 2013, at 12:11 PM, Ionut Ignatescu wrote:
> ./pig
> I expected to run Grunt shell, like in the previous version.
> I strongly believe it's a problem on my side - how I adopted Apache Pig,
> but I don't
The elephant bird sequence file loader should work, you'll just need to
register the mahout jar with the vector writable they use.
David
On Feb 4, 2013, at 7:06 PM, Harsha wrote:
> keeyong,
>we used elephantbird( https://github.com/kevinweil/elephant-bird ) from
> twitter to read/write s
Try
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad')
This should allow access to nested object as nested map
($0#'level1#'level2'#'level3' …)
David
On Nov 21, 2012, at 12:56 AM, Saxifrage Cucvara
wrote:
> I'm also experiencing problems working with JSON objects in Pig.
>
> I have
Have you tried using a positional reference ($0)?
David
On Nov 7, 2012, at 6:44 PM, Yang wrote:
> hadoop@ip-10-245-54-191:~/top50/new$ cat a.pig
> DEFINE mymacro(blah, zoo) RETURNS foo {
> x = JOIN $blah BY id, $zoo BY id;
>y = JOIN x BY $blah::id, $zoo BY id;
> $foo = foreach y generat
sugar for "TOTUPLE( )" that was introduced in 0.10. (Sorry
> that I forgot that "( )" doesn't work in 0.9.)
>
> Thanks,
> Cheolsoo
>
> On Wed, Oct 31, 2012 at 4:53 AM, David LaBarbera <
> davidlabarb...@localresponse.com> wrote:
>
>> Cheol
, 0L), you mean { ('$ID_NULL'),
> (0) }. But I believe that what you want is { ('$ID_NULL', 0) } given the
> schema of relation 1.
>
> Thanks,
> Cheolsoo
>
> On Tue, Oct 30, 2012 at 10:22 AM, David LaBarbera <
> davidlabarb...@localresponse.com> wrote
I have a cogroup which effectively does a full outer join of two relations.
Some of the relations are blank, so I have a FOREACH statement like
grouped = COGROUP relation1 BY x, relation2 BY y;
normalized = FOREACH grouped {
normal1 = TOBAG('$ID_NULL', 0L);
value1 = ( IsEmpty(relation1) ? n
Try
http://pig.apache.org/docs/r0.8.1/piglatin_ref2.html#SPLIT
On Oct 24, 2012, at 2:51 AM, Eli Finkelshteyn wrote:
> Hi folks,
> I have a pig script that right now looks like this:
>
> …
> likes = FILTER main_set BY blah == 'a' AND meh == 'b';
> likes_time = FOREACH likes GENERATE date, 'likes
15 matches
Mail list logo