Just tried this:
----------------------------------------------------
REGISTER 'hdfs:///lib/avro-1.7.2.jar';
REGISTER 'hdfs:///lib/json-simple-1.1.1.jar';
REGISTER 'hdfs:///lib/piggybank.jar';
DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
avro = load '/data/2012/trace_ejb3/2012-01-0*.avro' USING
AvroStorage();
groups = group avro by tracetype;
dump groups;
----------------------------------------------------
gave me:
<file avro-test.pig, line 10, column 23> Invalid field projection.
Projected field [tracetype] does not exist.
Pig Stack Trace
---------------
ERROR 1025:
<file avro-test.pig, line 10, column 23> Invalid field projection.
Projected field [tracetype] does not exist.
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable
to open iterator for alias groups
at org.apache.pig.PigServer.openIterator(PigServer.java:862)
at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:682)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:189)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:555)
at org.apache.pig.Main.main(Main.java:111)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: org.apache.pig.PigException: ERROR 1002: Unable to store
alias groups
at org.apache.pig.PigServer.storeEx(PigServer.java:961)
at org.apache.pig.PigServer.store(PigServer.java:924)
at org.apache.pig.PigServer.openIterator(PigServer.java:837)
... 12 more
Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR
1025:
<file avro-test.pig, line 10, column 23> Invalid field projection.
Projected field [tracetype] does not exist.
at
org.apache.pig.newplan.logical.expression.ProjectExpression.findColNum(ProjectExpression.java:183)
at
org.apache.pig.newplan.logical.expression.ProjectExpression.setColumnNumberFromAlias(ProjectExpression.java:166)
at
org.apache.pig.newplan.logical.visitor.ColumnAliasConversionVisitor$1.visit(ColumnAliasConversionVisitor.java:53)
at
org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:207)
at
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at
org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:101)
at
org.apache.pig.newplan.logical.relational.LOCogroup.accept(LOCogroup.java:235)
at
org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1621)
at org.apache.pig.PigServer$Graph.compile(PigServer.java:1616)
at org.apache.pig.PigServer$Graph.access$200(PigServer.java:1339)
at org.apache.pig.PigServer.storeEx(PigServer.java:956)
... 14 more
================================================================================
Maybe globbing with [] doesnt work, but wildcard works? No idea why i
get the error above though..
Kind regards,
Bart
Cheolsoo Park schreef op 25.11.2012 15:33:
Hi Bart,
avro = load '/data/2012/trace_ejb3/2012-**01-*.avro' USING
AvroStorage();
gives me:
Schema for avro unknown.
This should work. The error that you're getting is not from
AvroStorage but
PigServer.
grep -r "Schema for .* unknown" *
src/org/apache/pig/PigServer.java:
System.out.println("Schema for " + alias + " unknown.");
...
It looks like that you have an error in your Pig script. Can you
please
provide your Pig script and the schema of your avro files that
reproduce
the error?
Thanks,
Cheolsoo
On Sun, Nov 25, 2012 at 1:02 AM, Bart Verwilst <[email protected]>
wrote:
Hi,
I've tried loading a csv with PigStorage(), getting this:
txt = load '/import.mysql/trace_ejb3_**2011/part-m-00000' USING
PigStorage(',');
describe txt;
Schema for txt unknown.
Maybe this is because of it being a csv, so a schema is hard to
figure
out..
Any other suggestions? Our whole hadoop setup is built around being
able
to selectively load avro files to run our jobs on, if this doesn't
work
then we're pretty much screwed.. :)
Thanks in advance!
Bart
Russell Jurney schreef op 24.11.2012 20:23:
I suspect the problem is AvroStorage, not globbing. Try this with
pigstorage.
Russell Jurney twitter.com/rjurney
On Nov 24, 2012, at 5:15 AM, Bart Verwilst <[email protected]>
wrote:
Hello,
Thanks for your suggestion!
I switch my avro variable to avro = load '$INPUT' USING
AvroStorage();
However I get the same results this way:
$ pig -p INPUT=/data/2012/trace_ejb3/**2012-01-02.avro
avro-test.pig
which: no hbase in (:/usr/lib64/qt-3.3/bin:/usr/**
java/jdk1.6.0_33/bin/:/usr/**local/bin:/bin:/usr/bin:/usr/**
local/sbin:/usr/sbin:/sbin:/**usr/local/bin)
<snip>
avro: {id: long,timestamp: long,latitude: int,longitude:
int,speed:
int,heading: int,terminalid: int,customerid: chararray,mileage:
int,creationtime: long,tracetype: int,traceproperties:
{ARRAY_ELEM: (id:
long,value: chararray,pkey: chararray)}}
$ pig -p INPUT="/data/2012/trace_ejb3/**2012-01-0[12].avro"
avro-test.pig
which: no hbase in (:/usr/lib64/qt-3.3/bin:/usr/**
java/jdk1.6.0_33/bin/:/usr/**local/bin:/bin:/usr/bin:/usr/**
local/sbin:/usr/sbin:/sbin:/**usr/local/bin)
<snip>
2012-11-24 14:11:17,309 [main] ERROR
org.apache.pig.tools.grunt.**Grunt
- ERROR 2999: Unexpected internal error. null
Caused by: java.net.URISyntaxException: Illegal character in path
at
index 31: /data/2012/trace_ejb3/2012-01-**0[12].avro
$ pig -p INPUT='/data/2012/trace_ejb3/**2012-01-0[12].avro'
avro-test.pig
which: no hbase in (:/usr/lib64/qt-3.3/bin:/usr/**
java/jdk1.6.0_33/bin/:/usr/**local/bin:/bin:/usr/bin:/usr/**
local/sbin:/usr/sbin:/sbin:/**usr/local/bin)
<snip>
2012-11-24 14:12:05,085 [main] ERROR
org.apache.pig.tools.grunt.**Grunt
- ERROR 2999: Unexpected internal error. null
Details at logfile: /var/lib/hadoop-hdfs/pig_**1353762722742.log
Caused by: java.net.URISyntaxException: Illegal character in path
at
index 31: /data/2012/trace_ejb3/2012-01-**0[12].avro
Deepak Tiwari schreef op 24.11.2012 00:41:
Hi,
I dont have a system to test it right now, but I have been
passing it
using
under parameter -p and it works.
change line to accept parameters like avro = load
'$INPUT'
USING
AvroStorage();
bin/pig -p INPUT="/data/2012/trace_ejb3/**2012-**01-0[12].avro"
<scriptName>
I think if you dont give double quotes then the expansion is done
by OS.
Please let us know if it doesnt work...
On Fri, Nov 23, 2012 at 12:45 PM, Bart Verwilst
<[email protected]>
wrote:
Hello,
I have the following files on HDFS:
-rw-r--r-- 3 hdfs supergroup 22989179 2012-11-22 11:17
/data/2012/trace_ejb3/2012-01-****01.avro
-rw-r--r-- 3 hdfs supergroup 240551819 2012-11-22 14:27
/data/2012/trace_ejb3/2012-01-****02.avro
-rw-r--r-- 3 hdfs supergroup 324464635 2012-11-22 18:28
/data/2012/trace_ejb3/2012-01-****03.avro
-rw-r--r-- 3 hdfs supergroup 345526418 2012-11-22 21:30
/data/2012/trace_ejb3/2012-01-****04.avro
-rw-r--r-- 3 hdfs supergroup 351322916 2012-11-23 00:28
/data/2012/trace_ejb3/2012-01-****05.avro
-rw-r--r-- 3 hdfs supergroup 325953043 2012-11-23 04:32
/data/2012/trace_ejb3/2012-01-****06.avro
-rw-r--r-- 3 hdfs supergroup 107019156 2012-11-23 05:58
/data/2012/trace_ejb3/2012-01-****07.avro
-rw-r--r-- 3 hdfs supergroup 46392850 2012-11-23 06:37
/data/2012/trace_ejb3/2012-01-****08.avro
-rw-r--r-- 3 hdfs supergroup 361970930 2012-11-23 10:06
/data/2012/trace_ejb3/2012-01-****09.avro
-rw-r--r-- 3 hdfs supergroup 398462505 2012-11-23 13:44
/data/2012/trace_ejb3/2012-01-****10.avro
-rw-r--r-- 3 hdfs supergroup 400785976 2012-11-23 17:17
/data/2012/trace_ejb3/2012-01-****11.avro
-rw-r--r-- 3 hdfs supergroup 400027565 2012-11-23 20:43
/data/2012/trace_ejb3/2012-01-****12.avro
Using Pig 0.10.0-cdh4.1.2, i try to load those files, and
describe
them.
REGISTER 'hdfs:///lib/avro-1.7.2.jar';
REGISTER 'hdfs:///lib/json-simple-1.1.****1.jar';
REGISTER 'hdfs:///lib/piggybank.jar';
DEFINE AvroStorage org.apache.pig.piggybank.****
storage.avro.AvroStorage();
avro = load '/data/2012/trace_ejb3/2012-****01-01.avro' USING
AvroStorage();
describe avro;
This works, same with 2012-01-02.avro.
However, as soon as i want to include multiple files, no dice.
avro = load '/data/2012/trace_ejb3/2012-****01-{01,02}.avro'
USING
AvroStorage();
gives me:
2012-11-23 21:41:07,475 [main] ERROR
org.apache.pig.tools.grunt.****Grunt
-
ERROR 2999: Unexpected internal error. null
Caused by: java.net.URISyntaxException: Illegal character in
path at
index
30: /data/2012/trace_ejb3/2012-01-****{01,02}.avro
avro = load '/data/2012/trace_ejb3/2012-****01-*.avro' USING
AvroStorage();
gives me:
Schema for avro unknown.
avro = load '/data/2012/trace_ejb3/2012-****01-0[12].avro' USING
AvroStorage();
also gives me:
Caused by: java.net.URISyntaxException: Illegal character in
path at
index
31: /data/2012/trace_ejb3/2012-01-****0[12].avro
What am i doing wrong here? According to
http://hadoop.apache.org/docs/****
<http://hadoop.apache.org/docs/**>
r0.21.0/api/org/apache/hadoop/****fs/FileSystem.html#**globStatus%**
28org.apache.hadoop.fs.Path%**29<http://hadoop.apache.org/**
docs/r0.21.0/api/org/apache/**hadoop/fs/FileSystem.html#**
globStatus%28org.apache.**hadoop.fs.Path%29<http://hadoop.apache.org/docs/r0.21.0/api/org/apache/hadoop/fs/FileSystem.html#globStatus%28org.apache.hadoop.fs.Path%29>>,
this should all be acceptable input?
Thanks in advance!
Kind regards,
Bart