Re: FLUME AVRO

2012-08-12 Thread Harsh J
Abhishek,

Moving this to user@flume lists, as it is Flume specific.

P.s. Please do not cross post to multiple lists, it does not guarantee
you a faster response nor is mailing to a *-dev list relevant to your
question here. Help avoid additional inbox noise! :)

On Thu, Aug 9, 2012 at 10:43 PM, abhiTowson cal
abhishek.dod...@gmail.com wrote:
 hi all,

 can log data be converted into avro,when data is sent from source to sink.

 Regards
 Abhishek



-- 
Harsh J


Re: Avro vs Json

2012-08-12 Thread Harsh J
Moving this to the user@avro lists. Please use the right lists for the
best answers and the right people.

I'd pick Avro out of the two - it is very well designed for typed data
and has a very good implementation of the serializer/deserializer,
aside of the schema advantages. FWIW, Avro has a tojson CLI tool to
dump Avro binary format out as JSON structures, which would be of help
if you seek readability and/or integration with apps/systems that
already depend on JSON.

On Sun, Aug 12, 2012 at 10:41 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
 We get data in Json format. I was initially thinking of simply storing Json
 in hdfs for processing. I see there is Avro that does the similar thing but
 most likely stores it in more optimized format. I wanted to get users
 opinion on which one is better.



-- 
Harsh J


Re: Avro

2012-08-05 Thread Nitin Kesarwani
Mohit,

You can use this patch to suit your need:
https://issues.apache.org/jira/browse/PIG-2579

New fields in Avro schema descriptor file need to have a non-null default
value. Hence, using the new schema file, you should be able to read older
data as well. Try it out. It is very straight forward.

Hope this helps!

On Sun, Aug 5, 2012 at 12:01 AM, Mohit Anchlia mohitanch...@gmail.com wrote:

I've heard that Avro provides a good way of dealing with changing schemas.
 I am not sure how it could be done without keeping some kind of structure
 along with the data. Are there any good examples and documentation that I
 can look at?


-N


Re: Avro

2012-08-05 Thread Mohit Anchlia
On Sat, Aug 4, 2012 at 11:43 PM, Nitin Kesarwani bumble@gmail.comwrote:

 Mohit,

 You can use this patch to suit your need:
 https://issues.apache.org/jira/browse/PIG-2579

 New fields in Avro schema descriptor file need to have a non-null default
 value. Hence, using the new schema file, you should be able to read older
 data as well. Try it out. It is very straight forward.

 Hope this helps!


Thanks! I am new to Avro what's the best place to see some examples of how
Avro deals with schema changes? I am trying to find some examples.


 On Sun, Aug 5, 2012 at 12:01 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

 I've heard that Avro provides a good way of dealing with changing schemas.
  I am not sure how it could be done without keeping some kind of structure
  along with the data. Are there any good examples and documentation that I
  can look at?
 

 -N



Re: Avro vs Protocol Buffer

2012-07-19 Thread Bruno Freudensprung

Once new results will be available, you might be interested in:
https://github.com/eishay/jvm-serializers/wiki/
https://github.com/eishay/jvm-serializers/wiki/Staging-Results

My2cts,

Bruno.

Le 16/07/2012 22:49, Mike S a écrit :

Strictly from speed and performance perspective, is Avro as fast as
protocol buffer?





Re: Avro vs Protocol Buffer

2012-07-19 Thread Harsh J
+1 to what Bruno's pointed you at. I personally like Avro for its data
files (schema's stored on file, and a good, splittable container for
typed data records). I think speed for serde is on-par with Thrift, if
not faster today. Thrift offers no optimized data container format
AFAIK.

On Thu, Jul 19, 2012 at 1:57 PM, Bruno Freudensprung
bruno.freudenspr...@temis.com wrote:
 Once new results will be available, you might be interested in:
 https://github.com/eishay/jvm-serializers/wiki/
 https://github.com/eishay/jvm-serializers/wiki/Staging-Results

 My2cts,

 Bruno.

 Le 16/07/2012 22:49, Mike S a écrit :

 Strictly from speed and performance perspective, is Avro as fast as
 protocol buffer?





-- 
Harsh J


Re: Avro, Hadoop0.20.2, Jackson Error

2012-03-29 Thread Deepak Nettem
Hi,

I have moved to CDH3 which doesn't have this issue. Hope that helps anybody
stuck with the same issue.

best,
Deepak

On Mon, Mar 26, 2012 at 11:19 PM, Scott Carey sc...@richrelevance.comwrote:

 Does it still happen if you configure

 avro-tools to use

 dependency
  groupIdorg.apache.avro/groupId
   artifactIdavro-tools/artifactId
  version1.6.3/version
   classifiernodeps/classifier
/dependency


 ?

 You have two hadoop's, two jacksons, and even two avro:avro artifacts in
 your classpath if you use the avro bundle jar with a default classifier.

 avro-tools jar is not intended for inclusion in a project, as it is a jar
 with dependencies inside.
 https://cwiki.apache.org/confluence/display/AVRO/Build+Documentation#BuildD
 ocumentation-ProjectStructurehttps://cwiki.apache.org/confluence/display/AVRO/Build+Documentation#BuildD%0Aocumentation-ProjectStructure

 On 3/26/12 7:52 PM, Deepak Nettem deepaknet...@gmail.com wrote:

 When I include some Avro code in my Mapper, I get this error:
 
 Error:
 org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$F
 eature;)Lorg/codehaus/jackson/JsonFactory;
 
 Particularly, just these two lines of code:
 
 InputStream in =
 getClass().getResourceAsStream(schema.avsc);
 Schema schema = Schema.parse(in);
 
 This code works perfectly when run as a stand alone application outside of
 Hadoop. Why do I get this error? and what's the best way to get rid of it?
 
 I am using Hadoop 0.20.2, and writing code in the new API.
 
 I found that the Hadoop lib directory contains jackson-core-asl-1.0.1.jar
 and jackson-mapper-asl-1.0.1.jar.
 
 I removed these, but got this error:
 hadoop Exception in thread main java.lang.
 NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException
 
 I am using Maven as a build tool, and my pom.xml has this dependency:
 
 dependency
 groupIdorg.codehaus.jackson/groupId
   artifactIdjackson-mapper-asl/artifactId
   version1.5.2/version
   scopecompile/scope
 /dependency
 
 
 
 
 I added the dependency:
 
 
 dependency
 groupIdorg.codehaus.jackson/groupId
   artifactIdjackson-core-asl/artifactId
   version1.5.2/version
   scopecompile/scope
 /dependency
 
 But that still gives me this error:
 
 Error: org.codehaus.jackson.
 JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus
 /jackson/JsonFactory;
 
 -
 
 I also tried replacing the earlier dependencies with these:
 
dependency
 groupIdorg.apache.avro/
 groupId
 artifactIdavro-tools/artifactId
 version1.6.3/version
 /dependency
 
 dependency
 groupIdorg.apache.avro/groupId
 artifactIdavro/artifactId
 version1.6.3/version
 /dependency
 
 
 dependency
 groupIdorg.codehaus.jackson/groupId
   artifactIdjackson-mapper-asl/artifactId
   version1.8.8/version
   scopecompile/scope
 /dependency
 
 dependency
 groupIdorg.codehaus.jackson/groupId
   artifactIdjackson-core-asl/artifactId
   version1.8.8/version
   scopecompile/scope
 /dependency
 
 And this is my app dependency tree:
 
 [INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ AvroTest ---
 [INFO] org.avrotest:AvroTest:jar:1.0-SNAPSHOT
 [INFO] +- junit:junit:jar:3.8.1:test (scope not updated to compile)
 [INFO] +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile
 [INFO] +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile
 [INFO] +- net.sf.json-lib:json-lib:jar:jdk15:2.3:compile
 [INFO] |  +- commons-beanutils:commons-beanutils:jar:1.8.0:compile
 [INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
 [INFO] |  +- commons-lang:commons-lang:jar:2.4:compile
 [INFO] |  +- commons-logging:commons-logging:jar:1.1.1:compile
 [INFO] |  \- net.sf.ezmorph:ezmorph:jar:1.0.6:compile
 [INFO] +- org.apache.avro:avro-tools:jar:1.6.3:compile
 [INFO] |  \- org.slf4j:slf4j-api:jar:1.6.4:compile
 [INFO] +- org.apache.avro:avro:jar:1.6.3:compile
 [INFO] |  +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
 [INFO] |  \- org.xerial.snappy:snappy-java:jar:1.0.4.1:compile
 [INFO] \- org.apache.hadoop:hadoop-core:jar:0.20.2:compile
 [INFO]+- commons-cli:commons-cli:jar:1.2:compile
 [INFO]+- xmlenc:xmlenc:jar:0.52:compile
 [INFO]+- commons-httpclient:commons-httpclient:jar:3.0.1:compile
 [INFO]+- commons-codec:commons-codec:jar:1.3:compile
 [INFO]+- commons-net:commons-net:jar:1.4.1:compile
 [INFO]+- org.mortbay.jetty:jetty:jar:6.1.14:compile
 [INFO]+- org.mortbay.jetty:jetty-util:jar:6.1.14:compile
 [INFO]+- tomcat:jasper-runtime:jar:5.5.12:compile
 [INFO]+- tomcat:jasper-compiler:jar:5.5.12:compile
 [INFO]+- org.mortbay.jetty:jsp-api-2.1:jar:6.1.14:compile
 [INFO]+- org.mortbay.jetty:jsp-2.1:jar:6.1.14:compile
 [INFO]|  \- ant:ant:jar:1.6.5:compile
 [INFO]+- commons-el:commons-el:jar:1.0:compile
 [INFO]+- net.java.dev.jets3t:jets3t:jar

Re: Avro, Hadoop0.20.2, Jackson Error

2012-03-26 Thread Scott Carey
Does it still happen if you configure

avro-tools to use 

dependency
  groupIdorg.apache.avro/groupId
  artifactIdavro-tools/artifactId
  version1.6.3/version
  classifiernodeps/classifier
/dependency


?

You have two hadoop's, two jacksons, and even two avro:avro artifacts in
your classpath if you use the avro bundle jar with a default classifier.

avro-tools jar is not intended for inclusion in a project, as it is a jar
with dependencies inside.
https://cwiki.apache.org/confluence/display/AVRO/Build+Documentation#BuildD
ocumentation-ProjectStructure

On 3/26/12 7:52 PM, Deepak Nettem deepaknet...@gmail.com wrote:

When I include some Avro code in my Mapper, I get this error:

Error:
org.codehaus.jackson.JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$F
eature;)Lorg/codehaus/jackson/JsonFactory;

Particularly, just these two lines of code:

InputStream in =
getClass().getResourceAsStream(schema.avsc);
Schema schema = Schema.parse(in);

This code works perfectly when run as a stand alone application outside of
Hadoop. Why do I get this error? and what's the best way to get rid of it?

I am using Hadoop 0.20.2, and writing code in the new API.

I found that the Hadoop lib directory contains jackson-core-asl-1.0.1.jar
and jackson-mapper-asl-1.0.1.jar.

I removed these, but got this error:
hadoop Exception in thread main java.lang.
NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException

I am using Maven as a build tool, and my pom.xml has this dependency:

dependency
groupIdorg.codehaus.jackson/groupId
  artifactIdjackson-mapper-asl/artifactId
  version1.5.2/version
  scopecompile/scope
/dependency




I added the dependency:


dependency
groupIdorg.codehaus.jackson/groupId
  artifactIdjackson-core-asl/artifactId
  version1.5.2/version
  scopecompile/scope
/dependency

But that still gives me this error:

Error: org.codehaus.jackson.
JsonFactory.enable(Lorg/codehaus/jackson/JsonParser$Feature;)Lorg/codehaus
/jackson/JsonFactory;

-

I also tried replacing the earlier dependencies with these:

   dependency
groupIdorg.apache.avro/
groupId
artifactIdavro-tools/artifactId
version1.6.3/version
/dependency

dependency
groupIdorg.apache.avro/groupId
artifactIdavro/artifactId
version1.6.3/version
/dependency


dependency
groupIdorg.codehaus.jackson/groupId
  artifactIdjackson-mapper-asl/artifactId
  version1.8.8/version
  scopecompile/scope
/dependency

dependency
groupIdorg.codehaus.jackson/groupId
  artifactIdjackson-core-asl/artifactId
  version1.8.8/version
  scopecompile/scope
/dependency

And this is my app dependency tree:

[INFO] --- maven-dependency-plugin:2.1:tree (default-cli) @ AvroTest ---
[INFO] org.avrotest:AvroTest:jar:1.0-SNAPSHOT
[INFO] +- junit:junit:jar:3.8.1:test (scope not updated to compile)
[INFO] +- org.codehaus.jackson:jackson-mapper-asl:jar:1.8.8:compile
[INFO] +- org.codehaus.jackson:jackson-core-asl:jar:1.8.8:compile
[INFO] +- net.sf.json-lib:json-lib:jar:jdk15:2.3:compile
[INFO] |  +- commons-beanutils:commons-beanutils:jar:1.8.0:compile
[INFO] |  +- commons-collections:commons-collections:jar:3.2.1:compile
[INFO] |  +- commons-lang:commons-lang:jar:2.4:compile
[INFO] |  +- commons-logging:commons-logging:jar:1.1.1:compile
[INFO] |  \- net.sf.ezmorph:ezmorph:jar:1.0.6:compile
[INFO] +- org.apache.avro:avro-tools:jar:1.6.3:compile
[INFO] |  \- org.slf4j:slf4j-api:jar:1.6.4:compile
[INFO] +- org.apache.avro:avro:jar:1.6.3:compile
[INFO] |  +- com.thoughtworks.paranamer:paranamer:jar:2.3:compile
[INFO] |  \- org.xerial.snappy:snappy-java:jar:1.0.4.1:compile
[INFO] \- org.apache.hadoop:hadoop-core:jar:0.20.2:compile
[INFO]+- commons-cli:commons-cli:jar:1.2:compile
[INFO]+- xmlenc:xmlenc:jar:0.52:compile
[INFO]+- commons-httpclient:commons-httpclient:jar:3.0.1:compile
[INFO]+- commons-codec:commons-codec:jar:1.3:compile
[INFO]+- commons-net:commons-net:jar:1.4.1:compile
[INFO]+- org.mortbay.jetty:jetty:jar:6.1.14:compile
[INFO]+- org.mortbay.jetty:jetty-util:jar:6.1.14:compile
[INFO]+- tomcat:jasper-runtime:jar:5.5.12:compile
[INFO]+- tomcat:jasper-compiler:jar:5.5.12:compile
[INFO]+- org.mortbay.jetty:jsp-api-2.1:jar:6.1.14:compile
[INFO]+- org.mortbay.jetty:jsp-2.1:jar:6.1.14:compile
[INFO]|  \- ant:ant:jar:1.6.5:compile
[INFO]+- commons-el:commons-el:jar:1.0:compile
[INFO]+- net.java.dev.jets3t:jets3t:jar:0.7.1:compile
[INFO]+- org.mortbay.jetty:servlet-api-2.5:jar:6.1.14:compile
[INFO]+- net.sf.kosmosfs:kfs:jar:0.3:compile
[INFO]+- hsqldb:hsqldb:jar:1.8.0.10:compile
[INFO]+- oro:oro:jar:2.0.8:compile
[INFO]\- org.eclipse.jdt:core:jar:3.1.1:compile

I still get the same error.

Somebody please please help me with this. I need to resolve this asap!!

Best,
Deepak



Hadoop Serialization: Avro

2011-11-26 Thread Leonardo Urbina
Hey everyone,

First time posting to the list. I'm currently writing a hadoop job that
will run daily and whose output will be part of the part of the next day's
input. Also, the output will potentially be read by other programs for
later analysis.

Since my program's output is used as part of the next day's input, it would
be nice if it was stored in some binary format that is easy to read the
next time around. But this format also needs to be readable by other
outside programs, not necessarily written in Java. After searching for a
while it seems that Avro is what I want to be using. In any case, I have
been looking around for a while and I can't seem to find a single example
of how to use Avro within a Hadoop job.

It seems that in order to use Avro I need to change the io.serializations
value, however I don't know which value should be specified. Furthermore, I
found that there are classes Avro{Input,Output}Format but these use a
series of other Avro classes which, as far as I understand, seem need the
use of other Avro classes such as AvroWrapper, AvroKey, AvroValue, and as
far as I am concerned Avro* (with * replaced with pretty much any Hadoop
class name). It seems however that these are used so that the Avro format
is used throughout the Hadoop process to pass objects around.

I just want to use Avro to save my output and read it again as input next
time around. So far I have been using SequenceFile{Input,Output}Format, and
have implemented the Writable interface in the relevant classes, however
this is not portable to other languages. Is there a way to use Avro without
a substantial rewrite (using Avro* classes) of my Hadoop job? Thanks in
advance,

Best,
-Leo

-- 
Leo Urbina
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Department of Mathematics
lurb...@mit.edu


Re: Hadoop Serialization: Avro

2011-11-26 Thread Brock Noland
Hi,

Depending on the response you get here, you might also post the
question separately on avro-user.

On Sat, Nov 26, 2011 at 1:46 PM, Leonardo Urbina lurb...@mit.edu wrote:
 Hey everyone,

 First time posting to the list. I'm currently writing a hadoop job that
 will run daily and whose output will be part of the part of the next day's
 input. Also, the output will potentially be read by other programs for
 later analysis.

 Since my program's output is used as part of the next day's input, it would
 be nice if it was stored in some binary format that is easy to read the
 next time around. But this format also needs to be readable by other
 outside programs, not necessarily written in Java. After searching for a
 while it seems that Avro is what I want to be using. In any case, I have
 been looking around for a while and I can't seem to find a single example
 of how to use Avro within a Hadoop job.

 It seems that in order to use Avro I need to change the io.serializations
 value, however I don't know which value should be specified. Furthermore, I
 found that there are classes Avro{Input,Output}Format but these use a
 series of other Avro classes which, as far as I understand, seem need the
 use of other Avro classes such as AvroWrapper, AvroKey, AvroValue, and as
 far as I am concerned Avro* (with * replaced with pretty much any Hadoop
 class name). It seems however that these are used so that the Avro format
 is used throughout the Hadoop process to pass objects around.

 I just want to use Avro to save my output and read it again as input next
 time around. So far I have been using SequenceFile{Input,Output}Format, and
 have implemented the Writable interface in the relevant classes, however
 this is not portable to other languages. Is there a way to use Avro without
 a substantial rewrite (using Avro* classes) of my Hadoop job? Thanks in
 advance,

 Best,
 -Leo

 --
 Leo Urbina
 Massachusetts Institute of Technology
 Department of Electrical Engineering and Computer Science
 Department of Mathematics
 lurb...@mit.edu



Re: Hadoop Serialization: Avro

2011-11-26 Thread Leonardo Urbina
Thanks, I will send the question to that last as well,

Best,
-Leo

Sent from my phone

On Nov 26, 2011, at 7:32 PM, Brock Noland br...@cloudera.com wrote:

 Hi,

 Depending on the response you get here, you might also post the
 question separately on avro-user.

 On Sat, Nov 26, 2011 at 1:46 PM, Leonardo Urbina lurb...@mit.edu wrote:
 Hey everyone,

 First time posting to the list. I'm currently writing a hadoop job that
 will run daily and whose output will be part of the part of the next day's
 input. Also, the output will potentially be read by other programs for
 later analysis.

 Since my program's output is used as part of the next day's input, it would
 be nice if it was stored in some binary format that is easy to read the
 next time around. But this format also needs to be readable by other
 outside programs, not necessarily written in Java. After searching for a
 while it seems that Avro is what I want to be using. In any case, I have
 been looking around for a while and I can't seem to find a single example
 of how to use Avro within a Hadoop job.

 It seems that in order to use Avro I need to change the io.serializations
 value, however I don't know which value should be specified. Furthermore, I
 found that there are classes Avro{Input,Output}Format but these use a
 series of other Avro classes which, as far as I understand, seem need the
 use of other Avro classes such as AvroWrapper, AvroKey, AvroValue, and as
 far as I am concerned Avro* (with * replaced with pretty much any Hadoop
 class name). It seems however that these are used so that the Avro format
 is used throughout the Hadoop process to pass objects around.

 I just want to use Avro to save my output and read it again as input next
 time around. So far I have been using SequenceFile{Input,Output}Format, and
 have implemented the Writable interface in the relevant classes, however
 this is not portable to other languages. Is there a way to use Avro without
 a substantial rewrite (using Avro* classes) of my Hadoop job? Thanks in
 advance,

 Best,
 -Leo

 --
 Leo Urbina
 Massachusetts Institute of Technology
 Department of Electrical Engineering and Computer Science
 Department of Mathematics
 lurb...@mit.edu



Hadoop/CDH + Avro

2011-09-13 Thread GOEKE, MATTHEW (AG/1000)
Would anyone happen to be able to share a good reference for Avro integration 
with Hadoop? I can find plenty of material around using Avro by itself but I 
have found little to no documentation on how to implement it as both the 
protocol and as custom key/value types.

Thanks,
Matt
This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of Viruses or other Malware.
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control 
laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and 
sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this 
information you are obligated to comply with all
applicable U.S. export laws and regulations.