Re: DataFileAvroStore within LogManager Tutorial [WAS] Re: DataFileAvroStore vs. AvroStore

Enis Söztutar Wed, 10 Oct 2012 13:57:51 -0700

Lewis,

You are using a pseudo-dist hdfs, but still providing a local
gora.avrostore.output.path. Can you try with an location on the hdfs.


Enis

On Wed, Oct 10, 2012 at 1:40 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi,
>
> For the sake of obtaining a pure understanding of this myself I'm
> trying to use DataFileAvroStore with the gora-tutorial LogManager
> scenario... with little luck. Config as follows
>
> gora.properties
> ---------------------
> gora.datastore.default=org.apache.gora.avro.store.DataFileAvroStore
> gora.avrostore.output.path=file:///home/lewis/ASF/gora_trunk/gora.output
>
> gora-datafileavrostore-mapping.xml
> ---------------------------------------------------
> non-existent... yet
>
> I'm running hadoop 1.0.1 (for compatibility with Gora trunk) in pseudo
> distrib with the following settings
>
> core-site.xml
> ------------------
> <configuration>
>      <property>
>          <name>fs.default.name</name>
>          <value>hdfs://localhost:9000</value>
>          <description>URI of NameNode.</description>
>      </property>
> </configuration>
>
> hdfs-site.xml
> ------------------
>      <property>
>          <name>dfs.replication</name>
>          <value>1</value>
>          <description></description>
>      </property>
>
>      <property>
>          <name>dfs.name.dir</name>
>          <value>/home/lewis/ASF/hadoop_output/dfs/name/</value>
>          <description>Path on the local filesystem where the NameNode
> stores the namespace and transactions logs persistently.</description>
>      </property>
>
>      <property>
>          <name>dfs.data.dir</name>
>          <value>/home/lewis/ASF/hadoop_output/dfs/data/</value>
>          <description>Comma separated list of paths on the local
> filesystem of a DataNode where it should store its blocks.
> </description>
>      </property>
>
> mapred-site.xml
> ------------------------
> <property>
>     <name>mapred.job.tracker</name>
>     <value>localhost:9001</value>
>     <description>URI of job tracker.</description>
> </property>
>
> <property>
>     <name>mapred.system.dir</name>
>     <value>/home/lewis/ASF/hadoop_output/mapred/system_files</value>
>     <description>Path on the HDFS where where the MapReduce framework
> stores system files e.g. /hadoop/mapred/system/. </description>
> </property>
>
> <property>
>     <name>mapred.local.dir</name>
>     <value>/home/lewis/ASF/hadoop_output/mapred/</value>
>     <description>Comma-separated list of paths on the local filesystem
> where temporary MapReduce data is written. </description>
> </property>
>
> <property>
>     <name>mapred.child.java.opts</name>
>     <value>-Xmx1024m</value>
>     <description>Memory allocated to the medred children
> nodes.</description>
> </property>
>
> I've been running this set up with both Nutch 2.x (head) and Cassandra
> 1.1.1 as well as the goraci module so I know my current Hadoop set up
> is 'OK'. When I parse the webserver logs within the tutorial module
> everything is fine, however when I attempt to query an individual
> record I am getting
>
> lewis@lewis-desktop:~/ASF/gora_trunk$ ./bin/gora logmanager -query 10
> Exception in thread "main" java.lang.IllegalArgumentException: Can not
> create a Path from a null string
>         at org.apache.hadoop.fs.Path.checkPathArg(Path.java:78)
>         at org.apache.hadoop.fs.Path.<init>(Path.java:90)
>         at
> org.apache.gora.avro.store.DataFileAvroStore.createFsInput(DataFileAvroStore.java:85)
>         at
> org.apache.gora.avro.store.DataFileAvroStore.executeQuery(DataFileAvroStore.java:67)
>         at
> org.apache.gora.store.impl.FileBackedDataStoreBase.execute(FileBackedDataStoreBase.java:163)
>         at org.apache.gora.query.impl.QueryBase.execute(QueryBase.java:71)
>         at
> org.apache.gora.tutorial.log.LogManager.query(LogManager.java:156)
>         at
> org.apache.gora.tutorial.log.LogManager.main(LogManager.java:246)
>
> Before I head over to hadoop forums I thought best to fire this one on
> here as it primarily concerns Gora config and fitting this around
> Hadoop.
>
> Any thoughts would be excellent here...
>
> Thanks
>
> Lewis
>
>
> On Wed, Oct 10, 2012 at 12:57 AM, Enis Söztutar <[email protected]> wrote:
> > Sorry, It's been some time that I last looked into these. AvroStore uses
> > files and writes data with DatumWriter directly, whereas
> DataFileAvroStore
> > uses the data file, which is an avro file format. This format support
> > blocks, so they can be split for mapreduce tasks.
> >
> > Yes, all FileBasedDataStores work on top of files stored at a hadoop file
> > system. even local file system should work.
>

Re: DataFileAvroStore within LogManager Tutorial [WAS] Re: DataFileAvroStore vs. AvroStore

Reply via email to