Hi Renato,

I do using Gora 0.5 from Apache: org.apache.gora. I just played a little
with Gora from com.argonio.gora (ver. 0.5.3 just builded for Hadoop 2.4.0
without shim layer)

I'll try to post a full report a bit later.

BR,
Alex



On Sat, Oct 4, 2014 at 5:07 PM, Renato Marroquín Mogrovejo <
[email protected]> wrote:

> Hi Alex,
>
> Just a quick question, why are you suing Gora from com.argonio.gora? Why
> don't you use the Apache one? org.apache.gora? And could you tell us what
> is exactly going wrong?
>
>
> Renato M.
>
> 2014-10-04 14:03 GMT+02:00 k4200 <[email protected]>:
>
> >  Hi Alex,
> >
> > > But info about another experiences with Nutch2+hadoop2 will also good..
> >
> > I set up Nutch 2.3 + CDH 4.7 (HBase 0.94, Hadoop 2.0 etc) a few months
> > ago, and it's working fine.
> >
> > I used the latest code from svn with no modifications, and followed
> > the tutorial below:
> > http://wiki.apache.org/nutch/Nutch2Tutorial
> >
> > HTH,
> > Kaz
> >
> > 2014-10-03 22:03 GMT+09:00 Alex Median <[email protected]>:
> > >
> > > Hi,
> > >
> > > Within a month I'm in the process of installing Nutch 2.3 in this
> > > configuration (subj).
> > > Nutch 2 initially with Hadoop 1 was chosen a few months ago, some of
> the
> > > coding is already done.
> > > We chose Amazon AWS Elastic MapReduce (EMR) as a platform.
> > > Unfortunately EMR Hadoop 1 version on an old Debian does not suit us.
> > > Therefore, we need to establish exactly Nutch 2 in the above
> > configuration:
> > > Hadoop 2.4.0 + HBase 0.94.18 (Amazon Linux: AMI version:3.2.1, Hadoop
> > > distribution:Amazon 2.4.0, Applications:HBase 0.94.18)
> > >
> > > But info about another experiences with Nutch2+hadoop2 will also good..
> > >
> > > What has been done for the last iteration of the installation on local
> > > computer:
> > >
> > > 1. Nutch 2.x
> > > 1.1 svn current 2.x version
> > > 1.2. prepared scripts:
> > > 1.2.1 ivy:
> > > <dependency org="org.apache.hadoop" name="hadoop-common" rev="2.4.0">..
> > > <dependency org="org.apache.hadoop" name="hadoop-mapreduce-client-core"
> > > rev="2.4.0">..
> > > <dependency org="org.apache.gora" name="gora" rev="0.5"
> > conf="*->default" />
> > > <dependency org="org.apache.gora" name="gora-hbase" rev="0.5"
> > > conf="*->default" />
> > > etc.
> > > 1.2.2 default.properties:
> > > hadoop.version=2.4.0
> > > version=2.3-SNAPSHOT
> > > etc.
> > > 1.3. added public int getFieldsCount() { return Field.values().length;
> }
> > to
> > > ProtocolStatus.java, ParseStatus.java, Host.java, WebPage.java.
> > >
> > > 2. HBase
> > > 2.1 svn HBase 0.94.18
> > > 2.2 prepared for Protobuf 2.5.0 [1], also thanks to Dobromyslov [5]
> > > 2.3 also generated hbase-0.94.18-hadoop-2.4.0.jar
> > >
> > > 3. Gora 0.5 (also was tested for versions 0.4, 0.6-SNAPSHOT, and 0.5.3
> > from
> > > com.argonio.gora)
> > >
> > > 4. Avro 1.7.6 (also played with versions 1.7.4, 1.7.7)
> > > 4.1 svn
> > > 4.2 patched for AVRO-813[2]
> > > 4.3 patched for AVRO-882[3] and rollbacked
> > > 4.4 patched as mentioned in [4] - commented throwing EOFException
> against
> > > org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473),
> > etc.
> > >
> > > After investigating numerous exceptions in many weeks, a number of
> > changes
> > > have been made in the code Nutch 2.x and Avro 1.7.6 to suppress
> > > exceptions and walk a little further. We got some success, Nutch looks
> > like
> > > a bit of running, but is unstable and incorrect. All necessary (for us)
> > > stages pass in cycle (inject, generate, fetch, parse, updatedb). But
> some
> > > functionalities are broken and ignored.
> > > It seems that because of the poor Nutch/Hadoop/HBase experience, we
> broke
> > > the normal data exchange between Nutch and HBase (also with gora and
> > avro).
> > > Perhaps some of the fields (and/or some of the data formats) read and
> > write
> > > incorrectly. For example, many markers are lost and temporary emulated
> in
> > > code to pass through the steps; data in batchId field are lost; scoring
> > is
> > > broken also.
> > >
> > > Please help us! Perhaps there are somewhere the necessary working
> > > assemblies and/or scripts and patches. Maybe someone has a positive
> > > experience in this. I'm ready to publish all my diffs and exception
> > traces.
> > > Also, I would be very grateful if someone would tell me when we can
> get a
> > > new of Nutch 2.3 release; it seems that it will be Hadoop2-compatible.
> > >
> > > [1] http://hbase.apache.org/book/configuration.html
> > > [2] https://issues.apache.org/jira/browse/AVRO-813
> > > [3] https://issues.apache.org/jira/browse/AVRO-882
> > >
> >
> http://mail-archives.apache.org/mod_mbox/avro-user/201108.mbox/%3ccaanh3_9_cqqbmt4vqyzg8-ikfo4nnlpcuzbbwd4kqoavpek...@mail.gmail.com%3E
> > > [4]
> > >
> >
> http://mail-archives.apache.org/mod_mbox/nutch-user/201409.mbox/%3cCAEmTxX9HrRM00SxerFAdRdZy=wVAd9xCchDTuLaxPQ=wi0q...@mail.gmail.com%3e
> > > [5]
> > >
> >
> http://stackoverflow.com/questions/13946725/configuring-hbase-standalone-mode-with-apache-nutch-java-lang-illegalargumente
> > > https://github.com/dobromyslov
> > >
> > > BR,
> > > Alex Median
> >
>

Reply via email to