Hi Renato, I do using Gora 0.5 from Apache: org.apache.gora. I just played a little with Gora from com.argonio.gora (ver. 0.5.3 just builded for Hadoop 2.4.0 without shim layer)
I'll try to post a full report a bit later. BR, Alex On Sat, Oct 4, 2014 at 5:07 PM, Renato Marroquín Mogrovejo < [email protected]> wrote: > Hi Alex, > > Just a quick question, why are you suing Gora from com.argonio.gora? Why > don't you use the Apache one? org.apache.gora? And could you tell us what > is exactly going wrong? > > > Renato M. > > 2014-10-04 14:03 GMT+02:00 k4200 <[email protected]>: > > > Hi Alex, > > > > > But info about another experiences with Nutch2+hadoop2 will also good.. > > > > I set up Nutch 2.3 + CDH 4.7 (HBase 0.94, Hadoop 2.0 etc) a few months > > ago, and it's working fine. > > > > I used the latest code from svn with no modifications, and followed > > the tutorial below: > > http://wiki.apache.org/nutch/Nutch2Tutorial > > > > HTH, > > Kaz > > > > 2014-10-03 22:03 GMT+09:00 Alex Median <[email protected]>: > > > > > > Hi, > > > > > > Within a month I'm in the process of installing Nutch 2.3 in this > > > configuration (subj). > > > Nutch 2 initially with Hadoop 1 was chosen a few months ago, some of > the > > > coding is already done. > > > We chose Amazon AWS Elastic MapReduce (EMR) as a platform. > > > Unfortunately EMR Hadoop 1 version on an old Debian does not suit us. > > > Therefore, we need to establish exactly Nutch 2 in the above > > configuration: > > > Hadoop 2.4.0 + HBase 0.94.18 (Amazon Linux: AMI version:3.2.1, Hadoop > > > distribution:Amazon 2.4.0, Applications:HBase 0.94.18) > > > > > > But info about another experiences with Nutch2+hadoop2 will also good.. > > > > > > What has been done for the last iteration of the installation on local > > > computer: > > > > > > 1. Nutch 2.x > > > 1.1 svn current 2.x version > > > 1.2. prepared scripts: > > > 1.2.1 ivy: > > > <dependency org="org.apache.hadoop" name="hadoop-common" rev="2.4.0">.. > > > <dependency org="org.apache.hadoop" name="hadoop-mapreduce-client-core" > > > rev="2.4.0">.. > > > <dependency org="org.apache.gora" name="gora" rev="0.5" > > conf="*->default" /> > > > <dependency org="org.apache.gora" name="gora-hbase" rev="0.5" > > > conf="*->default" /> > > > etc. > > > 1.2.2 default.properties: > > > hadoop.version=2.4.0 > > > version=2.3-SNAPSHOT > > > etc. > > > 1.3. added public int getFieldsCount() { return Field.values().length; > } > > to > > > ProtocolStatus.java, ParseStatus.java, Host.java, WebPage.java. > > > > > > 2. HBase > > > 2.1 svn HBase 0.94.18 > > > 2.2 prepared for Protobuf 2.5.0 [1], also thanks to Dobromyslov [5] > > > 2.3 also generated hbase-0.94.18-hadoop-2.4.0.jar > > > > > > 3. Gora 0.5 (also was tested for versions 0.4, 0.6-SNAPSHOT, and 0.5.3 > > from > > > com.argonio.gora) > > > > > > 4. Avro 1.7.6 (also played with versions 1.7.4, 1.7.7) > > > 4.1 svn > > > 4.2 patched for AVRO-813[2] > > > 4.3 patched for AVRO-882[3] and rollbacked > > > 4.4 patched as mentioned in [4] - commented throwing EOFException > against > > > org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473), > > etc. > > > > > > After investigating numerous exceptions in many weeks, a number of > > changes > > > have been made in the code Nutch 2.x and Avro 1.7.6 to suppress > > > exceptions and walk a little further. We got some success, Nutch looks > > like > > > a bit of running, but is unstable and incorrect. All necessary (for us) > > > stages pass in cycle (inject, generate, fetch, parse, updatedb). But > some > > > functionalities are broken and ignored. > > > It seems that because of the poor Nutch/Hadoop/HBase experience, we > broke > > > the normal data exchange between Nutch and HBase (also with gora and > > avro). > > > Perhaps some of the fields (and/or some of the data formats) read and > > write > > > incorrectly. For example, many markers are lost and temporary emulated > in > > > code to pass through the steps; data in batchId field are lost; scoring > > is > > > broken also. > > > > > > Please help us! Perhaps there are somewhere the necessary working > > > assemblies and/or scripts and patches. Maybe someone has a positive > > > experience in this. I'm ready to publish all my diffs and exception > > traces. > > > Also, I would be very grateful if someone would tell me when we can > get a > > > new of Nutch 2.3 release; it seems that it will be Hadoop2-compatible. > > > > > > [1] http://hbase.apache.org/book/configuration.html > > > [2] https://issues.apache.org/jira/browse/AVRO-813 > > > [3] https://issues.apache.org/jira/browse/AVRO-882 > > > > > > http://mail-archives.apache.org/mod_mbox/avro-user/201108.mbox/%3ccaanh3_9_cqqbmt4vqyzg8-ikfo4nnlpcuzbbwd4kqoavpek...@mail.gmail.com%3E > > > [4] > > > > > > http://mail-archives.apache.org/mod_mbox/nutch-user/201409.mbox/%3cCAEmTxX9HrRM00SxerFAdRdZy=wVAd9xCchDTuLaxPQ=wi0q...@mail.gmail.com%3e > > > [5] > > > > > > http://stackoverflow.com/questions/13946725/configuring-hbase-standalone-mode-with-apache-nutch-java-lang-illegalargumente > > > https://github.com/dobromyslov > > > > > > BR, > > > Alex Median > > >

