This is a Nutch issue and always has been. Please go to nutch user@ it is a Nutch configuration issue that is all
On Saturday, February 20, 2016, Tom Running <runningt...@gmail.com> wrote: > Lewis and Furkan, > > Thank you both for kindly explain and providing great tips in order for me > to get Nutch, Gora and HBase working. I can see Nutch's crawl data in > Hbase under the Webpage table by using scan 'webpage' with in hbase > shell. Thank you. > > I am still trying to get SORL to work. > After I ran this command. > ./nutch solrindex http://localhost:8983/solr -all > > ****** it came back with the following info ***** > ****** doesn't seem to have any problem there **** > IndexingJob: starting > Active IndexWriters : > SOLRIndexWriter > solr.server.url : URL of the SOLR instance (mandatory) > solr.commit.size : buffer size when sending to SOLR (default 1000) > solr.mapping.file : name of the mapping file for fields (default > solrindex-mapping.xml) > solr.auth : use authentication (default false) > solr.auth.username : username for authentication > solr.auth.password : password for authentication > IndexingJob: done. > > *** it doesn't seem to have any errors******************** > > However, when I launch the SOLR Web UI interface can not query or find any > things under the default collection1 or the gettingstarted_shard1_replica1 > or gettingstarted_shard2_replica1 > > > I have also tried with this option (with the colletion1) and still not > able to query anything. > ./nutch solrindex http://localhost:8983/solr/collection1 -all > > > After download SOLR 4.10.3 and start it as it with command > /home/solr/bin/solr start -e cloud -noprompt > > I did not modify any configuration file not posting any file or directory > from within SOLR. > I am assuming this command ./nutch solrindex > http://localhost:8983/solr/collection1 > will do all the posting and index for SOLR. > > Any ideas what am I missing here. Do I need to perform any things for > SOLR for this to work? > > Thank you very much. > Tom > > > > > > On Sat, Feb 20, 2016 at 4:07 AM, Furkan KAMACI <furkankam...@gmail.com > <javascript:_e(%7B%7D,'cvml','furkankam...@gmail.com');>> wrote: > >> Hi Tom, >> >> Download and configure both HBase and Solr and make them up. You do not >> need to build Gora at your case (also neither Hbase nor Solr). It is a >> dependency included at Nutch. >> >> Nutch will crawl webpages and use Gora as a backend system to communicate >> with Hbase and Solr. >> >> Kind Regards, >> Furkan KAMACI >> 20 Şub 2016 10:45 tarihinde "Tom Running" <runningt...@gmail.com >> <javascript:_e(%7B%7D,'cvml','runningt...@gmail.com');>> yazdı: >> >> I meant SOLR 4.10.3 instead SOLR 2.X >>> >>> On Sat, Feb 20, 2016 at 3:44 AM, Tom Running <runningt...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','runningt...@gmail.com');>> wrote: >>> >>>> Great. Thank you. >>>> >>>> I am just wondering. How is building GORA will help with anything in >>>> my situation? probably not, right? it doesn't seem I need to use any of >>>> the built. >>>> >>>> It seems GORA already included in the SOLR 2.X and HBASE .98.9 >>>> release. Is this a correct assumption? >>>> >>>> Thank you. >>>> Tom >>>> >>>> On Sat, Feb 20, 2016 at 1:35 AM, Lewis John Mcgibbney < >>>> lewis.mcgibb...@gmail.com >>>> <javascript:_e(%7B%7D,'cvml','lewis.mcgibb...@gmail.com');>> wrote: >>>> >>>>> Hi Tom, >>>>> All you need to do is ensure that gora-hbase dependency is uncommented >>>>> within $NUTCH_HOME/ivy/ivy.xml >>>>> https://github.com/apache/nutch/blob/2.x/ivy/ivy.xml#L116 >>>>> >>>>> You then need to ensure that that the storage.data.store.class is >>>>> correct in $NUTCH_HOME/conf/nutch-default.xml. This needs to be set to >>>>> 'org.apache.gora.hbase.store.HBaseStore' >>>>> >>>>> https://github.com/apache/nutch/blob/2.x/conf/nutch-default.xml#L1333-L1371 >>>>> >>>>> Finally, you need to configure $NUTCH_HOME/conf/gora.properties >>>>> https://github.com/apache/nutch/blob/2.x/conf/gora.properties >>>>> Make sure that the correct gora-hbase configuration is included. >>>>> >>>>> That is all you need to do. >>>>> Lewis >>>>> >>>>> On Fri, Feb 19, 2016 at 10:29 PM, Tom Running <runningt...@gmail.com >>>>> <javascript:_e(%7B%7D,'cvml','runningt...@gmail.com');>> wrote: >>>>> >>>>>> Furkan, >>>>>> >>>>>> What you had mention is exactly what I am trying to accomplish. >>>>>> > Using Nutch to crawl websites and storing them at Hbase and >>>>>> indexing at Solr via Gora? >>>>>> >>>>>> >>>>>> I need a bit more help to ensure what I am about to do is correct.. >>>>>> >>>>>> #1. >>>>>> after successfully build GORA. I have the following two .jar files >>>>>> in /gora/gora-solr/lib/ directory. Lot of .jar files in the /lib >>>>>> directory but only two .jar files relative to solr. >>>>>> solr-solrj-4.10.3.jar >>>>>> solr-core-4.10.3.jar >>>>>> >>>>>> >>>>>> #2. >>>>>> In the solr source distribution directory I have also see the same >>>>>> exact .jar files. This is a source code download. I have not build this >>>>>> solr yet. >>>>>> >>>>>> /home/solr/dist >>>>>> solr-solrj-4.10.3.jar >>>>>> solr-core-4.10.3.jar >>>>>> solr-4.10.3.war >>>>>> >>>>>> >>>>>> My question is. Should I copy the two solr files in #1 to >>>>>> /home/solr/dist/ then build solr? >>>>>> >>>>>> >>>>>> #3. >>>>>> Should I also do the same thing for hbase. Copy the >>>>>> /gora/gora-hbase/lib/hbase-* into /hbase/lib/ then build hbase? >>>>>> >>>>>> >>>>>> >>>>>> Thank you. >>>>>> Tom >>>>>> >>>>>> On Wed, Feb 17, 2016 at 5:31 PM, Furkan KAMACI < >>>>>> furkankam...@gmail.com >>>>>> <javascript:_e(%7B%7D,'cvml','furkankam...@gmail.com');>> wrote: >>>>>> >>>>>>> Hi Tom, >>>>>>> >>>>>>> What do you aim? Using Nutch to crawl websites and storing them at >>>>>>> Hbase and indexing at Solr via Gora? Do you have any other use cases? >>>>>>> >>>>>>> "Simply", you may think that Gora will act as Hibernate of NoSQL >>>>>>> ecosystem at your use case. So, it will not run as a service, it will >>>>>>> be a >>>>>>> dependency. >>>>>>> >>>>>>> Kind Regards, >>>>>>> Furkan KAMACI >>>>>>> 17 Şub 2016 22:13 tarihinde "Lewis John Mcgibbney" < >>>>>>> lewis.mcgibb...@gmail.com >>>>>>> <javascript:_e(%7B%7D,'cvml','lewis.mcgibb...@gmail.com');>> yazdı: >>>>>>> >>>>>>> Hi Tom, >>>>>>>> You can just follow the following tutorial >>>>>>>> http://wiki.apache.org/nutch/Nutch2Tutorial >>>>>>>> Replacing the gora-hbase configuration from within your Nutch >>>>>>>> conf/nutch-default.xml and conf/gora.properties and with the relevant >>>>>>>> dependency from within ivy/ivy.xml with the gora-solr equivalent. >>>>>>>> Any more issues then please let us know. Gora does not run as a >>>>>>>> service no, it is a dependency and is managed through your client >>>>>>>> dependency manager (which in Nutch 2.X is Ivy). >>>>>>>> Thanks >>>>>>>> >>>>>>>> On Wed, Feb 17, 2016 at 12:04 PM, Tom Running < >>>>>>>> runningt...@gmail.com >>>>>>>> <javascript:_e(%7B%7D,'cvml','runningt...@gmail.com');>> wrote: >>>>>>>> >>>>>>>>> Furkan and Lewis, >>>>>>>>> >>>>>>>>> Thank you for your response to my SOS. I tried varies suggestion >>>>>>>>> on editing pom.xlm file and including down grade the java JDK version >>>>>>>>> to >>>>>>>>> 1.7 and removed the .m2 folder and run mvn clean install >>>>>>>>> again and >>>>>>>>> it build successfully. >>>>>>>>> >>>>>>>>> Now Gora is successfully build. I am trying to understand how to >>>>>>>>> get Gora run or start in order get the following three packages to >>>>>>>>> work >>>>>>>>> together Nutch, Solr and Hbase with GORA >>>>>>>>> Does Gora start as a service? >>>>>>>>> Or >>>>>>>>> To get other three packages to work with GORA I will need to copy >>>>>>>>> the *.jar to the three packages (Nutch, Solr and Hbase) lib folder? >>>>>>>>> >>>>>>>>> >>>>>>>>> *I am a bit confuse on how to get these packages to work with >>>>>>>>> GORA. I had read GORA's quickstart guide but am still not too clear >>>>>>>>> on >>>>>>>>> what to do.* >>>>>>>>> >>>>>>>>> >>>>>>>>> *Can you provide some direction.* >>>>>>>>> >>>>>>>>> *Thank you.* >>>>>>>>> >>>>>>>>> *Tom* >>>>>>>>> >>>>>>>>> On Wed, Feb 17, 2016 at 1:56 PM, Furkan KAMACI < >>>>>>>>> furkankam...@gmail.com >>>>>>>>> <javascript:_e(%7B%7D,'cvml','furkankam...@gmail.com');>> wrote: >>>>>>>>> >>>>>>>>>> Hi Tom, >>>>>>>>>> >>>>>>>>>> It seems that your maven is at offline mode. There may be a >>>>>>>>>> problem with your settings.xml or environment variable for maven >>>>>>>>>> home. How >>>>>>>>>> do you build your project? Could you build it with -X option and >>>>>>>>>> send the >>>>>>>>>> output? >>>>>>>>>> >>>>>>>>>> Kind Regards, >>>>>>>>>> Furkan KAMACI >>>>>>>>>> 17 Şub 2016 20:51 tarihinde "Tom Running" <runningt...@gmail.com >>>>>>>>>> <javascript:_e(%7B%7D,'cvml','runningt...@gmail.com');>> yazdı: >>>>>>>>>> >>>>>>>>>> What to do with the error below. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [INFO] Building Apache Gora :: Accumulo 0.6.1 >>>>>>>>>> [INFO] >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> [WARNING] The POM for org.apache.accumulo:accumulo-core:jar:1.5.1 >>>>>>>>>> is missing, no dependency information available >>>>>>>>>> [WARNING] The POM for >>>>>>>>>> org.apache.accumulo:accumulo-minicluster:jar:1.5.1 is missing, no >>>>>>>>>> dependency information available >>>>>>>>>> [WARNING] The POM for org.jboss.netty:netty:jar:3.2.2.Final is >>>>>>>>>> missing, no dependency information available >>>>>>>>>> [INFO] >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> [INFO] Reactor Summary: >>>>>>>>>> [INFO] >>>>>>>>>> [INFO] Apache Gora ........................................ >>>>>>>>>> SUCCESS [ 1.468 s] >>>>>>>>>> [INFO] Apache Gora :: Compiler ............................ >>>>>>>>>> SUCCESS [ 0.121 s] >>>>>>>>>> [INFO] Apache Gora :: Compiler-CLI ........................ >>>>>>>>>> SUCCESS [ 0.032 s] >>>>>>>>>> [INFO] Apache Gora :: Shims Hadoop ........................ >>>>>>>>>> SUCCESS [ 0.543 s] >>>>>>>>>> [INFO] Apache Gora :: Shims Hadoop 1.x .................... >>>>>>>>>> SUCCESS [ 0.190 s] >>>>>>>>>> [INFO] Apache Gora :: Shims Hadoop 2.x .................... >>>>>>>>>> SUCCESS [ 0.295 s] >>>>>>>>>> [INFO] Apache Gora :: Shims Distribution .................. >>>>>>>>>> SUCCESS [ 0.026 s] >>>>>>>>>> [INFO] Apache Gora :: Core ................................ >>>>>>>>>> SUCCESS [ 0.806 s] >>>>>>>>>> [INFO] Apache Gora :: Accumulo ............................ >>>>>>>>>> FAILURE [ 0.120 s] >>>>>>>>>> [INFO] Apache Gora :: Cassandra ........................... >>>>>>>>>> SKIPPED >>>>>>>>>> [INFO] Apache Gora :: GoraCI .............................. >>>>>>>>>> SKIPPED >>>>>>>>>> [INFO] Apache Gora :: HBase ............................... >>>>>>>>>> SKIPPED >>>>>>>>>> [INFO] Apache Gora :: MongoDB ............................. >>>>>>>>>> SKIPPED >>>>>>>>>> [INFO] Apache Gora :: Solr ................................ >>>>>>>>>> SKIPPED >>>>>>>>>> [INFO] Apache Gora :: Tutorial ............................ >>>>>>>>>> SKIPPED >>>>>>>>>> [INFO] Apache Gora :: Sources-Dist ........................ >>>>>>>>>> SKIPPED >>>>>>>>>> [INFO] >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> [INFO] BUILD FAILURE >>>>>>>>>> [INFO] >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> [INFO] Total time: 6.359 s >>>>>>>>>> [INFO] Finished at: 2016-02-17T02:00:39-05:00 >>>>>>>>>> [INFO] Final Memory: 25M/61M >>>>>>>>>> [INFO] >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> [ERROR] Failed to execute goal on project gora-accumulo: Could >>>>>>>>>> not resolve dependencies for project >>>>>>>>>> org.apache.gora:gora-accumulo:bundle:0.6.1: The following artifacts >>>>>>>>>> could >>>>>>>>>> not be resolved: org.apache.gora:gora-core:jar:0.6.1, >>>>>>>>>> org.apache.gora:gora-core:jar:tests:0.6.1, >>>>>>>>>> org.apache.accumulo:accumulo-core:jar:1.5.1, >>>>>>>>>> org.apache.accumulo:accumulo-minicluster:jar:1.5.1, >>>>>>>>>> jline:jline:jar:0.9.1, >>>>>>>>>> org.jboss.netty:netty:jar:3.2.2.Final, >>>>>>>>>> org.codehaus.jackson:jackson-jaxrs:jar:1.8.3, >>>>>>>>>> org.codehaus.jackson:jackson-xc:jar:1.8.3: Cannot access central ( >>>>>>>>>> https://repo.maven.apache.org/maven2) in offline mode and the >>>>>>>>>> artifact org.apache.gora:gora-core:jar:0.6.1 has not been downloaded >>>>>>>>>> from >>>>>>>>>> it before. -> [Help 1] >>>>>>>>>> [ERROR] >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Lewis* >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *Lewis* >>>>> >>>> >>>> >>> > -- *Lewis*