Ohh. I'm a bit confused. What of the following is true in the 'deploy' mode: 1. Data cannot be stored in Cassandra, HBase is the only way. 2. Data will be stored in Cassandra but you need a (maybe, just a single node)Hadoop cluster anyway which won't be storing any data but is there just to make Nutch happy.
On 23 Feb 2018 22:08, "Yossi Tamari" <[email protected]> wrote: > Hi Kaliyug, > > Nutch 2 still requires Hadoop to run, it just allows you to store data > somewhere other than HDFS. > The only way to run Nutch without Hadoop is local mode, which is only > recommended for testing. To do that, run ./runtime/local/bin/crawl. > > Yossi. > > > -----Original Message----- > > From: Kaliyug Antagonist [mailto:[email protected]] > > Sent: 23 February 2018 20:26 > > To: [email protected] > > Subject: Nutch pointed to Cassandra, yet, asks for Hadoop > > > > Windows 10 Nutch 2.3.1 Cassandra 3.11.1 > > > > I have extracted and built Nutch under the Cygwin's home directory. > > > > I believe that the Cassandra server is working: > > > > INFO [main] 2018-02-23 16:20:41,077 StorageService.java:1442 - > > JOINING: Finish joining ring > > INFO [main] 2018-02-23 16:20:41,820 SecondaryIndexManager.java:509 - > > Executing pre-join tasks for: CFS(Keyspace='test', > > ColumnFamily='test') > > INFO [main] 2018-02-23 16:20:42,161 StorageService.java:2268 - Node > > localhost/127.0.0.1 state jump to NORMAL INFO [main] 2018-02-23 > > 16:20:43,049 NativeTransportService.java:75 - Netty using Java NIO event > loop > > INFO [main] 2018-02-23 16:20:43,358 Server.java:155 - Using Netty > > Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a, > > netty-codec=netty-codec-4.0.44.Final.452812a, > > netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, > > netty-codec-http=netty-codec-http-4.0.44.Final.452812a, > > netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, > > netty-common=netty-common-4.0.44.Final.452812a, > > netty-handler=netty-handler-4.0.44.Final.452812a, > > netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, > > netty-transport=netty-transport-4.0.44.Final.452812a, > > netty-transport-native-epoll=netty-transport-native-epoll- > 4.0.44.Final.452812a, > > netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, > > netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, > > netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] > > INFO [main] 2018-02-23 16:20:43,359 Server.java:156 - Starting > listening for > > CQL clients on localhost/127.0.0.1:9042 (unencrypted)... > > INFO [main] 2018-02-23 16:20:43,941 CassandraDaemon.java:527 - Not > > starting RPC server as requested. Use JMX > > (StorageService->startRPCServer()) or nodetool (enablethrift) to start > it > > > > I did the following check: > > > > apache-cassandra-3.11.1\bin>nodetool status > > Datacenter: datacenter1 > > ======================== > > Status=Up/Down > > |/ State=Normal/Leaving/Joining/Moving > > -- Address Load Tokens Owns (effective) Host ID > > Rack > > UN 127.0.0.1 273.97 KiB 256 100.0% > > dab932f2-d138-4a1a-acd4-f63cbb16d224 rack1 > > > > csql connects > > > > apache-cassandra-3.11.1\bin>cqlsh > > > > WARNING: console codepage must be set to cp65001 to support utf-8 > encoding > > on Windows platforms. > > If you experience encoding problems, change your console codepage with > 'chcp > > 65001' before starting cqlsh. > > > > Connected to Test Cluster at 127.0.0.1:9042. > > [cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4] > Use HELP > > for help. > > WARNING: pyreadline dependency missing. Install to enable tab > completion. > > cqlsh> describe keyspaces > > > > system_schema system_auth system system_distributed test > system_traces > > > > I followed the tutorial 'Setting up NUTCH 2.x with CASSANDRA > > <https://wiki.apache.org/nutch/Nutch2Cassandra>' and added the > respective > > entries in the properties and the xml files. > > > > I go to the Cygwin prompt and attempt to crawl. Instead of using > Cassandra, it > > asks for Hadoop(HBase, probably) > > > > /home/apache-nutch-2.3.1 > > $ ./runtime/deploy/bin/crawl urls/ crawl/ 1 No SOLRURL specified. > Skipping > > indexing. > > which: no hadoop in (<dump of the classpath entries>) Can't find Hadoop > > executable. Add HADOOP_HOME/bin to the path or run in local mode. > > > > > > > > <http://www.avg.com/email- > > signature?utm_medium=email&utm_source=link&utm_campaign=sig- > > email&utm_content=webmail> > > Virus-free. > > www.avg.com > > <http://www.avg.com/email- > > signature?utm_medium=email&utm_source=link&utm_campaign=sig- > > email&utm_content=webmail> > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > >

