Hi Kaliyug,
Nutch 2 still requires Hadoop to run, it just allows you to store data
somewhere other than HDFS.
The only way to run Nutch without Hadoop is local mode, which is only
recommended for testing. To do that, run ./runtime/local/bin/crawl.
Yossi.
> -----Original Message-----
> From: Kaliyug Antagonist [mailto:[email protected]]
> Sent: 23 February 2018 20:26
> To: [email protected]
> Subject: Nutch pointed to Cassandra, yet, asks for Hadoop
>
> Windows 10 Nutch 2.3.1 Cassandra 3.11.1
>
> I have extracted and built Nutch under the Cygwin's home directory.
>
> I believe that the Cassandra server is working:
>
> INFO [main] 2018-02-23 16:20:41,077 StorageService.java:1442 -
> JOINING: Finish joining ring
> INFO [main] 2018-02-23 16:20:41,820 SecondaryIndexManager.java:509 -
> Executing pre-join tasks for: CFS(Keyspace='test',
> ColumnFamily='test')
> INFO [main] 2018-02-23 16:20:42,161 StorageService.java:2268 - Node
> localhost/127.0.0.1 state jump to NORMAL INFO [main] 2018-02-23
> 16:20:43,049 NativeTransportService.java:75 - Netty using Java NIO event loop
> INFO [main] 2018-02-23 16:20:43,358 Server.java:155 - Using Netty
> Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a,
> netty-codec=netty-codec-4.0.44.Final.452812a,
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a,
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a,
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a,
> netty-common=netty-common-4.0.44.Final.452812a,
> netty-handler=netty-handler-4.0.44.Final.452812a,
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb,
> netty-transport=netty-transport-4.0.44.Final.452812a,
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
> netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a,
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a,
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO [main] 2018-02-23 16:20:43,359 Server.java:156 - Starting listening for
> CQL clients on localhost/127.0.0.1:9042 (unencrypted)...
> INFO [main] 2018-02-23 16:20:43,941 CassandraDaemon.java:527 - Not
> starting RPC server as requested. Use JMX
> (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
>
> I did the following check:
>
> apache-cassandra-3.11.1\bin>nodetool status
> Datacenter: datacenter1
> ========================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 127.0.0.1 273.97 KiB 256 100.0%
> dab932f2-d138-4a1a-acd4-f63cbb16d224 rack1
>
> csql connects
>
> apache-cassandra-3.11.1\bin>cqlsh
>
> WARNING: console codepage must be set to cp65001 to support utf-8 encoding
> on Windows platforms.
> If you experience encoding problems, change your console codepage with 'chcp
> 65001' before starting cqlsh.
>
> Connected to Test Cluster at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4] Use
> HELP
> for help.
> WARNING: pyreadline dependency missing. Install to enable tab completion.
> cqlsh> describe keyspaces
>
> system_schema system_auth system system_distributed test system_traces
>
> I followed the tutorial 'Setting up NUTCH 2.x with CASSANDRA
> <https://wiki.apache.org/nutch/Nutch2Cassandra>' and added the respective
> entries in the properties and the xml files.
>
> I go to the Cygwin prompt and attempt to crawl. Instead of using Cassandra, it
> asks for Hadoop(HBase, probably)
>
> /home/apache-nutch-2.3.1
> $ ./runtime/deploy/bin/crawl urls/ crawl/ 1 No SOLRURL specified. Skipping
> indexing.
> which: no hadoop in (<dump of the classpath entries>) Can't find Hadoop
> executable. Add HADOOP_HOME/bin to the path or run in local mode.
>
>
>
> <http://www.avg.com/email-
> signature?utm_medium=email&utm_source=link&utm_campaign=sig-
> email&utm_content=webmail>
> Virus-free.
> www.avg.com
> <http://www.avg.com/email-
> signature?utm_medium=email&utm_source=link&utm_campaign=sig-
> email&utm_content=webmail>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>