Hi Kaliyug,

Nutch 2 still requires Hadoop to run, it just allows you to store data 
somewhere other than HDFS.
The only way to run Nutch without Hadoop is local mode, which is only 
recommended for testing. To do that, run ./runtime/local/bin/crawl.

        Yossi.

> -----Original Message-----
> From: Kaliyug Antagonist [mailto:kaliyugantagon...@gmail.com]
> Sent: 23 February 2018 20:26
> To: user@nutch.apache.org
> Subject: Nutch pointed to Cassandra, yet, asks for Hadoop
> 
> Windows 10 Nutch 2.3.1 Cassandra 3.11.1
> 
> I have extracted and built Nutch under the Cygwin's home directory.
> 
> I believe that the Cassandra server is working:
> 
> INFO  [main] 2018-02-23 16:20:41,077 StorageService.java:1442 -
> JOINING: Finish joining ring
> INFO  [main] 2018-02-23 16:20:41,820 SecondaryIndexManager.java:509 -
> Executing pre-join tasks for: CFS(Keyspace='test',
> ColumnFamily='test')
> INFO  [main] 2018-02-23 16:20:42,161 StorageService.java:2268 - Node
> localhost/127.0.0.1 state jump to NORMAL INFO  [main] 2018-02-23
> 16:20:43,049 NativeTransportService.java:75 - Netty using Java NIO event loop
> INFO  [main] 2018-02-23 16:20:43,358 Server.java:155 - Using Netty
> Version: [netty-buffer=netty-buffer-4.0.44.Final.452812a,
> netty-codec=netty-codec-4.0.44.Final.452812a,
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a,
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a,
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a,
> netty-common=netty-common-4.0.44.Final.452812a,
> netty-handler=netty-handler-4.0.44.Final.452812a,
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb,
> netty-transport=netty-transport-4.0.44.Final.452812a,
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
> netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a,
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a,
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2018-02-23 16:20:43,359 Server.java:156 - Starting listening for
> CQL clients on localhost/127.0.0.1:9042 (unencrypted)...
> INFO  [main] 2018-02-23 16:20:43,941 CassandraDaemon.java:527 - Not
> starting RPC server as requested. Use JMX
> (StorageService->startRPCServer()) or nodetool (enablethrift) to start it
> 
> I did the following check:
> 
> apache-cassandra-3.11.1\bin>nodetool status
> Datacenter: datacenter1
> ========================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load       Tokens       Owns (effective)  Host ID
>                         Rack
> UN  127.0.0.1  273.97 KiB  256          100.0%
> dab932f2-d138-4a1a-acd4-f63cbb16d224  rack1
> 
> csql connects
> 
> apache-cassandra-3.11.1\bin>cqlsh
> 
> WARNING: console codepage must be set to cp65001 to support utf-8 encoding
> on Windows platforms.
> If you experience encoding problems, change your console codepage with 'chcp
> 65001' before starting cqlsh.
> 
> Connected to Test Cluster at 127.0.0.1:9042.
> [cqlsh 5.0.1 | Cassandra 3.11.1 | CQL spec 3.4.4 | Native protocol v4] Use 
> HELP
> for help.
> WARNING: pyreadline dependency missing.  Install to enable tab completion.
> cqlsh> describe keyspaces
> 
> system_schema  system_auth  system  system_distributed  test  system_traces
> 
> I followed the tutorial 'Setting up NUTCH 2.x with CASSANDRA
> <https://wiki.apache.org/nutch/Nutch2Cassandra>' and added the respective
> entries in the properties and the xml files.
> 
> I go to the Cygwin prompt and attempt to crawl. Instead of using Cassandra, it
> asks for Hadoop(HBase, probably)
> 
> /home/apache-nutch-2.3.1
> $ ./runtime/deploy/bin/crawl urls/ crawl/ 1 No SOLRURL specified. Skipping
> indexing.
> which: no hadoop in (<dump of the classpath entries>) Can't find Hadoop
> executable. Add HADOOP_HOME/bin to the path or run in local mode.
> 
> 
> 
> <http://www.avg.com/email-
> signature?utm_medium=email&utm_source=link&utm_campaign=sig-
> email&utm_content=webmail>
> Virus-free.
> www.avg.com
> <http://www.avg.com/email-
> signature?utm_medium=email&utm_source=link&utm_campaign=sig-
> email&utm_content=webmail>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Reply via email to