Re: need help with store.CassandraStore

2013-08-09 Thread Lewis John Mcgibbney
Hi Kaveh,

N.B. Taking this to user@gora and after this mail please drop user@nutch

Quick question, is your cassandra server up and running at default port
9160?


On Fri, Aug 9, 2013 at 3:36 PM, kaveh minooie ka...@plutoz.com wrote:

 Hi Everyone

 So I don't know if I am doing something wrong or there is actually
 something wrong but this is the issue. btw, I am using this commit of 2.x :

 commit d4deef989ffc41b9dd5e77683e7328**6d81e1178b
 Author: Sebastian Nagel sna...@apache.org
 Date:   Wed Aug 7 21:10:17 2013 +

 NUTCH-911 protocol-file to return proper protocol status for
 notmodified, gone, access_denied

 git-svn-id: https://svn.apache.org/repos/**
 asf/nutch/branches/2.x@1511496https://svn.apache.org/repos/asf/nutch/branches/2.x@151149613f79535-47bb-0310-9956-
 **ffa450edef68


 so my problem is that gora doesn't seem to be able to understand where my
 Cassandra cluster is. the gora.properties files have this line in it:

 gora.cassandrastore.servers=**my-server:9160

 the gora website for cassandra mentions this: (http://gora.apache.org/**
 current/gora-cassandra.htmlhttp://gora.apache.org/current/gora-cassandra.html
 )

 gora.cassandra.servers=my-**server:9160

 but my problem here is that neither one of them work. I even tried putting
 them in the nutch-site.xml file with no reuslt. gora still tries to
 connecto to local host:


 13/08/09 15:23:15 INFO connection.**CassandraHostRetryService: Not
 checking that localhost(127.0.0.1):9160 is a member of the ring since there
 are no live hosts
 13/08/09 15:23:15 WARN connection.**CassandraHostRetryService: Downed
 localhost(127.0.0.1):9160 host still appears to be down: Unable to open
 transport to localhost(127.0.0.1):9160 , java.net.ConnectException:
 Connection refused


 (that was from an inject command) anyone has any idea? should this go to
 dev list?

 --
 Kaveh Minooie




-- 
*Lewis*


Re: need help with store.CassandraStore

2013-08-09 Thread kaveh minooie
thanks Lewis for guiding me to the right mailing list :), and to answer 
your question yes, it is running at the default port. the port is not an 
issue here, the IP address is.


On 08/09/2013 03:51 PM, Lewis John Mcgibbney wrote:

Hi Kaveh,

N.B. Taking this to user@gora and after this mail please drop user@nutch

Quick question, is your cassandra server up and running at default port
9160?


On Fri, Aug 9, 2013 at 3:36 PM, kaveh minooie ka...@plutoz.com wrote:


Hi Everyone

So I don't know if I am doing something wrong or there is actually
something wrong but this is the issue. btw, I am using this commit of 2.x :

commit d4deef989ffc41b9dd5e77683e7328**6d81e1178b
Author: Sebastian Nagel sna...@apache.org
Date:   Wed Aug 7 21:10:17 2013 +

 NUTCH-911 protocol-file to return proper protocol status for
notmodified, gone, access_denied

 git-svn-id: https://svn.apache.org/repos/**
asf/nutch/branches/2.x@1511496https://svn.apache.org/repos/asf/nutch/branches/2.x@151149613f79535-47bb-0310-9956-
**ffa450edef68


so my problem is that gora doesn't seem to be able to understand where my
Cassandra cluster is. the gora.properties files have this line in it:

gora.cassandrastore.servers=**my-server:9160

the gora website for cassandra mentions this: (http://gora.apache.org/**
current/gora-cassandra.htmlhttp://gora.apache.org/current/gora-cassandra.html
)

gora.cassandra.servers=my-**server:9160

but my problem here is that neither one of them work. I even tried putting
them in the nutch-site.xml file with no reuslt. gora still tries to
connecto to local host:


13/08/09 15:23:15 INFO connection.**CassandraHostRetryService: Not
checking that localhost(127.0.0.1):9160 is a member of the ring since there
are no live hosts
13/08/09 15:23:15 WARN connection.**CassandraHostRetryService: Downed
localhost(127.0.0.1):9160 host still appears to be down: Unable to open
transport to localhost(127.0.0.1):9160 , java.net.ConnectException:
Connection refused


(that was from an inject command) anyone has any idea? should this go to
dev list?

--
Kaveh Minooie







--
Kaveh Minooie


Re: need help with store.CassandraStore

2013-08-09 Thread Lewis John Mcgibbney
I am assuming that you are regenrating your job file if this in in Nutch
distributed mode?
If not, and your running this as a local Nutch server, then also please
check that there are no temp files
ls -al
gora.properties
gora.properties~

The entry in gora.rpoperties should be
gora.cassandrastore.servers=localhost:9160 (if running locally)
and the host in gora-cassandra-mnapping.xml should reflect the host you use
here.
You can check that the host mapps properly by looking in to /etc/hosts
hth
Lewis


On Fri, Aug 9, 2013 at 5:04 PM, kaveh minooie ka...@plutoz.com wrote:

 nope. same exact result and I tried 'cassandraStore' with both uppercase
 and lowercase ( they are case sensitive, right? )


 13/08/09 16:59:17 INFO crawl.InjectorJob: InjectorJob: starting at
 2013-08-09 16:59:17
 13/08/09 16:59:17 INFO crawl.InjectorJob: InjectorJob: Injecting urlDir:
 /2locos/temp/url
 13/08/09 16:59:19 INFO connection.**CassandraHostRetryService: Downed
 Host Retry service started with queue size -1 and retry delay 10s
 13/08/09 16:59:19 ERROR connection.HConnectionManager: Could not start
 connection pool for host localhost(127.0.0.1):9160
 13/08/09 16:59:19 INFO connection.**CassandraHostRetryService: Host
 detected as down was added to retry queue: localhost(127.0.0.1):9160
 13/08/09 16:59:19 WARN connection.**CassandraHostRetryService: Downed
 localhost(127.0.0.1):9160 host still appears to be down: Unable to open
 transport to localhost(127.0.0.1):9160 , java.net.ConnectException:
 Connection refused
 13/08/09 16:59:19 INFO service.JmxMonitor: Registering JMX
 me.prettyprint.cassandra.**service_Test Cluster:ServiceType=hector,**
 MonitorType=hector
 13/08/09 16:59:19 ERROR store.CassandraStore: All host pools marked down.
 Retry burden pushed out to client.
 13/08/09 16:59:19 ERROR store.CassandraStore:
 [Ljava.lang.StackTraceElement;**@7a6b653f
 13/08/09 16:59:19 INFO crawl.InjectorJob: InjectorJob: Using class
 org.apache.gora.cassandra.**store.CassandraStore as the Gora storage
 class.
 13/08/09 16:59:20 INFO input.FileInputFormat: Total input paths to process
 : 1
 13/08/09 16:59:20 INFO util.NativeCodeLoader: Loaded the native-hadoop
 library
 13/08/09 16:59:20 WARN snappy.LoadSnappy: Snappy native library not loaded
 13/08/09 16:59:21 INFO mapred.JobClient: Running job: job_201308091131_0009
 13/08/09 16:59:22 INFO mapred.JobClient:  map 0% reduce 0%
 13/08/09 16:59:29 INFO connection.**CassandraHostRetryService: Not
 checking that localhost(127.0.0.1):9160 is a member of the ring since there
 are no live hosts
 13/08/09 16:59:29 WARN connection.**CassandraHostRetryService: Downed
 localhost(127.0.0.1):9160 host still appears to be down: Unable to open
 transport to localhost(127.0.0.1):9160 , java.net.ConnectException:
 Connection refused
 13/08/09 16:59:29 INFO connection.**CassandraHostRetryService: Downed
 Host retry status false with host: localhost(127.0.0.1):9160
 13/08/09 16:59:30 INFO mapred.JobClient:  map 100% reduce 0%
 13/08/09 16:59:31 INFO mapred.JobClient: Job complete:
 job_201308091131_0009
 13/08/09 16:59:31 INFO mapred.JobClient: Counters: 19



 On 08/09/2013 04:51 PM, Renato MarroquĂ­n Mogrovejo wrote:

 Could you please try with this one:

 gora.cassandraStore.host=cass-**node:9160




 2013/8/9 kaveh minooie ka...@plutoz.com mailto:ka...@plutoz.com


 so it is not working:

 from gora.properties:

 #
 # CassandraStore properties #
 #

 gora.cassandrastore.servers=__**cass-node:9160

 #gora.cassandra.servers=cass-_**_node:9160


 ###
 # MemStore properties #


 from gora-cassandra-mapping.xml:


 keyspace name=host cluster=DoslocosCluster host=cass-node
  family name=mtdt type=super/
  family name=il type=super/
  family name=ol type=super/
  /keyspace


 and inject output:

 13/08/09 16:38:34 INFO connection.__**CassandraHostRetryService:

 Downed Host Retry service started with queue size -1 and retry delay
 10s
 13/08/09 16:38:34 ERROR connection.HConnectionManager: Could not
 start connection pool for host localhost(127.0.0.1):9160
 13/08/09 16:38:34 INFO connection.__**CassandraHostRetryService: Host

 detected as down was added to retry queue: localhost(127.0.0.1):9160
 13/08/09 16:38:34 WARN connection.__**CassandraHostRetryService:

 Downed localhost(127.0.0.1):9160 host still appears to be down:
 Unable to open transport to localhost(127.0.0.1):9160 ,
 java.net.ConnectException: Connection refused
 13/08/09 16:38:35 INFO service.JmxMonitor: Registering JMX
 me.prettyprint.cassandra.__**service_Test
 Cluster:ServiceType=hector,__**MonitorType=hector

 13/08/09 16:38:35 ERROR store.CassandraStore: All host pools marked
 down. Retry burden pushed out to client.
 13/08/09 16:38:35 ERROR store.CassandraStore:
 

Re: need help with store.CassandraStore

2013-08-09 Thread kaveh minooie
:) yes I do regenerate the job file. I actually have scripts that makes 
a fresh copy of git and applies my changes and run ants to generate the 
job file every time I make a change. the cassandra cluster that I am 
trying to use here consist of 10 servers, and the Hadoop cluster on 
which I run the nutch has 11 nodes as well. as you can imagine every 
node or role for that matter is found through DNS and anything localhost 
is kinda meaningless here. ( the only things in my /etc/hosts files is 
localhost and 127...)


I would be surprise if you remember me since I can see on the lists how 
many emails you go through everyday, but I was trying to do similar 
thing with hbase couple of months ago. That turned out to be very 
unstable under some load (over 5o million pages) and its mostly has to 
do with the fact that gora does not support the new version of hbase 
which supposedly don't have this problem anymore, which by the way if 
you could point me the right direction I like to start working on 
updating hbase support for gora. I should say that you actually are the 
reason that I am trying with cassandra this time, cause at the time, I 
remember, you said you were using Cassandra, so i figured at least I 
know of one person  who is successfully doing this :) , that 
automatically means that I am gonna have better odds this time, since I 
didn't know of anyone, and I still don't, who was actually using hbase 
in production for this purpose. ( nutch load, at least as it is now and 
as long as it does the filtering on its own, is very particular, don't 
you agree? )


anyway as for thie issue at hand, I am going back a bit in git commits. 
in the very narrow chance that this is because of a recently broken 
wiring  or what ever. if it start working at some point it could let us 
isolate the issue, but so far no luck. I am pretty sure I am doing 
something stupid some where and it is going to be hell finding it :). so 
I guess this would be a good time for me to thank and apologize in 
advance to you and all the other people who spend time here for their 
attention and the amount of spam that I am gonna be generating on the list.




On 08/09/2013 06:14 PM, Lewis John Mcgibbney wrote:
I am assuming that you are regenrating your job file if this in in 
Nutch distributed mode?
If not, and your running this as a local Nutch server, then also 
please check that there are no temp files

ls -al
gora.properties
gora.properties~

The entry in gora.rpoperties should be
gora.cassandrastore.servers=localhost:9160 (if running locally)
and the host in gora-cassandra-mnapping.xml should reflect the host 
you use here.

You can check that the host mapps properly by looking in to /etc/hosts
hth
Lewis


On Fri, Aug 9, 2013 at 5:04 PM, kaveh minooie ka...@plutoz.com 
mailto:ka...@plutoz.com wrote:


nope. same exact result and I tried 'cassandraStore' with both
uppercase and lowercase ( they are case sensitive, right? )


13/08/09 16:59:17 INFO crawl.InjectorJob: InjectorJob: starting at
2013-08-09 16:59:17
13/08/09 16:59:17 INFO crawl.InjectorJob: InjectorJob: Injecting
urlDir: /2locos/temp/url
13/08/09 16:59:19 INFO connection.CassandraHostRetryService:
Downed Host Retry service started with queue size -1 and retry
delay 10s
13/08/09 16:59:19 ERROR connection.HConnectionManager: Could not
start connection pool for host localhost(127.0.0.1):9160
13/08/09 16:59:19 INFO connection.CassandraHostRetryService: Host
detected as down was added to retry queue: localhost(127.0.0.1):9160
13/08/09 16:59:19 WARN connection.CassandraHostRetryService:
Downed localhost(127.0.0.1):9160 host still appears to be down:
Unable to open transport to localhost(127.0.0.1):9160 ,
java.net.ConnectException: Connection refused
13/08/09 16:59:19 INFO service.JmxMonitor: Registering JMX
me.prettyprint.cassandra.service_Test
Cluster:ServiceType=hector,MonitorType=hector
13/08/09 16:59:19 ERROR store.CassandraStore: All host pools
marked down. Retry burden pushed out to client.
13/08/09 16:59:19 ERROR store.CassandraStore:
[Ljava.lang.StackTraceElement;@7a6b653f
13/08/09 16:59:19 INFO crawl.InjectorJob: InjectorJob: Using class
org.apache.gora.cassandra.store.CassandraStore as the Gora storage
class.
13/08/09 16:59:20 INFO input.FileInputFormat: Total input paths to
process : 1
13/08/09 16:59:20 INFO util.NativeCodeLoader: Loaded the
native-hadoop library
13/08/09 16:59:20 WARN snappy.LoadSnappy: Snappy native library
not loaded
13/08/09 16:59:21 INFO mapred.JobClient: Running job:
job_201308091131_0009
13/08/09 16:59:22 INFO mapred.JobClient:  map 0% reduce 0%
13/08/09 16:59:29 INFO connection.CassandraHostRetryService: Not
checking that localhost(127.0.0.1):9160 is a member of the ring
since there are no live hosts
13/08/09 16:59:29 WARN connection.CassandraHostRetryService:
Downed