Re: need help with store.CassandraStore
Great catch Kaveh! I think we should make the port explicit so others don't face the same problem as you have and it becomes more intuitive to set the servers port used. Thanks Kaveh! I have already added! Renato M. https://issues.apache.org/jira/browse/GORA-269 2013/8/12 kaveh minooie ka...@plutoz.com so turned out the contents of gora.properties must really really match the content of gora-cassandra-mapping.xm :) so if, in the gora.properties, there is a line like this: gora.cassandrastore.servers=**myserver:9160 you have to use 'myserver:9160' in gora-cassandra-mapping.xml as well and NOT 'myserver'. if you think, well 9160 is the default port and it wouldn't matter, you would be wrong. the string value for gora.cassandrastore.servers and host properties of BOTH keyspace tag in gora-cassandra-mapping.xml should match as a string. either use port for both or don't use it for either one. otherwise is just tries to connect to localhost:9160 regardless of anything else. On 08/09/2013 08:13 PM, Lewis John Mcgibbney wrote: Hi Kaveh, On Fri, Aug 9, 2013 at 7:54 PM, kaveh minooie ka...@plutoz.com mailto:ka...@plutoz.com wrote: :) yes I do regenerate the job file. I actually have scripts that makes a fresh copy of git and applies my changes and run ants to generate the job file every time I make a change. the cassandra cluster that I am trying to use here consist of 10 servers, and the Hadoop cluster on which I run the nutch has 11 nodes as well. as you can imagine every node or role for that matter is found through DNS and anything localhost is kinda meaningless here. ( the only things in my /etc/hosts files is localhost and 127...) OK so in o.a.g.store.CassandraClient#**initialize(), to define our Cassandra cluster we are using Hector's CassandraHostConfigurator as follows this.cluster = HFactory.getOrCreateCluster(**this.cassandraMapping.**getClusterName(), new CassandraHostConfigurator(**this.cassandraMapping.**getHostName())); this.cassandraMapping.**getHostName() is the host= attribute value which we pick up from the MAPPING_FILE. According to Hector Client's CassandraHostConfigurator Javadoc, Cassandra host specificiations should be of the form /** * Creates a new {@code CassandraHostConfigurator} from the specified hosts String, formatted as * {@code host[:port][,host[:port]...]}. * @param hosts The hosts to create {@link CassandraHost}s from. */ I would be surprise if you remember me Of course I do ;) Don't worry we will get this sorted. Keep the questions coming. Try our best to get the answers Kaveh. Ta Lewis -- Kaveh Minooie
Re: need help with store.CassandraStore
Hi Kaveh, N.B. Taking this to user@gora and after this mail please drop user@nutch Quick question, is your cassandra server up and running at default port 9160? On Fri, Aug 9, 2013 at 3:36 PM, kaveh minooie ka...@plutoz.com wrote: Hi Everyone So I don't know if I am doing something wrong or there is actually something wrong but this is the issue. btw, I am using this commit of 2.x : commit d4deef989ffc41b9dd5e77683e7328**6d81e1178b Author: Sebastian Nagel sna...@apache.org Date: Wed Aug 7 21:10:17 2013 + NUTCH-911 protocol-file to return proper protocol status for notmodified, gone, access_denied git-svn-id: https://svn.apache.org/repos/** asf/nutch/branches/2.x@1511496https://svn.apache.org/repos/asf/nutch/branches/2.x@151149613f79535-47bb-0310-9956- **ffa450edef68 so my problem is that gora doesn't seem to be able to understand where my Cassandra cluster is. the gora.properties files have this line in it: gora.cassandrastore.servers=**my-server:9160 the gora website for cassandra mentions this: (http://gora.apache.org/** current/gora-cassandra.htmlhttp://gora.apache.org/current/gora-cassandra.html ) gora.cassandra.servers=my-**server:9160 but my problem here is that neither one of them work. I even tried putting them in the nutch-site.xml file with no reuslt. gora still tries to connecto to local host: 13/08/09 15:23:15 INFO connection.**CassandraHostRetryService: Not checking that localhost(127.0.0.1):9160 is a member of the ring since there are no live hosts 13/08/09 15:23:15 WARN connection.**CassandraHostRetryService: Downed localhost(127.0.0.1):9160 host still appears to be down: Unable to open transport to localhost(127.0.0.1):9160 , java.net.ConnectException: Connection refused (that was from an inject command) anyone has any idea? should this go to dev list? -- Kaveh Minooie -- *Lewis*
Re: need help with store.CassandraStore
thanks Lewis for guiding me to the right mailing list :), and to answer your question yes, it is running at the default port. the port is not an issue here, the IP address is. On 08/09/2013 03:51 PM, Lewis John Mcgibbney wrote: Hi Kaveh, N.B. Taking this to user@gora and after this mail please drop user@nutch Quick question, is your cassandra server up and running at default port 9160? On Fri, Aug 9, 2013 at 3:36 PM, kaveh minooie ka...@plutoz.com wrote: Hi Everyone So I don't know if I am doing something wrong or there is actually something wrong but this is the issue. btw, I am using this commit of 2.x : commit d4deef989ffc41b9dd5e77683e7328**6d81e1178b Author: Sebastian Nagel sna...@apache.org Date: Wed Aug 7 21:10:17 2013 + NUTCH-911 protocol-file to return proper protocol status for notmodified, gone, access_denied git-svn-id: https://svn.apache.org/repos/** asf/nutch/branches/2.x@1511496https://svn.apache.org/repos/asf/nutch/branches/2.x@151149613f79535-47bb-0310-9956- **ffa450edef68 so my problem is that gora doesn't seem to be able to understand where my Cassandra cluster is. the gora.properties files have this line in it: gora.cassandrastore.servers=**my-server:9160 the gora website for cassandra mentions this: (http://gora.apache.org/** current/gora-cassandra.htmlhttp://gora.apache.org/current/gora-cassandra.html ) gora.cassandra.servers=my-**server:9160 but my problem here is that neither one of them work. I even tried putting them in the nutch-site.xml file with no reuslt. gora still tries to connecto to local host: 13/08/09 15:23:15 INFO connection.**CassandraHostRetryService: Not checking that localhost(127.0.0.1):9160 is a member of the ring since there are no live hosts 13/08/09 15:23:15 WARN connection.**CassandraHostRetryService: Downed localhost(127.0.0.1):9160 host still appears to be down: Unable to open transport to localhost(127.0.0.1):9160 , java.net.ConnectException: Connection refused (that was from an inject command) anyone has any idea? should this go to dev list? -- Kaveh Minooie -- Kaveh Minooie
Re: need help with store.CassandraStore
I am assuming that you are regenrating your job file if this in in Nutch distributed mode? If not, and your running this as a local Nutch server, then also please check that there are no temp files ls -al gora.properties gora.properties~ The entry in gora.rpoperties should be gora.cassandrastore.servers=localhost:9160 (if running locally) and the host in gora-cassandra-mnapping.xml should reflect the host you use here. You can check that the host mapps properly by looking in to /etc/hosts hth Lewis On Fri, Aug 9, 2013 at 5:04 PM, kaveh minooie ka...@plutoz.com wrote: nope. same exact result and I tried 'cassandraStore' with both uppercase and lowercase ( they are case sensitive, right? ) 13/08/09 16:59:17 INFO crawl.InjectorJob: InjectorJob: starting at 2013-08-09 16:59:17 13/08/09 16:59:17 INFO crawl.InjectorJob: InjectorJob: Injecting urlDir: /2locos/temp/url 13/08/09 16:59:19 INFO connection.**CassandraHostRetryService: Downed Host Retry service started with queue size -1 and retry delay 10s 13/08/09 16:59:19 ERROR connection.HConnectionManager: Could not start connection pool for host localhost(127.0.0.1):9160 13/08/09 16:59:19 INFO connection.**CassandraHostRetryService: Host detected as down was added to retry queue: localhost(127.0.0.1):9160 13/08/09 16:59:19 WARN connection.**CassandraHostRetryService: Downed localhost(127.0.0.1):9160 host still appears to be down: Unable to open transport to localhost(127.0.0.1):9160 , java.net.ConnectException: Connection refused 13/08/09 16:59:19 INFO service.JmxMonitor: Registering JMX me.prettyprint.cassandra.**service_Test Cluster:ServiceType=hector,** MonitorType=hector 13/08/09 16:59:19 ERROR store.CassandraStore: All host pools marked down. Retry burden pushed out to client. 13/08/09 16:59:19 ERROR store.CassandraStore: [Ljava.lang.StackTraceElement;**@7a6b653f 13/08/09 16:59:19 INFO crawl.InjectorJob: InjectorJob: Using class org.apache.gora.cassandra.**store.CassandraStore as the Gora storage class. 13/08/09 16:59:20 INFO input.FileInputFormat: Total input paths to process : 1 13/08/09 16:59:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/08/09 16:59:20 WARN snappy.LoadSnappy: Snappy native library not loaded 13/08/09 16:59:21 INFO mapred.JobClient: Running job: job_201308091131_0009 13/08/09 16:59:22 INFO mapred.JobClient: map 0% reduce 0% 13/08/09 16:59:29 INFO connection.**CassandraHostRetryService: Not checking that localhost(127.0.0.1):9160 is a member of the ring since there are no live hosts 13/08/09 16:59:29 WARN connection.**CassandraHostRetryService: Downed localhost(127.0.0.1):9160 host still appears to be down: Unable to open transport to localhost(127.0.0.1):9160 , java.net.ConnectException: Connection refused 13/08/09 16:59:29 INFO connection.**CassandraHostRetryService: Downed Host retry status false with host: localhost(127.0.0.1):9160 13/08/09 16:59:30 INFO mapred.JobClient: map 100% reduce 0% 13/08/09 16:59:31 INFO mapred.JobClient: Job complete: job_201308091131_0009 13/08/09 16:59:31 INFO mapred.JobClient: Counters: 19 On 08/09/2013 04:51 PM, Renato Marroquín Mogrovejo wrote: Could you please try with this one: gora.cassandraStore.host=cass-**node:9160 2013/8/9 kaveh minooie ka...@plutoz.com mailto:ka...@plutoz.com so it is not working: from gora.properties: # # CassandraStore properties # # gora.cassandrastore.servers=__**cass-node:9160 #gora.cassandra.servers=cass-_**_node:9160 ### # MemStore properties # from gora-cassandra-mapping.xml: keyspace name=host cluster=DoslocosCluster host=cass-node family name=mtdt type=super/ family name=il type=super/ family name=ol type=super/ /keyspace and inject output: 13/08/09 16:38:34 INFO connection.__**CassandraHostRetryService: Downed Host Retry service started with queue size -1 and retry delay 10s 13/08/09 16:38:34 ERROR connection.HConnectionManager: Could not start connection pool for host localhost(127.0.0.1):9160 13/08/09 16:38:34 INFO connection.__**CassandraHostRetryService: Host detected as down was added to retry queue: localhost(127.0.0.1):9160 13/08/09 16:38:34 WARN connection.__**CassandraHostRetryService: Downed localhost(127.0.0.1):9160 host still appears to be down: Unable to open transport to localhost(127.0.0.1):9160 , java.net.ConnectException: Connection refused 13/08/09 16:38:35 INFO service.JmxMonitor: Registering JMX me.prettyprint.cassandra.__**service_Test Cluster:ServiceType=hector,__**MonitorType=hector 13/08/09 16:38:35 ERROR store.CassandraStore: All host pools marked down. Retry burden pushed out to client. 13/08/09 16:38:35 ERROR store.CassandraStore:
Re: need help with store.CassandraStore
:) yes I do regenerate the job file. I actually have scripts that makes a fresh copy of git and applies my changes and run ants to generate the job file every time I make a change. the cassandra cluster that I am trying to use here consist of 10 servers, and the Hadoop cluster on which I run the nutch has 11 nodes as well. as you can imagine every node or role for that matter is found through DNS and anything localhost is kinda meaningless here. ( the only things in my /etc/hosts files is localhost and 127...) I would be surprise if you remember me since I can see on the lists how many emails you go through everyday, but I was trying to do similar thing with hbase couple of months ago. That turned out to be very unstable under some load (over 5o million pages) and its mostly has to do with the fact that gora does not support the new version of hbase which supposedly don't have this problem anymore, which by the way if you could point me the right direction I like to start working on updating hbase support for gora. I should say that you actually are the reason that I am trying with cassandra this time, cause at the time, I remember, you said you were using Cassandra, so i figured at least I know of one person who is successfully doing this :) , that automatically means that I am gonna have better odds this time, since I didn't know of anyone, and I still don't, who was actually using hbase in production for this purpose. ( nutch load, at least as it is now and as long as it does the filtering on its own, is very particular, don't you agree? ) anyway as for thie issue at hand, I am going back a bit in git commits. in the very narrow chance that this is because of a recently broken wiring or what ever. if it start working at some point it could let us isolate the issue, but so far no luck. I am pretty sure I am doing something stupid some where and it is going to be hell finding it :). so I guess this would be a good time for me to thank and apologize in advance to you and all the other people who spend time here for their attention and the amount of spam that I am gonna be generating on the list. On 08/09/2013 06:14 PM, Lewis John Mcgibbney wrote: I am assuming that you are regenrating your job file if this in in Nutch distributed mode? If not, and your running this as a local Nutch server, then also please check that there are no temp files ls -al gora.properties gora.properties~ The entry in gora.rpoperties should be gora.cassandrastore.servers=localhost:9160 (if running locally) and the host in gora-cassandra-mnapping.xml should reflect the host you use here. You can check that the host mapps properly by looking in to /etc/hosts hth Lewis On Fri, Aug 9, 2013 at 5:04 PM, kaveh minooie ka...@plutoz.com mailto:ka...@plutoz.com wrote: nope. same exact result and I tried 'cassandraStore' with both uppercase and lowercase ( they are case sensitive, right? ) 13/08/09 16:59:17 INFO crawl.InjectorJob: InjectorJob: starting at 2013-08-09 16:59:17 13/08/09 16:59:17 INFO crawl.InjectorJob: InjectorJob: Injecting urlDir: /2locos/temp/url 13/08/09 16:59:19 INFO connection.CassandraHostRetryService: Downed Host Retry service started with queue size -1 and retry delay 10s 13/08/09 16:59:19 ERROR connection.HConnectionManager: Could not start connection pool for host localhost(127.0.0.1):9160 13/08/09 16:59:19 INFO connection.CassandraHostRetryService: Host detected as down was added to retry queue: localhost(127.0.0.1):9160 13/08/09 16:59:19 WARN connection.CassandraHostRetryService: Downed localhost(127.0.0.1):9160 host still appears to be down: Unable to open transport to localhost(127.0.0.1):9160 , java.net.ConnectException: Connection refused 13/08/09 16:59:19 INFO service.JmxMonitor: Registering JMX me.prettyprint.cassandra.service_Test Cluster:ServiceType=hector,MonitorType=hector 13/08/09 16:59:19 ERROR store.CassandraStore: All host pools marked down. Retry burden pushed out to client. 13/08/09 16:59:19 ERROR store.CassandraStore: [Ljava.lang.StackTraceElement;@7a6b653f 13/08/09 16:59:19 INFO crawl.InjectorJob: InjectorJob: Using class org.apache.gora.cassandra.store.CassandraStore as the Gora storage class. 13/08/09 16:59:20 INFO input.FileInputFormat: Total input paths to process : 1 13/08/09 16:59:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/08/09 16:59:20 WARN snappy.LoadSnappy: Snappy native library not loaded 13/08/09 16:59:21 INFO mapred.JobClient: Running job: job_201308091131_0009 13/08/09 16:59:22 INFO mapred.JobClient: map 0% reduce 0% 13/08/09 16:59:29 INFO connection.CassandraHostRetryService: Not checking that localhost(127.0.0.1):9160 is a member of the ring since there are no live hosts 13/08/09 16:59:29 WARN connection.CassandraHostRetryService: Downed