Can you also make sure that the cluster name and fully qualified address and port agree between mapping and Gora.properties Thanks
On Tuesday, September 30, 2014, Renato Marroquín Mogrovejo < [email protected]> wrote: > Hi Kartik, > > If TTL hasn't been set or if it has been set to 0, then Gora is not using > any TTL[1] and all your data should be persisted without any problems. > Maybe this has to do something with the url generating/fetching process? > Could you determine during which process the data is changing? > (generate/fetch/parse) > Thanks! > > > Renato M. > > [1] > https://github.com/apache/gora/blob/master/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/HectorUtils.java#L72 > > 2014-09-30 10:00 GMT+02:00 Krishnanand, Kartik < > [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>>: > >> Hi, Talat >> >> >> >> I am afraid that I do not understand. We have set the “ttl” value to 0, >> which is the default value. We don’t have any need portions of data that >> needs to be deleted. For now, I am using a single node cluster, for us the >> gc_grace_seconds=”0” default value would be a valid value. >> >> >> >> Have I missed out anything? My settings are as follows. Any suggestions >> would be greatly appreciated. >> >> >> >> <gora-orm> >> >> >> >> <keyspace name=*"projectKeyspace"* cluster=*"MultiTest"* >> host=*"192.161.23.161:9160 >> <http://192.161.23.161:9160>"* placement_strategy= >> *"org.apache.cassandra.locator.NetworkTopologyStrategy"*> >> >> <family name=*"p"* /> >> >> <family name=*"f"*/> >> >> <family name=*"sc"* type=*"super"*/> >> >> >> >> <family name=*"mtdt"* type=*"super"*/> >> >> <family name=*"il"* type=*"super"*/> >> >> <family name=*"ol"* type=*"super"*/> >> >> </keyspace> >> >> >> >> <class keyClass=*"java.lang.String"* name= >> *"org.apache.nutch.storage.WebPage"* keyspace=*"projectKeyspace "*> >> >> >> >> <!-- fetch fields --> >> >> <field name=*"baseUrl"* family=*"f"* qualifier=*"bas"*/> >> >> <field name=*"status"* family=*"f"* qualifier=*"st"*/> >> >> <field name=*"prevFetchTime"* family=*"f"* qualifier=*"pts"*/> >> >> <field name=*"fetchTime"* family=*"f"* qualifier=*"ts"*/> >> >> <field name=*"fetchInterval"* family=*"f"* qualifier=*"fi"*/> >> >> <field name=*"retriesSinceFetch"* family=*"f"* qualifier=*"rsf"* >> /> >> >> <field name=*"reprUrl"* family=*"f"* qualifier=*"rpr"*/> >> >> <field name=*"content"* family=*"f"* qualifier=*"cnt"*/> >> >> <field name=*"contentType"* family=*"f"* qualifier=*"typ"*/> >> >> <field name=*"modifiedTime"* family=*"f"* qualifier=*"mod"*/> >> >> <field name=*"prevModifiedTime"* family=*"f"* qualifier=*"pmod"* >> /> >> >> <field name=*"batchId"* family=*"f"* qualifier=*"bid"*/> >> >> >> >> <!-- parse fields --> >> >> <field name=*"title"* family=*"p"* qualifier=*"t"*/> >> >> <field name=*"text"* family=*"p"* qualifier=*"c"*/> >> >> <field name=*"signature"* family=*"p"* qualifier=*"sig"*/> >> >> <field name=*"prevSignature"* family=*"p"* qualifier=*"psig"*/> >> >> >> >> <!-- score fields --> >> >> <field name=*"score"* family=*"f"* qualifier=*"s"*/> >> >> >> >> <!-- super columns --> >> >> <field name=*"headers"* family=*"sc"* qualifier=*"h"*/> >> >> <field name=*"inlinks"* family=*"sc"* qualifier=*"il"*/> >> >> <field name=*"outlinks"* family=*"sc"* qualifier=*"ol"*/> >> >> <field name=*"metadata"* family=*"sc"* qualifier=*"mtdt"*/> >> >> <field name=*"markers"* family=*"sc"* qualifier=*"mk"*/> >> >> <field name=*"parseStatus"* family=*"sc"* qualifier=*"pas"*/> >> >> <field name=*"protocolStatus"* family=*"sc"* qualifier=*"prs"*/> >> >> </class> >> >> >> >> >> >> <class keyClass=*"java.lang.String"* name= >> *"org.apache.nutch.storage.Host"* keyspace=*"projectKeyspace "*> >> >> <field name=*"metadata"* family=*"mtdt"* qualifier=*"mtdt"*/> >> >> <field name=*"inlinks"* family=*"il"* qualifier=*"il"*/> >> >> <field name=*"outlinks"* family=*"ol"* qualifier=*"ol"*/> >> >> </class> >> >> >> >> </gora-orm> >> >> >> >> Thanks, >> >> >> >> Kartik >> >> >> >> *From:* Talat Uyarer [mailto:[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>] >> *Sent:* Thursday, September 25, 2014 5:04 PM >> *To:* [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> >> *Cc:* [email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');> >> *Subject:* Re: Crawled data not inserting in the tables >> >> >> >> Hi Kartik, >> >> The 'problem' is with your mapping settings in >> gora-cassandra-mapping.xml. Please see the documentation [0], specifically >> relating to the values for 'gc_grace_seconds' and also 'ttl'. This will fix >> the problem >> >> Talat >> >> [0] http://gora.apache.org/current/gora-cassandra.html >> >> Hi, Gora gurus, >> >> >> >> I am trying to crawl URLS starting with 12 seed URLs. I am using the GORA >> Cassandra mapping to store the crawled data. >> >> >> >> I can confirm that all 12 URLs are not being filtered and are injected, >> but after running the generate, fetch and parse jobs . There are only 3 >> entries in “column family” f. >> >> >> >> I am not sure what I am doing wrong. The logs have not yielded anything >> relevant. What should I be looking at? >> >> >> >> Any advice would be gratefully appreciated. >> >> >> >> Thanks, >> >> >> >> Kartik >> ------------------------------ >> >> This message, and any attachments, is for the intended recipient(s) only, >> may contain information that is privileged, confidential and/or proprietary >> and subject to important terms and conditions available at >> http://www.bankofamerica.com/emaildisclaimer. If you are not the >> intended recipient, please delete this message. >> ------------------------------ >> This message, and any attachments, is for the intended recipient(s) only, >> may contain information that is privileged, confidential and/or proprietary >> and subject to important terms and conditions available at >> http://www.bankofamerica.com/emaildisclaimer. If you are not the >> intended recipient, please delete this message. >> > > -- *Lewis*

