Hi Kartik,

If TTL hasn't been set or if it has been set to 0, then Gora is not using
any TTL[1] and all your data should be persisted without any problems.
Maybe this has to do something with the url generating/fetching process?
Could you determine during which process the data is changing?
(generate/fetch/parse)
Thanks!


Renato M.

[1]
https://github.com/apache/gora/blob/master/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/HectorUtils.java#L72

2014-09-30 10:00 GMT+02:00 Krishnanand, Kartik <
[email protected]>:

>  Hi, Talat
>
>
>
> I am afraid that I do not understand.  We have set the “ttl” value to 0,
> which is the default value. We don’t have any need portions of data that
> needs to be deleted.  For now, I am using a single node cluster, for us the
> gc_grace_seconds=”0” default value would be a valid value.
>
>
>
> Have I missed out anything? My settings are as follows. Any suggestions
> would be greatly appreciated.
>
>
>
> <gora-orm>
>
>
>
>     <keyspace name=*"projectKeyspace"* cluster=*"MultiTest"* 
> host=*"192.161.23.161:9160
> <http://192.161.23.161:9160>"* placement_strategy=
> *"org.apache.cassandra.locator.NetworkTopologyStrategy"*>
>
>         <family name=*"p"* />
>
>         <family name=*"f"*/>
>
>         <family name=*"sc"* type=*"super"*/>
>
>
>
>         <family name=*"mtdt"* type=*"super"*/>
>
>         <family name=*"il"* type=*"super"*/>
>
>         <family name=*"ol"* type=*"super"*/>
>
>     </keyspace>
>
>
>
>     <class keyClass=*"java.lang.String"* name=
> *"org.apache.nutch.storage.WebPage"* keyspace=*"projectKeyspace "*>
>
>
>
>         <!-- fetch fields -->
>
>         <field name=*"baseUrl"* family=*"f"* qualifier=*"bas"*/>
>
>         <field name=*"status"* family=*"f"* qualifier=*"st"*/>
>
>         <field name=*"prevFetchTime"* family=*"f"* qualifier=*"pts"*/>
>
>         <field name=*"fetchTime"* family=*"f"* qualifier=*"ts"*/>
>
>         <field name=*"fetchInterval"* family=*"f"* qualifier=*"fi"*/>
>
>         <field name=*"retriesSinceFetch"* family=*"f"* qualifier=*"rsf"*/>
>
>         <field name=*"reprUrl"* family=*"f"* qualifier=*"rpr"*/>
>
>         <field name=*"content"* family=*"f"* qualifier=*"cnt"*/>
>
>         <field name=*"contentType"* family=*"f"* qualifier=*"typ"*/>
>
>         <field name=*"modifiedTime"* family=*"f"* qualifier=*"mod"*/>
>
>         <field name=*"prevModifiedTime"* family=*"f"* qualifier=*"pmod"*/>
>
>         <field name=*"batchId"* family=*"f"* qualifier=*"bid"*/>
>
>
>
>         <!-- parse fields -->
>
>         <field name=*"title"* family=*"p"* qualifier=*"t"*/>
>
>         <field name=*"text"* family=*"p"* qualifier=*"c"*/>
>
>         <field name=*"signature"* family=*"p"* qualifier=*"sig"*/>
>
>         <field name=*"prevSignature"* family=*"p"* qualifier=*"psig"*/>
>
>
>
>         <!-- score fields -->
>
>         <field name=*"score"* family=*"f"* qualifier=*"s"*/>
>
>
>
>         <!-- super columns -->
>
>         <field name=*"headers"* family=*"sc"* qualifier=*"h"*/>
>
>         <field name=*"inlinks"* family=*"sc"* qualifier=*"il"*/>
>
>         <field name=*"outlinks"* family=*"sc"* qualifier=*"ol"*/>
>
>         <field name=*"metadata"* family=*"sc"* qualifier=*"mtdt"*/>
>
>         <field name=*"markers"* family=*"sc"* qualifier=*"mk"*/>
>
>         <field name=*"parseStatus"* family=*"sc"* qualifier=*"pas"*/>
>
>         <field name=*"protocolStatus"* family=*"sc"* qualifier=*"prs"*/>
>
>     </class>
>
>
>
>
>
>     <class keyClass=*"java.lang.String"* name=
> *"org.apache.nutch.storage.Host"* keyspace=*"projectKeyspace "*>
>
>         <field name=*"metadata"* family=*"mtdt"* qualifier=*"mtdt"*/>
>
>         <field name=*"inlinks"* family=*"il"* qualifier=*"il"*/>
>
>         <field name=*"outlinks"* family=*"ol"* qualifier=*"ol"*/>
>
>     </class>
>
>
>
> </gora-orm>
>
>
>
> Thanks,
>
>
>
> Kartik
>
>
>
> *From:* Talat Uyarer [mailto:[email protected]]
> *Sent:* Thursday, September 25, 2014 5:04 PM
> *To:* [email protected]
> *Cc:* [email protected]
> *Subject:* Re: Crawled data not inserting in the tables
>
>
>
> Hi Kartik,
>
> The 'problem' is with your mapping settings in gora-cassandra-mapping.xml.
> Please see the documentation [0], specifically relating to the values for
> 'gc_grace_seconds' and also 'ttl'. This will fix the problem
>
> Talat
>
> [0] http://gora.apache.org/current/gora-cassandra.html
>
> Hi, Gora gurus,
>
>
>
> I am trying to crawl URLS starting with 12 seed URLs. I am using the GORA
> Cassandra mapping to store the crawled data.
>
>
>
> I can confirm that all 12 URLs are not being filtered and are injected,
> but after running the generate, fetch and parse jobs . There are only 3
> entries in “column family” f.
>
>
>
> I am not sure what I am doing wrong. The logs have not yielded anything
> relevant. What should I be looking at?
>
>
>
> Any advice would be gratefully appreciated.
>
>
>
> Thanks,
>
>
>
> Kartik
>  ------------------------------
>
> This message, and any attachments, is for the intended recipient(s) only,
> may contain information that is privileged, confidential and/or proprietary
> and subject to important terms and conditions available at
> http://www.bankofamerica.com/emaildisclaimer. If you are not the intended
> recipient, please delete this message.
>   ------------------------------
> This message, and any attachments, is for the intended recipient(s) only,
> may contain information that is privileged, confidential and/or proprietary
> and subject to important terms and conditions available at
> http://www.bankofamerica.com/emaildisclaimer. If you are not the intended
> recipient, please delete this message.
>

Reply via email to