[jira] [Updated] (CASSANDRA-3136) Allow CFIF to keep going despite unavailable ranges

Mck SembWever (JIRA) Mon, 05 Sep 2011 00:45:50 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mck SembWever updated CASSANDRA-3136:
-------------------------------------

    Description: 
>From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902

<use-case-1 from="Patrik Modesto">
We use Cassandra as a storage for web-pages, we store the HTML, all
URLs that has the same HTML data and some computed data. We run Hadoop
MR jobs to compute lexical and thematical data for each page and for
exporting the data to a binary files for later use. URL gets to a
Cassandra on user request (a pageview) so if we delete an URL, it gets
back quickly if the page is active. Because of that and because there
is lots of data, we have the keyspace set to RF=1. We can drop the
whole keyspace and it will regenerate quickly and would contain only
fresh data, so we don't care about lossing a node.
</use-case-1>

<use-case-2>
trying to extract a small random sample (like a pig SAMPLE) of data out of 
cassandra.
</use-case-2>

<use-case-3>
searching for something or some-pattern and one hit
is enough. If you get the hit it's a positive result regardless if
ranges were ignored, if you don't and you *know* there was a range
ignored along the way you can re-run the job later. 
For example such a job could be run at regular intervals in the day until a hit 
was found.
</use-case-3>

  was:
>From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902

<use-case-1>
We use Cassandra as a storage for web-pages, we store the HTML, all
URLs that has the same HTML data and some computed data. We run Hadoop
MR jobs to compute lexical and thematical data for each page and for
exporting the data to a binary files for later use. URL gets to a
Cassandra on user request (a pageview) so if we delete an URL, it gets
back quickly if the page is active. Because of that and because there
is lots of data, we have the keyspace set to RF=1. We can drop the
whole keyspace and it will regenerate quickly and would contain only
fresh data, so we don't care about lossing a node.
</use-case-1>

<use-case-2>
trying to extract a small random sample (like a pig SAMPLE) of data out of 
cassandra.
</use-case-2>

<use-case-3>
searching for something or some-pattern and one hit
is enough. If you get the hit it's a positive result regardless if
ranges were ignored, if you don't and you *know* there was a range
ignored along the way you can re-run the job later. 
For example such a job could be run at regular intervals in the day until a hit 
was found.
</use-case-3>


> Allow CFIF to keep going despite unavailable ranges
> ---------------------------------------------------
>
>                 Key: CASSANDRA-3136
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3136
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Mck SembWever
>            Priority: Minor
>
> From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902
> <use-case-1 from="Patrik Modesto">
> We use Cassandra as a storage for web-pages, we store the HTML, all
> URLs that has the same HTML data and some computed data. We run Hadoop
> MR jobs to compute lexical and thematical data for each page and for
> exporting the data to a binary files for later use. URL gets to a
> Cassandra on user request (a pageview) so if we delete an URL, it gets
> back quickly if the page is active. Because of that and because there
> is lots of data, we have the keyspace set to RF=1. We can drop the
> whole keyspace and it will regenerate quickly and would contain only
> fresh data, so we don't care about lossing a node.
> </use-case-1>
> <use-case-2>
> trying to extract a small random sample (like a pig SAMPLE) of data out of 
> cassandra.
> </use-case-2>
> <use-case-3>
> searching for something or some-pattern and one hit
> is enough. If you get the hit it's a positive result regardless if
> ranges were ignored, if you don't and you *know* there was a range
> ignored along the way you can re-run the job later. 
> For example such a job could be run at regular intervals in the day until a 
> hit was found.
> </use-case-3>

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3136) Allow CFIF to keep going despite unavailable ranges

Reply via email to