The overhead would be in parsing. But you could skip all that if you prepended 
constant length data to your text. Something like :

Field:Val field:Val text

Where field and Val length are constant.

Maybe like a guid:100

Where that guid is known to you to be the file size.


@siculars
http://siculars.posterous.com

Sent from my iRotaryPhone

On Jul 21, 2012, at 2:16, Metin Akat <[email protected]> wrote:

> I was thinking about this too, but as I said, these text files are sometimes 
> quite big.  Sometimes megabytes. Rarely - tens of megabytes. They are all 
> "write once, read quite a lot". So having them as JSON is probably going to 
> put quite a lot of load onto riak and my application (deserialize a big chunk 
> of JSON on every read). Of course, I might be wrong, I'll have to benchmark 
> it probably, but I don't really feel very comfortable about it. Besides of 
> potentially being a performance issue, it also feels quite ugly to me. Have 
> you done this? How big files? How's the performance?
> 
> On Sat, Jul 21, 2012 at 7:52 AM, Alexander Sicular <[email protected]> wrote:
> Turn your text into a json obj. Maybe something like this:
> 
> { size: 100
> Name: bla
> Date: 1/1/2012
> Raw_txt: txt
> }
> 
> 
> @siculars
> http://siculars.posterous.com
> 
> Sent from my iRotaryPhone
> 
> On Jul 20, 2012, at 17:49, Metin Akat <[email protected]> wrote:
> 
> > Hi,
> >
> > I am using riak to store (relatively large) text files. I store them as 
> > normal riak objects where the value is the text of the file. Now I want to 
> > index and search them. All is fine, I just enabled the "standard" search 
> > pre-commit hook for that bucket and they get indexed nicely. But, there is 
> > one tricky requirement. I need to be able to index and search some metadata 
> > about these files. For example date of submission, size of file, type 
> > (internal business logic) of file etc.
> >
> > I have been thinking quite a lot about this recently. Asked several times 
> > on #riak. I got one answer suggesting that I create a second "metadata" 
> > riak object for each file, link it to the "file object" and index it 
> > separately. That's not really what I want, because I need to be able to 
> > execute "combined" queries, like value:<some word> AND date:<some date>.
> >
> > So, here is the ideal solution that I'm thinking about.... It would be 
> > great if it's possible to modify the riak search index object. After the 
> > file is submitted, and after it's indexed, I could just fetch the index and 
> > just add some more fields to it.
> > I see there is a bucket with the search index objects that's automatically 
> > created by riak search. So I guess it is indeed possible, though I don't 
> > know what to expect. Is it a good idea? If not, what else could I do in 
> > order to solve the problem?
> >
> > Regards,
> > Metin
> > _______________________________________________
> > riak-users mailing list
> > [email protected]
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to