Update: after checking the Java-client and my bucket code I noticed that I am 
doing the following:

val bucket = DB.client.createBucket(bucketName).enableForSearch().execute()

I have a feeling that “enableForSearch” is causing each object in its entirety 
to be indexed, instead of the explicit fields. Checking this Javadoc 
http://basho.github.io/riak-java-client/1.0.5/com/basho/riak/client/bucket/WriteBucket.html
 shows that search=true is written for the bucket.

Would that cause the entire object to be indexed, not just the explicit fields?



On Nov 24, 2013, at 3:02 PM, Justin Long <[email protected]> wrote:

> Interesting, as I mentioned previously any objects in the “collector” buckets 
> all share the same structure. It is trying to index a field that is actually 
> inside a String. “data_followers” is a key inside the “data” map and the 
> value for that key is escaped JSON (it goes into offline processing FYI 
> later). And as you can see there isn’t any indexing set for that field.
> 
> 
> On Nov 24, 2013, at 2:56 PM, Joe Caswell <[email protected]> wrote:
> 
>> Justin,
>> 
>>   The binary in the log entry below equates to:
>> {<<"collector-collect-twitter">>,<<"data_followers">>,<<32897-byte string>>}
>> 
>>   Hope this helps.
>> 
>> Joe
>> From: Justin Long <[email protected]>
>> Date: Sunday, November 24, 2013 5:17 PM
>> To: Joe Caswell <[email protected]>
>> Cc: Richard Shaw <[email protected]>, riak-users <[email protected]>
>> Subject: Re: Runaway "Failed to compact" errors
>> 
>> Thanks Joe. I would agree that would probably be the problem. I am concerned 
>> since none of the fields of objects I am storing in Riak would produce a key 
>> larger than 32kb. Here’s a sample Scala (Java-based) POJO that represents an 
>> object in the problem bucket using the Riak-Java-Client:
>> 
>> case class InstagramCache(
>>   @(JsonProperty@field)("identityId")
>>   @(RiakKey@field)
>>   val identityId: String, // ID of user on social network
>>   
>>   @(JsonProperty@field)("userId")
>>   @(RiakIndex@field)(name = "userId")
>>   val userId: String, // associated user ID on platform
>>   
>>   @(JsonProperty@field)("data")
>>   val data: Map[String, Option[String]],
>>   
>>   @(JsonProperty@field)("updated")
>>   var updated: Date
>>   
>> )
>> 
>> The fields identityId and userId would rarely exceed 30 characters. Is Riak 
>> trying to index the whole object?
>> 
>> Thanks
>> 
>> 
>> 
>> On Nov 24, 2013, at 2:11 PM, Joe Caswell <[email protected]> wrote:
>> 
>>> Justin,
>>> 
>>> The terms being stored in merge index are too large. The maximum size for 
>>> an {Index, Field, Term} key is 32k bytes.
>>> The binary blob in your log entry represents a tuple that was 32952 bytes.  
>>> Since merge index uses a 15-bit integer to store term size, if the 
>>> term_to_binary of the given key is larger than 32767, high bits are lost, 
>>> effectively storing (<large size> mod 32767) bytes.
>>> When this data is read back, binary_to_term is unable to reconstruct the 
>>> key due the missing bytes, and throws a badarg exception.
>>> 
>>> Search index repair is document here: 
>>> http://docs.basho.com/riak/1.4.0/cookbooks/Repairing-Search-Indexes/  
>>> However,  you would need to first modify your extractor to not produce 
>>> search keys larger than 32k or the corruption issues will recur.
>>> 
>>> Joe Caswell
>>> 
>>> 
>>> From: Richard Shaw <[email protected]>
>>> Date: Sunday, November 24, 2013 4:25 PM
>>> To: Justin Long <[email protected]>
>>> Cc: riak-users <[email protected]>
>>> Subject: Re: Runaway "Failed to compact" errors
> 

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to