Re: [orientdb] Re: RecordID generation in distributed mode

Keith Freeman Fri, 19 Dec 2014 07:51:12 -0800

FWIW, we use multiple clusters for all of our classes and simply treat the 
entire RID (cluster:index) as the id of each record.  Since we store many 
millions of records every day for some classes, we create new clusters 
named according to the date, and run queries directly against specific 
clusters -- this way we are managing the clusters ourselves, which greatly 
improves query performance at the expense of having to run multiple queries 
to get data from several days.


We have investigated orientdb's distributed architecture but due to many 
stability/reliability issues have decided for our project running orientdb 
on several servers independently works better for our needs.

On Monday, December 15, 2014 10:54:26 PM UTC-7, Mateusz Dymczyk wrote:
>
> @Stephen:
>
> Sorry for hijacking your thread ;-)
>
> @Luca:
>
> Your suggestion means that I have to generate the IDs by myself, I don't 
> want that because it's hard to pull off in a distributed setting when you 
> don't have support from the database.
>
> After saving the record it's visible in the DB but I didn't generate the 
> ID yet (generating incremented sequences in a dist system isn't instant as 
> you know) so I listing all elements from the DB will show me something I 
> cannot query... I would need to somehow mark those records as not visible 
> yet or save them in a different table, generate the id and then put them in 
> the final table. There are probably other ways to do it but they are all 
> pretty clumsy and bug prone. That's why usually people rely on IDs 
> generated by the db, because it's easier for the DB not to publish the data 
> before it is fully initialized and stored.
>
> I could first generate the ID and then save it but then I need to manage 
> all the failures etc. And what happens if operation 1 fails and 2 goes 
> through? Yup a lot of corner cases.
>
> That's why I find the way you guys implemented master-replica kind of 
> weird and perplexing.
>
> Also one more thing to notice: if you deploy your app in an AWS or similar 
> cloud setting and name your nodes as "database_<someAwsKey>" where 
> <someAwsKey> is an ID for your AWS instance you will have a lot of clusters 
> in your DB if you shutdown/start your instances frequently which isn't that 
> rare in an AWS environment as you want to spin up new instances only during 
> peak times. This will grow the cluster table and the IDs might get 
> ridiculously big. I have only around 80 classes (1 cluster per class) and 
> after very short time with 3 nodes I had already 500 clusters. Ids like 
> #23414321:0 might not be user friendly.
>
> Just my two cents. 
>
> Anyway if you have any more suggestions I think it would be fair to 
> Stephan to move this discussion to 
> https://groups.google.com/forum/#!topic/orient-database/pdBWuacJpI0
>
> Mateusz
>
> On Tuesday, December 16, 2014 1:47:03 AM UTC+9, Stéphane Schild wrote:
>>
>> Hi,
>>
>> I think he wants to query the customers by using a unique ID without 
>> having to map it as you propose.
>>
>> Could I just ask something that is important to me ? As you say, the 
>> behavior in 1.7 is not as described, could you quickly explain how the 
>> multi-master architecture was working in 1.7 ?
>>
>> Thanks
>>
>> Stéphane
>>
>> Le lundi 15 décembre 2014 17:28:44 UTC+1, Lvc@ a écrit :
>>>
>>> Hi Mateusz.
>>> If you have 3 nodes, you will have:
>>>
>>>    - customer: #13:0
>>>    - customer_node1: #14:0
>>>    - customer_node2: #15:0
>>>    
>>> While with 1.7 you had.
>>>
>>>    - customer: #13:0
>>>    - customer: #13:1
>>>    - customer: #13:2
>>>
>>> Well, you could manage ID considering the node, so:
>>>
>>>    - customer: #13:0 -> 0
>>>    - customer_node1: #14:0 -> 1
>>>    - customer_node2: #15:0 -> 2
>>>    - customer: #13:1 -> 3
>>>    - customer_node1: #14:1 -> 4
>>>    - customer_node2: #15:1 -> 5
>>>    - customer: #13:2 -> 6
>>>    - customer_node1: #14:2 -> 7
>>>    - customer_node2: #15:2 -> 8
>>>
>>> WDYT?
>>>
>>> Lvc@
>>>
>>>
>>> On 15 December 2014 at 14:42, Mateusz Dymczyk <[email protected]> wrote:
>>>>
>>>> Hey Luca,
>>>>
>>>> My usecase is very trivial: in my app cluster and class are synonyms as 
>>>> I always had only one cluster per class. Furthermore I allow my users to 
>>>> query the data by a long ID, rest style "URL/{classname}/{id}". The ID was 
>>>> basically the cluster position generated by Orient. Now I can't use that 
>>>> as 
>>>> I can have N records with the same cluster position for a given cluster 
>>>> type, where N is the number of nodes. With this I need to tell my users to 
>>>> use the whole {clusterId:clusterPos} string as an ID which is very user 
>>>> unfriendly and confusing...
>>>>
>>>> Somehow it's hard for me to believe no one else is using those IDs in 
>>>> such a way? That's pretty standard practice in ORMs etc.
>>>>
>>>> Mateusz 
>>>>
>>>> On Monday, December 15, 2014 9:28:57 PM UTC+9, Lvc@ wrote:
>>>>>
>>>>> Hi guys,
>>>>> The RID is always the same across all the nodes. But the used sequence 
>>>>> is per server. Playing with different clusters allows us to avoiding 
>>>>> conflicts.
>>>>>
>>>>> Unfortunately there is no way to force a clusterId on creation in 
>>>>> distributed mode if that clusterId is assigned to another server. This is 
>>>>> mandatory to avoid conflicts.
>>>>>
>>>>> @Mateusz, what's your use case?
>>>>>
>>>>> Lvc@
>>>>>
>>>>>
>>>>> On 15 December 2014 at 10:39, Stéphane Schild <[email protected]> 
>>>>> wrote:
>>>>>>
>>>>>> Thanks Lvc for your response !
>>>>>>
>>>>>> To follow on Mateusz's question, is it really possible to have 
>>>>>> multiple records with the same id across distributed nodes ?
>>>>>> Maybe is there some centralized sequence for cluster ids to avoid 
>>>>>> different classes to use the same cluster ids ?
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> --- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "OrientDB" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>  -- 
>>>>
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "OrientDB" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Re: RecordID generation in distributed mode

Reply via email to