Thanks all for the suggestions. From what I've understood so far, I think
it's better to go with Orientdb-Lucene for now. I don't think its worth the
effort to make ES work with OrientDB as it clearly requires some additional
work in updating the indices in ES, since the application I am working on
is not going to have millions of users.

I have used neither ES nor Lucene before, but I guess Lucene should be
enough since ES is built on top of Lucene. So I guess I'll give it a try.

On Fri, Mar 20, 2015 at 5:00 PM, Nicolas Harraudeau <
nicolas.harraud...@gmail.com> wrote:

> Hi Kevin,
>
> There might be a way to mark documents as updated. This is not an easy
> solution and I didn't try it yet. It uses MVCC and Optimistic Transactions
> (you can read more about this here
> http://www.orientechnologies.com/docs/2.0/orientdb.wiki/Transactions.html
> ).
> Let's say you have your application on one side which is adding, deleting,
> updating documents in OrientDB. On the other side you have your replication
> process which reads OrientDB and writes in Elasticsearch.
>
> When your replication process starts scanning OrientDB, it creates/replace
> first a unique vertex (let's call it "checkpoint vertex") which contains
> the start date of the scan.
> Each time your application modifies OrientDB, it reads the checkpoint
> vertex and set the modification date of each indexed vertex/edge to its
> date. If a scan started during the modification, the checkpoint vertex has
> been changed and the transaction should fail.
> For deletes, a vertex describing the delete has to be created.
>
> This has some drawbacks:
> - the application has either to know what is indexed in ES, or it has to
> set a date on every vertex/edge.
> - you must use transactions even when you want to modify one vertex/edge.
>
> I don't like this solution very much but it might be ok for you.
>
> You might also use a file or something else as a modifications log. But
> then you can't backup both the modification log and the OrientDB graph at
> the same time.
>
> Regards,
>
> On Thursday, March 19, 2015 at 5:37:18 PM UTC+1, Nicolas Harraudeau wrote:
>>
>> Hi Patrick,
>> I have searched a way to do it myself but didn't found a correct way to
>> do it. Here is what I found:
>>
>> Having worked with indexing problems before on another search engine and
>> other sources, there are always two different jobs:
>> - The first one does a full scan of the source. With OrientDB it is
>> possible using a simple JDBC driver and a few requests. OrientDB can be
>> completely scanned using pagination http://www.
>> orientechnologies.com/docs/last/Pagination.html
>> - The second job is more complex. It has to fetch only modified documents
>> as often as you need in order to have up to date results.
>>
>> When fetching updates you want to scan from the start date of the last
>> scan because modifications can happen during the scan itself. Let's name
>> this start date "checkpoint".
>>
>> My first thought was that I could save the last modification timestamp in
>> OrientDB docs. But I didn't found any way to generate it during commit. It
>> MUST not be generated by the application as this would add dates which are
>> generated BEFORE the checkpoint but saved AFTER this same checkpoint. Think
>> of your application making a modification that spans the start of the
>> update scan.
>>
>> The second approach would be to create a "Modifications to scan" vertex
>> and link to it every modified document. This would not scale as it would
>> conflict more and more during transactions.
>>
>> The third approach is to use Hooks which would mark documents as
>> modified. However the documentation is rather poor on those. In order to be
>> used by an update scan, hook registration need to be transactional. I asked
>> here if adding a hook invalidates the running transactions (
>> https://groups.google.com/forum/#!topic/orient-database/FBHiZg68b1s) but
>> did not receive any answer. I tested it myself and found that it is not
>> working as I would like (https://github.com/orientechnologies/orientdb/
>> issues/3763). There is still no information as to how it SHOULd work. No
>> specifications.
>>
>> Maybe one of those features will enable to have a correct update stream:
>> https://github.com/orientechnologies/orientdb/issues/2652
>> https://github.com/orientechnologies/orientdb/issues/1227
>>
>> In the mean time, I don't see any way to index correctly OrientDB. If
>> someone succeeded at indexing OrientDB I am interested too.
>>
>> OrientDB-Lucene is promising but it is too limited for me right now. I
>> cannot work without features like highlights or complex scoring.
>>
>> On Monday, March 16, 2015 at 4:41:36 PM UTC+1, Kevin I wrote:
>>>
>>> I can see that OrientDB lucene indices can be done through
>>> orientdb-lucene <https://github.com/orientechnologies/orientdb-lucene>,
>>> but is there a way to use ElasticSearch in OrientDB? In TitanDB,
>>> ElasticSearch support was inbuilt. It would be great if OrientDB has that
>>> too.
>>>
>>> If not, can I make the two work together out of the box? I haven't used
>>> ElasticSearch before, so it would be of great help if anyone can help me
>>> out with this.
>>>
>>> Thanks.
>>>
>>  --
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "OrientDB" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/orient-database/2g5VbvwDLk4/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> orient-database+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Always remember that the world around you is made by people that are no
smarter than you and me.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to orient-database+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to