Thanks all for the suggestions. From what I've understood so far, I think it's better to go with Orientdb-Lucene for now. I don't think its worth the effort to make ES work with OrientDB as it clearly requires some additional work in updating the indices in ES, since the application I am working on is not going to have millions of users.
I have used neither ES nor Lucene before, but I guess Lucene should be enough since ES is built on top of Lucene. So I guess I'll give it a try. On Fri, Mar 20, 2015 at 5:00 PM, Nicolas Harraudeau < nicolas.harraud...@gmail.com> wrote: > Hi Kevin, > > There might be a way to mark documents as updated. This is not an easy > solution and I didn't try it yet. It uses MVCC and Optimistic Transactions > (you can read more about this here > http://www.orientechnologies.com/docs/2.0/orientdb.wiki/Transactions.html > ). > Let's say you have your application on one side which is adding, deleting, > updating documents in OrientDB. On the other side you have your replication > process which reads OrientDB and writes in Elasticsearch. > > When your replication process starts scanning OrientDB, it creates/replace > first a unique vertex (let's call it "checkpoint vertex") which contains > the start date of the scan. > Each time your application modifies OrientDB, it reads the checkpoint > vertex and set the modification date of each indexed vertex/edge to its > date. If a scan started during the modification, the checkpoint vertex has > been changed and the transaction should fail. > For deletes, a vertex describing the delete has to be created. > > This has some drawbacks: > - the application has either to know what is indexed in ES, or it has to > set a date on every vertex/edge. > - you must use transactions even when you want to modify one vertex/edge. > > I don't like this solution very much but it might be ok for you. > > You might also use a file or something else as a modifications log. But > then you can't backup both the modification log and the OrientDB graph at > the same time. > > Regards, > > On Thursday, March 19, 2015 at 5:37:18 PM UTC+1, Nicolas Harraudeau wrote: >> >> Hi Patrick, >> I have searched a way to do it myself but didn't found a correct way to >> do it. Here is what I found: >> >> Having worked with indexing problems before on another search engine and >> other sources, there are always two different jobs: >> - The first one does a full scan of the source. With OrientDB it is >> possible using a simple JDBC driver and a few requests. OrientDB can be >> completely scanned using pagination http://www. >> orientechnologies.com/docs/last/Pagination.html >> - The second job is more complex. It has to fetch only modified documents >> as often as you need in order to have up to date results. >> >> When fetching updates you want to scan from the start date of the last >> scan because modifications can happen during the scan itself. Let's name >> this start date "checkpoint". >> >> My first thought was that I could save the last modification timestamp in >> OrientDB docs. But I didn't found any way to generate it during commit. It >> MUST not be generated by the application as this would add dates which are >> generated BEFORE the checkpoint but saved AFTER this same checkpoint. Think >> of your application making a modification that spans the start of the >> update scan. >> >> The second approach would be to create a "Modifications to scan" vertex >> and link to it every modified document. This would not scale as it would >> conflict more and more during transactions. >> >> The third approach is to use Hooks which would mark documents as >> modified. However the documentation is rather poor on those. In order to be >> used by an update scan, hook registration need to be transactional. I asked >> here if adding a hook invalidates the running transactions ( >> https://groups.google.com/forum/#!topic/orient-database/FBHiZg68b1s) but >> did not receive any answer. I tested it myself and found that it is not >> working as I would like (https://github.com/orientechnologies/orientdb/ >> issues/3763). There is still no information as to how it SHOULd work. No >> specifications. >> >> Maybe one of those features will enable to have a correct update stream: >> https://github.com/orientechnologies/orientdb/issues/2652 >> https://github.com/orientechnologies/orientdb/issues/1227 >> >> In the mean time, I don't see any way to index correctly OrientDB. If >> someone succeeded at indexing OrientDB I am interested too. >> >> OrientDB-Lucene is promising but it is too limited for me right now. I >> cannot work without features like highlights or complex scoring. >> >> On Monday, March 16, 2015 at 4:41:36 PM UTC+1, Kevin I wrote: >>> >>> I can see that OrientDB lucene indices can be done through >>> orientdb-lucene <https://github.com/orientechnologies/orientdb-lucene>, >>> but is there a way to use ElasticSearch in OrientDB? In TitanDB, >>> ElasticSearch support was inbuilt. It would be great if OrientDB has that >>> too. >>> >>> If not, can I make the two work together out of the box? I haven't used >>> ElasticSearch before, so it would be of great help if anyone can help me >>> out with this. >>> >>> Thanks. >>> >> -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "OrientDB" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/orient-database/2g5VbvwDLk4/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > orient-database+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- Always remember that the world around you is made by people that are no smarter than you and me. -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.