Hi Patrick,
I have searched a way to do it myself but didn't found a correct way to do 
it. Here is what I found:

Having worked with indexing problems before on another search engine and 
other sources, there are always two different jobs:
- The first one does a full scan of the source. With OrientDB it is 
possible using a simple JDBC driver and a few requests. OrientDB can be 
completely scanned using pagination 
http://www.orientechnologies.com/docs/last/Pagination.html
- The second job is more complex. It has to fetch only modified documents 
as often as you need in order to have up to date results.

When fetching updates you want to scan from the start date of the last scan 
because modifications can happen during the scan itself. Let's name this 
start date "checkpoint".

My first thought was that I could save the last modification timestamp in 
OrientDB docs. But I didn't found any way to generate it during commit. It 
MUST not be generated by the application as this would add dates which are 
generated BEFORE the checkpoint but saved AFTER this same checkpoint. Think 
of your application making a modification that spans the start of the 
update scan.

The second approach would be to create a "Modifications to scan" vertex and 
link to it every modified document. This would not scale as it would 
conflict more and more during transactions.

The third approach is to use Hooks which would mark documents as modified. 
However the documentation is rather poor on those. In order to be used by 
an update scan, hook registration need to be transactional. I asked here if 
adding a hook invalidates the running transactions (
https://groups.google.com/forum/#!topic/orient-database/FBHiZg68b1s) but 
did not receive any answer. I tested it myself and found that it is not 
working as I would like (
https://github.com/orientechnologies/orientdb/issues/3763). There is still 
no information as to how it SHOULd work. No specifications.

Maybe one of those features will enable to have a correct update stream:
https://github.com/orientechnologies/orientdb/issues/2652
https://github.com/orientechnologies/orientdb/issues/1227

In the mean time, I don't see any way to index correctly OrientDB. If 
someone succeeded at indexing OrientDB I am interested too.

OrientDB-Lucene is promising but it is too limited for me right now. I 
cannot work without features like highlights or complex scoring.

On Monday, March 16, 2015 at 4:41:36 PM UTC+1, Kevin I wrote:
>
> I can see that OrientDB lucene indices can be done through orientdb-lucene 
> <https://github.com/orientechnologies/orientdb-lucene>, but is there a 
> way to use ElasticSearch in OrientDB? In TitanDB, ElasticSearch support was 
> inbuilt. It would be great if OrientDB has that too.
>
> If not, can I make the two work together out of the box? I haven't used 
> ElasticSearch before, so it would be of great help if anyone can help me 
> out with this.
>
> Thanks.
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to