Re: [orientdb] Some thought regardinthe usability of fulltext search in a graph vs. document environment

stefan Mon, 26 May 2014 02:06:06 -0700

Hi Recardo,

Thank you for taking the time :).


If the information could be assembled into a single document for indexing 
only then it would return a pointer to the master vertex but it could 
return the virtual document as we. 
If the temporary document is stored on the indexing site then the index 
grows bigger but the advantage is that you have a "preview" document 
available for search results.

We used the latter approach but the project we are working on now is too 
large for that to be feasible.

If the information is combined in this way the novice users (regular 
employees) can compose a Lucene query which has a syntax close enough to 
advanced search in google to be "acceptable" to them.

It's far fetched to write a query parser, for simple queries, that allows 
search over fulltext+graph. The only alternative that has come to mind is 
to create this "searchable document" that contains information from various 
vertices.

I'm a OrientDB fan. One of the biggest reasons for that is this hybrid 
model of documents with graph based relations. The fulltext capabilities of 
competing document-only stores (Elastic Search/Solr) offer so much more 
usability then this can offer without additional functionality like this.

I'm not sure if this is the right approach. The only thing I'm pretty sure 
of is that something like this is needed to complete the merger of the two 
domains (documents+graphs).

Regards,
  -Stefán


On Monday, 26 May 2014 06:57:00 UTC, Riccardo Tasso wrote:
>
> Very interesting post.
>
> In my application, based on OrientDB and FULLTEXT indices (not lucene by 
> now), I recently had a similar experience.
>
> I indexed the field "title" of a document, but the user expected to find a 
> document also when a given field of a connected vertex matched the text 
> query. I don't remember if it's possible, but I'd like to achieve this 
> result with a query of the form:
>
> SELECT FROM Person WHERE name containsText "Garibaldi" OR 
> out('address').street containsText "Garibaldi"
>
> When you write a query, the result represent a "virtual document", so I 
> don't think there will be required extra functionalities.
>
> Do you agree?
> Cheers,
>    Riccardo
>
>
> 2014-05-24 13:53 GMT+02:00 <[email protected] <javascript:>>:
>
>> Hi,
>>
>> I'm one of the OrientDB users that celebrates that Lucene can now be used 
>> for fulltext indexing in OrientDB, thank you!
>>
>> I have been using Solr for some years and used it, for example, to build 
>> extensive an "Enterprise search" where it shined (powered by Lucene).
>>
>> One of the things we found was that the MultiFieldQueryParser was quite 
>> helpful as it provided users/employees a fairly simple and powerful way to 
>> narrow their search for entities/documents.
>>
>> The usability of fulltext search is diminishes somewhat when it creation 
>> relies on information stored in a graph as related information, like 
>> address+postcodes, have been spread out/normalized over many vertices and 
>> edges.
>> A search for "John that lives on Pine* in 980201" can no longer be built 
>> using the index or the MultiFieldQueryParser, at least not without 
>> combining OSQL with the Lucene query.
>>
>> What I guess I'm trying to say is that the fulltext-document-search that 
>> shines when it's based on documents (take Solr/Elastic Search for example) 
>> is rendered quite limited when used on to of a graph if it can only consist 
>> of information from a single vertex.
>>
>> The ability to create a virtual/temporary document for fulltext indexing, 
>> from the information of a vertex and the adjacent vertices, is quite 
>> appealing to me but and it would bridge the gap between document and graph 
>> strengths and weaknesses when it comes to fulltext search.
>>
>> I realize that there is a line between what the database it self should 
>> do and what the users need to do by them selves, but keeping in mind that 
>> OrientDB's main differentiation is it's mix of a document store and graph I 
>> think that a more powerful Fultext feature, that takes these differences 
>> into account, could help establish it as a clear winner in both spaces.
>>
>> There are many small projects, like Django-Haystack, that focus on the 
>> ability to create virtual/temporary documents for indexing-only purposes 
>> that might be helpful in evaluating options to improve this.
>>
>> Please let me know if anyone else here shares this view or, better yet, 
>> has devices a simple way around this limitation.
>>
>> Very best regards,
>>   -Stefán
>>
>>
>>
>>
>>
>>
>>
>>  -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Some thought regardinthe usability of fulltext search in a graph vs. document environment

Reply via email to