Re: [topbraid-users] Search the EDG in localhost 6.3.2 - using Lucene on local machine or does it require TBL server ? and how may Lucene be configured ?

2020-05-25 Thread Holger Knublauch

Hi Rob,

I think the scenario you describe is already supported by the free-text 
search if you are in individual asset collections. The Lucene-text index 
there does include all imported files. However, the Search the EDG index 
has the goal to drive the navigation into asset collections, so it 
doesn't include those triples that are not from imported files.


Holger


On 26/05/2020 13:40, Rob Atkinson wrote:


Probably ought to wind up this topic is the answer is "Lucene works on 
local development server" and "Lucene needs data to be in the graph, 
not in imports"  - that said, I guess we'd still be interested in 
seeing if we can open up that easily.


Its another thread to work out how much work it would take to make it 
feasible to copy all the data into EDG and sync it automatically with 
changes in a source. At the moment we are happier with version 
controlled sources and an import via project deployment and we can 
copy data at load time rather than import.
It would be a pain to use the file import service and have to 
multipart form-encode the file contents to load it.


Is there another option to look at a deployed file and efficiently 
copy it into an new asset collection?  Still dont think this gives us 
the flexibility to keep the original data and potential additional 
data separate (segmented A-box union graphs) whilst retaining search 
capability.



On Tuesday, 26 May 2020 10:50:06 UTC+10, Irene Polikoff wrote:

Please see below


On May 25, 2020, at 8:00 PM, Rob Atkinson > wrote:


Yes, it runs on local host.

I can’t reproduce your issues.

One possibility is that you have no data in the asset
collections you set to be indexed. From Rob’s e-mails, I know
that he uses files and asset collections in EDG are simply
“wrappers” for these files. If you are following the same
pattern, you will get no search results. Search the EDG
indexing will index only the content that you actually have
in the asset collection, not content included by reference.



We definitely need to be able to configure search to handle
customised graph closures  - and creating wrappers with
appropriate imports is one obvious way to do this, as well as
being the best way to handle large static data streams generated
by other processes.


Only content in the asset collection is indexed for the Search the
EDG index. In other words, data must be in EDG’s triple store. If
it is external data, it will not be indexed.


I have wondered if, something like teamwork:imports there is a
property a graph could have to indicate a-box content, and hence
inclusion in the default closure for search and display.


You can run queries over included content. Search in the asset
collection will work - even over data that is in a file. This
search uses GraphQL access to a graph with all imports closure.

Search the EDG uses Lucene index that is created and
maintained/updated as data changes. The indexing process only
indexes the content in each asset collection.


It may be a big leap to allow editing only on local content in
future - but perhaps we can start with the search problem

So whats the way forward - is there a piece of Java code we need
to rewrite here?


I will leave this question for Holger.

Content outside of EDG repository requires import closure, loading
and all operations to always be resolved in memory. Our strategic
direction is to minimize such cases with the objective to
eventually have all data stored in EDG’s repository. This aligns
better with a number of goals such as cloud enablement.

Further, one needs to make assumptions about the lifecycle of this
external data (when does it change, etc.). If you have specific
use cases for your application and specific system architecture in
mind, you can make such assumptions, but we don’t. Typically, we
expect customers to load data into EDG repository as an asset
collection, marking it “read only". If the data is largely static
why is it a problem to load it into EDG? A system that creates it
can create EDG asset collection as opposed to make it available as
a file. If the issue is size, I don’t believe that having it
outside of EDG helps in dealing with size.




Part 2:

I've run through documentation I can find for customizing
Lucene. e.g. textindex.ui.ttl but it doesn't give many clues
to configuration all the functionality that exists in Lucene
or in EDG.

E.g. on the EDG search configuration screen (see below) the
selected classes, search facets and properties are listed.
Where are all these configured  ?

How are/can the other many optimization aspects of Lucene
configured ?


Did you look at this


Re: [topbraid-users] Search the EDG in localhost 6.3.2 - using Lucene on local machine or does it require TBL server ? and how may Lucene be configured ?

2020-05-25 Thread Rob Atkinson

Probably ought to wind up this topic is the answer is "Lucene works on 
local development server" and "Lucene needs data to be in the graph, not in 
imports"  - that said, I guess we'd still be interested in seeing if we can 
open up that easily.

Its another thread to work out how much work it would take to make it 
feasible to copy all the data into EDG and sync it automatically with 
changes in a source. At the moment we are happier with version controlled 
sources and an import via project deployment and we can copy data at load 
time rather than import. 
It would be a pain to use the file import service and have to multipart 
form-encode the file contents to load it.

Is there another option to look at a deployed file and efficiently copy it 
into an new asset collection?  Still dont think this gives us the 
flexibility to keep the original data and potential additional data 
separate (segmented A-box union graphs) whilst retaining search capability. 


On Tuesday, 26 May 2020 10:50:06 UTC+10, Irene Polikoff wrote:
>
> Please see below
>
> On May 25, 2020, at 8:00 PM, Rob Atkinson  > wrote:
>
>
>> Yes, it runs on local host.
>>
>> I can’t reproduce your issues.
>>
>> One possibility is that you have no data in the asset collections you set 
>> to be indexed. From Rob’s e-mails, I know that he uses files and asset 
>> collections in EDG are simply “wrappers” for these files. If you are 
>> following the same pattern, you will get no search results. Search the EDG 
>> indexing will index only the content that you actually have in the asset 
>> collection, not content included by reference.
>>
>>
>> We definitely need to be able to configure search to handle customised 
> graph closures  - and creating wrappers with appropriate imports is one 
> obvious way to do this, as well as being the best way to handle large 
> static data streams generated by other processes. 
>
>
> Only content in the asset collection is indexed for the Search the EDG 
> index. In other words, data must be in EDG’s triple store. If it is 
> external data, it will not be indexed. 
>
>  
> I have wondered if, something like teamwork:imports there is a property a 
> graph could have to indicate a-box content, and hence inclusion in the 
> default closure for search and display. 
>
>
> You can run queries over included content. Search in the asset collection 
> will work - even over data that is in a file. This search uses GraphQL 
> access to a graph with all imports closure. 
>
> Search the EDG uses Lucene index that is created and maintained/updated as 
> data changes. The indexing process only indexes the content in each asset 
> collection.
>
> It may be a big leap to allow editing only on local content in future - 
> but perhaps we can start with the search problem
>
> So whats the way forward - is there a piece of Java code we need to 
> rewrite here? 
>
>
> I will leave this question for Holger. 
>
> Content outside of EDG repository requires import closure, loading and all 
> operations to always be resolved in memory. Our strategic direction is to 
> minimize such cases with the objective to eventually have all data stored 
> in EDG’s repository. This aligns better with a number of goals such as 
> cloud enablement.
>
> Further, one needs to make assumptions about the lifecycle of this 
> external data (when does it change, etc.). If you have specific use cases 
> for your application and specific system architecture in mind, you can make 
> such assumptions, but we don’t. Typically, we expect customers to load data 
> into EDG repository as an asset collection, marking it “read only". If the 
> data is largely static why is it a problem to load it into EDG? A system 
> that creates it can create EDG asset collection as opposed to make it 
> available as a file. If the issue is size, I don’t believe that having it 
> outside of EDG helps in dealing with size.
>
>
> Part 2:
>>
>> I've run through documentation I can find for customizing Lucene. e.g. 
>> textindex.ui.ttl but it doesn't give many clues to configuration all the 
>> functionality that exists in Lucene or in EDG.
>>
>> E.g. on the EDG search configuration screen (see below) the selected 
>> classes, search facets and properties are listed. Where are all these 
>> configured  ? 
>>
>> How are/can the other many optimization aspects of Lucene configured ?
>>
>>
>> Did you look at this 
>> https://doc.topquadrant.com/6.3/developer-guide/#Search_the_EDG_Customizations
>> ?
>>
>> These customizations are about selecting facets and configuring how the 
>> results page will look like.
>>
>> By default, EDG will auto calculate the facets to be shown on the results 
>> page - using the “most populated” properties. But you can customize this.
>>
>>
>>
>> Current Search Configuration
>>
>> Search is currently configured, by administrator, to find items in the 
>> asset collections:
>> Selected Collections
>>
>>- Data Graphs:  Wiring Rules Data Graph
>> 

Re: [topbraid-users] Search the EDG in localhost 6.3.2 - using Lucene on local machine or does it require TBL server ? and how may Lucene be configured ?

2020-05-25 Thread Rob Atkinson

>
>
> Yes, it runs on local host.
>
> I can’t reproduce your issues.
>
> One possibility is that you have no data in the asset collections you set 
> to be indexed. From Rob’s e-mails, I know that he uses files and asset 
> collections in EDG are simply “wrappers” for these files. If you are 
> following the same pattern, you will get no search results. Search the EDG 
> indexing will index only the content that you actually have in the asset 
> collection, not content included by reference.
>
>
> We definitely need to be able to configure search to handle customised 
graph closures  - and creating wrappers with appropriate imports is one 
obvious way to do this, as well as being the best way to handle large 
static data streams generated by other processes. 
 
I have wondered if, something like teamwork:imports there is a property a 
graph could have to indicate a-box content, and hence inclusion in the 
default closure for search and display. It may be a big leap to allow 
editing only on local content in future - but perhaps we can start with the 
search problem

So whats the way forward - is there a piece of Java code we need to rewrite 
here? 

Part 2:
>
> I've run through documentation I can find for customizing Lucene. e.g. 
> textindex.ui.ttl but it doesn't give many clues to configuration all the 
> functionality that exists in Lucene or in EDG.
>
> E.g. on the EDG search configuration screen (see below) the selected 
> classes, search facets and properties are listed. Where are all these 
> configured  ? 
>
> How are/can the other many optimization aspects of Lucene configured ?
>
>
> Did you look at this 
> https://doc.topquadrant.com/6.3/developer-guide/#Search_the_EDG_Customizations
> ?
>
> These customizations are about selecting facets and configuring how the 
> results page will look like.
>
> By default, EDG will auto calculate the facets to be shown on the results 
> page - using the “most populated” properties. But you can customize this.
>
>
>
> Current Search Configuration
>
> Search is currently configured, by administrator, to find items in the 
> asset collections:
> Selected Collections
>
>- Data Graphs:  Wiring Rules Data Graph
> (urn:x-evn-master:rules_data_graph)
>- Ontologies:  wiring rules data
> (urn:x-evn-master:rules_data)
>
>
> Regards
>
> Simon 
>
> Selected Classes
>
>
> Selected Search Properties
>
>
> Selected Search Facets
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "TopBraid Suite Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to topbrai...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/topbraid-users/a993e039-8567-4d61-97ca-fb46df1f1284%40googlegroups.com
>  
> 
> .
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to topbraid-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/topbraid-users/b4dfcc02-35c8-4ba1-8aea-2605b220d0c9%40googlegroups.com.


Re: [topbraid-users] Search the EDG in localhost 6.3.2 - using Lucene on local machine or does it require TBL server ? and how may Lucene be configured ?

2020-05-25 Thread Simon Opper
Thanks very much for the reply Irene.

Could you please fix the broken image links in the documentation link you
sent please. The guidance on customising the facets is not visible and I
believe this is the info I need 

[image: image.png]

Re: data wrapping, it's not the issue per se it seems as I was able to get
things working with the data assets we have. But it appears that something
else we are doing with automated bulk loading / scanning for ontology
manifest changes is breaking the indexing or triggering searching the EDG.

Many thanks

S





On Mon, May 25, 2020 at 4:35 PM Irene Polikoff 
wrote:

> Simon,
>
> Please see below
>
> On May 25, 2020, at 1:02 AM, Simon Opper <
> simon.op...@surroundaustralia.com> wrote:
>
> Hi there
>
> We need to optimise Lucene in a text search App and I want to run some
> tests and explore the configuration options for Lucene on a text corpus,
> datagraph and/or ontology.
>
> Part 1 of my question is:
>
> When using a local instance of TBCME and EDG on 6.3.2 search the EDG box
> from the main home page of EDG is not responsive. There is nothing
> triggered in the client after the on click action as observed via chrome
> dev tools when clicking the search button.
>
> Does Search the EDG require a server or can it be run on localhost ?
>
>
> Yes, it runs on local host.
>
> I can’t reproduce your issues.
>
> One possibility is that you have no data in the asset collections you set
> to be indexed. From Rob’s e-mails, I know that he uses files and asset
> collections in EDG are simply “wrappers” for these files. If you are
> following the same pattern, you will get no search results. Search the EDG
> indexing will index only the content that you actually have in the asset
> collection, not content included by reference.
>
>
> Part 2:
>
> I've run through documentation I can find for customizing Lucene. e.g.
> textindex.ui.ttl but it doesn't give many clues to configuration all the
> functionality that exists in Lucene or in EDG.
>
> E.g. on the EDG search configuration screen (see below) the selected
> classes, search facets and properties are listed. Where are all these
> configured  ?
>
> How are/can the other many optimization aspects of Lucene configured ?
>
>
> Did you look at this
> https://doc.topquadrant.com/6.3/developer-guide/#Search_the_EDG_Customizations
> ?
>
> These customizations are about selecting facets and configuring how the
> results page will look like.
>
> By default, EDG will auto calculate the facets to be shown on the results
> page - using the “most populated” properties. But you can customize this.
>
>
>
> Current Search Configuration
>
> Search is currently configured, by administrator, to find items in the
> asset collections:
> Selected Collections
>
>- Data Graphs:  Wiring Rules Data Graph
> (urn:x-evn-master:rules_data_graph)
>- Ontologies:  wiring rules data
> (urn:x-evn-master:rules_data)
>
>
> Regards
>
> Simon
>
> Selected Classes
>
>
> Selected Search Properties
>
>
> Selected Search Facets
>
> --
> You received this message because you are subscribed to the Google Groups
> "TopBraid Suite Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to topbraid-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/topbraid-users/a993e039-8567-4d61-97ca-fb46df1f1284%40googlegroups.com
> 
> .
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "TopBraid Suite Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to topbraid-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/topbraid-users/6F056960-E015-4C37-9523-427846D6237A%40topquadrant.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to topbraid-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/topbraid-users/CABfSiRPeAzSxyWGiKtNauOb5kPeOFY--VorbkcrjKzgMB8uorg%40mail.gmail.com.


Re: [topbraid-users] Search the EDG in localhost 6.3.2 - using Lucene on local machine or does it require TBL server ? and how may Lucene be configured ?

2020-05-25 Thread Irene Polikoff
Simon,

Please see below

> On May 25, 2020, at 1:02 AM, Simon Opper  
> wrote:
> 
> Hi there
> 
> We need to optimise Lucene in a text search App and I want to run some tests 
> and explore the configuration options for Lucene on a text corpus, datagraph 
> and/or ontology.
> 
> Part 1 of my question is:
> 
> When using a local instance of TBCME and EDG on 6.3.2 search the EDG box from 
> the main home page of EDG is not responsive. There is nothing triggered in 
> the client after the on click action as observed via chrome dev tools when 
> clicking the search button.
> 
> Does Search the EDG require a server or can it be run on localhost ?

Yes, it runs on local host.

I can’t reproduce your issues.

One possibility is that you have no data in the asset collections you set to be 
indexed. From Rob’s e-mails, I know that he uses files and asset collections in 
EDG are simply “wrappers” for these files. If you are following the same 
pattern, you will get no search results. Search the EDG indexing will index 
only the content that you actually have in the asset collection, not content 
included by reference.
> 
> Part 2:
> 
> I've run through documentation I can find for customizing Lucene. e.g. 
> textindex.ui.ttl but it doesn't give many clues to configuration all the 
> functionality that exists in Lucene or in EDG.
> 
> E.g. on the EDG search configuration screen (see below) the selected classes, 
> search facets and properties are listed. Where are all these configured  ? 
> 
> How are/can the other many optimization aspects of Lucene configured ?

Did you look at this 
https://doc.topquadrant.com/6.3/developer-guide/#Search_the_EDG_Customizations 
?

These customizations are about selecting facets and configuring how the results 
page will look like.

By default, EDG will auto calculate the facets to be shown on the results page 
- using the “most populated” properties. But you can customize this.

> 
> 
> Current Search Configuration
> 
> Search is currently configured, by administrator, to find items in the asset 
> collections:
> 
> Selected Collections
> 
> Data Graphs:  Wiring Rules Data Graph
>  (urn:x-evn-master:rules_data_graph)
> Ontologies:  wiring rules data
>  (urn:x-evn-master:rules_data)
> 
> Regards
> 
> Simon 
> 
> Selected Classes
> 
> Selected Search Properties
> 
> Selected Search Facets
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "TopBraid Suite Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to topbraid-users+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/topbraid-users/a993e039-8567-4d61-97ca-fb46df1f1284%40googlegroups.com
>  
> .

-- 
You received this message because you are subscribed to the Google Groups 
"TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to topbraid-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/topbraid-users/6F056960-E015-4C37-9523-427846D6237A%40topquadrant.com.