Re: Indexing file with security problem

2013-07-04 Thread Sanne Grinovero
To be honest I am not familiar with ManifoldCF, so I won't say if
Hibernate Search is better or not, but it would definitely not be too
hard with Hibernate Search:

1) You annotate with @Indexed the entity referring to your PostgreSQL
table containing the metadata; with @TikaBridge you point it to the
external resource containing the document.

Returning database ids is the default behaviour.

http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#d0e4244

2) Is a bit more complex but I don't think any more complex than what
it would be with other technologies: you should encode some
information in the index, then define a parametric filter on that.

http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#query-filter

3) Not sure, sorry. But the automatic indexing triggers happen as soon
as you store the metadata, so maybe that is good enough?

Looks interesting!

Sanne - Hibernate Search team


On 27 June 2013 03:14, Otis Gospodnetic otis.gospodne...@gmail.com wrote:
 Hi,

 I would start from ManifoldCF - it may save you some work.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/

 On Jun 26, 2013 5:01 PM, lukasw lukas...@gmail.com wrote:

 Hello

 I'll try to briefly describe my problem and task.
 My name is Lukas and i am Java developer , my task is to create search
 engine for different types of file (only text file types) pdf, word, odf,
 xml but not html.
 I have got little experience with lucene about year ago i wrote simple
 full
 text search using lucene and hibernate search. That was simple project.
 But
 now i have got very difficult task with searching.
 We are using java 1.7 and glassfish 3 and i have to concentrate only
 server
 side approach not client ui. Ther is my three major problem :

 1) All files is stored on webdav server, but information about file name ,
 id file typ etc are stored into database (postgresql) so when i creating
 index i need to use both information. As a result of query i need only
 return file id from database. Summary content of file is stored in server
 but information about file is stored in database so we must retrieve both.

 2) Secondary problem it that  each file has a level of secrecy. But major
 problem is that this level is calculated dynamically. When calculating
 level
 of security for file we considering several properties. The static
 properties is files location, the folder in which the file is, but also
 dynamic  information  user profiles user roles and departments . So when
 user Maggie is logged she can search only files test.pdf , test2.doc
 etc but if user Stev is logged he have got different profiles such a
 Maggie so he can only search some phase in file broken.pdf,
 mybook.odt.
 test2.doc etc . . I think that when for example user search phase
 lucene +solr we search in all indexed documents and after that filtered
 result. But i think that solution is  is not very efficient. What if
 results
 count 100 files , so what next we filtered step by step each files  ? But
 i
 do not see any other solution. Maybe you can help me and lucene or solr
 have
 got mechanism to help.

 3) Last problem is that some files are encrypted. So that files must be
 indexed only once before encryption ! But i think that if we indexed
 secure
 files so we get security issue. Because all word from that file is
 tokenized.
 I have not got any idea haw to secure lucene documents and index datastore
 ?
 its possible ...


 Also i have got question that i need to use Solr for my serarch engine or
 using only lucene and write own search engine ? So as you can see i have
 not
 got problem with indexing , serching but with security files and files
 secured levels.

 Thanks for any hints and time you spend for me.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Indexing-file-with-security-problem-tp4073394.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Indexing file with security problem

2013-06-26 Thread lukasw
Hello

I'll try to briefly describe my problem and task.
My name is Lukas and i am Java developer , my task is to create search
engine for different types of file (only text file types) pdf, word, odf,
xml but not html.
I have got little experience with lucene about year ago i wrote simple full
text search using lucene and hibernate search. That was simple project. But
now i have got very difficult task with searching.
We are using java 1.7 and glassfish 3 and i have to concentrate only server
side approach not client ui. Ther is my three major problem :

1) All files is stored on webdav server, but information about file name ,
id file typ etc are stored into database (postgresql) so when i creating
index i need to use both information. As a result of query i need only
return file id from database. Summary content of file is stored in server
but information about file is stored in database so we must retrieve both.

2) Secondary problem it that  each file has a level of secrecy. But major
problem is that this level is calculated dynamically. When calculating level
of security for file we considering several properties. The static
properties is files location, the folder in which the file is, but also 
dynamic  information  user profiles user roles and departments . So when
user Maggie is logged she can search only files test.pdf , test2.doc
etc but if user Stev is logged he have got different profiles such a
Maggie so he can only search some phase in file broken.pdf, mybook.odt.
test2.doc etc . . I think that when for example user search phase
lucene +solr we search in all indexed documents and after that filtered
result. But i think that solution is  is not very efficient. What if results
count 100 files , so what next we filtered step by step each files  ? But i
do not see any other solution. Maybe you can help me and lucene or solr have
got mechanism to help.

3) Last problem is that some files are encrypted. So that files must be
indexed only once before encryption ! But i think that if we indexed secure
files so we get security issue. Because all word from that file is
tokenized.
I have not got any idea haw to secure lucene documents and index datastore ?
its possible ...


Also i have got question that i need to use Solr for my serarch engine or
using only lucene and write own search engine ? So as you can see i have not
got problem with indexing , serching but with security files and files
secured levels.

Thanks for any hints and time you spend for me. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-file-with-security-problem-tp4073394.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Indexing file with security problem

2013-06-26 Thread Otis Gospodnetic
Hi,

I would start from ManifoldCF - it may save you some work.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jun 26, 2013 5:01 PM, lukasw lukas...@gmail.com wrote:

 Hello

 I'll try to briefly describe my problem and task.
 My name is Lukas and i am Java developer , my task is to create search
 engine for different types of file (only text file types) pdf, word, odf,
 xml but not html.
 I have got little experience with lucene about year ago i wrote simple full
 text search using lucene and hibernate search. That was simple project. But
 now i have got very difficult task with searching.
 We are using java 1.7 and glassfish 3 and i have to concentrate only server
 side approach not client ui. Ther is my three major problem :

 1) All files is stored on webdav server, but information about file name ,
 id file typ etc are stored into database (postgresql) so when i creating
 index i need to use both information. As a result of query i need only
 return file id from database. Summary content of file is stored in server
 but information about file is stored in database so we must retrieve both.

 2) Secondary problem it that  each file has a level of secrecy. But major
 problem is that this level is calculated dynamically. When calculating
 level
 of security for file we considering several properties. The static
 properties is files location, the folder in which the file is, but also
 dynamic  information  user profiles user roles and departments . So when
 user Maggie is logged she can search only files test.pdf , test2.doc
 etc but if user Stev is logged he have got different profiles such a
 Maggie so he can only search some phase in file broken.pdf, mybook.odt.
 test2.doc etc . . I think that when for example user search phase
 lucene +solr we search in all indexed documents and after that filtered
 result. But i think that solution is  is not very efficient. What if
 results
 count 100 files , so what next we filtered step by step each files  ? But i
 do not see any other solution. Maybe you can help me and lucene or solr
 have
 got mechanism to help.

 3) Last problem is that some files are encrypted. So that files must be
 indexed only once before encryption ! But i think that if we indexed secure
 files so we get security issue. Because all word from that file is
 tokenized.
 I have not got any idea haw to secure lucene documents and index datastore
 ?
 its possible ...


 Also i have got question that i need to use Solr for my serarch engine or
 using only lucene and write own search engine ? So as you can see i have
 not
 got problem with indexing , serching but with security files and files
 secured levels.

 Thanks for any hints and time you spend for me.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Indexing-file-with-security-problem-tp4073394.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org