Re: Indexing file with security problem

2013-07-04 Thread Sanne Grinovero
To be honest I am not familiar with ManifoldCF, so I won't say if
Hibernate Search is better or not, but it would definitely not be too
hard with Hibernate Search:

1) You annotate with @Indexed the entity referring to your PostgreSQL
table containing the metadata; with @TikaBridge you point it to the
external resource containing the document.

Returning database ids is the default behaviour.

http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#d0e4244

2) Is a bit more complex but I don't think any more complex than what
it would be with other technologies: you should encode some
information in the index, then define a parametric filter on that.

http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#query-filter

3) Not sure, sorry. But the automatic indexing triggers happen as soon
as you store the metadata, so maybe that is good enough?

Looks interesting!

Sanne - Hibernate Search team


On 27 June 2013 03:14, Otis Gospodnetic  wrote:
> Hi,
>
> I would start from ManifoldCF - it may save you some work.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
>
> On Jun 26, 2013 5:01 PM, "lukasw"  wrote:
>>
>> Hello
>>
>> I'll try to briefly describe my problem and task.
>> My name is Lukas and i am Java developer , my task is to create search
>> engine for different types of file (only text file types) pdf, word, odf,
>> xml but not html.
>> I have got little experience with lucene about year ago i wrote simple
>> full
>> text search using lucene and hibernate search. That was simple project.
>> But
>> now i have got very difficult task with searching.
>> We are using java 1.7 and glassfish 3 and i have to concentrate only
>> server
>> side approach not client ui. Ther is my three major problem :
>>
>> 1) All files is stored on webdav server, but information about file name ,
>> id file typ etc are stored into database (postgresql) so when i creating
>> index i need to use both information. As a result of query i need only
>> return file id from database. Summary content of file is stored in server
>> but information about file is stored in database so we must retrieve both.
>>
>> 2) Secondary problem it that  each file has a level of secrecy. But major
>> problem is that this level is calculated dynamically. When calculating
>> level
>> of security for file we considering several properties. The static
>> properties is files location, the folder in which the file is, but also
>> dynamic  information  user profiles user roles and departments . So when
>> user "Maggie" is logged she can search only files "test.pdf" , "test2.doc"
>> etc but if user "Stev" is logged he have got different profiles such a
>> Maggie so he can only search some phase in file "broken.pdf",
>> "mybook.odt".
>> test2.doc etc . . I think that when for example user search phase
>> "lucene +solr" we search in all indexed documents and after that filtered
>> result. But i think that solution is  is not very efficient. What if
>> results
>> count 100 files , so what next we filtered step by step each files  ? But
>> i
>> do not see any other solution. Maybe you can help me and lucene or solr
>> have
>> got mechanism to help.
>>
>> 3) Last problem is that some files are encrypted. So that files must be
>> indexed only once before encryption ! But i think that if we indexed
>> secure
>> files so we get security issue. Because all word from that file is
>> tokenized.
>> I have not got any idea haw to secure lucene documents and index datastore
>> ?
>> its possible ...
>>
>>
>> Also i have got question that i need to use Solr for my serarch engine or
>> using only lucene and write own search engine ? So as you can see i have
>> not
>> got problem with indexing , serching but with security files and files
>> secured levels.
>>
>> Thanks for any hints and time you spend for me.
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Indexing-file-with-security-problem-tp4073394.html
>> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Indexing file with security problem

2013-06-26 Thread Otis Gospodnetic
Hi,

I would start from ManifoldCF - it may save you some work.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jun 26, 2013 5:01 PM, "lukasw"  wrote:

> Hello
>
> I'll try to briefly describe my problem and task.
> My name is Lukas and i am Java developer , my task is to create search
> engine for different types of file (only text file types) pdf, word, odf,
> xml but not html.
> I have got little experience with lucene about year ago i wrote simple full
> text search using lucene and hibernate search. That was simple project. But
> now i have got very difficult task with searching.
> We are using java 1.7 and glassfish 3 and i have to concentrate only server
> side approach not client ui. Ther is my three major problem :
>
> 1) All files is stored on webdav server, but information about file name ,
> id file typ etc are stored into database (postgresql) so when i creating
> index i need to use both information. As a result of query i need only
> return file id from database. Summary content of file is stored in server
> but information about file is stored in database so we must retrieve both.
>
> 2) Secondary problem it that  each file has a level of secrecy. But major
> problem is that this level is calculated dynamically. When calculating
> level
> of security for file we considering several properties. The static
> properties is files location, the folder in which the file is, but also
> dynamic  information  user profiles user roles and departments . So when
> user "Maggie" is logged she can search only files "test.pdf" , "test2.doc"
> etc but if user "Stev" is logged he have got different profiles such a
> Maggie so he can only search some phase in file "broken.pdf", "mybook.odt".
> test2.doc etc . . I think that when for example user search phase
> "lucene +solr" we search in all indexed documents and after that filtered
> result. But i think that solution is  is not very efficient. What if
> results
> count 100 files , so what next we filtered step by step each files  ? But i
> do not see any other solution. Maybe you can help me and lucene or solr
> have
> got mechanism to help.
>
> 3) Last problem is that some files are encrypted. So that files must be
> indexed only once before encryption ! But i think that if we indexed secure
> files so we get security issue. Because all word from that file is
> tokenized.
> I have not got any idea haw to secure lucene documents and index datastore
> ?
> its possible ...
>
>
> Also i have got question that i need to use Solr for my serarch engine or
> using only lucene and write own search engine ? So as you can see i have
> not
> got problem with indexing , serching but with security files and files
> secured levels.
>
> Thanks for any hints and time you spend for me.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-file-with-security-problem-tp4073394.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


Indexing file with security problem

2013-06-26 Thread lukasw
Hello

I'll try to briefly describe my problem and task.
My name is Lukas and i am Java developer , my task is to create search
engine for different types of file (only text file types) pdf, word, odf,
xml but not html.
I have got little experience with lucene about year ago i wrote simple full
text search using lucene and hibernate search. That was simple project. But
now i have got very difficult task with searching.
We are using java 1.7 and glassfish 3 and i have to concentrate only server
side approach not client ui. Ther is my three major problem :

1) All files is stored on webdav server, but information about file name ,
id file typ etc are stored into database (postgresql) so when i creating
index i need to use both information. As a result of query i need only
return file id from database. Summary content of file is stored in server
but information about file is stored in database so we must retrieve both.

2) Secondary problem it that  each file has a level of secrecy. But major
problem is that this level is calculated dynamically. When calculating level
of security for file we considering several properties. The static
properties is files location, the folder in which the file is, but also 
dynamic  information  user profiles user roles and departments . So when
user "Maggie" is logged she can search only files "test.pdf" , "test2.doc"
etc but if user "Stev" is logged he have got different profiles such a
Maggie so he can only search some phase in file "broken.pdf", "mybook.odt".
test2.doc etc . . I think that when for example user search phase
"lucene +solr" we search in all indexed documents and after that filtered
result. But i think that solution is  is not very efficient. What if results
count 100 files , so what next we filtered step by step each files  ? But i
do not see any other solution. Maybe you can help me and lucene or solr have
got mechanism to help.

3) Last problem is that some files are encrypted. So that files must be
indexed only once before encryption ! But i think that if we indexed secure
files so we get security issue. Because all word from that file is
tokenized.
I have not got any idea haw to secure lucene documents and index datastore ?
its possible ...


Also i have got question that i need to use Solr for my serarch engine or
using only lucene and write own search engine ? So as you can see i have not
got problem with indexing , serching but with security files and files
secured levels.

Thanks for any hints and time you spend for me. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-file-with-security-problem-tp4073394.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org