Re: indexing dovecot mailbox

2016-05-22 Thread Andreas Meyer
Hi!

I have no idea how to use the MailEntityProcessor or how to start it.

Greetings

  Andreas

Ahmet Arslan  schrieb am 22.05.16 um 00:52:59 Uhr:

> 
> 
> Hi,
> 
> You might be also interested in the MailEntityProcessor of DataImportHandler.
> 
> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors
> 
> 
> 
> On Sunday, May 22, 2016 3:46 AM, Ahmet Arslan  
> wrote:
> Hi Meyer,
> 
> Not sure what "mailbox of dovecot" is, but SimplePostTool can recognize 
> certain file types.
> They (xml,json,...,log) are actually listed in the log msg in your email.
> 
> Can you describe the format of the files that you want to index?
> Are they text files?
> 
> ahmet
> 
> 
> 
> 
> On Sunday, May 22, 2016 1:16 AM, Andreas Meyer  wrote:
> Hello!
> 
> Bear with me, I am new to solr and everything is very
> complex. Don't know how the thing is working.
> 
> I installed solr-5.5.1.tgz and got it running. Try to
> index a mailbox of dovecot with
> 
> # bin/post -c myfiles /home/a.meyer/Postfach
> 
> after I copied solr-schema.xml to /opt/solr/server/solr/myfiles/conf
> as schema.xml, but no files other than dovecot.index.log and 
> dovecot.mailbox.log
> are indexed.
> 
> # bin/post -c myfiles /home/a.meyer/Postfach
> /usr/lib64/jvm/jre/bin/java -classpath /opt/solr/dist/solr-core-5.5.1.jar 
> -Dauto=yes -Dc=myfiles -Ddata=files -Drecursive=yes 
> org.apache.solr.util.SimplePostTool /home/a.meyer/Postfach
> SimplePostTool version 5.0.0
> Posting files to [base] url http://localhost:8983/solr/myfiles/update...
> Entering auto mode. File endings considered are 
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> Entering recursive mode, max depth=999, delay=0s
> Indexing directory /home/a.meyer/Postfach (2 files, depth=0)
> POSTing file dovecot.index.log (text/plain) to [base]/extract
> POSTing file dovecot.mailbox.log (text/plain) to [base]/extract
> Indexing directory /home/a.meyer/Postfach/cur (0 files, depth=1)
> Indexing directory /home/a.meyer/Postfach/new (0 files, depth=1)
> Indexing directory /home/a.meyer/Postfach/tmp (0 files, depth=1)
> 2 files indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/myfiles/update...
> Time spent: 0:00:02.976
> 
> I was hoping the post command would index the email in 
> /home/a.meyer/Postfach/cur,
> but it doesn't. The content of this folder looks like this:
> 
> -rw--- 1 a.meyer users   4764 25. Apr 13:27 
> 1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
> -rw--- 1 a.meyer users 276318 26. Apr 17:48 
> 1461685694.Vfe03I1000f6M202284.bitmachine1:2,S
> -rw--- 1 a.meyer users   4578 27. Apr 17:16 
> 1461770179.Vfe03I10010aM756286.bitmachine1:2,S
> -rw--- 1 a.meyer users  16981  3. Mai 10:12 
> 1462263159.Vfe03I1000c5M88.bitmachine1:2,RS
> 
> What did I miss? Could need some help with this one.
> 
> Kind regards
> 
>   Andreas



Re: indexing dovecot mailbox

2016-05-22 Thread Ahmet Arslan
Hi Andreas,

Exactly, SimplePostTool does not recognize/support the file-ending.

If they are text files, you can change file exception to *.txt, post tool will 
grab them.

If you have some code to read those files, you can use SolrJ to roll your own 
indexer
https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

Sorry I am not familiar with these e-mail staff, may be Apache Tika can 
read/recognize these mail files.

Ahmet



On Sunday, May 22, 2016 1:14 PM, Andreas Meyer  wrote:
Hello!

The files I want to index are IMAP-folders of dovecot, Maildir.

bitmachine1:/home/a.meyer/Postfach/cur # file 
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S: SMTP mail, ASCII text

I can read them with the Midnight Commeander. Has it something to do
with the file-ending not recognized?

Andreas


Ahmet Arslan  schrieb am 22.05.16 um 00:46:32 Uhr:

> Hi Meyer,
> 
> Not sure what "mailbox of dovecot" is, but SimplePostTool can recognize 
> certain file types.
> They (xml,json,...,log) are actually listed in the log msg in your email.
> 
> Can you describe the format of the files that you want to index?
> Are they text files?
> 
> ahmet
> 
> 
> 
> On Sunday, May 22, 2016 1:16 AM, Andreas Meyer  wrote:
> Hello!
> 
> Bear with me, I am new to solr and everything is very
> complex. Don't know how the thing is working.
> 
> I installed solr-5.5.1.tgz and got it running. Try to
> index a mailbox of dovecot with
> 
> # bin/post -c myfiles /home/a.meyer/Postfach
> 
> after I copied solr-schema.xml to /opt/solr/server/solr/myfiles/conf
> as schema.xml, but no files other than dovecot.index.log and 
> dovecot.mailbox.log
> are indexed.
> 
> # bin/post -c myfiles /home/a.meyer/Postfach
> /usr/lib64/jvm/jre/bin/java -classpath /opt/solr/dist/solr-core-5.5.1.jar 
> -Dauto=yes -Dc=myfiles -Ddata=files -Drecursive=yes 
> org.apache.solr.util.SimplePostTool /home/a.meyer/Postfach
> SimplePostTool version 5.0.0
> Posting files to [base] url http://localhost:8983/solr/myfiles/update...
> Entering auto mode. File endings considered are 
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> Entering recursive mode, max depth=999, delay=0s
> Indexing directory /home/a.meyer/Postfach (2 files, depth=0)
> POSTing file dovecot.index.log (text/plain) to [base]/extract
> POSTing file dovecot.mailbox.log (text/plain) to [base]/extract
> Indexing directory /home/a.meyer/Postfach/cur (0 files, depth=1)
> Indexing directory /home/a.meyer/Postfach/new (0 files, depth=1)
> Indexing directory /home/a.meyer/Postfach/tmp (0 files, depth=1)
> 2 files indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/myfiles/update...
> Time spent: 0:00:02.976
> 
> I was hoping the post command would index the email in 
> /home/a.meyer/Postfach/cur,
> but it doesn't. The content of this folder looks like this:
> 
> -rw--- 1 a.meyer users   4764 25. Apr 13:27 
> 1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
> -rw--- 1 a.meyer users 276318 26. Apr 17:48 
> 1461685694.Vfe03I1000f6M202284.bitmachine1:2,S
> -rw--- 1 a.meyer users   4578 27. Apr 17:16 
> 1461770179.Vfe03I10010aM756286.bitmachine1:2,S
> -rw--- 1 a.meyer users  16981  3. Mai 10:12 
> 1462263159.Vfe03I1000c5M88.bitmachine1:2,RS
> 
> What did I miss? Could need some help with this one.
> 
> Kind regards
> 
>   Andreas


Re: indexing dovecot mailbox

2016-05-22 Thread Andreas Meyer
Hello!

The files I want to index are IMAP-folders of dovecot, Maildir.

bitmachine1:/home/a.meyer/Postfach/cur # file 
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S: SMTP mail, ASCII text

I can read them with the Midnight Commeander. Has it something to do
with the file-ending not recognized?

 Andreas

Ahmet Arslan  schrieb am 22.05.16 um 00:46:32 Uhr:

> Hi Meyer,
> 
> Not sure what "mailbox of dovecot" is, but SimplePostTool can recognize 
> certain file types.
> They (xml,json,...,log) are actually listed in the log msg in your email.
> 
> Can you describe the format of the files that you want to index?
> Are they text files?
> 
> ahmet
> 
> 
> 
> On Sunday, May 22, 2016 1:16 AM, Andreas Meyer  wrote:
> Hello!
> 
> Bear with me, I am new to solr and everything is very
> complex. Don't know how the thing is working.
> 
> I installed solr-5.5.1.tgz and got it running. Try to
> index a mailbox of dovecot with
> 
> # bin/post -c myfiles /home/a.meyer/Postfach
> 
> after I copied solr-schema.xml to /opt/solr/server/solr/myfiles/conf
> as schema.xml, but no files other than dovecot.index.log and 
> dovecot.mailbox.log
> are indexed.
> 
> # bin/post -c myfiles /home/a.meyer/Postfach
> /usr/lib64/jvm/jre/bin/java -classpath /opt/solr/dist/solr-core-5.5.1.jar 
> -Dauto=yes -Dc=myfiles -Ddata=files -Drecursive=yes 
> org.apache.solr.util.SimplePostTool /home/a.meyer/Postfach
> SimplePostTool version 5.0.0
> Posting files to [base] url http://localhost:8983/solr/myfiles/update...
> Entering auto mode. File endings considered are 
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> Entering recursive mode, max depth=999, delay=0s
> Indexing directory /home/a.meyer/Postfach (2 files, depth=0)
> POSTing file dovecot.index.log (text/plain) to [base]/extract
> POSTing file dovecot.mailbox.log (text/plain) to [base]/extract
> Indexing directory /home/a.meyer/Postfach/cur (0 files, depth=1)
> Indexing directory /home/a.meyer/Postfach/new (0 files, depth=1)
> Indexing directory /home/a.meyer/Postfach/tmp (0 files, depth=1)
> 2 files indexed.
> COMMITting Solr index changes to http://localhost:8983/solr/myfiles/update...
> Time spent: 0:00:02.976
> 
> I was hoping the post command would index the email in 
> /home/a.meyer/Postfach/cur,
> but it doesn't. The content of this folder looks like this:
> 
> -rw--- 1 a.meyer users   4764 25. Apr 13:27 
> 1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
> -rw--- 1 a.meyer users 276318 26. Apr 17:48 
> 1461685694.Vfe03I1000f6M202284.bitmachine1:2,S
> -rw--- 1 a.meyer users   4578 27. Apr 17:16 
> 1461770179.Vfe03I10010aM756286.bitmachine1:2,S
> -rw--- 1 a.meyer users  16981  3. Mai 10:12 
> 1462263159.Vfe03I1000c5M88.bitmachine1:2,RS
> 
> What did I miss? Could need some help with this one.
> 
> Kind regards
> 
>   Andreas



Re: indexing dovecot mailbox

2016-05-21 Thread Ahmet Arslan


Hi,

You might be also interested in the MailEntityProcessor of DataImportHandler.

https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors



On Sunday, May 22, 2016 3:46 AM, Ahmet Arslan  wrote:
Hi Meyer,

Not sure what "mailbox of dovecot" is, but SimplePostTool can recognize certain 
file types.
They (xml,json,...,log) are actually listed in the log msg in your email.

Can you describe the format of the files that you want to index?
Are they text files?

ahmet




On Sunday, May 22, 2016 1:16 AM, Andreas Meyer  wrote:
Hello!

Bear with me, I am new to solr and everything is very
complex. Don't know how the thing is working.

I installed solr-5.5.1.tgz and got it running. Try to
index a mailbox of dovecot with

# bin/post -c myfiles /home/a.meyer/Postfach

after I copied solr-schema.xml to /opt/solr/server/solr/myfiles/conf
as schema.xml, but no files other than dovecot.index.log and dovecot.mailbox.log
are indexed.

# bin/post -c myfiles /home/a.meyer/Postfach
/usr/lib64/jvm/jre/bin/java -classpath /opt/solr/dist/solr-core-5.5.1.jar 
-Dauto=yes -Dc=myfiles -Ddata=files -Drecursive=yes 
org.apache.solr.util.SimplePostTool /home/a.meyer/Postfach
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/myfiles/update...
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory /home/a.meyer/Postfach (2 files, depth=0)
POSTing file dovecot.index.log (text/plain) to [base]/extract
POSTing file dovecot.mailbox.log (text/plain) to [base]/extract
Indexing directory /home/a.meyer/Postfach/cur (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/new (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/tmp (0 files, depth=1)
2 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/myfiles/update...
Time spent: 0:00:02.976

I was hoping the post command would index the email in 
/home/a.meyer/Postfach/cur,
but it doesn't. The content of this folder looks like this:

-rw--- 1 a.meyer users   4764 25. Apr 13:27 
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
-rw--- 1 a.meyer users 276318 26. Apr 17:48 
1461685694.Vfe03I1000f6M202284.bitmachine1:2,S
-rw--- 1 a.meyer users   4578 27. Apr 17:16 
1461770179.Vfe03I10010aM756286.bitmachine1:2,S
-rw--- 1 a.meyer users  16981  3. Mai 10:12 
1462263159.Vfe03I1000c5M88.bitmachine1:2,RS

What did I miss? Could need some help with this one.

Kind regards

  Andreas


Re: indexing dovecot mailbox

2016-05-21 Thread Ahmet Arslan
Hi Meyer,

Not sure what "mailbox of dovecot" is, but SimplePostTool can recognize certain 
file types.
They (xml,json,...,log) are actually listed in the log msg in your email.

Can you describe the format of the files that you want to index?
Are they text files?

ahmet



On Sunday, May 22, 2016 1:16 AM, Andreas Meyer  wrote:
Hello!

Bear with me, I am new to solr and everything is very
complex. Don't know how the thing is working.

I installed solr-5.5.1.tgz and got it running. Try to
index a mailbox of dovecot with

# bin/post -c myfiles /home/a.meyer/Postfach

after I copied solr-schema.xml to /opt/solr/server/solr/myfiles/conf
as schema.xml, but no files other than dovecot.index.log and dovecot.mailbox.log
are indexed.

# bin/post -c myfiles /home/a.meyer/Postfach
/usr/lib64/jvm/jre/bin/java -classpath /opt/solr/dist/solr-core-5.5.1.jar 
-Dauto=yes -Dc=myfiles -Ddata=files -Drecursive=yes 
org.apache.solr.util.SimplePostTool /home/a.meyer/Postfach
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/myfiles/update...
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory /home/a.meyer/Postfach (2 files, depth=0)
POSTing file dovecot.index.log (text/plain) to [base]/extract
POSTing file dovecot.mailbox.log (text/plain) to [base]/extract
Indexing directory /home/a.meyer/Postfach/cur (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/new (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/tmp (0 files, depth=1)
2 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/myfiles/update...
Time spent: 0:00:02.976

I was hoping the post command would index the email in 
/home/a.meyer/Postfach/cur,
but it doesn't. The content of this folder looks like this:

-rw--- 1 a.meyer users   4764 25. Apr 13:27 
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
-rw--- 1 a.meyer users 276318 26. Apr 17:48 
1461685694.Vfe03I1000f6M202284.bitmachine1:2,S
-rw--- 1 a.meyer users   4578 27. Apr 17:16 
1461770179.Vfe03I10010aM756286.bitmachine1:2,S
-rw--- 1 a.meyer users  16981  3. Mai 10:12 
1462263159.Vfe03I1000c5M88.bitmachine1:2,RS

What did I miss? Could need some help with this one.

Kind regards

  Andreas


indexing dovecot mailbox

2016-05-21 Thread Andreas Meyer
Hello!

Bear with me, I am new to solr and everything is very
complex. Don't know how the thing is working.

I installed solr-5.5.1.tgz and got it running. Try to
index a mailbox of dovecot with

# bin/post -c myfiles /home/a.meyer/Postfach

after I copied solr-schema.xml to /opt/solr/server/solr/myfiles/conf
as schema.xml, but no files other than dovecot.index.log and dovecot.mailbox.log
are indexed.

# bin/post -c myfiles /home/a.meyer/Postfach
/usr/lib64/jvm/jre/bin/java -classpath /opt/solr/dist/solr-core-5.5.1.jar 
-Dauto=yes -Dc=myfiles -Ddata=files -Drecursive=yes 
org.apache.solr.util.SimplePostTool /home/a.meyer/Postfach
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/myfiles/update...
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory /home/a.meyer/Postfach (2 files, depth=0)
POSTing file dovecot.index.log (text/plain) to [base]/extract
POSTing file dovecot.mailbox.log (text/plain) to [base]/extract
Indexing directory /home/a.meyer/Postfach/cur (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/new (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/tmp (0 files, depth=1)
2 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/myfiles/update...
Time spent: 0:00:02.976

I was hoping the post command would index the email in 
/home/a.meyer/Postfach/cur,
but it doesn't. The content of this folder looks like this:

-rw--- 1 a.meyer users   4764 25. Apr 13:27 
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
-rw--- 1 a.meyer users 276318 26. Apr 17:48 
1461685694.Vfe03I1000f6M202284.bitmachine1:2,S
-rw--- 1 a.meyer users   4578 27. Apr 17:16 
1461770179.Vfe03I10010aM756286.bitmachine1:2,S
-rw--- 1 a.meyer users  16981  3. Mai 10:12 
1462263159.Vfe03I1000c5M88.bitmachine1:2,RS

What did I miss? Could need some help with this one.

Kind regards

  Andreas