Re: indexing dovecot mailbox

2016-05-21 Thread Ahmet Arslan


Hi,

You might be also interested in the MailEntityProcessor of DataImportHandler.

https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors



On Sunday, May 22, 2016 3:46 AM, Ahmet Arslan  wrote:
Hi Meyer,

Not sure what "mailbox of dovecot" is, but SimplePostTool can recognize certain 
file types.
They (xml,json,...,log) are actually listed in the log msg in your email.

Can you describe the format of the files that you want to index?
Are they text files?

ahmet




On Sunday, May 22, 2016 1:16 AM, Andreas Meyer  wrote:
Hello!

Bear with me, I am new to solr and everything is very
complex. Don't know how the thing is working.

I installed solr-5.5.1.tgz and got it running. Try to
index a mailbox of dovecot with

# bin/post -c myfiles /home/a.meyer/Postfach

after I copied solr-schema.xml to /opt/solr/server/solr/myfiles/conf
as schema.xml, but no files other than dovecot.index.log and dovecot.mailbox.log
are indexed.

# bin/post -c myfiles /home/a.meyer/Postfach
/usr/lib64/jvm/jre/bin/java -classpath /opt/solr/dist/solr-core-5.5.1.jar 
-Dauto=yes -Dc=myfiles -Ddata=files -Drecursive=yes 
org.apache.solr.util.SimplePostTool /home/a.meyer/Postfach
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/myfiles/update...
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory /home/a.meyer/Postfach (2 files, depth=0)
POSTing file dovecot.index.log (text/plain) to [base]/extract
POSTing file dovecot.mailbox.log (text/plain) to [base]/extract
Indexing directory /home/a.meyer/Postfach/cur (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/new (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/tmp (0 files, depth=1)
2 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/myfiles/update...
Time spent: 0:00:02.976

I was hoping the post command would index the email in 
/home/a.meyer/Postfach/cur,
but it doesn't. The content of this folder looks like this:

-rw--- 1 a.meyer users   4764 25. Apr 13:27 
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
-rw--- 1 a.meyer users 276318 26. Apr 17:48 
1461685694.Vfe03I1000f6M202284.bitmachine1:2,S
-rw--- 1 a.meyer users   4578 27. Apr 17:16 
1461770179.Vfe03I10010aM756286.bitmachine1:2,S
-rw--- 1 a.meyer users  16981  3. Mai 10:12 
1462263159.Vfe03I1000c5M88.bitmachine1:2,RS

What did I miss? Could need some help with this one.

Kind regards

  Andreas


Re: indexing dovecot mailbox

2016-05-21 Thread Ahmet Arslan
Hi Meyer,

Not sure what "mailbox of dovecot" is, but SimplePostTool can recognize certain 
file types.
They (xml,json,...,log) are actually listed in the log msg in your email.

Can you describe the format of the files that you want to index?
Are they text files?

ahmet



On Sunday, May 22, 2016 1:16 AM, Andreas Meyer  wrote:
Hello!

Bear with me, I am new to solr and everything is very
complex. Don't know how the thing is working.

I installed solr-5.5.1.tgz and got it running. Try to
index a mailbox of dovecot with

# bin/post -c myfiles /home/a.meyer/Postfach

after I copied solr-schema.xml to /opt/solr/server/solr/myfiles/conf
as schema.xml, but no files other than dovecot.index.log and dovecot.mailbox.log
are indexed.

# bin/post -c myfiles /home/a.meyer/Postfach
/usr/lib64/jvm/jre/bin/java -classpath /opt/solr/dist/solr-core-5.5.1.jar 
-Dauto=yes -Dc=myfiles -Ddata=files -Drecursive=yes 
org.apache.solr.util.SimplePostTool /home/a.meyer/Postfach
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/myfiles/update...
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory /home/a.meyer/Postfach (2 files, depth=0)
POSTing file dovecot.index.log (text/plain) to [base]/extract
POSTing file dovecot.mailbox.log (text/plain) to [base]/extract
Indexing directory /home/a.meyer/Postfach/cur (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/new (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/tmp (0 files, depth=1)
2 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/myfiles/update...
Time spent: 0:00:02.976

I was hoping the post command would index the email in 
/home/a.meyer/Postfach/cur,
but it doesn't. The content of this folder looks like this:

-rw--- 1 a.meyer users   4764 25. Apr 13:27 
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
-rw--- 1 a.meyer users 276318 26. Apr 17:48 
1461685694.Vfe03I1000f6M202284.bitmachine1:2,S
-rw--- 1 a.meyer users   4578 27. Apr 17:16 
1461770179.Vfe03I10010aM756286.bitmachine1:2,S
-rw--- 1 a.meyer users  16981  3. Mai 10:12 
1462263159.Vfe03I1000c5M88.bitmachine1:2,RS

What did I miss? Could need some help with this one.

Kind regards

  Andreas


indexing dovecot mailbox

2016-05-21 Thread Andreas Meyer
Hello!

Bear with me, I am new to solr and everything is very
complex. Don't know how the thing is working.

I installed solr-5.5.1.tgz and got it running. Try to
index a mailbox of dovecot with

# bin/post -c myfiles /home/a.meyer/Postfach

after I copied solr-schema.xml to /opt/solr/server/solr/myfiles/conf
as schema.xml, but no files other than dovecot.index.log and dovecot.mailbox.log
are indexed.

# bin/post -c myfiles /home/a.meyer/Postfach
/usr/lib64/jvm/jre/bin/java -classpath /opt/solr/dist/solr-core-5.5.1.jar 
-Dauto=yes -Dc=myfiles -Ddata=files -Drecursive=yes 
org.apache.solr.util.SimplePostTool /home/a.meyer/Postfach
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/myfiles/update...
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
Entering recursive mode, max depth=999, delay=0s
Indexing directory /home/a.meyer/Postfach (2 files, depth=0)
POSTing file dovecot.index.log (text/plain) to [base]/extract
POSTing file dovecot.mailbox.log (text/plain) to [base]/extract
Indexing directory /home/a.meyer/Postfach/cur (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/new (0 files, depth=1)
Indexing directory /home/a.meyer/Postfach/tmp (0 files, depth=1)
2 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/myfiles/update...
Time spent: 0:00:02.976

I was hoping the post command would index the email in 
/home/a.meyer/Postfach/cur,
but it doesn't. The content of this folder looks like this:

-rw--- 1 a.meyer users   4764 25. Apr 13:27 
1461583672.Vfe03I1000f4M981621.bitmachine1:2,S
-rw--- 1 a.meyer users 276318 26. Apr 17:48 
1461685694.Vfe03I1000f6M202284.bitmachine1:2,S
-rw--- 1 a.meyer users   4578 27. Apr 17:16 
1461770179.Vfe03I10010aM756286.bitmachine1:2,S
-rw--- 1 a.meyer users  16981  3. Mai 10:12 
1462263159.Vfe03I1000c5M88.bitmachine1:2,RS

What did I miss? Could need some help with this one.

Kind regards

  Andreas


Re: Parallel SQL doesn't support >, >=, <, <= syntax?

2016-05-21 Thread Joel Bernstein
Also agreed we should throw an exception in this scenario until it's
implemented.

Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, May 21, 2016 at 5:34 PM, Joel Bernstein  wrote:

> Yes, currently only Solr range syntax is supported. The SQL greater, less
> than syntax isn't yet supported.
>
> This isn't explicitly stated in the docs, which would be a good first step.
>
> Supporting SQL greater and less then predicates should not be too
> difficult. Feel free to create a jira ticket for this.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sat, May 21, 2016 at 10:55 AM, Timothy Potter 
> wrote:
>
>> this gives expected result:
>>
>>  SELECT title_s, COUNT(*) as cnt
>> FROM movielens
>>  WHERE genre_ss='action' AND rating_i='[4 TO 5]'
>> GROUP BY title_s
>> ORDER BY cnt desc
>>  LIMIT 5
>>
>> but using >= 4 doesn't give same results (my ratings are 1-5):
>>
>>   SELECT title_s, COUNT(*) as cnt
>>  FROM movielens
>>   WHERE genre_ss='action' AND rating_i >= 4
>> GROUP BY title_s
>> ORDER BY cnt desc
>>   LIMIT 5
>>
>> on the Solr side, I see queries forumlated as:
>>
>> 2016-05-21 14:53:43.096 INFO  (qtp1435804085-1419) [c:movielens
>> s:shard1 r:core_node1 x:movielens_shard1_replica1] o.a.s.c.S.Request
>> [movielens_shard1_replica1]  webapp=/solr path=/export
>>
>> params={q=((genre_ss:"action")+AND+(rating_i:"4"))=false=title_s=title_s+desc=json=2.2}
>> hits=2044 status=0 QTime=0
>>
>> which is obviously wrong ... known issue or should I open a JIRA?
>>
>> In general, rather than crafting an incorrect query that gives the
>> wrong results, we should throw an exception stating that the syntax is
>> not supported.
>>
>
>


Re: Parallel SQL doesn't support >, >=, <, <= syntax?

2016-05-21 Thread Joel Bernstein
Yes, currently only Solr range syntax is supported. The SQL greater, less
than syntax isn't yet supported.

This isn't explicitly stated in the docs, which would be a good first step.

Supporting SQL greater and less then predicates should not be too
difficult. Feel free to create a jira ticket for this.


Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, May 21, 2016 at 10:55 AM, Timothy Potter 
wrote:

> this gives expected result:
>
>  SELECT title_s, COUNT(*) as cnt
> FROM movielens
>  WHERE genre_ss='action' AND rating_i='[4 TO 5]'
> GROUP BY title_s
> ORDER BY cnt desc
>  LIMIT 5
>
> but using >= 4 doesn't give same results (my ratings are 1-5):
>
>   SELECT title_s, COUNT(*) as cnt
>  FROM movielens
>   WHERE genre_ss='action' AND rating_i >= 4
> GROUP BY title_s
> ORDER BY cnt desc
>   LIMIT 5
>
> on the Solr side, I see queries forumlated as:
>
> 2016-05-21 14:53:43.096 INFO  (qtp1435804085-1419) [c:movielens
> s:shard1 r:core_node1 x:movielens_shard1_replica1] o.a.s.c.S.Request
> [movielens_shard1_replica1]  webapp=/solr path=/export
>
> params={q=((genre_ss:"action")+AND+(rating_i:"4"))=false=title_s=title_s+desc=json=2.2}
> hits=2044 status=0 QTime=0
>
> which is obviously wrong ... known issue or should I open a JIRA?
>
> In general, rather than crafting an incorrect query that gives the
> wrong results, we should throw an exception stating that the syntax is
> not supported.
>


Re: Information for solr-user@lucene.apache.org

2016-05-21 Thread Shawn Heisey
On 5/21/2016 7:09 AM, Carl Roberts wrote:
> And, these response are just weird.  Do they mean this user list is
> obsolete?  is solr no longer supported via a user list where we can
> ask questions?

You received one other reply, but that reply was sent to the list, and I
do not know if you are subscribed or not.  I am sending this directly to
you as well as to the list.

Your message was sent successfully, as evidenced by the fact that I am
replying to it.

This is a very active list.  So far I have received 798 messages in May
2016 alone.  There are a lot of subscribers, but I do not know what the
exact number is.

The message I am replying to here *did* go to the list, but I do not
know how to tell whether it was sent from a subscribed address, or
manually released to the list by a moderator.  Reading the response you
got from ezmlm that you forwarded, it sounds like you tried to send your
previous message to both of these addresses, which will not get it to
the list:

solr-user-i...@lucene.apache.org
solr-user-ow...@lucene.apache.org

I'm not sure how you inferred that the list is obsolete from the
response you included.  I don't see anything in that response to
indicate that it's dead.

To make it to the list without human intervention, your message must be
sent from a subscribed address to solr-user@lucene.apache.org.  All the
details you need to get subscribed to Solr mailing lists is here:

http://lucene.apache.org/solr/resources.html#mailing-lists

Thanks,
Shawn



Parallel SQL doesn't support >, >=, <, <= syntax?

2016-05-21 Thread Timothy Potter
this gives expected result:

 SELECT title_s, COUNT(*) as cnt
FROM movielens
 WHERE genre_ss='action' AND rating_i='[4 TO 5]'
GROUP BY title_s
ORDER BY cnt desc
 LIMIT 5

but using >= 4 doesn't give same results (my ratings are 1-5):

  SELECT title_s, COUNT(*) as cnt
 FROM movielens
  WHERE genre_ss='action' AND rating_i >= 4
GROUP BY title_s
ORDER BY cnt desc
  LIMIT 5

on the Solr side, I see queries forumlated as:

2016-05-21 14:53:43.096 INFO  (qtp1435804085-1419) [c:movielens
s:shard1 r:core_node1 x:movielens_shard1_replica1] o.a.s.c.S.Request
[movielens_shard1_replica1]  webapp=/solr path=/export
params={q=((genre_ss:"action")+AND+(rating_i:"4"))=false=title_s=title_s+desc=json=2.2}
hits=2044 status=0 QTime=0

which is obviously wrong ... known issue or should I open a JIRA?

In general, rather than crafting an incorrect query that gives the
wrong results, we should throw an exception stating that the syntax is
not supported.


Re: Information for solr-user@lucene.apache.org

2016-05-21 Thread Andrea Gazzarini
Hi Carl,
This address is valid, any subscribed user received a copy of your email.

solr-user@lucene.apache.org

Andrea
On 21 May 2016 15:10, "Carl Roberts"  wrote:

> And, these response are just weird.  Do they mean this user list is
> obsolete?  is solr no longer supported via a user list where we can ask
> questions?
>
> On 5/21/16 9:08 AM, solr-user-h...@lucene.apache.org wrote:
>
>> Hi! This is the ezmlm program. I'm managing the
>> solr-user@lucene.apache.org mailing list.
>>
>> I'm working for my owner, who can be reached
>> at solr-user-ow...@lucene.apache.org.
>>
>> No information has been provided for this list.
>>
>> --- Administrative commands for the solr-user list ---
>>
>> I can handle administrative requests automatically. Please
>> do not send them to the list address! Instead, send
>> your message to the correct command address:
>>
>> To subscribe to the list, send a message to:
>> 
>>
>> To remove your address from the list, send a message to:
>> 
>>
>> Send mail to the following for info and FAQ for this list:
>> 
>> 
>>
>> Similar addresses exist for the digest list:
>> 
>> 
>>
>> To get messages 123 through 145 (a maximum of 100 per request), mail:
>> 
>>
>> To get an index with subject and author for messages 123-456 , mail:
>> 
>>
>> They are always returned as sets of 100, max 2000 per request,
>> so you'll actually get 100-499.
>>
>> To receive all messages with the same subject as message 12345,
>> send a short message to:
>> 
>>
>> The messages should contain one line or word of text to avoid being
>> treated as sp@m, but I will ignore their content.
>> Only the ADDRESS you send to is important.
>>
>> You can start a subscription for an alternate address,
>> for example "john@host.domain", just add a hyphen and your
>> address (with '=' instead of '@') after the command word:
>> 

Re: Information for solr-user@lucene.apache.org

2016-05-21 Thread Carl Roberts
Let's try this one (solr-user-digest-subscr...@lucene.apache.org) - 
maybe a real person will answer there.


On 5/21/16 9:09 AM, Carl Roberts wrote:
And, these responses are just wierd.  Do they mean this user list is 
obsolete?  Is solr no longer supported via a user list where we can 
ask questions?


On 5/21/16 9:08 AM, solr-user-h...@lucene.apache.org wrote:

Hi! This is the ezmlm program. I'm managing the
solr-user@lucene.apache.org mailing list.

I'm working for my owner, who can be reached
at solr-user-ow...@lucene.apache.org.

No information has been provided for this list.

--- Administrative commands for the solr-user list ---

I can handle administrative requests automatically. Please
do not send them to the list address! Instead, send
your message to the correct command address:

To subscribe to the list, send a message to:


To remove your address from the list, send a message to:


Send mail to the following for info and FAQ for this list:



Similar addresses exist for the digest list:



To get messages 123 through 145 (a maximum of 100 per request), mail:


To get an index with subject and author for messages 123-456 , mail:


They are always returned as sets of 100, max 2000 per request,
so you'll actually get 100-499.

To receive all messages with the same subject as message 12345,
send a short message to:


The messages should contain one line or word of text to avoid being
treated as sp@m, but I will ignore their content.
Only the ADDRESS you send to is important.

You can start a subscription for an alternate address,
for example "john@host.domain", just add a hyphen and your
address (with '=' instead of '@') after the command word:

Re: Information for solr-user@lucene.apache.org

2016-05-21 Thread Carl Roberts
And, these response are just weird.  Do they mean this user list is 
obsolete?  is solr no longer supported via a user list where we can ask 
questions?


On 5/21/16 9:08 AM, solr-user-h...@lucene.apache.org wrote:

Hi! This is the ezmlm program. I'm managing the
solr-user@lucene.apache.org mailing list.

I'm working for my owner, who can be reached
at solr-user-ow...@lucene.apache.org.

No information has been provided for this list.

--- Administrative commands for the solr-user list ---

I can handle administrative requests automatically. Please
do not send them to the list address! Instead, send
your message to the correct command address:

To subscribe to the list, send a message to:


To remove your address from the list, send a message to:


Send mail to the following for info and FAQ for this list:



Similar addresses exist for the digest list:



To get messages 123 through 145 (a maximum of 100 per request), mail:


To get an index with subject and author for messages 123-456 , mail:


They are always returned as sets of 100, max 2000 per request,
so you'll actually get 100-499.

To receive all messages with the same subject as message 12345,
send a short message to:


The messages should contain one line or word of text to avoid being
treated as sp@m, but I will ignore their content.
Only the ADDRESS you send to is important.

You can start a subscription for an alternate address,
for example "john@host.domain", just add a hyphen and your
address (with '=' instead of '@') after the command word:

Re: Solr join between documents

2016-05-21 Thread elisabeth benoit
Ok, thanks for your answer! That's what I thought but just wanted to be
sure.

Best regards,
Elisabeth

2016-05-21 2:02 GMT+02:00 Erick Erickson :

> Gosh, I'm not even sure how to start to form such a query.
>
> Let's see, you have StreetB in some city identified by postal code P.
>
> Is what you're wanting "return me all pairs of documents within that
> postal code that have all the terms matching and the polygons enclosing
> those streets plus some distance intersect"?
>
> Seems difficult.
>
> Best,
> Erick
>
> On Thu, May 19, 2016 at 8:35 AM, elisabeth benoit
>  wrote:
> > Hello all,
> >
> > I was wondering if there was a solr solution for a problem I have (and
> I'm
> > not the only one I guess)
> >
> > We use solr as a search engine for addresses. We sometimes have requests
> > with let's say for instance
> >
> > street A close to street B City postcode
> >
> > I was wondering if some kind of join between two documents is possible in
> > solr?
> >
> > The query would be: find union of two documents matching all words in
> query.
> >
> > Those documents have a latitude and a longitude, and we would fix a max
> > distance between two documents to be eligible for a join.
> >
> > Is there a way to do this?
> >
> > Best regards,
> > Elisabeth
>


How to enable JMX to monitor Jetty

2016-05-21 Thread Georg Sorst
Hi list,

how do I correctly enable JMX in Solr 6 so that I can monitor Jetty's
thread pool?

The first step is to set ENABLE_REMOTE_JMX_OPTS="true" in bin/solr.in.sh.
This will give me JMX access to JVM properties (garbage collection, class
loading etc.) and works fine. However, this will not give me any Jetty
specific properties.

I've tried manually adding jetty-jmx.xml from the jetty 9 distribution to
server/etc/ and then starting Solr with 'java ... start.jar
etc/jetty-jmx.xml'. This works fine and gives me access to the right
properties, but seems wrong. I could similarly copy the contents of
jetty-jmx.xml into jetty.xml but this is not much better either.

Is there a correct way for this?

Thanks!
Georg


ngroup for MLT results

2016-05-21 Thread Zheng Lin Edwin Yeo
Hi,

Would like to check, is there a way to do ngrouping for MLT queries?
I tried the following query with the MoreLikeThisHandler, but I could not
get any ngroup results.

http://localhost:8983/solr/collection1/mlt?q=testing={ngroups:%22unique(signature)%22}=0

I can only get the ngroup results like this when I use the normal
SearchHandler

  "facets":{
"count":15177209,
"ngroups":9181621}}


Regards,
Edwin


Re: Sorting for MLT results

2016-05-21 Thread Zheng Lin Edwin Yeo
Thanks for your reply.

But it didn't work when I execute this query.
http://localhost:8983/solr/collection1/mlt?q=testing=creation_date
desc,id asc=10

This is my configuration for the handler.

  explicit 10
json true 
edismax id, score  content subject content 2 5 3 25 10 false details  

Regards,
Edwin


On 17 May 2016 at 22:35, Alessandro Benedetti  wrote:

> using the more like this query parser should solve your problem !
> Just use that query parser and than sort as usual.
>
> Cheers
>
> On Wed, May 11, 2016 at 4:53 AM, Zheng Lin Edwin Yeo  >
> wrote:
>
> > Hi,
> >
> > Would like to check, is there a function to do the sorting for MLT
> results
> > in Solr? I understand that there is a sort function, but that only works
> > for the main query results. It does not do any sorting for the MLT
> results
> >
> > I'm using Solr 5.4.0.
> >
> > Regards,
> > Edwin
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>