Solr Sorting Caching

2012-09-11 Thread Amey Patil
Our solr index (Solr 3.4) has over 100 million docuemnts.
We frequently fire one type of query on this index to get documents, do
some processing and dump in another index.

Query is of the form -
*((keyword1 AND keyword2...) OR (keyword3 AND keyword4...) OR ...) AND
date:[date1 TO *]*
No. of keywords can be in the range of 100 - 1000.
We are adding sort parameter *'date asc'*.
The keyword part of the query changes very rarely but date part always
changes.

Now there are mainly 2 problems,
1) Query takes too much time.
2) Sometimes when 'numFound' is very large for a query, It gives OOM error
(I guess this is because of sort).

We are not using any type of caching yet.
Will caching be helpful to solve these problems?
If yes, what type of cache or caching configuration is suitable to start
with?


Re: SolrCloud and grouping

2012-09-11 Thread Pavel Goncharik
Apparently https://issues.apache.org/jira/browse/SOLR-2592 will help you out.
Unfortunately, it seems that this issue will not be included into Solr
4.0 release.

I'm wondering myself if there are any plans to commit and release this
issue or an equivalent, to give users control over partititioning in
the SolrCloud?

On Tue, Sep 11, 2012 at 7:00 AM, Nikhil Chhaochharia
nikhil...@yahoo.com wrote:
 Hi,

 I am trying out SolrCloud using a recent Solr 4 nightly.  We use result 
 grouping (FieldCollapsing) and found that the value of ngroups returned by 
 Solr is not correct.

 My understanding is that all the documents belonging to the same group should 
 be on the same shard to ensure that ngroups returns the correct value.  
 However, the shard that a document is sent to is decided automatically based 
 on the value of the uniqueKey field.

 Is it possible for Solr to hash fieldX instead of the uniqueKey while 
 distributing the documents to the different shards? Is there some other way 
 of getting accurate values of ngroups when using SolrCloud?

 Thanks,
 Nikhil


Re: SolrCloud and grouping

2012-09-11 Thread Mark Miller
Yes, , we will offer something for this - just a matter of priorities
for the 4 release. My current priority is heavily bug side at the
moment personally.

It's more likely in 4.1 or 4.2 or something.

- Mark

On Tue, Sep 11, 2012 at 2:37 AM, Pavel Goncharik
pavel.goncha...@gmail.com wrote:
 Apparently https://issues.apache.org/jira/browse/SOLR-2592 will help you out.
 Unfortunately, it seems that this issue will not be included into Solr
 4.0 release.

 I'm wondering myself if there are any plans to commit and release this
 issue or an equivalent, to give users control over partititioning in
 the SolrCloud?

 On Tue, Sep 11, 2012 at 7:00 AM, Nikhil Chhaochharia
 nikhil...@yahoo.com wrote:
 Hi,

 I am trying out SolrCloud using a recent Solr 4 nightly.  We use result 
 grouping (FieldCollapsing) and found that the value of ngroups returned by 
 Solr is not correct.

 My understanding is that all the documents belonging to the same group 
 should be on the same shard to ensure that ngroups returns the correct 
 value.  However, the shard that a document is sent to is decided 
 automatically based on the value of the uniqueKey field.

 Is it possible for Solr to hash fieldX instead of the uniqueKey while 
 distributing the documents to the different shards? Is there some other way 
 of getting accurate values of ngroups when using SolrCloud?

 Thanks,
 Nikhil


Re: XInclude Multiple Elements

2012-09-11 Thread Amit Nithian
Way back when I opened an issue about using XML entity includes in
Solr as a way to break up the config. I have found problems with
XInclude having multiple elements to include because the file is not
well formed. From what I have read, if you make this well formed, you
end up with a document that's not what you expect.

For example:
my schema.xml has
fields
...
xinclude href=more_fields.xml .../
/fields

more_fields.xml
field name=..

which isn't well formed. You could make it well formed:
fields
field name =..
/fields
but then I think you end up with nested fields element which doesn't
work (and btw I still keep getting the blasted failed to parse error
which isn't very helpful). Looking at this made me wonder if entity
includes work with Solr 4 and indeed they do! They aren't as flexible
as XIncludes but for the purpose of breaking up an XML file into
smaller pieces, it works beautifully and as you would expect.

You can simply declare your entities at the top as shown in the
earlier thread and then include them where you need. I've been using
this for years and it works fairly well.

Cheers!
Amit


On Thu, May 31, 2012 at 7:01 AM, Bogdan Nicolau bogdan@gmail.com wrote:
 I've also tried a lot of tricks to get xpointer working with multiple child
 elements, to no success.
 In the end, I've resorted to a less pretty, other-way-around solution. I do
 something like this:
 solrconfig_common.xml - no xml declaration, no root tag, no nothing
 etc/etc
 etc2/etc2
 ...
 For each file that I need the common stuff into, I'd do something like this:
 solrconfig_master.xml/solrconfig_slave.xml/etc.
 ?xml version=1.0 encoding=UTF-8 ?
 !DOCTYPE config [
 lt;!ENTITY solrconfigcommon SYSTEM
 quot;solrconfig_common.xmlquot;
 ]

 config
 solrconfigcommon;

 /config

 Solr starts with 0 warnings, the configuration is properly loaded, etc.
 Property substitution also works, including inside the
 solrconfig_common.xml. Hope it helps anyone.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/XInclude-Multiple-Elements-tp3167658p3987029.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replication policy

2012-09-11 Thread Amit Nithian
If I understand you right,  replication of data has 0 downtime, it
just works and the data flows through from master to slaves. If you
want, you can configure the replication to replicate configuration
files across the cluster (although to me my deploy script does this).
I'd recommend tweaking the warmers so that you don't get latency
spikes due to cold caches during the replications.

Not being well versed in the latest Solr features (I'm a bit behind
here), I don't know if you can reload the cores on demand to indicate
the latest configurations or not but in my environment, I have a
rolling restart script that bounces a set of servers when the
schema/solrconfig changes.

HTH
Amit

On Mon, Sep 10, 2012 at 11:10 PM, Abhishek tiwari
abhishek.tiwari@gmail.com wrote:
 HI All,

  am having 1 master and 3 slave solr server.(verson 3.6)
  What kind of replication policy should i adopt with zero down time  no
 data loss .

 1) when we do some configuration and schema  changes on the solr server .


Re: Solr Sorting Caching

2012-09-11 Thread Toke Eskildsen
On Tue, 2012-09-11 at 08:00 +0200, Amey Patil wrote:
 Our solr index (Solr 3.4) has over 100 million docuemnts.
[...]
 *((keyword1 AND keyword2...) OR (keyword3 AND keyword4...) OR ...) AND
 date:[date1 TO *]*
 No. of keywords can be in the range of 100 - 1000.
 We are adding sort parameter *'date asc'*.

Are you using a TrieDateField for the dates?

 The keyword part of the query changes very rarely but date part always
 changes.

Consider creating and re-using a filter for the keywords and let the
query consist of the date range only.

[...]

 2) Sometimes when 'numFound' is very large for a query, It gives OOM error
 (I guess this is because of sort).

Guessing here: You request all the results from the search, which is
potentially 100M documents? Solr is not geared towards such massive
responses. You might have better luck by paging, but even that does not
behave very well when requesting pages very far into the result set.



Re: RES: RES: Problem with accented words sorting

2012-09-11 Thread Toke Eskildsen
On Mon, 2012-09-10 at 16:04 +0200, Claudio Ranieri wrote:
 When I used the CollationKeyFilterFactory in my facet (example below),
 the value of facet went wrong. When I remove the
 CollationKeyFilterFactory of type of facet, the value went correct.

As Ahmed wrote, CollationKeyFilter is meant for sorting of the document
result. It works by creating a key for each value. The key is, as you
discovered, not meant for human eyes. When you do a sort on the
collation field, the key is used for ordering and the original
human-friendly text is taken from a stored field.
See https://wiki.apache.org/solr/UnicodeCollation

For faceting, the dual value approach does not work as there are no
mapping from the key to the original value. There are several possible
solutions to this (storing the original value together with the key
seems sensible), but as far as I know, Solr does not currently support
collator sorted faceting.

 Is it a bug?

No, it is a known (and significant IMO) limitation.



Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0

2012-09-11 Thread guenter.hip...@unibas.ch

feedback to patch:
I used build #85 (Revision: 1382192) to test the same use case (build up 
an initial index of 18 Mio and run updates with around 200.000 documents)


result: the use of fq to drill down facets is now consistent! (available 
under http://sb-tp1.swissbib.unibas.ch)


Thanks for providing a quick patch!!

-Günter

On 09/07/2012 05:09 PM, Erick Erickson wrote:

Thank the guys who actually fixed it!

Thanks for bringing this up, and please let us know if Yonik's patch fixes
your problem

Best
Erick

On Thu, Sep 6, 2012 at 11:39 PM, guenter.hip...@unibas.ch
guenter.hip...@unibas.ch wrote:

Erick, thanks for response!
Our use case is very straight forward and basic.
- no cloud infrastructure
- XMLUpdateRequest - handler (transformed library bibliographic data which
is pushed by the post.jar component). For deletions I used to use the solrJ
component until two month ago but because of the difficulties I read about I
changed back to the basic procedure with XML documents
- around 18 million documents, no distributed shards
- once the basic use case is stable and maintainable we are heading forward
to the more fancy things ;-)

Yonik provided a patch (https://issues.apache.org/jira/browse/SOLR-3793)
yesterday morning. I'm going to run tests once it is part of the nightly
builds. By now, if I'm not wrong
(https://builds.apache.org/job/Solr-Artifacts-4.x/), the last build doesn't
contain it.

Best wishes from Basel, Günter


On 09/07/2012 07:09 AM, Erick Erickson wrote:

Guenter:

Are you using SolrCloud or straight Solr? And were you updating in
batches (i.e. updating multiple docs at once from SolrJ by using the
server.add(doclist) form)?

There was a bug in this process that caused various docs to show up
in various shards differently. This has been fixed in 4x, any nightly
build should have the fix.

I'm absolutely grasping at straws here, but this was a weird case that
I happen to know about...

Hossman:
of course this all goes up in smoke if you can reproduce this with any
recent compilation of the code.

FWIW
Erick

On Wed, Sep 5, 2012 at 11:29 PM, guenter.hip...@unibas.ch
guenter.hip...@unibas.ch wrote:

Hoss, I'm so happy you realized the problem because I was quite worried
about it!!

Let me know if I can provide support with testing it.
The last two days I was busy with migrating a bunch of hosts which should
-hopefully- be finished today.
Then I have again the infrastructure for running tests

Günter


On 09/05/2012 11:19 PM, Chris Hostetter wrote:

: Subject: Re: use of filter queries in Lucene/Solr Alpha40 and Beta4.0

Günter, This is definitely strange

The good news is, i can reproduce your problem.
The bad news is, i can reproduce your problem - and i have no idea
what's
causing it.

I've opened SOLR-3793 to try to get to the bottom of this, and included
some basic steps to demonstrate the bug using the Solr 4.0-BETA example
data, but i'm really not sure what the problem might be...

https://issues.apache.org/jira/browse/SOLR-3793


-Hoss



--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
e-mailguenter.hip...@unibas.ch
URL:www.swissbib.org   /http://www.ub.unibas.ch/



--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
e-mailguenter.hip...@unibas.ch
URL:www.swissbib.org   /http://www.ub.unibas.ch/






--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
e-mailguenter.hip...@unibas.ch
URL:www.swissbib.org   /http://www.ub.unibas.ch/



Solr backup replication - restore from snapshot

2012-09-11 Thread roySolr
Hello,

I have some question about restoring from a snapshot backup. I have a master
and do the following command:

http://solr.test.uk:/solr/replication?command=backup

It created a directory in my data directory: snapshot.20120911224532

When i want to use this backup on master i replace the index directory with
the snapshot directory. I restart the master and it works! Now i want to
replicate this to my slaves(live) but the slaves doesn't recognize the
changes. I think the problem is the index-version. The index version of the
master(created from snapshot) is lower than the index versions on the
slaves. How can i fix this? Can i force the slaves to replicate without
looking to index versions? Can i upgrade the index version on the master?

Any help will be appreciated!
Roy





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-backup-replication-restore-from-snapshot-tp4006821.html
Sent from the Solr - User mailing list archive at Nabble.com.


RES: RES: RES: Problem with accented words sorting

2012-09-11 Thread Claudio Ranieri
Ok Toke.
Thanks for your explanation.
This is an interesting feature to be implemented, because we can sort the 
results correctly, but not in the facets.
The facets also does not bring the total count for pagination.
I'm using the facets to get the distinct values ​​of a field. I wish to sort 
and pagination them.


-Mensagem original-
De: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Enviada em: terça-feira, 11 de setembro de 2012 04:11
Para: solr-user@lucene.apache.org
Assunto: Re: RES: RES: Problem with accented words sorting

On Mon, 2012-09-10 at 16:04 +0200, Claudio Ranieri wrote:
 When I used the CollationKeyFilterFactory in my facet (example below), 
 the value of facet went wrong. When I remove the 
 CollationKeyFilterFactory of type of facet, the value went correct.

As Ahmed wrote, CollationKeyFilter is meant for sorting of the document result. 
It works by creating a key for each value. The key is, as you discovered, not 
meant for human eyes. When you do a sort on the collation field, the key is 
used for ordering and the original human-friendly text is taken from a stored 
field.
See https://wiki.apache.org/solr/UnicodeCollation

For faceting, the dual value approach does not work as there are no mapping 
from the key to the original value. There are several possible solutions to 
this (storing the original value together with the key seems sensible), but as 
far as I know, Solr does not currently support collator sorted faceting.

 Is it a bug?

No, it is a known (and significant IMO) limitation.



Re: Re: Get parent when the child is a search hit

2012-09-11 Thread Stein Gran
Hi,

Thanks for all the suggestions :-)

Seems like denormalization is the way to go to do this without losing
scalability and speed.

BlockJoins seems so solve another requirement I have, and that is for the
parent-child relationship between for instance an email and email
attachments. This relationship is more stable, it does not change so often,
so block joins seems like a good approach. I see support for block joins is
not committed to Solr yet, the functionality only exists in Lucene. Does
anyone know if the block joins functionality in SOLR-3076 will be committed
to Solr before Solr 4 is released?


Best,
Stein Gran


On Tue, Sep 11, 2012 at 6:29 AM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Hello,
 One more approach is BlockJoin. see SOLR-3076
 blog.griddynamics.com/2012/08/block-join-query-performs.html
  11.09.2012 5:40 пользователь 李�S liyun2...@corp.netease.com написал:

  I think denormalize the data is the best way.
 
  2012-09-11
 
 
 
  李�S
 
 
 
  发件人:jimtronic
  发送时间:2012-09-11 01:38
  主题:Re: Get parent when the child is a search hit
  收件人:solr-usersolr-user@lucene.apache.org
  抄送:
 
  You could create a type field with folder or file as values and
 then
  have the parentid present in the folder docs.
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Get-parent-when-the-child-is-a-search-hit-tp4006623p4006687.html
  Sent from the Solr - User mailing list archive at Nabble.com.



Return only matched multiValued field

2012-09-11 Thread Dotan Cohen
Assuming a multivalued, stored and indexed field with name comment.
When performing a search, I would like to return only the values of
comment which contain the match. For example:

When searching for gold instead of getting this result:
doc
arr name=comment
strTheres a lady whos sure/str
strall that glitters is gold/str
strand shes buying a stairway to heaven/str
/arr
/doc

I would prefer to get this result:
doc
arr name=comment
strall that glitters is gold/str
/arr
/doc

(psuedo-XML from memory, may not be accurate but illustrates the point)

Thanks.

-- 
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


Re: RES: RES: RES: Problem with accented words sorting

2012-09-11 Thread Toke Eskildsen
On Tue, 2012-09-11 at 12:14 +0200, Claudio Ranieri wrote:
 This is an interesting feature to be implemented, because we can sort
 the results correctly, but not in the facets.

At work (State and University Library, Denmark) we use collator-ordered
faceting for author  title, but out current implementation suffers from
sorting upon index-open time. Roughly speaking this takes one minute per
one million terms and since we have 10M documents, we're talking 10-15
minutes before a search can be performed.

The collator-key+original term-approach would take nearly the same time
as standard index order faceting when opening the index.

 The facets also does not bring the total count for pagination. I'm
 using the facets to get the distinct values ​​of a field. I wish to
 sort and pagination them.

This seems to be the relevant JIRA issue:
https://issues.apache.org/jira/browse/SOLR-2242



RES: RES: RES: RES: Problem with accented words sorting

2012-09-11 Thread Claudio Ranieri
Ok Toke,
Is it worth opening a ticket in jira to implement the collator-key + original 
in facet?

-Mensagem original-
De: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Enviada em: terça-feira, 11 de setembro de 2012 08:46
Para: solr-user@lucene.apache.org
Assunto: Re: RES: RES: RES: Problem with accented words sorting

On Tue, 2012-09-11 at 12:14 +0200, Claudio Ranieri wrote:
 This is an interesting feature to be implemented, because we can sort 
 the results correctly, but not in the facets.

At work (State and University Library, Denmark) we use collator-ordered 
faceting for author  title, but out current implementation suffers from 
sorting upon index-open time. Roughly speaking this takes one minute per one 
million terms and since we have 10M documents, we're talking 10-15 minutes 
before a search can be performed.

The collator-key+original term-approach would take nearly the same time as 
standard index order faceting when opening the index.

 The facets also does not bring the total count for pagination. I'm 
 using the facets to get the distinct values ​​of a field. I wish to 
 sort and pagination them.

This seems to be the relevant JIRA issue:
https://issues.apache.org/jira/browse/SOLR-2242



Solr 3.6.1 Source Code

2012-09-11 Thread mechravi25
Hi,

I would like to know the base lined version of Solr 3.6.1 Source code for
svn Check out. We tried to check out from the following link and found many
base lined versions related to Solr 3.6.x version.

https://svn.apache.org/repos/asf/lucene/dev/branches/

Can anyone tell me the exact svn check out link for Solr 3.6.1 version?

Thanks a Lot



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-6-1-Source-Code-tp4006903.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 3.6.1 Source Code

2012-09-11 Thread Thomas Matthijs
On Tue, Sep 11, 2012 at 2:43 PM, mechravi25 mechrav...@yahoo.co.in wrote:
 Hi,

 I would like to know the base lined version of Solr 3.6.1 Source code for
 svn Check out. We tried to check out from the following link and found many
 base lined versions related to Solr 3.6.x version.

 https://svn.apache.org/repos/asf/lucene/dev/branches/

 Can anyone tell me the exact svn check out link for Solr 3.6.1 version?

 Thanks a Lot

https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_6_1/


Re: Solr 3.6.1 Source Code

2012-09-11 Thread Jack Krupansky
The branch will be the source for the next release (3.6.2), if there is 
one. To get the exact source for a release, go to tags rather than 
branches. Use:


http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_3_6_1/

-- Jack Krupansky

-Original Message- 
From: mechravi25

Sent: Tuesday, September 11, 2012 8:43 AM
To: solr-user@lucene.apache.org
Subject: Solr 3.6.1 Source Code

Hi,

I would like to know the base lined version of Solr 3.6.1 Source code for
svn Check out. We tried to check out from the following link and found many
base lined versions related to Solr 3.6.x version.

https://svn.apache.org/repos/asf/lucene/dev/branches/

Can anyone tell me the exact svn check out link for Solr 3.6.1 version?

Thanks a Lot



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-6-1-Source-Code-tp4006903.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: RES: RES: RES: Problem with accented words sorting

2012-09-11 Thread Ahmet Arslan
 This is an interesting feature to be implemented, because we
 can sort the results correctly, but not in the facets.
 The facets also does not bring the total count for
 pagination.
 I'm using the facets to get the distinct values ​​of a
 field. I wish to sort and pagination them.

Distinct values can be retrieved using 
http://wiki.apache.org/solr/LukeRequestHandler too.

Regarding pagination :
http://wiki.apache.org/solr/SimpleFacetParameters#facet.offset


Re: Running Solr Unit Test in Eclipse

2012-09-11 Thread Jack Krupansky
Generally, source folders come in pairs - java for the actual code source, 
and test for the unit tests. So, make sure that /test appears in the 
source folder name. And unit test file names either begin or end with 
Test.


If you right click on a unit test and select Run As, you should see JUnit 
Test. Or press Ctrl+F11 to run a unit test.


-- Jack Krupansky

-Original Message- 
From: BadalChhatbar

Sent: Tuesday, September 11, 2012 1:37 AM
To: solr-user@lucene.apache.org
Subject: Running Solr Unit Test in Eclipse

Hi All,

I am new to solr and eclipse.

I am trying to run solr unit test in eclipse, and i am getting confused at
couple of places. ( Note: I am able to run test using ant command and it all
works fine).

but when I open unit test, and go to Right Click --- Run As Configuration
option , it shows me to run unit test as Jetty WebApp , is this the right
thing ? if yes, then it would be great if you can provide me some
configuration steps.


and if i try to select Run As Configuration to Junit, then i m not sure
what classes to select or what arguments to specify.

I tried to follow this document, but it didn't help much.
http://wiki.apache.org/solr/TestingSolr






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Running-Solr-Unit-Test-in-Eclipse-tp4006795.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: [Solr4 beta] error 503 on commit

2012-09-11 Thread Antoine LE FLOC'H
Hoss,

After investigating more, here is the tomcat log herebelow. It is indeed
the same problem: exceeded limit of maxWarmingSearchers=2,.
It is an indexing box and the comment says that we could rise this number
to 4 or something. I can do that but I have four questions though:
  - is it something that can happen anyway ?
  - what are the actions in case it does (since re-committing the same docs
with a try-again mechanism is not re-committing) ?
  - I was in the impression that no new searchers are created if I don't do
search. And I was not searching at that time. Where is this searcher coming
from ?
  - is there a way to disable searchers during indexing since I precisely
don't want warming during indexing ?
Thanks a lot.

11 sept. 2012 15:25:08 org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener done.
11 sept. 2012 15:25:08 org.apache.solr.core.SolrCore registerSearcher
INFO: [lg_fr_alpha6_full] Registered new searcher
Searcher@61f5e795main{StandardDirectoryReader(segments_qo:13989:nrt
_13d(4.0.0.1):C4636752/2696195 _26s(4.0.0.1):C4422269
_329(4.0.0.1):C2409200 _3me(4.0.0.1):C4534687/1 _4df(4.0.0.1):C3745599/731
_4np(4.0.0.1):C3660400/5317 _4qr(4.0.0.1):C2422935/138
_4w4(4.0.0.1):C261418 _4qw(4.0.0.1):C293154 _56y(4.0.0.1):C833572
_4vc(4.0.0.1):C138593 _53b(4.0.0.1):C410764 _4yg(4.0.0.1):C168744
_51j(4.0.0.1):C134313 _56s(4.0.0.1):C151121 _55m(4.0.0.1):C40342
_59i(4.0.0.1):C167117 _58i(4.0.0.1):C82564 _57r(4.0.0.1):C79488
_57k(4.0.0.1):C2589 _5a7(4.0.0.1):C64667 _59x(4.0.0.1):C14142
_58v(4.0.0.1):C1618 _58x(4.0.0.1):C2219 _595(4.0.0.1):C15193
_590(4.0.0.1):C17855 _598(4.0.0.1):C3528 _59a(4.0.0.1):C5998
_59l(4.0.0.1):C10234 _59k(4.0.0.1):C6524 _59q(4.0.0.1):C3115
_5al(4.0.0.1):C3502 _5a4(4.0.0.1):C1602 _5a6(4.0.0.1):C1797
_5a5(4.0.0.1):C1530 _5ad(4.0.0.1):C7351 _5ac(4.0.0.1):C5797
_5ab(4.0.0.1):C8330 _5aa(4.0.0.1):C6436 _5a9(4.0.0.1):C5944
_5ag(4.0.0.1):C36424 _5aj(4.0.0.1):C7 _5ai(4.0.0.1):C26770
_5ak(4.0.0.1):C29729 _5af(4.0.0.1):C35554 _5ah(4.0.0.1):C23804)}
11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
11 sept. 2012 15:25:08 org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [lg_fr_alpha6_full] webapp=/solr path=/update
params={waitSearcher=falsecommit=truewt=xmlsoftCommit=falseversion=2.2}
{commit=} 0 107492
11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false}
11 sept. 2012 15:25:08 org.apache.solr.search.SolrIndexSearcher init
INFO: Opening Searcher@62677672 main
11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
11 sept. 2012 15:25:08 org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [lg_fr_alpha6_full] webapp=/solr path=/update
params={waitSearcher=falsecommit=truewt=xmlsoftCommit=falseversion=2.2}
{commit=} 0 93563
11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false}
11 sept. 2012 15:25:08 org.apache.solr.core.SolrCore getSearcher
ATTENTION: [lg_fr_alpha6_full] PERFORMANCE WARNING: Overlapping
onDeckSearchers=2
11 sept. 2012 15:25:08 org.apache.solr.search.SolrIndexSearcher init
INFO: Opening Searcher@4479f66a main
11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
11 sept. 2012 15:25:08 org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [lg_fr_alpha6_full] webapp=/solr path=/update
params={waitSearcher=falsecommit=truewt=xmlsoftCommit=falseversion=2.2}
{commit=} 0 93178
11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false}
11 sept. 2012 15:25:08 org.apache.solr.core.SolrCore getSearcher
ATTENTION: [lg_fr_alpha6_full] Error opening new searcher. exceeded limit
of maxWarmingSearchers=2, try again later.
11 sept. 2012 15:25:08 org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [lg_fr_alpha6_full] webapp=/solr path=/update
params={waitSearcher=falsecommit=truewt=xmlsoftCommit=falseversion=2.2}
{} 0 93137
11 sept. 2012 15:25:08 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start
commit{flags=0,_version_=0,optimize=false,openSearcher=true,waitSearcher=false,expungeDeletes=false,softCommit=false}
11 sept. 2012 15:25:08 org.apache.solr.core.SolrCore getSearcher
ATTENTION: [lg_fr_alpha6_full] Error opening new searcher. exceeded limit
of maxWarmingSearchers=2, try again later.
11 sept. 2012 15:25:08 org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: [lg_fr_alpha6_full] webapp=/solr path=/update
params={waitSearcher=falsecommit=truewt=xmlsoftCommit=falseversion=2.2}
{} 0 90799
11 sept. 2012 15:25:08 

Solr on https

2012-09-11 Thread Cool Techi
Hi,

We are trying to run solr on https, these are few of the issues or problems 
that are coming up. Just wanted to understand if anyone else is facing these 
problems,

we have some shards running on https, but in shards parameter in solr we don't 
specify the protocol, how can we achieve thisWill replication work on https
Will commit and other functions working normally?
Regards,atpug
  

Re: RES: RES: RES: RES: Problem with accented words sorting

2012-09-11 Thread Toke Eskildsen
On Tue, 2012-09-11 at 14:21 +0200, Claudio Ranieri wrote:
 Ok Toke, Is it worth opening a ticket in jira to implement the
 collator-key + original in facet?

I think it would be best to discuss it on the developer mailing list
first. I have send a mail there: Collator-based facet sorting in Solr.

Regards,
Toke Eskildsen



Re: Facet Sort by Index, missing indexes

2012-09-11 Thread Chris Hostetter

: I did the query twice, once with the sorting and once without the sort:
: 
: 1.  Without f.a.facet.sort=index : I have all l1, l2, l3 in count order 
: (all l1, l2, and l3 facets have counts on them)
: 
: 2.  With f.a.facet.sort=index : The facet is sorted accordingly 
: l1:..,l2:.. but l3 facets are completely missing

first off: terminology clariication.  It sounds like in your case you have 
a field named a and you are using that field as a field facet. 
 within the field a you have terms like l1, l2, l3, etc...  when you 
facet on a field, the terms are each treated as a constraint and you get 
a constraint count (or facet count) for each term.


Having said that: it sounds like what you are describing is that some 
constraints are missing from the list when you use facet.sort=index (ie: 
when the constraints are in index order)

Is your example real? ie: are the terms really l1, l2, and l3 or are 
those just hypothetical examples?  how many terms do you see in the 
response? - because by default a max of 100 constraints are returned...

https://wiki.apache.org/solr/SimpleFacetParameters#facet.limit

It would help if you could provide a full, real example of the request you 
are attempting (with all params) and the actaul response you get back - if 
things appaer to be working with facet.sort=count, but not with 
facet.sort=index, then please show us both requests+responses so we can 
compare.




-Hoss


Re: Semantic document format... standards?

2012-09-11 Thread Michael Della Bitta
I'm probably a little unclear about the breadth of what you want to
do, but I would recommend DC at the extremely lightweight end, and TEI
at the very heavyweight end. Perhaps you could come up with a mashup
of DC and your own fields in RDF as well.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Tue, Sep 11, 2012 at 11:51 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hello,

 If I'm extracting named entities, topics, key phrases/tags, etc. from 
 documents and I want to have a representation of this document, what format 
 should I use? Are there any standard or at least common formats or approaches 
 people use in such situations?

 For example, the most straight forward format might be something like this:


 document
   titledoc title/title
   keywordsmeta keywords coming from the web page/keywords
   contentpage meat/content
   entitiesname entities recognized in the document/entities
   topicstopics extracted by the annotator/topics
   tagstags extracted by the annotator/tags
   relationsrelations extracted by the annotator/relations
 /document

 But this is a made up format - the XML tags above are just what somebody 
 happened to pick.

 Are there any standard or at least common formats for this?


 Thanks,
 Otis
 
 Performance Monitoring - Solr - ElasticSearch - HBase - 
 http://sematext.com/spm

 Search Analytics - http://sematext.com/search-analytics/index.html


Re: Use field as bool flag for another, not indexed, field

2012-09-11 Thread Erick Erickson
You've outlined the possibilities pretty well. I don't think you
want a custom analyzer though, consider a custom UpdateHandler
and overriding the addDoc command. You can freely manipulate
the document at this point, adding or removing fields etc. So see
if the incoming doc has your original field or not and add your new
boolean field at that point...

Or, even simpler, if you're indexing from SolrJ do this on the client side.

Best
Erick

On Sun, Sep 9, 2012 at 2:53 PM, simple350 aurel...@yahoo.com wrote:
 Hi,

 I want to be able to select from the index the documents who have a certain
 field not null. The problem is that the field is not indexed just stored.
 I'm not interested in indexing that field as it is just an internal URL.

 The idea was to add another field to the document - a boolean field - based
 on the initial field: 'True' for exiting field, 'False' for null - I could
 copy the initial field and use some analyzer having as output a bool result.

 Before trying to build a custom analyzer I wanted to ask if anything like
 this makes sense or if it is already available in Solr or if I completely
 missed some point.

 Regards,
 Alex



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Use-field-as-bool-flag-for-another-not-indexed-field-tp4006491.html
 Sent from the Solr - User mailing list archive at Nabble.com.


PatternTokenizerFactory not working to split comma separated value

2012-09-11 Thread Suneel
Hello ,

I am using following field type for comma separated value but it is not
working.

fieldType name=commaDelimited class=solr.TextField
  analyzer
tokenizer class=solr.PatternTokenizerFactory group=-1
pattern=,|\|  /
  /analyzer
/fieldType

field indexed=true multiValued=true name=vc_cat_shape
omitNorms=true omitPositions=true omitTermFreqAndPositions=true
stored=false termVectors=false type=commaDelimited/

Please suggested what i did wrong.



-
Regards,

Suneel Pandey
Sr. Software Developer
--
View this message in context: 
http://lucene.472066.n3.nabble.com/PatternTokenizerFactory-not-working-to-split-comma-separated-value-tp4006994.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr request/response lifecycle and logging full response time

2012-09-11 Thread Chris Hostetter

: I'd still love to see a query lifecycle flowchart, but, in case it
: helps any future users or in case this is still incorrect, here's how
: I'm tackling this:

part of the problem is deciding what you mean by lifecycle - as far as 
the SolrCore is concerned, the lifecycle of hte request is it's execute 
method -- after that there is still respons writing, but SolrCore doesn't 
really care about that.

From the perspective of SolrDispatchFilter, the lifecycle is longer, as 
the REsponseWRiter is used to format the response.

From the perspective of the servlet container, the lifecycle might be even 
longer, as the client may be slow to read bytes off the wire, so the 
SolrDispatchFilter  ResponseWriter may be completley done with the 
response, but the ServletContainer may not yet have written all the bytes 
back to the client.

That's why most people looking for the full response time usually get 
this info from the ServletContainer (logs), because it's the only place 
that knows for certain when the request is *DONE* done.

: Please advise if:
: - Flowcharts for any solr/lucene-related lifecycles exist

Jan made a pretty decent stab at this a while back, which is good for an 
end user perspective but isn't super detailed...

http://www.cominvent.com/nb/2011/04/04/solr-architecture-diagram/

: - There is a better way of doing this

If i were attempting to solve this problem, i probably would have tried to 
implement it as a simple Servlet Filter that wrapped the 
ServletOutputStream in somethng that would have done hte logging on 
close().  (so that it could be re-used by any ResponseWriter)

-Hos


RE: Term searches with colon(:)

2012-09-11 Thread Chris Hostetter

: Thank you for the reply and help.  The description field is part of the
: defaultHandler's eDisMax search list (qf):
...
: Similar queries for other escaped characters in description using term
: search return correctly as shown from the logs correctly.

Ok ... but you haven't really answered my main question -- what are you 
trying to match?  are you trying to search for the literal term *:* in 
the description field? are you trying to do a wildcard search for terms 
that contain a colon in the middle (ie: foo:bar), are you trying to 
match all documents in the description field?

you've said it doesn't match anything, but you haven't explain what you 
expect it to match

(Actually .. lemme back up and ask a silly question -- are the * 
characters in your email actaully part of the query you are sending to 
solr, or is that just an artifact of your mail client translating bold or 
highlighted characters into * when converting to plain text?)

: what are you expecting that query to match?  because by backslash
: escpaing the colon, what you are asking for there is for Solr to search
: for the literal string *:* in your default search field (afterwhatever
: query time analysis is configured on your default search field)


-Hoss


Re: SolrCloud vs SolrReplication

2012-09-11 Thread thaihai
Thanks for the answer Erick.

thaihai



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-vs-SolrReplication-tp4006327p4007019.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: PatternTokenizerFactory not working to split comma separated value

2012-09-11 Thread Jack Krupansky
I tried your field type in Solr 4.0-BETA and it works fine for input such 
as:


cat,dog|fox,bat|frog

What do you see when you use the Solr Admin Analysis web page for that text 
for your field type?


I would note that your pattern does not permit spaces as delimiters or after 
delimiters, so if your input had spaces, queries could fail unless they 
included the escaped spaces.


-- Jack Krupansky

-Original Message- 
From: Suneel

Sent: Tuesday, September 11, 2012 2:20 PM
To: solr-user@lucene.apache.org
Subject: PatternTokenizerFactory not working to split comma separated value

Hello ,

I am using following field type for comma separated value but it is not
working.

fieldType name=commaDelimited class=solr.TextField
 analyzer
   tokenizer class=solr.PatternTokenizerFactory group=-1
pattern=,|\|  /
 /analyzer
   /fieldType

field indexed=true multiValued=true name=vc_cat_shape
omitNorms=true omitPositions=true omitTermFreqAndPositions=true
stored=false termVectors=false type=commaDelimited/

Please suggested what i did wrong.



-
Regards,

Suneel Pandey
Sr. Software Developer
--
View this message in context: 
http://lucene.472066.n3.nabble.com/PatternTokenizerFactory-not-working-to-split-comma-separated-value-tp4006994.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: PatternTokenizerFactory not working to split comma separated value

2012-09-11 Thread Suneel Pandey
Hi Jack,

This is happening on only lucid cloud server not splitting comma separated
value. but on solr server this fix is working perfectly.

Is any configuration changes in solrconfig.xml which can enable and disable
PatternTokenizerFactory?



-
Regards,

Suneel Pandey
Sr. Software Developer
--
View this message in context: 
http://lucene.472066.n3.nabble.com/PatternTokenizerFactory-not-working-to-split-comma-separated-value-tp4006994p4007028.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Semantic document format... standards?

2012-09-11 Thread Péter Király
Hi,

I guess the most common format today is using the schema.org's ontologies.
It provides a couple of definitions, and it is supported by big players,
such as Google, Yahoo, Microsoft. See http://schema.org/.

Hope it helps,
Péter


otis_gospodne...@yahoo.com wrote:
  Hello,
 
  If I'm extracting named entities, topics, key phrases/tags, etc. from
 documents and I want to have a representation of this document, what format
 should I use? Are there any standard or at least common formats or
 approaches people use in such situations?
 
  For example, the most straight forward format might be something like
 this:
 
 
  document
titledoc title/title
keywordsmeta keywords coming from the web page/keywords
contentpage meat/content
entitiesname entities recognized in the document/entities
topicstopics extracted by the annotator/topics
tagstags extracted by the annotator/tags
relationsrelations extracted by the annotator/relations
  /document
 
  But this is a made up format - the XML tags above are just what somebody
 happened to pick.
 
  Are there any standard or at least common formats for this?
 
 
  Thanks,
  Otis
  
  Performance Monitoring - Solr - ElasticSearch - HBase -
 http://sematext.com/spm
 
  Search Analytics - http://sematext.com/search-analytics/index.html

On Tue, Sep 11, 2012 at 11:51 AM, Otis Gospodnetic



-- 
Péter Király
eXtensible Catalog
http://eXtensibleCatalog.org
http://drupal.org/project/xc


Re: PatternTokenizerFactory not working to split comma separated value

2012-09-11 Thread Jack Krupansky
Is it possible that you may have indexed with an earlier pattern, changed 
the pattern, and then tried to query? If so, you need to fully re-index to 
see the change take effect.


I don't know of anything in solrconfig that should affect 
PatternTokenizerFactory.


-- Jack Krupansky

-Original Message- 
From: Suneel Pandey

Sent: Tuesday, September 11, 2012 4:09 PM
To: solr-user@lucene.apache.org
Subject: Re: PatternTokenizerFactory not working to split comma separated 
value


Hi Jack,

This is happening on only lucid cloud server not splitting comma separated
value. but on solr server this fix is working perfectly.

Is any configuration changes in solrconfig.xml which can enable and disable
PatternTokenizerFactory?



-
Regards,

Suneel Pandey
Sr. Software Developer
--
View this message in context: 
http://lucene.472066.n3.nabble.com/PatternTokenizerFactory-not-working-to-split-comma-separated-value-tp4006994p4007028.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Getting more proper results

2012-09-11 Thread Ramo Karahasan
Hi,

 

i'm using solr 3.5 with the following configuration:

 

fieldType name=text_auto class=solr.TextField

analyzer type=index

!--tokenizer class=solr.KeywordTokenizerFactory/--

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=25
/

 

filter class=solr.LowerCaseFilterFactory/

!--filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=25 /--

/analyzer

analyzer type=query

!--tokenizer class=solr.KeywordTokenizerFactory /--

tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.LowerCaseFilterFactory/

filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=25
/

/analyzer

/fieldType

 

 

fieldType name=text class=solr.TextField positionIncrementGap=100

analyzer

  tokenizer class=solr.WhitespaceTokenizerFactory/

  filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/

  filter class=solr.LowerCaseFilterFactory/

/analyzer

/fieldType

 

field name=id type=string indexed=true stored=true required=true
/

field  name=title type=text_auto indexed=true stored=true
multiValued=false/

field name=description type=text indexed=true stored=true
multiValued=false/

field name=title_autocomplete type=text_auto indexed=true
stored=true multiValued=false/

field name=pic_thumb type=text indexed=true stored=true
multiValued=false/

field name=category type=text indexed=true stored=true
multiValued=false/

field name=category_id type=int  indexed=true stored=true
multiValued=false/

field name=top_id type=int intdexed=true stored=true
multiValued=false/

 

 

I'm using it that way, because I want to have an autocompletion list. Now
I'm wondering if I can influence some of the results I'm getting. I have a
lot of categories in my database. If I for example search for iphone 3 I
would expect to get all iphone 3 from the category electronic. If I'm
searching for iphone 3 I get the following results  iPhone - the book ,
Apple iphone 4, My iphone  I , etc.

 

If I instead write iphone 3g, then I get the proper results. a lot of
iphones .

 

Why did my first search didn't give me the results that I'm getting with the
second search term? I would expect that the behavior should be the same. Is
it possible to configure solr in such a way, that he returns with the first
searchterm the results of the second operation?

 

Thanks,

Rk



Re: Semantic document format... standards?

2012-09-11 Thread Paul Libbrecht
As Michael hinted, I believe RDF would be the de-factor answer.
Within it, things such as OWL or SKOS certainly represent classical formats.
Processors such as OWLAPI can go pretty far there.

As Péter hinted, schema.org might provide a way to complement an existing XML 
with semantic information. The big support everyone talks about (because 
apparently the big names speak out), I haven't seen yet very present; in 
particular in terms of shared toolset.

There's many many many alternatives. We've been recently touching with a format 
call DITA which is an XML format for annotated documents and it also claims to 
provide semantic support (e.g. with taxonomies).

Is your goal to serve these as food for solr to index?

Paul


Le 11 sept. 2012 à 17:51, Otis Gospodnetic a écrit :

 Hello,
 
 If I'm extracting named entities, topics, key phrases/tags, etc. from 
 documents and I want to have a representation of this document, what format 
 should I use? Are there any standard or at least common formats or approaches 
 people use in such situations?
 
 For example, the most straight forward format might be something like this:
 
 
 document
   titledoc title/title
   keywordsmeta keywords coming from the web page/keywords
   contentpage meat/content
   entitiesname entities recognized in the document/entities
   topicstopics extracted by the annotator/topics
   tagstags extracted by the annotator/tags
   relationsrelations extracted by the annotator/relations
 /document
 
 But this is a made up format - the XML tags above are just what somebody 
 happened to pick.
 
 Are there any standard or at least common formats for this? 



multiple filter queries and boolean operators in SolrJ

2012-09-11 Thread Rajarshi Guha
Hi, I am accessing our Solr installation via SolrJ. Currently, we are
supporting filter queries via the addFilterQuery() method of
SolrQuery. However as far as I can see, the resultant documents that
come out of the query are the intersection of all the filters.

Ideally, what I'd like to happen is that if we have two FQ's on the
same field, the result should be the OR, whereas if we have two FQ's
on different fields it should be AND.

The thread at 
http://lucene.472066.n3.nabble.com/complex-boolean-filtering-in-fq-queries-td2038365.html
seems to suggest that I could do this by constructing a URL manually.

But can this be done via SolrJ?

-- 
Rajarshi Guha | http://blog.rguha.net
NIH Center for Advancing Translational Science


What's the rules about contributing to Solr WIKI?

2012-09-11 Thread Alexandre Rafalovitch
I just figured out how to run custom solr core with basic jetty under
Windows service with Apache procrun. Not quite a production setup and
most probably not perfect, but it might save somebody several hours
next time. I want to contribute that back to Solr WIKI for next
person.

Do I just wade in and start editing or is there a process/coordinator?

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


Re: What's the rules about contributing to Solr WIKI?

2012-09-11 Thread Chris Hostetter

: Subject: What's the rules about contributing to Solr WIKI?

https://wiki.apache.org/solr/#How_to_edit_this_Wiki

 This Wiki is a collaborative site, anyone can contribute and share:

 Create an account by clicking the Login link at the top of any 
 page, and picking a username and password.
 
 Edit any page by pressing Edit at the top or the bottom of the page 

If people feel that your contributions should be edited or re-organized, 
they will do so.  if conflicts of vision arise, people bring them up on 
the mailing list.


-Hoss


Re: Getting more proper results

2012-09-11 Thread Ahmet Arslan
 I'm using it that way, because I want to have an
 autocompletion list. Now
 I'm wondering if I can influence some of the results I'm
 getting. I have a
 lot of categories in my database. If I for example search
 for iphone 3 I
 would expect to get all iphone 3 from the category
 electronic. If I'm
 searching for iphone 3 I get the following results 
 iPhone - the book ,
 Apple iphone 4, My iphone  I , etc.

You can try to set your default operator to and. q.op=AND

If you are not firing a phrase query (with quotes), you can do that too.
q=iphone 3


Re: multiple filter queries and boolean operators in SolrJ

2012-09-11 Thread Ahmet Arslan


--- On Wed, 9/12/12, Rajarshi Guha rajarshi.g...@gmail.com wrote:

 From: Rajarshi Guha rajarshi.g...@gmail.com
 Subject: multiple filter queries and boolean operators in SolrJ
 To: solr-user@lucene.apache.org
 Date: Wednesday, September 12, 2012, 12:58 AM
 Hi, I am accessing our Solr
 installation via SolrJ. Currently, we are
 supporting filter queries via the addFilterQuery() method
 of
 SolrQuery. However as far as I can see, the resultant
 documents that
 come out of the query are the intersection of all the
 filters.
 
 Ideally, what I'd like to happen is that if we have two FQ's
 on the
 same field, the result should be the OR, whereas if we have
 two FQ's
 on different fields it should be AND.
 
 The thread at 
 http://lucene.472066.n3.nabble.com/complex-boolean-filtering-in-fq-queries-td2038365.html
 seems to suggest that I could do this by constructing a URL
 manually.
 
 But can this be done via SolrJ?

Does this work for you?

addFilterQuery(field1:(term1 OR term2));
addFilterQuery(fiel2:term5);



Re: multiple filter queries and boolean operators in SolrJ

2012-09-11 Thread Chris Hostetter

fq's are always intersections, if you want to union multiple queries 
you have to specify them in a single fq -- that's not a SolrJ/URL thing, 
that's just a low level detail of how solr caches  intersects filters.

from SolrJ you just have to do a single addFilterQuery() call containing
your union query (using whatever query parser you choose)

There's been an open issue for a while talking about the logistics of 
making unioned fq's more feasible.  I recently posted some thoughts 
there on what i *think* would be a fairly straight forward way to do 
support this in a relatively non-invasive and robust way, which you may 
want to look at if you are comfortable working with java and would like to 
try your hand at implementing it in solr...

https://issues.apache.org/jira/browse/SOLR-1223
https://issues.apache.org/jira/browse/SOLR-1223?focusedCommentId=13450929page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13450929


-Hoss


Re: multiple filter queries and boolean operators in SolrJ

2012-09-11 Thread rajarshi.g...@gmail.com
Perfect!

Many thanks

Sent from my HTC One™ X, an ATT 4G LTE smartphone

- Reply message -
From: Ahmet Arslan iori...@yahoo.com
To: solr-user@lucene.apache.org, rajarshi.g...@gmail.com
Subject: multiple filter queries and boolean operators in SolrJ
Date: Tue, Sep 11, 2012 6:36 PM




--- On Wed, 9/12/12, Rajarshi Guha rajarshi.g...@gmail.com wrote:

 From: Rajarshi Guha rajarshi.g...@gmail.com
 Subject: multiple filter queries and boolean operators in SolrJ
 To: solr-user@lucene.apache.org
 Date: Wednesday, September 12, 2012, 12:58 AM
 Hi, I am accessing our Solr
 installation via SolrJ. Currently, we are
 supporting filter queries via the addFilterQuery() method
 of
 SolrQuery. However as far as I can see, the resultant
 documents that
 come out of the query are the intersection of all the
 filters.
 
 Ideally, what I'd like to happen is that if we have two FQ's
 on the
 same field, the result should be the OR, whereas if we have
 two FQ's
 on different fields it should be AND.
 
 The thread at 
 http://lucene.472066.n3.nabble.com/complex-boolean-filtering-in-fq-queries-td2038365.html
 seems to suggest that I could do this by constructing a URL
 manually.
 
 But can this be done via SolrJ?

Does this work for you?

addFilterQuery(field1:(term1 OR term2));
addFilterQuery(fiel2:term5);



Re: Semantic document format... standards?

2012-09-11 Thread Jack Krupansky
My standard question for such a situation: How are you expecting your users 
to query this data? Are they expecting simple English/natural language text, 
or are they expecting structured identifiers that can be keys into other 
data sources.


For example, are your entities simple text literal names, or might they be 
Dublin Core (DC) Agent URI identifiers?


Ditto for topics - free text vs. some SKOS vocabulary or other form of 
taxonomy.


In other words, clue us in as to your client requirements.

-- Jack Krupansky

-Original Message- 
From: Otis Gospodnetic

Sent: Tuesday, September 11, 2012 11:51 AM
To: solr-user@lucene.apache.org
Subject: Semantic document format... standards?

Hello,

If I'm extracting named entities, topics, key phrases/tags, etc. from 
documents and I want to have a representation of this document, what format 
should I use? Are there any standard or at least common formats or 
approaches people use in such situations?


For example, the most straight forward format might be something like this:


document
 titledoc title/title
 keywordsmeta keywords coming from the web page/keywords
 contentpage meat/content
 entitiesname entities recognized in the document/entities
 topicstopics extracted by the annotator/topics
 tagstags extracted by the annotator/tags
 relationsrelations extracted by the annotator/relations
/document

But this is a made up format - the XML tags above are just what somebody 
happened to pick.


Are there any standard or at least common formats for this?


Thanks,
Otis

Performance Monitoring - Solr - ElasticSearch - HBase - 
http://sematext.com/spm


Search Analytics - http://sematext.com/search-analytics/index.html 



solr.StrField with stored=true useless or bad?

2012-09-11 Thread sysrq
Hi,

I have a StrField to store an URL. The field definition looks like this:
field name=link type=string indexed=true stored=true required=true /

Type string is defined as usual:
fieldType name=string class=solr.StrField sortMissingLast=true /

Then I realized that a StrField doesn't execute any analyzers and stored data 
verbatim. The data is just a single token.

The purpose of stored=true is to store the raw string data besides the 
analyzed/transformed data for displaying purposes. This is fine for an analyzed 
solr.TextField, but for an StrField both values are the same. So is there any 
reason to apply stored=true on a StrField as well?

I ask, because I found a lot of sites and tutorials applying stored=true on 
StrFields as well. Do they all to it wrong or am I missing something here?


Re: solr.StrField with stored=true useless or bad?

2012-09-11 Thread Yonik Seeley
On Tue, Sep 11, 2012 at 7:03 PM,  sy...@web.de wrote:
 The purpose of stored=true is to store the raw string data besides the 
 analyzed/transformed data for displaying purposes. This is fine for an 
 analyzed solr.TextField, but for an StrField both values are the same. So is 
 there any reason to apply stored=true on a StrField as well?

You're over-thinking things a bit ;-)

if you want to search on it: index it
If you want to return it in search results: store it
Those are two orthogonal things (even for StrField).

Why?  Indexed means full-text inverted index: words (terms) point to
documents.  It's not easy/fast for a given document to find out what
terms point to it.  Stored fields are all stored together and can be
retrieved together given a document id.  Hence search finds lists of
document ids (via indexed fields), and can then return any of the
stored fields for those document ids.

-Yonik
http://lucidworks.com


Running Luke On Solr Index (getting lock error)

2012-09-11 Thread BadalChhatbar
Hi All,

I am trying to run luke on my solr search index ( I have contributed 3-4 xml
files only).

when I try to open indexes in luke, I am getting write.lock error on index
and its not showing me index.

I did check Force Unlock option, but it didn't help either, I also tried
opening index under read only mode. didn't work :(.

I have attached screenshot of the error. ( Note : I am using
Luckall-0.9.9.jar). do i need to select any specific option while opening
index in luke.

http://lucene.472066.n3.nabble.com/file/n4007084/solrIndex.jpg 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Running-Luke-On-Solr-Index-getting-lock-error-tp4007084.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr.StrField with stored=true useless or bad?

2012-09-11 Thread Ahmet Arslan
 The purpose of stored=true is to store the raw string data
 besides the analyzed/transformed data for displaying
 purposes. This is fine for an analyzed solr.TextField, but
 for an StrField both values are the same. So is there any
 reason to apply stored=true on a StrField as well?

If you don't store it, you cannot retrieve it (displaying purposes) via fl= 
parameter. You can access indexed values via faceting, terms component etc.


Re: Solr search not working after copying a new field to an existing Indexed Field

2012-09-11 Thread Mani
Eric,
 When you add a doc with the same unique key as an old doc, 
the data associated with the first version of the doc is entirely 
thrown away and its as though you'd never indexed it at all, I did exactly
the same. The old doc and new doc there is not a change except the Name has
changed. When I query Solr for the document, I do see the Name field with
the correct recent changes. However if I search for for the new name, I do
not get the result. So I removed all the documents entirely and then added
the same new document. It worked. Not sure if this is a bug.

So whenever I add a new field to an existing search field, the document
needs to be thrown away (not just adding with the same key as its not
working in my case) for the search to take effect.

Thanks








--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-search-not-working-after-copying-a-new-field-to-an-existing-Indexed-Field-tp4005993p4007096.html
Sent from the Solr - User mailing list archive at Nabble.com.


Partial search

2012-09-11 Thread Mani
I have three documents with the following search field (text_en type) values. 

When I search for Energy Field, I am getting the document in this order
presented. However if you look at the match, I would expect the Doc3 should
come first and Doc1 should be the last. 


Doc1 : Automic Energy and Peace
Doc2 : Energy One Energy Two Energy Three Energy Four
Doc3 : Mathematic Field Energy Field

What is the best way to configure my search to accomodate as many terms
match as possible?







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-search-tp4007097.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud fail over

2012-09-11 Thread andy
I know fail over is available in solr4.0 right now, if one server
crashes,other servers also support query,I set up a solr cloud like this
http://lucene.472066.n3.nabble.com/file/n4007117/Selection_028.png 

I use http://localhost:8983/solr/collection1/select?q=*%3A*wt=xml for query
at first, if the node  8983 crashes, I have to access other nodes for query
like http://localhost:8900/solr/collection1/select?q=*%3A*wt=xml

but I use the nodes url in the solrj, how to change the request url
dynamically?
does SolrCloud support something like virtual ip address? for example I use
url http://collections1 in the solrj, and forward the request to available
url automatically.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-fail-over-tp4007117.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Term searches with colon(:)

2012-09-11 Thread Nemani, Raj
Sorry for not being clear.

Yes, I am trying to do a wildcard search for terms that contain a colon
in the text (ie: foo:bar) in the filed list mentioned in the default
requesthandler that I posted earlier.  Description is one of those
fields.  Mpg is another field.  I have not included the entire default
field list for brevity's sake.

The *s  in my queries that I have included are part of the actual solr
query (to denote wildcards as you said earlier).
Hope I am clear this time.

Thank you again for your help.

Raj


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Tuesday, September 11, 2012 3:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Term searches with colon(:)


: Thank you for the reply and help.  The description field is part of
the
: defaultHandler's eDisMax search list (qf):
...
: Similar queries for other escaped characters in description using
term
: search return correctly as shown from the logs correctly.

Ok ... but you haven't really answered my main question -- what are you
trying to match?  are you trying to search for the literal term *:* in
the description field? are you trying to do a wildcard search for terms
that contain a colon in the middle (ie: foo:bar), are you trying to
match all documents in the description field?

you've said it doesn't match anything, but you haven't explain what you
expect it to match

(Actually .. lemme back up and ask a silly question -- are the * 
characters in your email actaully part of the query you are sending to
solr, or is that just an artifact of your mail client translating bold
or highlighted characters into * when converting to plain text?)

: what are you expecting that query to match?  because by backslash
: escpaing the colon, what you are asking for there is for Solr to
search
: for the literal string *:* in your default search field
(afterwhatever
: query time analysis is configured on your default search field)


-Hoss


RE: Facet Sort by Index, missing indexes

2012-09-11 Thread Dewi Wahyuni
Hi Chris,

Thanks for that tip, I checked and it is correctly because of the constraint 
limit.

Thanks  again
Dewi
 
-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Wednesday, 12 September 2012 1:57 AM
To: solr-user@lucene.apache.org
Subject: Re: Facet Sort by Index, missing indexes


: I did the query twice, once with the sorting and once without the sort:
: 
: 1.  Without f.a.facet.sort=index : I have all l1, l2, l3 in count order
: (all l1, l2, and l3 facets have counts on them)
: 
: 2.  With f.a.facet.sort=index : The facet is sorted accordingly
: l1:..,l2:.. but l3 facets are completely missing

first off: terminology clariication.  It sounds like in your case you have a 
field named a and you are using that field as a field facet. 
 within the field a you have terms like l1, l2, l3, etc...  when you facet 
on a field, the terms are each treated as a constraint and you get a 
constraint count (or facet count) for each term.


Having said that: it sounds like what you are describing is that some 
constraints are missing from the list when you use facet.sort=index (ie: 
when the constraints are in index order)

Is your example real? ie: are the terms really l1, l2, and l3 or are 
those just hypothetical examples?  how many terms do you see in the response? - 
because by default a max of 100 constraints are returned...

https://wiki.apache.org/solr/SimpleFacetParameters#facet.limit

It would help if you could provide a full, real example of the request you are 
attempting (with all params) and the actaul response you get back - if things 
appaer to be working with facet.sort=count, but not with facet.sort=index, then 
please show us both requests+responses so we can compare.




-Hoss


Solr 4.0-BETA facet pivot returns no result

2012-09-11 Thread andy
I use the Solr 4.0-BETA version, my request url is
http://localhost:8983/solr/collection1/select?q=*%3A*rows=0wt=xmlfacet.pivot=cat,popularity,inStockfacet.pivot=popularity,catfacet=truefacet.field=catfacet.pivot.mincount=0

but I do not get any facet pivot info in the result

lst name=params
str name=facettrue/str
str name=q*:*/str
str name=facet.fieldcat/str
str name=facet.pivot.mincount0/str
str name=wtxml/str
arr name=facet.pivot
strcat,popularity,inStock/str
strpopularity,cat/str
/arr
str name=rows0/str
/lst
/lst
result name=response numFound=32 start=0 maxScore=1.0/
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=cat
int name=electronics14/int
int name=currency4/int
int name=memory3/int
int name=connector2/int
int name=graphics card2/int
int name=hard drive2/int
int name=monitor2/int
int name=search2/int
int name=software2/int
int name=camera1/int
int name=copier1/int
int name=multifunction printer1/int
int name=music1/int
int name=printer1/int
int name=scanner1/int
/lst
/lst
lst name=facet_dates/
lst name=facet_ranges/
/lst

Does any body know the reason ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-0-BETA-facet-pivot-returns-no-result-tp4007133.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr.StrField with stored=true useless or bad?

2012-09-11 Thread Amit Nithian
This is great thanks for this post! I was curious about the same thing
and was wondering why fl couldn't return the indexed
representation of a field if that field were only indexed but not
stored. My thoughts were return something than nothing but I didn't
pay attention to the fact that getting even the indexed
representation of a field given a document is not fast.

Thanks
Amit

On Tue, Sep 11, 2012 at 4:03 PM,  sy...@web.de wrote:
 Hi,

 I have a StrField to store an URL. The field definition looks like this:
 field name=link type=string indexed=true stored=true required=true 
 /

 Type string is defined as usual:
 fieldType name=string class=solr.StrField sortMissingLast=true /

 Then I realized that a StrField doesn't execute any analyzers and stored data 
 verbatim. The data is just a single token.

 The purpose of stored=true is to store the raw string data besides the 
 analyzed/transformed data for displaying purposes. This is fine for an 
 analyzed solr.TextField, but for an StrField both values are the same. So is 
 there any reason to apply stored=true on a StrField as well?

 I ask, because I found a lot of sites and tutorials applying stored=true on 
 StrFields as well. Do they all to it wrong or am I missing something here?


Re: Solr - Lucene Debuging help

2012-09-11 Thread Amit Nithian
The wiki should probably be updated.. maybe I'll take a stab at it.
I'll also try and update my article referenced there too.

When you checkout the project from SVN, do ant eclipse

Look at this bug (https://issues.apache.org/jira/browse/SOLR-3817) and
either run the ruby program or download the patch and apply but either
way it should fix the classpath issues.

Then import the project and you can follow the remainder of the steps
in the 
http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse
article.

Cheers
Amit

On Mon, Sep 10, 2012 at 1:29 PM, BadalChhatbar badal...@yahoo.com wrote:
 Hi Steve,

 Thanks, I was able to create new project using that url. :)

 one more thing,.. its giving me about 32K error. (something like.. this type
 cannot be resolved).

 i tried rebuilding project and running ant command (build.xml) . but it
 didn't help. any suggestions on this ?


 thanks



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Lucene-Debuging-help-tp4006715p4006721.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: In-memory indexing

2012-09-11 Thread Amit Nithian
I have wondered about this too but instead why not just set your cache
sizes large enough to house most/all of your documents and pre-warm
the caches accordingly? My bet is that a large enough document cache
may suffice but that's just a guess.

- Amit

On Mon, Sep 10, 2012 at 10:56 AM, Kiran Jayakumar
kiranjuni...@gmail.com wrote:
 Hi,

 Does anyone have any experience in hosting the entire index in a RAM disk ?
 (I'm not thinking about Lucene's RAM directory). I have some small indexes
 (less than a Gb). Also, please recommend a good RAM disk application for
 Windows (I have used Gizmo, wondering if there's any better one out there).

 Thanks
 Kiran


Re: Solr Sorting Caching

2012-09-11 Thread Amey Patil
Are you using a TrieDateField for the dates?
Yes

Consider creating and re-using a filter for the keywords and let the
query consist of the date range only.
In this case, do I have to configure any cache or solr's default
configurations are enough?

Guessing here: You request all the results from the search, which is
potentially 100M documents? Solr is not geared towards such massive
responses. You might have better luck by paging, but even that does not
behave very well when requesting pages very far into the result set.
We have implemented paging but the problem is sort. Solr try to sort the
dates of all the documents satisfying that query, hence if the numFound is
very large, solr loads all the date values in the memory so as to sort them
and hence goes OOM. Correct me if I am wrong.


On Tue, Sep 11, 2012 at 12:30 PM, Toke Eskildsen 
t...@statsbiblioteket.dkwrote:

 On Tue, 2012-09-11 at 08:00 +0200, Amey Patil wrote:
  Our solr index (Solr 3.4) has over 100 million docuemnts.
 [...]
  *((keyword1 AND keyword2...) OR (keyword3 AND keyword4...) OR ...) AND
  date:[date1 TO *]*
  No. of keywords can be in the range of 100 - 1000.
  We are adding sort parameter *'date asc'*.

 Are you using a TrieDateField for the dates?

  The keyword part of the query changes very rarely but date part always
  changes.

 Consider creating and re-using a filter for the keywords and let the
 query consist of the date range only.

 [...]

  2) Sometimes when 'numFound' is very large for a query, It gives OOM
 error
  (I guess this is because of sort).

 Guessing here: You request all the results from the search, which is
 potentially 100M documents? Solr is not geared towards such massive
 responses. You might have better luck by paging, but even that does not
 behave very well when requesting pages very far into the result set.




Re: Semantic document format... standards?

2012-09-11 Thread Paul Libbrecht
Otis,

if you have a bit of time to research, I think your document may look a lot 
like the documents processed by:
http://langtech.jrc.it/
which is a flagship multilingual technology implementation and includes a 
fair amount of entity disambiguation as far as I could hear in Ralph's talk.
I do not have a more concrete pointer, sorry, and I would love to read 
something more concretely closer to solr about them.

Paul


Le 12 sept. 2012 à 00:46, Jack Krupansky a écrit :

 My standard question for such a situation: How are you expecting your users 
 to query this data? Are they expecting simple English/natural language text, 
 or are they expecting structured identifiers that can be keys into other 
 data sources.
 
 For example, are your entities simple text literal names, or might they be 
 Dublin Core (DC) Agent URI identifiers?
 
 Ditto for topics - free text vs. some SKOS vocabulary or other form of 
 taxonomy.
 
 In other words, clue us in as to your client requirements.
 
 -- Jack Krupansky
 
 -Original Message- From: Otis Gospodnetic
 Sent: Tuesday, September 11, 2012 11:51 AM
 To: solr-user@lucene.apache.org
 Subject: Semantic document format... standards?
 
 Hello,
 
 If I'm extracting named entities, topics, key phrases/tags, etc. from 
 documents and I want to have a representation of this document, what format 
 should I use? Are there any standard or at least common formats or approaches 
 people use in such situations?
 
 For example, the most straight forward format might be something like this:
 
 
 document
 titledoc title/title
 keywordsmeta keywords coming from the web page/keywords
 contentpage meat/content
 entitiesname entities recognized in the document/entities
 topicstopics extracted by the annotator/topics
 tagstags extracted by the annotator/tags
 relationsrelations extracted by the annotator/relations
 /document
 
 But this is a made up format - the XML tags above are just what somebody 
 happened to pick.
 
 Are there any standard or at least common formats for this?
 
 
 Thanks,
 Otis
 
 Performance Monitoring - Solr - ElasticSearch - HBase - 
 http://sematext.com/spm
 
 Search Analytics - http://sematext.com/search-analytics/index.html 



AW: Getting more proper results

2012-09-11 Thread Ramo Karahasan
Hi,

i've set it to AND, restarted tomcat, but in my search i get the same
results. So it seems that this don't have an effect.

Any ideas?

Ramo

-Ursprüngliche Nachricht-
Von: Ahmet Arslan [mailto:iori...@yahoo.com] 
Gesendet: Mittwoch, 12. September 2012 00:34
An: solr-user@lucene.apache.org
Betreff: Re: Getting more proper results

 I'm using it that way, because I want to have an autocompletion list. 
 Now I'm wondering if I can influence some of the results I'm getting. 
 I have a lot of categories in my database. If I for example search for 
 iphone 3 I would expect to get all iphone 3 from the category 
 electronic. If I'm searching for iphone 3 I get the following 
 results iPhone - the book , Apple iphone 4, My iphone  I , etc.

You can try to set your default operator to and. q.op=AND

If you are not firing a phrase query (with quotes), you can do that too.
q=iphone 3