Re: 2 solr dataImport requests on a single core at the same time

2010-07-23 Thread kishan

Hi Tq very much its solved my problem , 
having multiple Request Handlers will not degrade the performance ... unless
we are sending parallel requests? am i right ?


Thansk,
Prasad


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/2-solr-dataImport-requests-on-a-single-core-at-the-same-time-tp978649p989132.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Getting FileNotFoundException with repl command=backup?

2010-07-23 Thread Alexander Rothenberg
Thanks for the info Peter, i think i ran into the same isssue some time ago 
and could not find out why the backup stopped and also got deleted by solr. 

I decided to stop current running updates to solr while backup is running and 
wrote an own backuphandler that simply just copies the index-files to some 
location and rotates older unneeded backups. 

I thought about a cleaner solution where the backuphandler should create a 
LOCK to the index which would prevent incomming updates to write into the 
index. (the same is happening when index-optimizing is running). Then when 
the LOCK is set, a backup could run without any problems and removes the LOCK 
when done then. I was not able to create a working LOCK that prevents 
incomming updates to be applied, never found out... 

-- 
Alexander Rothenberg
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.net/
Potsdamer Str. 96   Tel: +49 30 25792890
10785 BerlinFax: +49 30 257928999

Geschäftsführer:Ali Paczensky
Amtsgericht:Berlin Charlottenburg (HRB 73099)
Sitz:   Berlin


Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Eric Grobler
Thanks I saw the article,

As far as I can tell the trunk archives only go back to the middle of March
and the 2 patches are from the beginning of the year.

Thus:
*These approaches can be tried out easily using a single set of sample data
and the Solr example application (assumes current trunk codebase and latest
patches posted to the respective issues). **

**Is a bit of an over-statement!**
*
Regards
Eric*
*
On Fri, Jul 23, 2010 at 6:22 AM, Jonathan Rochkind rochk...@jhu.edu wrote:

 Solr does not, yet, at least not simply, as far as I know, but there are
 ideas and some JIRA's with maybe some patches:

 http://wiki.apache.org/solr/HierarchicalFaceting


 
 From: rajini maski [rajinima...@gmail.com]
 Sent: Friday, July 23, 2010 12:34 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Tree Faceting in Solr 1.4

 I am also looking out for same feature in Solr and very keen to know
 whether
 it supports this feature of tree faceting... Or we are forced to index in
 tree faceting formatlike

 1/2/3/4
 1/2/3
 1/2
 1

 In-case of multilevel faceting it will give only 2 level tree facet is what
 i found..

 If i give query as : country India and state Karnataka and city
 bangalore...All what i want is a facet count  1) for condition above. 2)
 The
 number of states in that Country 3) the number of cities in that state ...

 Like = Country: India ,State:Karnataka , City: Bangalore 1

 State:Karnataka
  Kerla
  Tamilnadu
  Andra Pradesh...and so on

 City:  Mysore
  Hubli
  Mangalore
  Coorg and so on...


 If I am doing
 facet=on  facet.field={!ex=State}State  fq={!tag=State}State:Karnataka

 All it gives me is Facets on state excluding only that filter query.. But i
 was not able to do same on third level ..Like  facet.field= Give me the
 counts of  cities also in state Karantaka..
 Let me know solution for this...

 Regards,
 Rajani Maski





 On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler impalah...@googlemail.com
 wrote:

  Thank you for the link.
 
  I was not aware of the multifaceting syntax - this will enable me to run
 1
  less query on the main page!
 
  However this is not a tree faceting feature.
 
  Thanks
  Eric
 
 
 
 
  On Thu, Jul 22, 2010 at 4:51 PM, SR r.steve@gmail.com wrote:
 
   Perhaps the following article can help:
  
 
 http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
  
   -S
  
  
   On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote:
  
Hi Solr Community
   
If I have:
COUNTRY CITY
Germany Berlin
Germany Hamburg
Spain   Madrid
   
Can I do faceting like:
Germany
 Berlin
 Hamburg
Spain
 Madrid
   
I tried to apply SOLR-792 to the current trunk but it does not seem
 to
  be
compatible.
Maybe there is a similar feature existing in the latest builds?
   
Thanks  Regards
Eric
  
  
 



Re: Duplicates

2010-07-23 Thread Peter Karich
Another possibility could be the well known 'field collapse' ;-)

http://wiki.apache.org/solr/FieldCollapsing

Regards,
Peter.

 Thanks.

 If I set uniqueKey on the field, then I can save duplicates?
 I need to remove duplicates only from search results. The ability to save
 duplicates are should be.

 2010/7/23 Erick Erickson erickerick...@gmail.com

   
 If the field is a single token, just define the uniqueKey on it in your
 schema.

 Otherwise, this may be of interest:
 http://wiki.apache.org/solr/Deduplication

 Haven't used it myself though...

 best
 Erick

 On Thu, Jul 22, 2010 at 6:14 PM, Pavel Minchenkov char...@gmail.com
 wrote:

 
 Hi,

 Is it possible to remove duplicates in search results by a given field?

 Thanks.

 --
 Pavel Minchenkov



Re: Duplicates

2010-07-23 Thread Pavel Minchenkov
Thanks.

Does it work with Solr 1.4 (Solr 4.0 mentioned in article)?
What about performance? I need only to delete duplicates (I don't need cout
of duplicates or select certain duplicate).

2010/7/23 Peter Karich peat...@yahoo.de

 Another possibility could be the well known 'field collapse' ;-)

 http://wiki.apache.org/solr/FieldCollapsing

 Regards,
 Peter.

  Thanks.
 
  If I set uniqueKey on the field, then I can save duplicates?
  I need to remove duplicates only from search results. The ability to save
  duplicates are should be.
 
  2010/7/23 Erick Erickson erickerick...@gmail.com
 
 
  If the field is a single token, just define the uniqueKey on it in your
  schema.
 
  Otherwise, this may be of interest:
  http://wiki.apache.org/solr/Deduplication
 
  Haven't used it myself though...
 
  best
  Erick
 
  On Thu, Jul 22, 2010 at 6:14 PM, Pavel Minchenkov char...@gmail.com
  wrote:
 
 
  Hi,
 
  Is it possible to remove duplicates in search results by a given field?
 
  Thanks.
 
  --
  Pavel Minchenkov




-- 
Pavel Minchenkov


Re: Solr on iPad?

2010-07-23 Thread Mark Allan

Hi Stephan,

On the iPad, as with the iPhone, I'm afraid you're stuck with using  
SQLite if you want any form of database in your app.


I suppose if you wanted to get really ambitious and had a lot of time  
on your hands you could use Xcode to try and compile one of the open- 
source C-based DBs/Indexers, but as with most things in OS X and iOS  
development, if you're bending over yourself trying to implement  
something, you're probably doing it wrongly!  Also, I wouldn't put it  
past the AppStore guardians to reject your app purely on the basis of  
having used something other than SQLite!


Apple's cocoa-dev mailing list is very active if you have problems,  
but do your homework before asking questions or you'll get short shrift.

http://lists.apple.com/cocoa-dev

Mark

On 22 Jul 2010, at 6:12 pm, Stephan Schwab wrote:


Dear Solr community,

does anyone know whether it may be possible or has already been done  
to
bring Solr to the Apple iPad so that applications may use a local  
search

engine?

Greetings,
Stephan



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: Duplicates

2010-07-23 Thread Peter Karich
Hi Pavel!

The patch can be applied to 1.4.
The performance is ok, but for some situations it could be worse than
without the patch.
For us it works good, but others reported some exceptions
(see the patch site: https://issues.apache.org/jira/browse/SOLR-236)

 I need only to delete duplicates

Could you give us an example what you exactly need?
(Maybe you could index each master document of the 'unique' documents
with an extra field and query for that field?)

Regards,
Peter.

 Thanks.

 Does it work with Solr 1.4 (Solr 4.0 mentioned in article)?
 What about performance? I need only to delete duplicates (I don't need cout
 of duplicates or select certain duplicate).

 2010/7/23 Peter Karich peat...@yahoo.de

   
 Another possibility could be the well known 'field collapse' ;-)

 http://wiki.apache.org/solr/FieldCollapsing

 Regards,
 Peter.

 
 Thanks.

 If I set uniqueKey on the field, then I can save duplicates?
 I need to remove duplicates only from search results. The ability to save
 duplicates are should be.

 2010/7/23 Erick Erickson erickerick...@gmail.com


   
 If the field is a single token, just define the uniqueKey on it in your
 schema.

 Otherwise, this may be of interest:
 http://wiki.apache.org/solr/Deduplication

 Haven't used it myself though...

 best
 Erick

 On Thu, Jul 22, 2010 at 6:14 PM, Pavel Minchenkov char...@gmail.com
 wrote:


 
 Hi,

 Is it possible to remove duplicates in search results by a given field?

 Thanks.

 --
 Pavel Minchenkov
   

 

   


-- 
http://karussell.wordpress.com/



Replacing text fields with numeric fields for speed

2010-07-23 Thread Gora Mohanty
Hi,

  One of the things that we were thinking of doing in order to
speed up results from Solr search is to convert fixed-text fields
(such as values from a drop-down) into numeric fields. The thinking
behind this was that searching through numeric values would be
faster than searching through text. However, I now feel that we
were barking up the wrong tree, as Lucene is probably not doing a
text search per se.

  From some experiments, I see only a small difference between a
text search on a field, and a numeric search on the corresponding
numeric field. This difference can probably be attributed to the
additional processing on the text field. Could someone clarify on
whether one can expect a difference in speed between searching
through a fixed-text field, and its numeric equivalent?

  I am aware of the benefit of numeric fields for range queries.

Regards,
Gora


Problem with Pdf, Sol 1.4.1 Cell

2010-07-23 Thread Alessandro Benedetti
Hi all,
as I saw in this discussion [1] there were many issues with PDF indexing in
Solr 1.4  due to TIka library (0.4 Version).
In Solr 1.4.1 the tika library is the same so I guess  the issues are the
same.
Could anyone, who contributed to the previous thread, help me in resolving
these issues?
I need a simple tutorial that could help me to upgrade Solr Cell!

Something like this:
1) download tika core from trunk
2)create jar with maven dependecies
3)unjar Sol 1.4.1 and change tika library
4)jar the patched Solr 1.4.1 and enjoy!

[1]
http://markmail.org/message/zbkplnzqho7mxwy3#query:+page:1+mid:gamcxdx34ayt6ccg+state:results

Best regards

-- 
--

Benedetti Alessandro
Personal Page: http://tigerbolt.altervista.org

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Solr on iPad?

2010-07-23 Thread Chantal Ackermann
Hi,

unfortunately for iPad developers, it seems that it is not possible to
use the Spotlight engine through the SDK:

http://stackoverflow.com/questions/3133678/spotlight-search-in-the-application

Chantal

On Fri, 2010-07-23 at 10:16 +0200, Mark Allan wrote:
 Hi Stephan,
 
 On the iPad, as with the iPhone, I'm afraid you're stuck with using  
 SQLite if you want any form of database in your app.
 
 I suppose if you wanted to get really ambitious and had a lot of time  
 on your hands you could use Xcode to try and compile one of the open- 
 source C-based DBs/Indexers, but as with most things in OS X and iOS  
 development, if you're bending over yourself trying to implement  
 something, you're probably doing it wrongly!  Also, I wouldn't put it  
 past the AppStore guardians to reject your app purely on the basis of  
 having used something other than SQLite!
 
 Apple's cocoa-dev mailing list is very active if you have problems,  
 but do your homework before asking questions or you'll get short shrift.
   http://lists.apple.com/cocoa-dev
 
 Mark
 
 On 22 Jul 2010, at 6:12 pm, Stephan Schwab wrote:
 
  Dear Solr community,
 
  does anyone know whether it may be possible or has already been done  
  to
  bring Solr to the Apple iPad so that applications may use a local  
  search
  engine?
 
  Greetings,
  Stephan
 





Re: Replacing text fields with numeric fields for speed

2010-07-23 Thread Gora Mohanty
On Fri, 23 Jul 2010 14:44:32 +0530
Gora Mohanty g...@srijan.in wrote:
[...]
   From some experiments, I see only a small difference between a
 text search on a field, and a numeric search on the corresponding
 numeric field.
[...]

Well, I take that back. Running more rigorous tests with Apache
Bench shows a difference of slightly over a factor of 2 between the
median search time on the numeric field, and on the text field. The
search on the numeric field is, of course, faster. That much
of a difference puzzles me. Would someone knowledgeable about
Lucene indexes care to comment?

Regards,
Gora


Re: filter query on timestamp slowing query???

2010-07-23 Thread oferiko

I don't specify any sort order, and i do request for the score, so it is
ordered based on that.

My schema consists of these fields:
field name=id type=string indexed=true stored=true required=true
/ 
field name=timestamp type=pdate indexed=true stored=true
default=NOW multiValued=false/ (changing now to tdate)
field name=type type=string indexed=true stored=true
required=true / 
field name=contents type=text indexed=true stored=false
termVectors=true /

and a typical query would be:
fl=id,type,timestamp,scorestart=0q=Coca+Cola+pepsi+-dr+pepperfq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y)rows=2000

thanks again for you time
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/filter-query-on-timestamp-slowing-query-tp977280p989536.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Duplicates

2010-07-23 Thread Peter Karich
Pavel,

hopefully I understand now your usecase :-) but one question:

 I need to select always *one* file per folder or
 select *only* folders than contains matched files (without files).

What do you mean here with 'or'? Do you have 2 usecases or would one of them be 
sufficient?
Because the second usecase could be solved without the patch: you could index 
folders only, 
then all prop_N will be multivalued field. and you don't have the problem of 
duplicate folders.

(If you don't mind uglyness both usecases could even handled: After you got the 
folders 
 grabbing the files which matched could be done in postprocessing)

But I fear the cleanest solution is to use the patch. Hopefully it can be 
applied without hassles
against 1.4 or the trunk. If not, please ask on the patch-site for assistance.

Regards,
Peter.


 Thanks, Peter!

 I'll try collapsing today.

 Example (sorry if table unformated):

 id |  type  |   prop_1  |  |  prop_N |  folderId
 
  0 | folder |   |  | |
  1 | file   |  val1 |  |  valN1  |   0
  2 | file   |  val3 |  |  valN2  |   0
  3 | file   |  val1 |  |  valN3  |   0
  4 | folder |   |  | |
  5 | folder |   |  | |
  6 | file   |  val3 |  |  valN7  |   6
  7 | file   |  val4 |  |  valN8  |   6
  8 | folder |   |  | |
  9 | file   |  val2 |  |  valN3  |   8
  10| file   |  val1 |  |  valN2  |   8
  11| file   |  val2 |  |  valN5  |   8
  12| folder |   |  | |


 I need to select always *one* file per folder or
 select *only* folders than contains matched files (without files).

 Query:
 prop_1:val1 OR prop_2:val2

 I need results (document ids):
 1, 9
 or
 0, 8

 2010/7/23 Peter Karich peat...@yahoo.de

   
 Hi Pavel!

 The patch can be applied to 1.4.
 The performance is ok, but for some situations it could be worse than
 without the patch.
 For us it works good, but others reported some exceptions
 (see the patch site: https://issues.apache.org/jira/browse/SOLR-236)

 
 I need only to delete duplicates
   
 Could you give us an example what you exactly need?
 (Maybe you could index each master document of the 'unique' documents
 with an extra field and query for that field?)

 Regards,
 Peter.

 --
 
 Pavel Minchenkov

   


-- 
http://karussell.wordpress.com/



Re: Replacing text fields with numeric fields for speed

2010-07-23 Thread Peter Karich
Gora,

just for my interests:
does apache bench sends different queries, or from the logs, or always
the same query?
If it would be always the same query the cache of solr will come and
make the response time super small.

I would like to find a tool or script where I can send my logfile to solr
and measure some things ... because at the moment we are using fastbench
and I would like to replace it ;-)

Regards,
Peter.

 On Fri, 23 Jul 2010 14:44:32 +0530
 Gora Mohanty g...@srijan.in wrote:
 [...]
   
   From some experiments, I see only a small difference between a
 text search on a field, and a numeric search on the corresponding
 numeric field.
 
 [...]

 Well, I take that back. Running more rigorous tests with Apache
 Bench shows a difference of slightly over a factor of 2 between the
 median search time on the numeric field, and on the text field. The
 search on the numeric field is, of course, faster. That much
 of a difference puzzles me. Would someone knowledgeable about
 Lucene indexes care to comment?

 Regards,
 Gora
   


Re: Solr 3.1 dev

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 6:09 AM, Eric Grobler impalah...@googlemail.com wrote:
 I have a few questions :-)

 a) Will the next release of solr be 3.0 (instead of 1.5)?

The next release will be 3.1 (matching the next lucene version off of
the 3x branch).
Trunk is 4.0-dev

 b) How stable/mature is the current 3x version?

For features that are not new, it should be very stable.

 c) Is LocalSolr implemented? where can I find a list of new features?

Solr spatial is partly implemented... currently in trunk.
http://wiki.apache.org/solr/SpatialSearch

 d) Is this the correct method to download the lasted stable version?
 svn co https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x

The last official Solr release was 1.4.1
Nightly builds aren't official apache releases... but plenty of people
do use them in production environments (after appropriate testing of
course).

-Yonik
http://www.lucidimagination.com


Re: Solr 3.1 dev

2010-07-23 Thread robert mena
Hi,

is there any wiki/url of the proposed changes or new features that we should
expect with this new release?

On Fri, Jul 23, 2010 at 9:20 AM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Fri, Jul 23, 2010 at 6:09 AM, Eric Grobler impalah...@googlemail.com
 wrote:
  I have a few questions :-)
 
  a) Will the next release of solr be 3.0 (instead of 1.5)?

 The next release will be 3.1 (matching the next lucene version off of
 the 3x branch).
 Trunk is 4.0-dev

  b) How stable/mature is the current 3x version?

 For features that are not new, it should be very stable.

  c) Is LocalSolr implemented? where can I find a list of new features?

 Solr spatial is partly implemented... currently in trunk.
 http://wiki.apache.org/solr/SpatialSearch

  d) Is this the correct method to download the lasted stable version?
  svn co https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x

 The last official Solr release was 1.4.1
 Nightly builds aren't official apache releases... but plenty of people
 do use them in production environments (after appropriate testing of
 course).

 -Yonik
 http://www.lucidimagination.com



Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Eric Grobler
Hi Erik,

Thanks for the fast update :-)
I will try it soon.

Regards
Eric

On Fri, Jul 23, 2010 at 2:37 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 I've update the SOLR-792 patch to apply to trunk (using the solr/ directory
 as the root still, not the higher-level trunk/).

 This one I think is an important one that I'd love to see eventually part
 of Solr built-in, but the TODO's in TreeFacetComponent ought to be taken
 care of first, to generalize this to N fields levels and maybe some other
 must/nice-to-haves.

Erik



 On Jul 23, 2010, at 3:45 AM, Eric Grobler wrote:

  Thanks I saw the article,

 As far as I can tell the trunk archives only go back to the middle of
 March
 and the 2 patches are from the beginning of the year.

 Thus:
 *These approaches can be tried out easily using a single set of sample
 data
 and the Solr example application (assumes current trunk codebase and
 latest
 patches posted to the respective issues). **

 **Is a bit of an over-statement!**
 *
 Regards
 Eric*
 *
 On Fri, Jul 23, 2010 at 6:22 AM, Jonathan Rochkind rochk...@jhu.edu
 wrote:

  Solr does not, yet, at least not simply, as far as I know, but there are
 ideas and some JIRA's with maybe some patches:

 http://wiki.apache.org/solr/HierarchicalFaceting


 
 From: rajini maski [rajinima...@gmail.com]
 Sent: Friday, July 23, 2010 12:34 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Tree Faceting in Solr 1.4

 I am also looking out for same feature in Solr and very keen to know
 whether
 it supports this feature of tree faceting... Or we are forced to index in
 tree faceting formatlike

 1/2/3/4
 1/2/3
 1/2
 1

 In-case of multilevel faceting it will give only 2 level tree facet is
 what
 i found..

 If i give query as : country India and state Karnataka and city
 bangalore...All what i want is a facet count  1) for condition above. 2)
 The
 number of states in that Country 3) the number of cities in that state
 ...

 Like = Country: India ,State:Karnataka , City: Bangalore 1

   State:Karnataka
Kerla
Tamilnadu
Andra Pradesh...and so on

   City:  Mysore
Hubli
Mangalore
Coorg and so on...


 If I am doing
 facet=on  facet.field={!ex=State}State  fq={!tag=State}State:Karnataka

 All it gives me is Facets on state excluding only that filter query.. But
 i
 was not able to do same on third level ..Like  facet.field= Give me the
 counts of  cities also in state Karantaka..
 Let me know solution for this...

 Regards,
 Rajani Maski





 On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler 
 impalah...@googlemail.com

 wrote:


  Thank you for the link.

 I was not aware of the multifaceting syntax - this will enable me to run

 1

 less query on the main page!

 However this is not a tree faceting feature.

 Thanks
 Eric




 On Thu, Jul 22, 2010 at 4:51 PM, SR r.steve@gmail.com wrote:

  Perhaps the following article can help:



 http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html


 -S


 On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote:

  Hi Solr Community

 If I have:
 COUNTRY CITY
 Germany Berlin
 Germany Hamburg
 Spain   Madrid

 Can I do faceting like:
 Germany
 Berlin
 Hamburg
 Spain
 Madrid

 I tried to apply SOLR-792 to the current trunk but it does not seem

 to

 be

 compatible.
 Maybe there is a similar feature existing in the latest builds?

 Thanks  Regards
 Eric









solrj occasional timeout on commit

2010-07-23 Thread Nagelberg, Kallin
Hey,

I recently moved a solr app from a testing environment into a production 
environment, and I'm seeing a brand new error which never occurred during 
testing. I'm seeing this in the solrJ-based app logs:


org.apache.solr.common.SolrException: com.caucho.vfs.SocketTimeoutException: 
client timeout

com.caucho.vfs.SocketTimeoutException: client timeout

request: http://somehost:8080/solr/live/update?wt=javabinversion=1

at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)

at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)

at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)




This occurs in a service that periodically adds new documents to solr. There 
are 4 boxes that could be doing updates in parallel. In testing there were 2.





We're running on a new Resin 4 based install on production, whereas we were 
using resin 3 in testing. Does anyone have any ideas. Help would be greatly 
appreciated!



Thanks,

-Kallin Nagelberg







Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Geert-Jan Brits
If I am doing
facet=on  facet.field={!ex=State}State  fq={!tag=State}State:Karnataka

All it gives me is Facets on state excluding only that filter query.. But i
was not able to do same on third level ..Like  facet.field= Give me the
counts of  cities also in state Karantaka..
Let me know solution for this...

This looks like regular faceting to me.

1. Showing citycounts given state
facet=onfq=State:Karnatakafacet.field=city

2. showing statecounts given country (similar to 1)
facet=onfq=Country:Indiafacet.field=state

3. showing city and state counts given country:
facet=onfq=Country:Indiafacet.field=statefacet.field=city

4. showing city counts given state + all other states not filtered by
current state (
http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters
)
facet=onfq={!tag=State}state:Karnatakafacet.field={!ex=State}statefacet.field=city

5. showing state + city counts given country + all other countries not
filtered by current country
(shttp://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filtersimilar
to 4)
facet=onfq={!tag=country}country:Indiafacet.field={!ex=country}countryfacet.field=cityfacet.field=state

etc.

This has nothing to do with Hierarchical faceting as described in SOLR-792
btw, although I understand the possible confusion as County  state  city
can obvisouly be seen as some sort of hierarchy.  The first part of your
question seemed to be more about Hierarchial faceting as per SOLR-792, but I
couldn't quite distill a question from that part.

Also, just a suggestion, consider using id's instead of names for filtering;
you will get burned sooner or later otherwise.

HTH,

Geert-Jan



2010/7/23 rajini maski rajinima...@gmail.com

 I am also looking out for same feature in Solr and very keen to know
 whether
 it supports this feature of tree faceting... Or we are forced to index in
 tree faceting formatlike

 1/2/3/4
 1/2/3
 1/2
 1

 In-case of multilevel faceting it will give only 2 level tree facet is what
 i found..

 If i give query as : country India and state Karnataka and city
 bangalore...All what i want is a facet count  1) for condition above. 2)
 The
 number of states in that Country 3) the number of cities in that state ...

 Like = Country: India ,State:Karnataka , City: Bangalore 1

 State:Karnataka
  Kerla
  Tamilnadu
  Andra Pradesh...and so on

 City:  Mysore
  Hubli
  Mangalore
  Coorg and so on...


 If I am doing
 facet=on  facet.field={!ex=State}State  fq={!tag=State}State:Karnataka

 All it gives me is Facets on state excluding only that filter query.. But i
 was not able to do same on third level ..Like  facet.field= Give me the
 counts of  cities also in state Karantaka..
 Let me know solution for this...

 Regards,
 Rajani Maski





 On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler impalah...@googlemail.com
 wrote:

  Thank you for the link.
 
  I was not aware of the multifaceting syntax - this will enable me to run
 1
  less query on the main page!
 
  However this is not a tree faceting feature.
 
  Thanks
  Eric
 
 
 
 
  On Thu, Jul 22, 2010 at 4:51 PM, SR r.steve@gmail.com wrote:
 
   Perhaps the following article can help:
  
 
 http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
  
   -S
  
  
   On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote:
  
Hi Solr Community
   
If I have:
COUNTRY CITY
Germany Berlin
Germany Hamburg
Spain   Madrid
   
Can I do faceting like:
Germany
 Berlin
 Hamburg
Spain
 Madrid
   
I tried to apply SOLR-792 to the current trunk but it does not seem
 to
  be
compatible.
Maybe there is a similar feature existing in the latest builds?
   
Thanks  Regards
Eric
  
  
 



Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Eric Grobler
Hi Erik,

I must be doing something wrong :-(
I took:
svn co https://svn.apache.org/repos/asf/lucene/dev/trunk  mytest
  then i copied SOLR-792.path to folder /mytest/solr
then i ran:
  patch -p1  SOLR-792.patch

but I get can't find file to patch at input line 5
Is this the correct trunk and patch command?

However if I just manually
  - copy TreeFacetComponent.java to folder
solr/src/java/org/apache/solr/handler/component
  - add SimpleOrderedMapSimpleOrderedMap _treeFacets; to
ResponseBuilder.java
  - and make the changes to solrconfig.xml
I am able to compile and run your test :-)

Regards
Eric


On Fri, Jul 23, 2010 at 2:37 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 I've update the SOLR-792 patch to apply to trunk (using the solr/ directory
 as the root still, not the higher-level trunk/).

 This one I think is an important one that I'd love to see eventually part
 of Solr built-in, but the TODO's in TreeFacetComponent ought to be taken
 care of first, to generalize this to N fields levels and maybe some other
 must/nice-to-haves.

Erik



 On Jul 23, 2010, at 3:45 AM, Eric Grobler wrote:

  Thanks I saw the article,

 As far as I can tell the trunk archives only go back to the middle of
 March
 and the 2 patches are from the beginning of the year.

 Thus:
 *These approaches can be tried out easily using a single set of sample
 data
 and the Solr example application (assumes current trunk codebase and
 latest
 patches posted to the respective issues). **

 **Is a bit of an over-statement!**
 *
 Regards
 Eric*
 *
 On Fri, Jul 23, 2010 at 6:22 AM, Jonathan Rochkind rochk...@jhu.edu
 wrote:

  Solr does not, yet, at least not simply, as far as I know, but there are
 ideas and some JIRA's with maybe some patches:

 http://wiki.apache.org/solr/HierarchicalFaceting


 
 From: rajini maski [rajinima...@gmail.com]
 Sent: Friday, July 23, 2010 12:34 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Tree Faceting in Solr 1.4

 I am also looking out for same feature in Solr and very keen to know
 whether
 it supports this feature of tree faceting... Or we are forced to index in
 tree faceting formatlike

 1/2/3/4
 1/2/3
 1/2
 1

 In-case of multilevel faceting it will give only 2 level tree facet is
 what
 i found..

 If i give query as : country India and state Karnataka and city
 bangalore...All what i want is a facet count  1) for condition above. 2)
 The
 number of states in that Country 3) the number of cities in that state
 ...

 Like = Country: India ,State:Karnataka , City: Bangalore 1

   State:Karnataka
Kerla
Tamilnadu
Andra Pradesh...and so on

   City:  Mysore
Hubli
Mangalore
Coorg and so on...


 If I am doing
 facet=on  facet.field={!ex=State}State  fq={!tag=State}State:Karnataka

 All it gives me is Facets on state excluding only that filter query.. But
 i
 was not able to do same on third level ..Like  facet.field= Give me the
 counts of  cities also in state Karantaka..
 Let me know solution for this...

 Regards,
 Rajani Maski





 On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler 
 impalah...@googlemail.com

 wrote:


  Thank you for the link.

 I was not aware of the multifaceting syntax - this will enable me to run

 1

 less query on the main page!

 However this is not a tree faceting feature.

 Thanks
 Eric




 On Thu, Jul 22, 2010 at 4:51 PM, SR r.steve@gmail.com wrote:

  Perhaps the following article can help:



 http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html


 -S


 On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote:

  Hi Solr Community

 If I have:
 COUNTRY CITY
 Germany Berlin
 Germany Hamburg
 Spain   Madrid

 Can I do faceting like:
 Germany
 Berlin
 Hamburg
 Spain
 Madrid

 I tried to apply SOLR-792 to the current trunk but it does not seem

 to

 be

 compatible.
 Maybe there is a similar feature existing in the latest builds?

 Thanks  Regards
 Eric









Re: Duplicates

2010-07-23 Thread Pavel Minchenkov
I mean two usecases.
I can't index folders only because I have another queries on files. Or I
have to do another index that contains only folders, but then I have to take
care of synchronizing folders in two indexes.
Does range, spatial, etc quiries are supported on multivalued fields?

2010/7/23 Peter Karich peat...@yahoo.de

 Pavel,

 hopefully I understand now your usecase :-) but one question:

  I need to select always *one* file per folder or
  select *only* folders than contains matched files (without files).

 What do you mean here with 'or'? Do you have 2 usecases or would one of
 them be sufficient?
 Because the second usecase could be solved without the patch: you could
 index folders only,
 then all prop_N will be multivalued field. and you don't have the problem
 of duplicate folders.

 (If you don't mind uglyness both usecases could even handled: After you got
 the folders
  grabbing the files which matched could be done in postprocessing)

 But I fear the cleanest solution is to use the patch. Hopefully it can be
 applied without hassles
 against 1.4 or the trunk. If not, please ask on the patch-site for
 assistance.

 Regards,
 Peter.


  Thanks, Peter!
 
  I'll try collapsing today.
 
  Example (sorry if table unformated):
 
  id |  type  |   prop_1  |  |  prop_N |  folderId
  
   0 | folder |   |  | |
   1 | file   |  val1 |  |  valN1  |   0
   2 | file   |  val3 |  |  valN2  |   0
   3 | file   |  val1 |  |  valN3  |   0
   4 | folder |   |  | |
   5 | folder |   |  | |
   6 | file   |  val3 |  |  valN7  |   6
   7 | file   |  val4 |  |  valN8  |   6
   8 | folder |   |  | |
   9 | file   |  val2 |  |  valN3  |   8
   10| file   |  val1 |  |  valN2  |   8
   11| file   |  val2 |  |  valN5  |   8
   12| folder |   |  | |
 
 
  I need to select always *one* file per folder or
  select *only* folders than contains matched files (without files).
 
  Query:
  prop_1:val1 OR prop_2:val2
 
  I need results (document ids):
  1, 9
  or
  0, 8
 
  2010/7/23 Peter Karich peat...@yahoo.de
 
 
  Hi Pavel!
 
  The patch can be applied to 1.4.
  The performance is ok, but for some situations it could be worse than
  without the patch.
  For us it works good, but others reported some exceptions
  (see the patch site: https://issues.apache.org/jira/browse/SOLR-236)
 
 
  I need only to delete duplicates
 
  Could you give us an example what you exactly need?
  (Maybe you could index each master document of the 'unique' documents
  with an extra field and query for that field?)
 
  Regards,
  Peter.
 
  --
 
  Pavel Minchenkov
 
 


 --
 http://karussell.wordpress.com/




-- 
Pavel Minchenkov


Re: Solr on iPad?

2010-07-23 Thread Stephan Schwab

Thanks Mark!

I'm subscribing to the cocoa-dev list.

On Jul 23, 2010, at 10:17 AM, Mark Allan [via Lucene] wrote:

 Hi Stephan, 
 
 On the iPad, as with the iPhone, I'm afraid you're stuck with using   
 SQLite if you want any form of database in your app. 
 
 I suppose if you wanted to get really ambitious and had a lot of time   
 on your hands you could use Xcode to try and compile one of the open- 
 source C-based DBs/Indexers, but as with most things in OS X and iOS   
 development, if you're bending over yourself trying to implement   
 something, you're probably doing it wrongly!  Also, I wouldn't put it   
 past the AppStore guardians to reject your app purely on the basis of   
 having used something other than SQLite! 
 
 Apple's cocoa-dev mailing list is very active if you have problems,   
 but do your homework before asking questions or you'll get short shrift. 
 http://lists.apple.com/cocoa-dev
 
 Mark 
 
 On 22 Jul 2010, at 6:12 pm, Stephan Schwab wrote: 
 
  Dear Solr community, 
  
  does anyone know whether it may be possible or has already been done   
  to 
  bring Solr to the Apple iPad so that applications may use a local   
  search 
  engine? 
  
  Greetings, 
  Stephan
 
 
 -- 
 The University of Edinburgh is a charitable body, registered in 
 Scotland, with registration number SC005336. 
 
 
 
 View message @ 
 http://lucene.472066.n3.nabble.com/Solr-on-iPad-tp987655p989269.html 
 To unsubscribe from Solr on iPad?, click here.
 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-on-iPad-tp987655p990034.html
Sent from the Solr - User mailing list archive at Nabble.com.


Autocommit not happening

2010-07-23 Thread John DeRosa
Hi! I'm a Solr newbie, and I don't understand why autocommits aren't happening 
in my Solr installation.

My one server running Solr:

- Ubuntu 10.04 (Lucid Lynx), with all the latest updates.
- Solr 1.4.0 running on Tomcat6
- Installation was done via apt-get install solr-common solr-tomcat 
tomcat6-admin

My solrconfig.xml has:
autoCommit 
  maxDocs1/maxDocs
  maxTime1/maxTime 
/autoCommit


My code can add documents just fine. But after 12 hours, autocommit has never 
happened! Here's what I see on my Solr Admin pages:

CORE:   
name:   core  
class:   
version:1.0  
description:SolrCore  
stats:  coreName : 
startTime : Thu Jul 22 21:38:30 UTC 2010 
refCount : 2 
aliases : [] 
name:   searcher  
class:  org.apache.solr.search.SolrIndexSearcher  
version:1.0  
description:index searcher  
stats:  searcherName : searc...@10ed7f5c main 
caching : true 
numDocs : 0 
maxDoc : 0 
reader : 
SolrIndexReader{this=509f662e,r=readonlydirectoryrea...@509f662e,refCnt=1,segments=0}
 
readerDir : org.apache.lucene.store.NIOFSDirectory@/var/lib/solr/data/index 
indexVersion : 1279834591965 
openedAt : Thu Jul 22 23:58:28 UTC 2010 
registeredAt : Thu Jul 22 23:58:28 UTC 2010 
warmupTime : 3 
name:   searc...@10ed7f5c main  
class:  org.apache.solr.search.SolrIndexSearcher  
version:1.0  
description:index searcher  
stats:  searcherName : searc...@10ed7f5c main 
caching : true 
numDocs : 0 
maxDoc : 0 
reader : 
SolrIndexReader{this=509f662e,r=readonlydirectoryrea...@509f662e,refCnt=1,segments=0}
 
readerDir : org.apache.lucene.store.NIOFSDirectory@/var/lib/solr/data/index 
indexVersion : 1279834591965 
openedAt : Thu Jul 22 23:58:28 UTC 2010 
registeredAt : Thu Jul 22 23:58:28 UTC 2010 
warmupTime : 3 


UPDATE HANDLERS:

name:   updateHandler  
class:  org.apache.solr.update.DirectUpdateHandler2  
version:1.0  
description:Update handler that efficiently directly updates the on-disk 
main lucene index  
stats:  commits : 2 
autocommits : 0 
optimizes : 0 
rollbacks : 0 
expungeDeletes : 0 
docsPending : 496590 
adds : 496590 
deletesById : 0 
deletesByQuery : 0 
errors : 0 
cumulative_adds : 501989 
cumulative_deletesById : 0 
cumulative_deletesByQuery : 2 
cumulative_errors : 0 


There's nearly 500K pending commits, accumulated over the past 12 hours. I 
think we're past the specified autocommit limits. :-)

What should I look at to figure out what's preventing autocommits?

Thank you all in advance!

John



RE: filter query on timestamp slowing query???

2010-07-23 Thread Jonathan Rochkind
 and a typical query would be:
 fl=id,type,timestamp,scorestart=0q=Coca+Cola+pepsi+-dr+pepperfq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y)
 rows=2000

My understanding is that this is essentially what the solr 1.4 trie date fields 
are made for, I'd use them, should speed things up.  Not sure where the best 
documentation for them is, but see:

http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/




Re: filter query on timestamp slowing query???

2010-07-23 Thread oferiko

I'm in the process of indexing my demi data to test that, I'll have more
valid data on whether or not it made the differeve In a few days
Thanks


ב-23/07/2010, בשעה 19:42, Jonathan Rochkind [via Lucene] 
ml-node+990234-2085494904-316...@n3.nabble.com כתב/ה:

 and a typical query would be:

fl=id,type,timestamp,scorestart=0q=Coca+Cola+pepsi+-dr+pepperfq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y)

 rows=2000

My understanding is that this is essentially what the solr 1.4 trie date
fields are made for, I'd use them, should speed things up.  Not sure where
the best documentation for them is, but see:

http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/




--
 View message @
http://lucene.472066.n3.nabble.com/filter-query-on-timestamp-slowing-query-tp977280p990234.html
To unsubscribe from Re: filter query on timestamp slowing query???, click
here (link removed) =.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/filter-query-on-timestamp-slowing-query-tp977280p990337.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Autocommit not happening

2010-07-23 Thread John DeRosa
On Jul 23, 2010, at 9:37 AM, John DeRosa wrote:

 Hi! I'm a Solr newbie, and I don't understand why autocommits aren't 
 happening in my Solr installation.
 
 My one server running Solr:
 
 - Ubuntu 10.04 (Lucid Lynx), with all the latest updates.
 - Solr 1.4.0 running on Tomcat6
 - Installation was done via apt-get install solr-common solr-tomcat 
 tomcat6-admin
 
 My solrconfig.xml has:
autoCommit 
  maxDocs1/maxDocs
  maxTime1/maxTime 
/autoCommit
 

[snip]

The plot thickens. var/log/tomcat6/catalina.out contains:

Jul 22, 2010 9:36:32 PM 
org.apache.solr.update.DirectUpdateHandler2$CommitTracker init
INFO: AutoCommit: disabled

What's stepping in and disabling autocommit?

John



Allow custom overrides

2010-07-23 Thread Charlie Jackson
I need to implement a search engine that will allow users to override
pieces of data and then search against or view that data. For example, a
doc that has the following values:

 

DocId   FulltextMeta1  Meta2 Meta3

1   The quick brown fox foofoo   foo

 

Now say a user overrides Meta2 :

 

DocId   FulltextMeta1  Meta2 Meta3

1   The quick brown fox foofoo   foo

   bar

 

For that user, if they search for Meta2:bar, I need to hit, but no other
user should hit on it. Likewise, if that user searches for Meta2:foo, it
should not hit. Also, any searches against that document for that user
should return the value 'bar' for Meta2, but should return 'foo' for
other users.  

 

I'm not sure the best way to implement this. Maybe I could do this with
field collapsing somehow? Or with payloads? Custom analyzer? Any help
would be appreciated.

 

 

- Charlie

 



RE: filter query on timestamp slowing query???

2010-07-23 Thread Jonathan Rochkind

 and a typical query would be:

fl=id,type,timestamp,scorestart=0q=Coca+Cola+pepsi+-dr+pepperfq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y)
 rows=2000

On top of using trie dates, you might consider separating the timestamp portion 
and the type portion of the fq into seperate fq parameters -- that will allow 
them to to be stored in the filter cache seperately. So for instance, if you 
include type:x OR type:y in queries a lot, but with different date ranges, 
then when you make a new query, the set for type:x OR type:y can be pulled 
from the filter cache and intersected with the other result set, that portion 
won't have to be run again. That's probably not where your slowness is coming 
from, but shouldn't hurt. 

Multiple fq's are essentially AND'd together, so whenever you have an 'fq' 
that's seperate clauses AND'd together, you can always seperate them into 
multiple fq's, wont' effect the result set, will effect the caching 
possibilities. 

RE: Novice seeking help to change filters to search without diacritics

2010-07-23 Thread Steven A Rowe
Hi HSingh,

Maybe the mapping file I attached to 
https://issues.apache.org/jira/browse/SOLR-2013 will help?

Steve

 -Original Message-
 From: HSingh [mailto:hsin...@gmail.com]
 Sent: Thursday, July 22, 2010 11:30 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Novice seeking help to change filters to search without
 diacritics
 
 
 Hoss, thank you for your helpful response!
 
 : i think what's confusing you is that you are using the
 : MappingCharFilterFactory with that file in your text field type to
 : convert any ISOLatin1Accent characters to their base characters
 
 The problem is that a large range of characters are not getting converting
 to their base characters.  The ASCIIFoldingFilterFactory handles this
 conversion for the entire Latin character set, including the extended sets
 without having to specify individual characters and their equivalent base
 characters.
 
 Is there way for me to switch to ASCIIFoldingFilterFactory?  If so, what
 changes do I need to make to these files?  I would appreciate your help!
 --
 View this message in context: http://lucene.472066.n3.nabble.com/Novice-
 seeking-help-to-change-filters-to-search-without-diacritics-
 tp971263p988890.html
 Sent from the Solr - User mailing list archive at Nabble.com.


RE: Spellcheck help

2010-07-23 Thread Dyer, James
In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84):

final static String PATTERN = (?:(?!( + NMTOKEN + :|\\d+)))[\\p{L}_\\-0-9]+;

and remove the |\\d+ to make it:

final static String PATTERN = (?:(?! + NMTOKEN + :))[\\p{L}_\\-0-9]+;

My testing shows this solves your problem.  The caution is to test it against 
all your use cases because obviously someone thought we should ignore leading 
digits from keywords.  Surely there's a reason why although I can't think of it.

James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311

-Original Message-
From: dekay...@hotmail.com [mailto:dekay...@hotmail.com] 
Sent: Saturday, July 17, 2010 12:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck help

Can anybody help me with this? :(

-Original Message- 
From: Marc Ghorayeb
Sent: Thursday, July 08, 2010 9:46 AM
To: solr-user@lucene.apache.org
Subject: Spellcheck help


Hello,I've been trying to get rid of a bug when using the spellcheck but so 
far with no success :(When searching for a word that starts with a number, 
for example 3dsmax, i get the results that i want, BUT the spellcheck says 
it is not correctly spelled AND the collation gives me 33dsmax. Further 
investigation shows that the spellcheck is actually only checking dsmax 
which it considers does not exist and gives me 3dsmax for better results, 
but since i have spellcheck.collate = true, the collation that i show is 
33dsmax with the first 3 being the one discarded by the spellchecker... 
Otherwise, the spellcheck works correctly for normal words... any ideas? 
:(My spellcheck field is fairly classic, whitespace tokenizer, with 
lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
_
Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
http://www.messengersurvotremobile.com/?d=iPhone 



Re: Replacing text fields with numeric fields for speed

2010-07-23 Thread Gora Mohanty
On Fri, 23 Jul 2010 14:33:54 +0200
Peter Karich peat...@yahoo.de wrote:

 Gora,
 
 just for my interests:
 does apache bench sends different queries, or from the logs, or
 always the same query?
 If it would be always the same query the cache of solr will come
 and make the response time super small.

Yes, the way that things are set up currently the query is always
the same. My reasoning was that the effect of the Solr cache should
be the same for both numeric, and text fields. I am going to be
trying some more rigorous tests, such as turning off Solr caching,
and pre-warming the query before running the tests.

 I would like to find a tool or script where I can send my logfile
 to solr and measure some things ... because at the moment we are
 using fastbench and I would like to replace it ;-)

Not sure what fastbench is, but using Solr logs as a tool to
measure search times for typical searches is an interesting idea.
Hmm, we will also need to do that, so maybe we can compare notes on
this.

Regards,
Gora


help with a schema design problem

2010-07-23 Thread Pramod Goyal
Hi,

Lets say i have table with 3 columns document id Party Value and Party Type.
In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod
Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type:
Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier.
Now in this table if i use SQL its easy for me find all document with Party
Value as Pramod and Party Type as Client.

I need to design solr schema so that i can do the same in Solr. If i create
2 fields in solr schema Party value and Party type both of them multi valued
and try to query +Pramod +Supplier then solr will return me the first
document, even though in the first document Pramod is a client and not a
supplier
Thanks,
Pramod Goyal


RE: help with a schema design problem

2010-07-23 Thread Nagelberg, Kallin
I think you just want something like:

p_value:Pramod AND p_type:Supplier

no?
-Kallin Nagelberg

-Original Message-
From: Pramod Goyal [mailto:pramod.go...@gmail.com] 
Sent: Friday, July 23, 2010 2:17 PM
To: solr-user@lucene.apache.org
Subject: help with a schema design problem

Hi,

Lets say i have table with 3 columns document id Party Value and Party Type.
In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod
Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type:
Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier.
Now in this table if i use SQL its easy for me find all document with Party
Value as Pramod and Party Type as Client.

I need to design solr schema so that i can do the same in Solr. If i create
2 fields in solr schema Party value and Party type both of them multi valued
and try to query +Pramod +Supplier then solr will return me the first
document, even though in the first document Pramod is a client and not a
supplier
Thanks,
Pramod Goyal


Sort by index order desc?

2010-07-23 Thread Ryan McKinley
Any pointers on how to sort by reverse index order?
http://search.lucidimagination.com/search/document/4a59ded3966271ca/sort_by_index_order_desc

it seems like it should be easy to do with the function query stuff,
but i'm not sure what to sort by (unless I add a new field for indexed
time)


Any pointers?

Thanks
Ryan


Re: a bug of solr distributed search

2010-07-23 Thread MitchK

Yonik,

why do we do not send the output of TermsComponent of every node in the
cluster to a Hadoop instance?
Since TermsComponent does the map-part of the map-reduce concept, Hadoop
only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
After reducing, every node in the cluster gets the current values to compute
the idf.
We can store this information in a HashMap-based SolrCache (or something
like that) to provide constant-time access. To keep the values up to date,
we can repeat that after every x minutes.

If we got that, it does not care whereas we use doc_X from shard_A or
shard_B, since they will all have got the same scores. 

Even if we got large indices with 10 million or more unique terms, this will
only need some megabyte network-traffic.

Kind regards,
- Mitch


Yonik Seeley-2-2 wrote:
 
 As the comments suggest, it's not a bug, but just the best we can do
 for now since our priority queues don't support removal of arbitrary
 elements.  I guess we could rebuild the current priority queue if we
 detect a duplicate, but that will have an obvious performance impact.
 Any other suggestions?
 
 -Yonik
 http://www.lucidimagination.com
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990506.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 2:23 PM, MitchK mitc...@web.de wrote:
 why do we do not send the output of TermsComponent of every node in the
 cluster to a Hadoop instance?
 Since TermsComponent does the map-part of the map-reduce concept, Hadoop
 only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
 After reducing, every node in the cluster gets the current values to compute
 the idf.
 We can store this information in a HashMap-based SolrCache (or something
 like that) to provide constant-time access. To keep the values up to date,
 we can repeat that after every x minutes.

There's already a patch in JIRA that does distributed IDF.
Hadoop wouldn't be the right tool for that anyway... it's for batch
oriented systems, not low-latency queries.

 If we got that, it does not care whereas we use doc_X from shard_A or
 shard_B, since they will all have got the same scores.

That only works if the docs are exactly the same - they may not be.

-Yonik
http://www.lucidimagination.com


Re: a bug of solr distributed search

2010-07-23 Thread MitchK

... Additionally to my previous posting:
To keep this sync we could do two things:
Waiting for every server to make sure that everyone uses the same values to
compute the score and than apply them.
Or: Let's say that we collect the new values every 15 minutes. To merge and
send them over the network, we declare that this will need 3 additionally
minutes (We want to keep the network traffic for such actions very low, so
we do not send everything instantly).
Okay, and now we say 2 additionally minutes, if 3 were not enough or
something needs a little bit more time than we tought.. After those 2
minutes, every node has to apply the new values.
Pro: If one node gets broken, we do not delay the Application of the new
values.
Con: We need two HashMaps and both will have roughly the same sice. That
means we will waste some RAM for this operation, if we do not write the
values to disk (Which I do not suggest).

Thoughts?

- Mitch

MitchK wrote:
 
 Yonik,
 
 why do we do not send the output of TermsComponent of every node in the
 cluster to a Hadoop instance?
 Since TermsComponent does the map-part of the map-reduce concept, Hadoop
 only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
 After reducing, every node in the cluster gets the current values to
 compute the idf.
 We can store this information in a HashMap-based SolrCache (or something
 like that) to provide constant-time access. To keep the values up to date,
 we can repeat that after every x minutes.
 
 If we got that, it does not care whereas we use doc_X from shard_A or
 shard_B, since they will all have got the same scores. 
 
 Even if we got large indices with 10 million or more unique terms, this
 will only need some megabyte network-traffic.
 
 Kind regards,
 - Mitch
 
 
 Yonik Seeley-2-2 wrote:
 
 As the comments suggest, it's not a bug, but just the best we can do
 for now since our priority queues don't support removal of arbitrary
 elements.  I guess we could rebuild the current priority queue if we
 detect a duplicate, but that will have an obvious performance impact.
 Any other suggestions?
 
 -Yonik
 http://www.lucidimagination.com
 
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990551.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: a bug of solr distributed search

2010-07-23 Thread MitchK


That only works if the docs are exactly the same - they may not be. 
Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
don't they?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990563.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: help with a schema design problem

2010-07-23 Thread Geert-Jan Brits
With the usecase you specified it should work to just index each Row as
you described in your initial post to be a seperate document.
This way p_value and p_type all get singlevalued and you get a correct
combination of p_value and p_type.

However, this may not go so well with other use-cases you have in mind,
e.g.: requiring that no multiple results are returned with the same document
id.



2010/7/23 Pramod Goyal pramod.go...@gmail.com

 I want to do that. But if i understand correctly in solr it would store the
 field like this:

 p_value: Pramod  Raj
 p_type:  Client Supplier

 When i search
 p_value:Pramod AND p_type:Supplier

 it would give me result as document 1. Which is incorrect, since in
 document
 1 Pramod is a Client and not a Supplier.




 On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin 
 knagelb...@globeandmail.com wrote:

  I think you just want something like:
 
  p_value:Pramod AND p_type:Supplier
 
  no?
  -Kallin Nagelberg
 
  -Original Message-
  From: Pramod Goyal [mailto:pramod.go...@gmail.com]
  Sent: Friday, July 23, 2010 2:17 PM
  To: solr-user@lucene.apache.org
  Subject: help with a schema design problem
 
  Hi,
 
  Lets say i have table with 3 columns document id Party Value and Party
  Type.
  In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod
  Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type:
  Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier.
  Now in this table if i use SQL its easy for me find all document with
 Party
  Value as Pramod and Party Type as Client.
 
  I need to design solr schema so that i can do the same in Solr. If i
 create
  2 fields in solr schema Party value and Party type both of them multi
  valued
  and try to query +Pramod +Supplier then solr will return me the first
  document, even though in the first document Pramod is a client and not a
  supplier
  Thanks,
  Pramod Goyal
 



Performance issues when querying on large documents

2010-07-23 Thread ahammad

Hello,

I have an index with lots of different types of documents. One of those
types basically contains extracts of PDF docs. Some of those PDFs can have
1000+ pages, so there would be a lot of stuff to search through.

I am experiencing really terrible performance when querying. My whole index
has about 270k documents, but less than 1000 of those are the PDF extracts.
The slow querying occurs when I search only on those PDF extracts (by
specifying filters), and return 100 results. The 100 results definitely adds
to the issue, but even cutting that down can be slow.

Is there a way to improve querying with such large results? To give an idea,
querying for a single word can take a little over a minute, which isn't
really viable for an application that revolves around searching. For now, I
have limited the results to 20, which makes the query execute in roughly
10-15 seconds. However, I would like to have the option of returning 100
results.

Thanks a lot.

 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-issues-when-querying-on-large-documents-tp990590p990590.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: help with a schema design problem

2010-07-23 Thread Pramod Goyal
In my case the document id is the unique key( each row is not a unique
document ) . So a single document has multiple Party Value and Party Type.
Hence i need to define both Party value and Party type as mutli-valued. Is
there any way in solr to say p_value[someIndex]=pramod And
p_type[someIndex]=client.
Is there any other way i can design my schema ? I have some solutions
but none seems to be a good solution. One way would be to define a single
field in the schema as p_value_type = client pramod i.e. combine the value
from both the field and store it in a single field.


On Sat, Jul 24, 2010 at 12:18 AM, Geert-Jan Brits gbr...@gmail.com wrote:

 With the usecase you specified it should work to just index each Row as
 you described in your initial post to be a seperate document.
 This way p_value and p_type all get singlevalued and you get a correct
 combination of p_value and p_type.

 However, this may not go so well with other use-cases you have in mind,
 e.g.: requiring that no multiple results are returned with the same
 document
 id.



 2010/7/23 Pramod Goyal pramod.go...@gmail.com

  I want to do that. But if i understand correctly in solr it would store
 the
  field like this:
 
  p_value: Pramod  Raj
  p_type:  Client Supplier
 
  When i search
  p_value:Pramod AND p_type:Supplier
 
  it would give me result as document 1. Which is incorrect, since in
  document
  1 Pramod is a Client and not a Supplier.
 
 
 
 
  On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin 
  knagelb...@globeandmail.com wrote:
 
   I think you just want something like:
  
   p_value:Pramod AND p_type:Supplier
  
   no?
   -Kallin Nagelberg
  
   -Original Message-
   From: Pramod Goyal [mailto:pramod.go...@gmail.com]
   Sent: Friday, July 23, 2010 2:17 PM
   To: solr-user@lucene.apache.org
   Subject: help with a schema design problem
  
   Hi,
  
   Lets say i have table with 3 columns document id Party Value and Party
   Type.
   In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod
   Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party
 Type:
   Supplier. 3rd row Document id:2 Party Value: Pramod Party Type:
 Supplier.
   Now in this table if i use SQL its easy for me find all document with
  Party
   Value as Pramod and Party Type as Client.
  
   I need to design solr schema so that i can do the same in Solr. If i
  create
   2 fields in solr schema Party value and Party type both of them multi
   valued
   and try to query +Pramod +Supplier then solr will return me the first
   document, even though in the first document Pramod is a client and not
 a
   supplier
   Thanks,
   Pramod Goyal
  
 



Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 2:40 PM, MitchK mitc...@web.de wrote:
 That only works if the docs are exactly the same - they may not be.
 Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
 don't they?

Documents aren't supposed to be duplicated across shards... so the
presence of multiple docs with the same id is a bug anyway.  We've
chosen to try and handle it gracefully rather than fail hard.

Some people have treated this as a feature - and that's OK as long as
expectations are set appropriately.

-Yonik
http://www.lucidimagination.com


Re: Sort by index order desc?

2010-07-23 Thread Ryan McKinley
Looks like you can sort by _docid_ to get things in index order or
reverse index order.

?sort=_docid_ asc

thank you solr!


On Fri, Jul 23, 2010 at 2:23 PM, Ryan McKinley ryan...@gmail.com wrote:
 Any pointers on how to sort by reverse index order?
 http://search.lucidimagination.com/search/document/4a59ded3966271ca/sort_by_index_order_desc

 it seems like it should be easy to do with the function query stuff,
 but i'm not sure what to sort by (unless I add a new field for indexed
 time)


 Any pointers?

 Thanks
 Ryan



Scoring Search for autocomplete

2010-07-23 Thread Frank A
Hi, I have an autocomplete that is currently working with an
NGramTokenizer so if I search for Yo both New York and Toyota
are valid results.  However I'm trying to figure out how to best
implement the search so that from a score perspective if the string
matches the beginning of an entire field it ranks first, followed by
the beginning of a term and then in the middle of a term.  For example
if I was searching with vi I would want Virginia ahead of West
Virginia ahead of Five.

I think I can do this with three seperate fields, one using a white
space tokenizer and a ngram filter, another using the edge-ngram +
whitespace and another using keyword+edge-ngram, then doing an or on
the 3 fields, so that Virginia would match all 3 and get a higher
score... but this doesn't feel right to me, so I wanted to check for
better options.

Thanks.


RE: filter query on timestamp slowing query???

2010-07-23 Thread Chris Hostetter
: On top of using trie dates, you might consider separating the timestamp 
: portion and the type portion of the fq into seperate fq parameters -- 
: that will allow them to to be stored in the filter cache seperately. So 
: for instance, if you include type:x OR type:y in queries a lot, but 
: with different date ranges, then when you make a new query, the set for 
: type:x OR type:y can be pulled from the filter cache and intersected 

definitely ... that's the one big thing that jumped out at me once you 
showed us *how* you were constructing these queries.  



-Hoss



Re: help with a schema design problem

2010-07-23 Thread Geert-Jan Brits
 Is there any way in solr to say p_value[someIndex]=pramod
And p_type[someIndex]=client.
No, I'm 99% sure there is not.

 One way would be to define a single field in the schema as p_value_type =
client pramod i.e. combine the value from both the field and store it in a
single field.
yep, for the use-case you mentioned that would definitely work. Multivalued
of course, so it can contain Supplier Raj as well.


2010/7/23 Pramod Goyal pramod.go...@gmail.com

In my case the document id is the unique key( each row is not a unique
 document ) . So a single document has multiple Party Value and Party Type.
 Hence i need to define both Party value and Party type as mutli-valued. Is
 there any way in solr to say p_value[someIndex]=pramod And
 p_type[someIndex]=client.
Is there any other way i can design my schema ? I have some solutions
 but none seems to be a good solution. One way would be to define a single
 field in the schema as p_value_type = client pramod i.e. combine the
 value
 from both the field and store it in a single field.


 On Sat, Jul 24, 2010 at 12:18 AM, Geert-Jan Brits gbr...@gmail.com
 wrote:

  With the usecase you specified it should work to just index each Row as
  you described in your initial post to be a seperate document.
  This way p_value and p_type all get singlevalued and you get a correct
  combination of p_value and p_type.
 
  However, this may not go so well with other use-cases you have in mind,
  e.g.: requiring that no multiple results are returned with the same
  document
  id.
 
 
 
  2010/7/23 Pramod Goyal pramod.go...@gmail.com
 
   I want to do that. But if i understand correctly in solr it would store
  the
   field like this:
  
   p_value: Pramod  Raj
   p_type:  Client Supplier
  
   When i search
   p_value:Pramod AND p_type:Supplier
  
   it would give me result as document 1. Which is incorrect, since in
   document
   1 Pramod is a Client and not a Supplier.
  
  
  
  
   On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin 
   knagelb...@globeandmail.com wrote:
  
I think you just want something like:
   
p_value:Pramod AND p_type:Supplier
   
no?
-Kallin Nagelberg
   
-Original Message-
From: Pramod Goyal [mailto:pramod.go...@gmail.com]
Sent: Friday, July 23, 2010 2:17 PM
To: solr-user@lucene.apache.org
Subject: help with a schema design problem
   
Hi,
   
Lets say i have table with 3 columns document id Party Value and
 Party
Type.
In this table i have 3 rows. 1st row Document id: 1 Party Value:
 Pramod
Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party
  Type:
Supplier. 3rd row Document id:2 Party Value: Pramod Party Type:
  Supplier.
Now in this table if i use SQL its easy for me find all document with
   Party
Value as Pramod and Party Type as Client.
   
I need to design solr schema so that i can do the same in Solr. If i
   create
2 fields in solr schema Party value and Party type both of them multi
valued
and try to query +Pramod +Supplier then solr will return me the first
document, even though in the first document Pramod is a client and
 not
  a
supplier
Thanks,
Pramod Goyal
   
  
 



RE: help with a schema design problem

2010-07-23 Thread Nagelberg, Kallin
   When i search
   p_value:Pramod AND p_type:Supplier
  
   it would give me result as document 1. Which is incorrect, since in
   document
   1 Pramod is a Client and not a Supplier.

Would it? I would expect it to give you nothing.

-Kal



-Original Message-
From: Geert-Jan Brits [mailto:gbr...@gmail.com] 
Sent: Friday, July 23, 2010 5:05 PM
To: solr-user@lucene.apache.org
Subject: Re: help with a schema design problem

 Is there any way in solr to say p_value[someIndex]=pramod
And p_type[someIndex]=client.
No, I'm 99% sure there is not.

 One way would be to define a single field in the schema as p_value_type =
client pramod i.e. combine the value from both the field and store it in a
single field.
yep, for the use-case you mentioned that would definitely work. Multivalued
of course, so it can contain Supplier Raj as well.


2010/7/23 Pramod Goyal pramod.go...@gmail.com

In my case the document id is the unique key( each row is not a unique
 document ) . So a single document has multiple Party Value and Party Type.
 Hence i need to define both Party value and Party type as mutli-valued. Is
 there any way in solr to say p_value[someIndex]=pramod And
 p_type[someIndex]=client.
Is there any other way i can design my schema ? I have some solutions
 but none seems to be a good solution. One way would be to define a single
 field in the schema as p_value_type = client pramod i.e. combine the
 value
 from both the field and store it in a single field.


 On Sat, Jul 24, 2010 at 12:18 AM, Geert-Jan Brits gbr...@gmail.com
 wrote:

  With the usecase you specified it should work to just index each Row as
  you described in your initial post to be a seperate document.
  This way p_value and p_type all get singlevalued and you get a correct
  combination of p_value and p_type.
 
  However, this may not go so well with other use-cases you have in mind,
  e.g.: requiring that no multiple results are returned with the same
  document
  id.
 
 
 
  2010/7/23 Pramod Goyal pramod.go...@gmail.com
 
   I want to do that. But if i understand correctly in solr it would store
  the
   field like this:
  
   p_value: Pramod  Raj
   p_type:  Client Supplier
  
   When i search
   p_value:Pramod AND p_type:Supplier
  
   it would give me result as document 1. Which is incorrect, since in
   document
   1 Pramod is a Client and not a Supplier.
  
  
  
  
   On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin 
   knagelb...@globeandmail.com wrote:
  
I think you just want something like:
   
p_value:Pramod AND p_type:Supplier
   
no?
-Kallin Nagelberg
   
-Original Message-
From: Pramod Goyal [mailto:pramod.go...@gmail.com]
Sent: Friday, July 23, 2010 2:17 PM
To: solr-user@lucene.apache.org
Subject: help with a schema design problem
   
Hi,
   
Lets say i have table with 3 columns document id Party Value and
 Party
Type.
In this table i have 3 rows. 1st row Document id: 1 Party Value:
 Pramod
Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party
  Type:
Supplier. 3rd row Document id:2 Party Value: Pramod Party Type:
  Supplier.
Now in this table if i use SQL its easy for me find all document with
   Party
Value as Pramod and Party Type as Client.
   
I need to design solr schema so that i can do the same in Solr. If i
   create
2 fields in solr schema Party value and Party type both of them multi
valued
and try to query +Pramod +Supplier then solr will return me the first
document, even though in the first document Pramod is a client and
 not
  a
supplier
Thanks,
Pramod Goyal
   
  
 



Re: filter query on timestamp slowing query???

2010-07-23 Thread Geert-Jan Brits
just wanted to mention a possible other route, which might be entirely
hypothetical :-)

*If* you could query on internal docid (I'm not sure that it's available
out-of-the-box, or if you can at all)
your original problem, quoted below, could imo be simplified to asking for
the last docid inserted (that match the other criteria from your use-case)
and in the next call filter from that docid forward.

Every 30 minutes, i ask the index what are the documents that were added to
it, since the last time i queried it, that match a certain criteria.
From time to time, once a week or so, i ask the index for ALL the documents
that match that criteria. (i also do this for not only one query, but
several)
This is why i need the timestamp filter.

Again, I'm not entirely sure that quering / filtering on internal docid's is
possible (perhaps someone can comment) but if it is, it would perhaps be
more performant.
Big IF, I know.

Geert-Jan

2010/7/23 Chris Hostetter hossman_luc...@fucit.org

 : On top of using trie dates, you might consider separating the timestamp
 : portion and the type portion of the fq into seperate fq parameters --
 : that will allow them to to be stored in the filter cache seperately. So
 : for instance, if you include type:x OR type:y in queries a lot, but
 : with different date ranges, then when you make a new query, the set for
 : type:x OR type:y can be pulled from the filter cache and intersected

 definitely ... that's the one big thing that jumped out at me once you
 showed us *how* you were constructing these queries.



 -Hoss




RE: Novice seeking help to change filters to search without diacritics

2010-07-23 Thread HSingh

Hi Steve,  This is extremely helpful!  What is the best way to also
preserve/append the diacritics in the index in case someone searches using
them?  I deeply appreciate your help!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Novice-seeking-help-to-change-filters-to-search-without-diacritics-tp971263p990949.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: help with a schema design problem

2010-07-23 Thread Geert-Jan Brits
Multiple rows in the OPs example are combined to form 1 solr-document (e.g:
row 1 and 2 both have documentid=1)
Because of this combine, it would match p_value from row1 with p_type from
row2 (or vice versa)


2010/7/23 Nagelberg, Kallin knagelb...@globeandmail.com

When i search
p_value:Pramod AND p_type:Supplier
   
it would give me result as document 1. Which is incorrect, since in
document
1 Pramod is a Client and not a Supplier.

 Would it? I would expect it to give you nothing.

 -Kal



 -Original Message-
 From: Geert-Jan Brits [mailto:gbr...@gmail.com]
 Sent: Friday, July 23, 2010 5:05 PM
 To: solr-user@lucene.apache.org
 Subject: Re: help with a schema design problem

  Is there any way in solr to say p_value[someIndex]=pramod
 And p_type[someIndex]=client.
 No, I'm 99% sure there is not.

  One way would be to define a single field in the schema as p_value_type =
 client pramod i.e. combine the value from both the field and store it in
 a
 single field.
 yep, for the use-case you mentioned that would definitely work. Multivalued
 of course, so it can contain Supplier Raj as well.


 2010/7/23 Pramod Goyal pramod.go...@gmail.com

 In my case the document id is the unique key( each row is not a unique
  document ) . So a single document has multiple Party Value and Party
 Type.
  Hence i need to define both Party value and Party type as mutli-valued.
 Is
  there any way in solr to say p_value[someIndex]=pramod And
  p_type[someIndex]=client.
 Is there any other way i can design my schema ? I have some solutions
  but none seems to be a good solution. One way would be to define a single
  field in the schema as p_value_type = client pramod i.e. combine the
  value
  from both the field and store it in a single field.
 
 
  On Sat, Jul 24, 2010 at 12:18 AM, Geert-Jan Brits gbr...@gmail.com
  wrote:
 
   With the usecase you specified it should work to just index each Row
 as
   you described in your initial post to be a seperate document.
   This way p_value and p_type all get singlevalued and you get a correct
   combination of p_value and p_type.
  
   However, this may not go so well with other use-cases you have in mind,
   e.g.: requiring that no multiple results are returned with the same
   document
   id.
  
  
  
   2010/7/23 Pramod Goyal pramod.go...@gmail.com
  
I want to do that. But if i understand correctly in solr it would
 store
   the
field like this:
   
p_value: Pramod  Raj
p_type:  Client Supplier
   
When i search
p_value:Pramod AND p_type:Supplier
   
it would give me result as document 1. Which is incorrect, since in
document
1 Pramod is a Client and not a Supplier.
   
   
   
   
On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin 
knagelb...@globeandmail.com wrote:
   
 I think you just want something like:

 p_value:Pramod AND p_type:Supplier

 no?
 -Kallin Nagelberg

 -Original Message-
 From: Pramod Goyal [mailto:pramod.go...@gmail.com]
 Sent: Friday, July 23, 2010 2:17 PM
 To: solr-user@lucene.apache.org
 Subject: help with a schema design problem

 Hi,

 Lets say i have table with 3 columns document id Party Value and
  Party
 Type.
 In this table i have 3 rows. 1st row Document id: 1 Party Value:
  Pramod
 Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party
   Type:
 Supplier. 3rd row Document id:2 Party Value: Pramod Party Type:
   Supplier.
 Now in this table if i use SQL its easy for me find all document
 with
Party
 Value as Pramod and Party Type as Client.

 I need to design solr schema so that i can do the same in Solr. If
 i
create
 2 fields in solr schema Party value and Party type both of them
 multi
 valued
 and try to query +Pramod +Supplier then solr will return me the
 first
 document, even though in the first document Pramod is a client and
  not
   a
 supplier
 Thanks,
 Pramod Goyal

   
  
 



Re: commit is taking very very long time

2010-07-23 Thread Alexey Serba
 I am not sure why some commits take very long time.
Hmm... Because it merges index segments... How large is your index?

 Also is there a way to reduce the time it takes?
You can disable commit in DIH call and use autoCommit instead. It's
kind of hack because you postpone commit operation and make it async.

Another option is to set optimize=false in DIH call ( it's true by
default ). Also you can try to increase mergeFactor parameter but it
would affect search performance.


Re: 2 solr dataImport requests on a single core at the same time

2010-07-23 Thread Alexey Serba
 having multiple Request Handlers will not degrade the performance
IMO you shouldn't worry unless you have hundreds of them


Re: commit is taking very very long time

2010-07-23 Thread Mark Miller
On 7/23/10 5:59 PM, Alexey Serba wrote:

 Another option is to set optimize=false in DIH call ( it's true by
 default ). 

Ouch - that should really be changed then.

- Mark


Re: Performance issues when querying on large documents

2010-07-23 Thread Alexey Serba
Do you use highlighting? ( http://wiki.apache.org/solr/HighlightingParameters )

Try to disable it and compare performance.

On Fri, Jul 23, 2010 at 10:52 PM, ahammad ahmed.ham...@gmail.com wrote:

 Hello,

 I have an index with lots of different types of documents. One of those
 types basically contains extracts of PDF docs. Some of those PDFs can have
 1000+ pages, so there would be a lot of stuff to search through.

 I am experiencing really terrible performance when querying. My whole index
 has about 270k documents, but less than 1000 of those are the PDF extracts.
 The slow querying occurs when I search only on those PDF extracts (by
 specifying filters), and return 100 results. The 100 results definitely adds
 to the issue, but even cutting that down can be slow.

 Is there a way to improve querying with such large results? To give an idea,
 querying for a single word can take a little over a minute, which isn't
 really viable for an application that revolves around searching. For now, I
 have limited the results to 20, which makes the query execute in roughly
 10-15 seconds. However, I would like to have the option of returning 100
 results.

 Thanks a lot.


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Performance-issues-when-querying-on-large-documents-tp990590p990590.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Autocommit not happening

2010-07-23 Thread Jay Luker
For the sake of any future googlers I'll report my own clueless but
thankfully brief struggle with autocommit.

There are two parts to the story: Part One is where I realize my
autoCommit config was not contained within my updateHandler. In
Part Two I realized I had typed autocommit rather than
autoCommit.

--jay

On Fri, Jul 23, 2010 at 2:35 PM, John DeRosa jo...@ipstreet.com wrote:
 On Jul 23, 2010, at 9:37 AM, John DeRosa wrote:

 Hi! I'm a Solr newbie, and I don't understand why autocommits aren't 
 happening in my Solr installation.


 [snip]

 Never mind... I have discovered my boneheaded mistake. It's so silly, I 
 wish I could retract my question from the archives.




Re: help with a schema design problem

2010-07-23 Thread Chris Hostetter
:  Is there any way in solr to say p_value[someIndex]=pramod
: And p_type[someIndex]=client.
: No, I'm 99% sure there is not.

it's possibly in code, by utilizing positions and FieldMaskingSpanQuery... 
http://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html

...but there is no QParser or RequestHandler with syntax for exposing it 
to clients.  it would have to be a custom plugin.


-Hoss



SOLR Memory Usage - Where does it go?

2010-07-23 Thread Stephen Weiss
We have been having problems with SOLR on one project lately.  Forgive  
me for writing a novel here but it's really important that we identify  
the root cause of this issue.  It is becoming unavailable at random  
intervals, and the problem appears to be memory related.  There are  
basically two ways it goes:


1) Straight up OOM error, either from Java or sometimes from the  
kernel itself.


2) Instead of throwing an OOM, the memory usage gets very high and  
then drops precipitously (say, from 92% (of 20GB) down to 60%).  Once  
the memory usage is done dropping, SOLR seems to stop responding to  
requests altogether.


It started out mostly being version #1 of the problem but now we're  
mostly seeing version #2 of the problem... and it's getting more and  
more frequent.  In either scenario the servlet container (Jetty) needs  
to be restarted to resume service.


The number of documents in the index is always going up.  They are  
relatively small in size (1K per piece max - mostly small numeric  
strings, with 5 text fields (one each for 5 languages) that are rarely  
more than 50-100 characters), and there are about 5 million of them at  
the moment (adding around 1000 every day).  The machine has 20 GB of  
RAM, Xmx is set to 18GB, and SOLR is the only thing this machine /  
servlet container does.  There are a couple other cores configured,  
but they are miniscule in comparison (one with 20 docs, and two  
more with  1 docs a piece).  Eliminating these other cores does  
not seem to make any significant impact.  This is with the SOLR 1.4.1  
release, using the SOLR-236 patch that was recently released to go  
with this version.  The patch was slightly modified in order to ensure  
that paging continued to work properly  - basically, an optimization  
that eliminated paging was removed per the instructions in this comment:


https://issues.apache.org/jira/browse/SOLR-236?focusedCommentId=12867680page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel 
#action_12867680


I realize this is not ideal if you want to control memory usage, but  
the design requirements of the project preclude us from eliminating  
either collapsing or paging.  It's also probably worth noting that  
these problems did not start with version 1.4.1 or this version of the  
236 patch - we actually upgraded from 1.4 because they said it fixed  
some memory leaks, hoping it would help solve this problem.


We have some test machines set up and we have been testing out various  
configuration changes.  Watching the stats in the admin area, this is  
what we've been able to figure out:


1) The fieldValueCache usage stays constant at 23 entries (one for  
each faceted field), and takes up a total size of about 750MB  
altogether.


2) Lowering or just eliminating the filterCache and the  
queryResultCache does not seem to have any serious impact - perhaps a  
difference of a few percent at the start, but after prolonged usage  
the memory still goes up seemingly uncontrolled.  It would appear the  
queryResultCache does not get much usage anyway, and even though we  
have higher eviction rates in the filterCache, this really doesn't  
seem to impact performance significantly.


3) Lowering or eliminating the documentCache also doesn't seem to have  
very much impact in memory usage, although it does make searches much  
slower.


4) We followed the instructions for configuring the HashDocSet  
parameter, but this doesn't seem to be having much impact either.


5)  All the caches, with the exception of the documentCache, are  
FastLRUCaches.  Switching between FastLRUCache and normal LRUCache in  
general doesn't seem to change the memory usage.


6) Glancing through all of the data on memory usage in the Lucene  
fieldCache would indicate that this cache is using well under 1GB of  
RAM as well.


Basically, when the servlet first starts, it uses very little RAM  
(4%).  We warm the searcher with a few standard queries that  
initialize everything in the fieldValueCache off the bat, and the  
query performance levels off at a reasonable speed, with memory usage  
around 10-12%.  At this point, almost all queries execute within a few  
100ms, if not faster.  A very few queries that return large numbers of  
collapsed documents, generally 800K up to about 2 million (we have  
about 5 distinct queries that do this), will take up to 20 seconds to  
run the first time, and up to 10 seconds thereafter.  Even after  
running all these queries, memory usage stays around 20-30%.  At this  
point, performance is optimal.  We simulate production usage, running  
queries taken from those logs through the system at a rate similar to  
production use.


For the most part, memory usage stays level.  Usage will go up as  
queries are run (this seems to correspond with when they are being  
collapsed), but then go back down as the results are returned.  Then,  
over the course of a few hours, at seemingly random 

Re: Autocommit not happening

2010-07-23 Thread John DeRosa
I'll see you, and raise. My solrconfig.xml wasn't being copied to the server by 
the deployment script.

On Jul 23, 2010, at 3:26 PM, Jay Luker wrote:

 For the sake of any future googlers I'll report my own clueless but
 thankfully brief struggle with autocommit.
 
 There are two parts to the story: Part One is where I realize my
 autoCommit config was not contained within my updateHandler. In
 Part Two I realized I had typed autocommit rather than
 autoCommit.
 
 --jay
 
 On Fri, Jul 23, 2010 at 2:35 PM, John DeRosa jo...@ipstreet.com wrote:
 On Jul 23, 2010, at 9:37 AM, John DeRosa wrote:
 
 Hi! I'm a Solr newbie, and I don't understand why autocommits aren't 
 happening in my Solr installation.
 
 
 [snip]
 
 Never mind... I have discovered my boneheaded mistake. It's so silly, I 
 wish I could retract my question from the archives.