Getting the offset of search keyword in a document

2010-07-23 Thread Ryan Chan
Hello,

I am new to Solr/Lucene and I am evaluating if they suit my need and
replace our in-house system.


Our requirements:

1. I have multiple documents (1M)
2. Each document contains text ranged from few KB to a few MB
3. I want to search for a keyword, search thru all theses document,
and it return the matched document(s), AND ALSO the offset of that
'keyword' inside the document.

Is it possible for requirement 3?


Re: Autocommit not happening

2010-07-23 Thread John DeRosa
I'll see you, and raise. My solrconfig.xml wasn't being copied to the server by 
the deployment script.

On Jul 23, 2010, at 3:26 PM, Jay Luker wrote:

> For the sake of any future googlers I'll report my own clueless but
> thankfully brief struggle with autocommit.
> 
> There are two parts to the story: Part One is where I realize my
>  config was not contained within my . In
> Part Two I realized I had typed "" rather than
> "".
> 
> --jay
> 
> On Fri, Jul 23, 2010 at 2:35 PM, John DeRosa  wrote:
>> On Jul 23, 2010, at 9:37 AM, John DeRosa wrote:
>> 
>>> Hi! I'm a Solr newbie, and I don't understand why autocommits aren't 
>>> happening in my Solr installation.
>>> 
>> 
>> [snip]
>> 
>> "Never mind"... I have discovered my boneheaded mistake. It's so silly, I 
>> wish I could retract my question from the archives.
>> 
>> 



SOLR Memory Usage - Where does it go?

2010-07-23 Thread Stephen Weiss
We have been having problems with SOLR on one project lately.  Forgive  
me for writing a novel here but it's really important that we identify  
the root cause of this issue.  It is becoming unavailable at random  
intervals, and the problem appears to be memory related.  There are  
basically two ways it goes:


1) Straight up OOM error, either from Java or sometimes from the  
kernel itself.


2) Instead of throwing an OOM, the memory usage gets very high and  
then drops precipitously (say, from 92% (of 20GB) down to 60%).  Once  
the memory usage is done dropping, SOLR seems to stop responding to  
requests altogether.


It started out mostly being version #1 of the problem but now we're  
mostly seeing version #2 of the problem... and it's getting more and  
more frequent.  In either scenario the servlet container (Jetty) needs  
to be restarted to resume service.


The number of documents in the index is always going up.  They are  
relatively small in size (1K per piece max - mostly small numeric  
strings, with 5 text fields (one each for 5 languages) that are rarely  
more than 50-100 characters), and there are about 5 million of them at  
the moment (adding around 1000 every day).  The machine has 20 GB of  
RAM, Xmx is set to 18GB, and SOLR is the only thing this machine /  
servlet container does.  There are a couple other cores configured,  
but they are miniscule in comparison (one with 20 docs, and two  
more with < 1 docs a piece).  Eliminating these other cores does  
not seem to make any significant impact.  This is with the SOLR 1.4.1  
release, using the SOLR-236 patch that was recently released to go  
with this version.  The patch was slightly modified in order to ensure  
that paging continued to work properly  - basically, an optimization  
that eliminated paging was removed per the instructions in this comment:


https://issues.apache.org/jira/browse/SOLR-236?focusedCommentId=12867680&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel 
#action_12867680


I realize this is not ideal if you want to control memory usage, but  
the design requirements of the project preclude us from eliminating  
either collapsing or paging.  It's also probably worth noting that  
these problems did not start with version 1.4.1 or this version of the  
236 patch - we actually upgraded from 1.4 because they said it fixed  
some memory leaks, hoping it would help solve this problem.


We have some test machines set up and we have been testing out various  
configuration changes.  Watching the stats in the admin area, this is  
what we've been able to figure out:


1) The fieldValueCache usage stays constant at 23 entries (one for  
each faceted field), and takes up a total size of about 750MB  
altogether.


2) Lowering or just eliminating the filterCache and the  
queryResultCache does not seem to have any serious impact - perhaps a  
difference of a few percent at the start, but after prolonged usage  
the memory still goes up seemingly uncontrolled.  It would appear the  
queryResultCache does not get much usage anyway, and even though we  
have higher eviction rates in the filterCache, this really doesn't  
seem to impact performance significantly.


3) Lowering or eliminating the documentCache also doesn't seem to have  
very much impact in memory usage, although it does make searches much  
slower.


4) We followed the instructions for configuring the HashDocSet  
parameter, but this doesn't seem to be having much impact either.


5)  All the caches, with the exception of the documentCache, are  
FastLRUCaches.  Switching between FastLRUCache and normal LRUCache in  
general doesn't seem to change the memory usage.


6) Glancing through all of the data on memory usage in the Lucene  
fieldCache would indicate that this cache is using well under 1GB of  
RAM as well.


Basically, when the servlet first starts, it uses very little RAM  
(<4%).  We warm the searcher with a few standard queries that  
initialize everything in the fieldValueCache off the bat, and the  
query performance levels off at a reasonable speed, with memory usage  
around 10-12%.  At this point, almost all queries execute within a few  
100ms, if not faster.  A very few queries that return large numbers of  
collapsed documents, generally 800K up to about 2 million (we have  
about 5 distinct queries that do this), will take up to 20 seconds to  
run the first time, and up to 10 seconds thereafter.  Even after  
running all these queries, memory usage stays around 20-30%.  At this  
point, performance is optimal.  We simulate production usage, running  
queries taken from those logs through the system at a rate similar to  
production use.


For the most part, memory usage stays level.  Usage will go up as  
queries are run (this seems to correspond with when they are being  
collapsed), but then go back down as the results are returned.  Then,  
over the course of a few hours, at seemingly random int

Re: help with a schema design problem

2010-07-23 Thread Chris Hostetter
: > Is there any way in solr to say p_value[someIndex]="pramod"
: And p_type[someIndex]="client".
: No, I'm 99% sure there is not.

it's possibly in code, by utilizing positions and FieldMaskingSpanQuery... 
http://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html

...but there is no QParser or RequestHandler with syntax for exposing it 
to clients.  it would have to be a custom plugin.


-Hoss



Re: Autocommit not happening

2010-07-23 Thread Jay Luker
For the sake of any future googlers I'll report my own clueless but
thankfully brief struggle with autocommit.

There are two parts to the story: Part One is where I realize my
 config was not contained within my . In
Part Two I realized I had typed "" rather than
"".

--jay

On Fri, Jul 23, 2010 at 2:35 PM, John DeRosa  wrote:
> On Jul 23, 2010, at 9:37 AM, John DeRosa wrote:
>
>> Hi! I'm a Solr newbie, and I don't understand why autocommits aren't 
>> happening in my Solr installation.
>>
>
> [snip]
>
> "Never mind"... I have discovered my boneheaded mistake. It's so silly, I 
> wish I could retract my question from the archives.
>
>


Re: Performance issues when querying on large documents

2010-07-23 Thread Alexey Serba
Do you use highlighting? ( http://wiki.apache.org/solr/HighlightingParameters )

Try to disable it and compare performance.

On Fri, Jul 23, 2010 at 10:52 PM, ahammad  wrote:
>
> Hello,
>
> I have an index with lots of different types of documents. One of those
> types basically contains extracts of PDF docs. Some of those PDFs can have
> 1000+ pages, so there would be a lot of stuff to search through.
>
> I am experiencing really terrible performance when querying. My whole index
> has about 270k documents, but less than 1000 of those are the PDF extracts.
> The slow querying occurs when I search only on those PDF extracts (by
> specifying filters), and return 100 results. The 100 results definitely adds
> to the issue, but even cutting that down can be slow.
>
> Is there a way to improve querying with such large results? To give an idea,
> querying for a single word can take a little over a minute, which isn't
> really viable for an application that revolves around searching. For now, I
> have limited the results to 20, which makes the query execute in roughly
> 10-15 seconds. However, I would like to have the option of returning 100
> results.
>
> Thanks a lot.
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Performance-issues-when-querying-on-large-documents-tp990590p990590.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: commit is taking very very long time

2010-07-23 Thread Mark Miller
On 7/23/10 5:59 PM, Alexey Serba wrote:

> Another option is to set optimize=false in DIH call ( it's true by
> default ). 

Ouch - that should really be changed then.

- Mark


Re: 2 solr dataImport requests on a single core at the same time

2010-07-23 Thread Alexey Serba
> having multiple Request Handlers will not degrade the performance
IMO you shouldn't worry unless you have hundreds of them


Re: commit is taking very very long time

2010-07-23 Thread Alexey Serba
> I am not sure why some commits take very long time.
Hmm... Because it merges index segments... How large is your index?

> Also is there a way to reduce the time it takes?
You can disable commit in DIH call and use autoCommit instead. It's
kind of hack because you postpone commit operation and make it async.

Another option is to set optimize=false in DIH call ( it's true by
default ). Also you can try to increase mergeFactor parameter but it
would affect search performance.


Re: help with a schema design problem

2010-07-23 Thread Geert-Jan Brits
Multiple rows in the OPs example are combined to form 1 solr-document (e.g:
row 1 and 2 both have documentid=1)
Because of this combine, it would match p_value from row1 with p_type from
row2 (or vice versa)


2010/7/23 Nagelberg, Kallin 

> > > > When i search
> > > > p_value:"Pramod" AND p_type:"Supplier"
> > > >
> > > > it would give me result as document 1. Which is incorrect, since in
> > > > document
> > > > 1 Pramod is a Client and not a Supplier.
>
> Would it? I would expect it to give you nothing.
>
> -Kal
>
>
>
> -Original Message-
> From: Geert-Jan Brits [mailto:gbr...@gmail.com]
> Sent: Friday, July 23, 2010 5:05 PM
> To: solr-user@lucene.apache.org
> Subject: Re: help with a schema design problem
>
> > Is there any way in solr to say p_value[someIndex]="pramod"
> And p_type[someIndex]="client".
> No, I'm 99% sure there is not.
>
> > One way would be to define a single field in the schema as p_value_type =
> "client pramod" i.e. combine the value from both the field and store it in
> a
> single field.
> yep, for the use-case you mentioned that would definitely work. Multivalued
> of course, so it can contain "Supplier Raj" as well.
>
>
> 2010/7/23 Pramod Goyal 
>
> >In my case the document id is the unique key( each row is not a unique
> > document ) . So a single document has multiple Party Value and Party
> Type.
> > Hence i need to define both Party value and Party type as mutli-valued.
> Is
> > there any way in solr to say p_value[someIndex]="pramod" And
> > p_type[someIndex]="client".
> >Is there any other way i can design my schema ? I have some solutions
> > but none seems to be a good solution. One way would be to define a single
> > field in the schema as p_value_type = "client pramod" i.e. combine the
> > value
> > from both the field and store it in a single field.
> >
> >
> > On Sat, Jul 24, 2010 at 12:18 AM, Geert-Jan Brits 
> > wrote:
> >
> > > With the usecase you specified it should work to just index each "Row"
> as
> > > you described in your initial post to be a seperate document.
> > > This way p_value and p_type all get singlevalued and you get a correct
> > > combination of p_value and p_type.
> > >
> > > However, this may not go so well with other use-cases you have in mind,
> > > e.g.: requiring that no multiple results are returned with the same
> > > document
> > > id.
> > >
> > >
> > >
> > > 2010/7/23 Pramod Goyal 
> > >
> > > > I want to do that. But if i understand correctly in solr it would
> store
> > > the
> > > > field like this:
> > > >
> > > > p_value: "Pramod"  "Raj"
> > > > p_type:  "Client" "Supplier"
> > > >
> > > > When i search
> > > > p_value:"Pramod" AND p_type:"Supplier"
> > > >
> > > > it would give me result as document 1. Which is incorrect, since in
> > > > document
> > > > 1 Pramod is a Client and not a Supplier.
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin <
> > > > knagelb...@globeandmail.com> wrote:
> > > >
> > > > > I think you just want something like:
> > > > >
> > > > > p_value:"Pramod" AND p_type:"Supplier"
> > > > >
> > > > > no?
> > > > > -Kallin Nagelberg
> > > > >
> > > > > -Original Message-
> > > > > From: Pramod Goyal [mailto:pramod.go...@gmail.com]
> > > > > Sent: Friday, July 23, 2010 2:17 PM
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: help with a schema design problem
> > > > >
> > > > > Hi,
> > > > >
> > > > > Lets say i have table with 3 columns document id Party Value and
> > Party
> > > > > Type.
> > > > > In this table i have 3 rows. 1st row Document id: 1 Party Value:
> > Pramod
> > > > > Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party
> > > Type:
> > > > > Supplier. 3rd row Document id:2 Party Value: Pramod Party Type:
> > > Supplier.
> > > > > Now in this table if i use SQL its easy for me find all document
> with
> > > > Party
> > > > > Value as Pramod and Party Type as Client.
> > > > >
> > > > > I need to design solr schema so that i can do the same in Solr. If
> i
> > > > create
> > > > > 2 fields in solr schema Party value and Party type both of them
> multi
> > > > > valued
> > > > > and try to query +Pramod +Supplier then solr will return me the
> first
> > > > > document, even though in the first document Pramod is a client and
> > not
> > > a
> > > > > supplier
> > > > > Thanks,
> > > > > Pramod Goyal
> > > > >
> > > >
> > >
> >
>


RE: Novice seeking help to change filters to search without diacritics

2010-07-23 Thread HSingh

Hi Steve,  This is extremely helpful!  What is the best way to also
preserve/append the diacritics in the index in case someone searches using
them?  I deeply appreciate your help!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Novice-seeking-help-to-change-filters-to-search-without-diacritics-tp971263p990949.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: filter query on timestamp slowing query???

2010-07-23 Thread Geert-Jan Brits
just wanted to mention a possible other route, which might be entirely
hypothetical :-)

*If* you could query on internal docid (I'm not sure that it's available
out-of-the-box, or if you can at all)
your original problem, quoted below, could imo be simplified to asking for
the last docid inserted (that match the other criteria from your use-case)
and in the next call filter from that docid forward.

>Every 30 minutes, i ask the index what are the documents that were added to
>it, since the last time i queried it, that match a certain criteria.
>From time to time, once a week or so, i ask the index for ALL the documents
>that match that criteria. (i also do this for not only one query, but
>several)
>This is why i need the timestamp filter.

Again, I'm not entirely sure that quering / filtering on internal docid's is
possible (perhaps someone can comment) but if it is, it would perhaps be
more performant.
Big IF, I know.

Geert-Jan

2010/7/23 Chris Hostetter 

> : On top of using trie dates, you might consider separating the timestamp
> : portion and the type portion of the fq into seperate fq parameters --
> : that will allow them to to be stored in the filter cache seperately. So
> : for instance, if you include "type:x OR type:y" in queries a lot, but
> : with different date ranges, then when you make a new query, the set for
> : "type:x OR type:y" can be pulled from the filter cache and intersected
>
> definitely ... that's the one big thing that jumped out at me once you
> showed us *how* you were constructing these queries.
>
>
>
> -Hoss
>
>


RE: help with a schema design problem

2010-07-23 Thread Nagelberg, Kallin
> > > When i search
> > > p_value:"Pramod" AND p_type:"Supplier"
> > >
> > > it would give me result as document 1. Which is incorrect, since in
> > > document
> > > 1 Pramod is a Client and not a Supplier.

Would it? I would expect it to give you nothing.

-Kal



-Original Message-
From: Geert-Jan Brits [mailto:gbr...@gmail.com] 
Sent: Friday, July 23, 2010 5:05 PM
To: solr-user@lucene.apache.org
Subject: Re: help with a schema design problem

> Is there any way in solr to say p_value[someIndex]="pramod"
And p_type[someIndex]="client".
No, I'm 99% sure there is not.

> One way would be to define a single field in the schema as p_value_type =
"client pramod" i.e. combine the value from both the field and store it in a
single field.
yep, for the use-case you mentioned that would definitely work. Multivalued
of course, so it can contain "Supplier Raj" as well.


2010/7/23 Pramod Goyal 

>In my case the document id is the unique key( each row is not a unique
> document ) . So a single document has multiple Party Value and Party Type.
> Hence i need to define both Party value and Party type as mutli-valued. Is
> there any way in solr to say p_value[someIndex]="pramod" And
> p_type[someIndex]="client".
>Is there any other way i can design my schema ? I have some solutions
> but none seems to be a good solution. One way would be to define a single
> field in the schema as p_value_type = "client pramod" i.e. combine the
> value
> from both the field and store it in a single field.
>
>
> On Sat, Jul 24, 2010 at 12:18 AM, Geert-Jan Brits 
> wrote:
>
> > With the usecase you specified it should work to just index each "Row" as
> > you described in your initial post to be a seperate document.
> > This way p_value and p_type all get singlevalued and you get a correct
> > combination of p_value and p_type.
> >
> > However, this may not go so well with other use-cases you have in mind,
> > e.g.: requiring that no multiple results are returned with the same
> > document
> > id.
> >
> >
> >
> > 2010/7/23 Pramod Goyal 
> >
> > > I want to do that. But if i understand correctly in solr it would store
> > the
> > > field like this:
> > >
> > > p_value: "Pramod"  "Raj"
> > > p_type:  "Client" "Supplier"
> > >
> > > When i search
> > > p_value:"Pramod" AND p_type:"Supplier"
> > >
> > > it would give me result as document 1. Which is incorrect, since in
> > > document
> > > 1 Pramod is a Client and not a Supplier.
> > >
> > >
> > >
> > >
> > > On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin <
> > > knagelb...@globeandmail.com> wrote:
> > >
> > > > I think you just want something like:
> > > >
> > > > p_value:"Pramod" AND p_type:"Supplier"
> > > >
> > > > no?
> > > > -Kallin Nagelberg
> > > >
> > > > -Original Message-
> > > > From: Pramod Goyal [mailto:pramod.go...@gmail.com]
> > > > Sent: Friday, July 23, 2010 2:17 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: help with a schema design problem
> > > >
> > > > Hi,
> > > >
> > > > Lets say i have table with 3 columns document id Party Value and
> Party
> > > > Type.
> > > > In this table i have 3 rows. 1st row Document id: 1 Party Value:
> Pramod
> > > > Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party
> > Type:
> > > > Supplier. 3rd row Document id:2 Party Value: Pramod Party Type:
> > Supplier.
> > > > Now in this table if i use SQL its easy for me find all document with
> > > Party
> > > > Value as Pramod and Party Type as Client.
> > > >
> > > > I need to design solr schema so that i can do the same in Solr. If i
> > > create
> > > > 2 fields in solr schema Party value and Party type both of them multi
> > > > valued
> > > > and try to query +Pramod +Supplier then solr will return me the first
> > > > document, even though in the first document Pramod is a client and
> not
> > a
> > > > supplier
> > > > Thanks,
> > > > Pramod Goyal
> > > >
> > >
> >
>


Re: help with a schema design problem

2010-07-23 Thread Geert-Jan Brits
> Is there any way in solr to say p_value[someIndex]="pramod"
And p_type[someIndex]="client".
No, I'm 99% sure there is not.

> One way would be to define a single field in the schema as p_value_type =
"client pramod" i.e. combine the value from both the field and store it in a
single field.
yep, for the use-case you mentioned that would definitely work. Multivalued
of course, so it can contain "Supplier Raj" as well.


2010/7/23 Pramod Goyal 

>In my case the document id is the unique key( each row is not a unique
> document ) . So a single document has multiple Party Value and Party Type.
> Hence i need to define both Party value and Party type as mutli-valued. Is
> there any way in solr to say p_value[someIndex]="pramod" And
> p_type[someIndex]="client".
>Is there any other way i can design my schema ? I have some solutions
> but none seems to be a good solution. One way would be to define a single
> field in the schema as p_value_type = "client pramod" i.e. combine the
> value
> from both the field and store it in a single field.
>
>
> On Sat, Jul 24, 2010 at 12:18 AM, Geert-Jan Brits 
> wrote:
>
> > With the usecase you specified it should work to just index each "Row" as
> > you described in your initial post to be a seperate document.
> > This way p_value and p_type all get singlevalued and you get a correct
> > combination of p_value and p_type.
> >
> > However, this may not go so well with other use-cases you have in mind,
> > e.g.: requiring that no multiple results are returned with the same
> > document
> > id.
> >
> >
> >
> > 2010/7/23 Pramod Goyal 
> >
> > > I want to do that. But if i understand correctly in solr it would store
> > the
> > > field like this:
> > >
> > > p_value: "Pramod"  "Raj"
> > > p_type:  "Client" "Supplier"
> > >
> > > When i search
> > > p_value:"Pramod" AND p_type:"Supplier"
> > >
> > > it would give me result as document 1. Which is incorrect, since in
> > > document
> > > 1 Pramod is a Client and not a Supplier.
> > >
> > >
> > >
> > >
> > > On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin <
> > > knagelb...@globeandmail.com> wrote:
> > >
> > > > I think you just want something like:
> > > >
> > > > p_value:"Pramod" AND p_type:"Supplier"
> > > >
> > > > no?
> > > > -Kallin Nagelberg
> > > >
> > > > -Original Message-
> > > > From: Pramod Goyal [mailto:pramod.go...@gmail.com]
> > > > Sent: Friday, July 23, 2010 2:17 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: help with a schema design problem
> > > >
> > > > Hi,
> > > >
> > > > Lets say i have table with 3 columns document id Party Value and
> Party
> > > > Type.
> > > > In this table i have 3 rows. 1st row Document id: 1 Party Value:
> Pramod
> > > > Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party
> > Type:
> > > > Supplier. 3rd row Document id:2 Party Value: Pramod Party Type:
> > Supplier.
> > > > Now in this table if i use SQL its easy for me find all document with
> > > Party
> > > > Value as Pramod and Party Type as Client.
> > > >
> > > > I need to design solr schema so that i can do the same in Solr. If i
> > > create
> > > > 2 fields in solr schema Party value and Party type both of them multi
> > > > valued
> > > > and try to query +Pramod +Supplier then solr will return me the first
> > > > document, even though in the first document Pramod is a client and
> not
> > a
> > > > supplier
> > > > Thanks,
> > > > Pramod Goyal
> > > >
> > >
> >
>


RE: filter query on timestamp slowing query???

2010-07-23 Thread Chris Hostetter
: On top of using trie dates, you might consider separating the timestamp 
: portion and the type portion of the fq into seperate fq parameters -- 
: that will allow them to to be stored in the filter cache seperately. So 
: for instance, if you include "type:x OR type:y" in queries a lot, but 
: with different date ranges, then when you make a new query, the set for 
: "type:x OR type:y" can be pulled from the filter cache and intersected 

definitely ... that's the one big thing that jumped out at me once you 
showed us *how* you were constructing these queries.  



-Hoss



Scoring Search for autocomplete

2010-07-23 Thread Frank A
Hi, I have an autocomplete that is currently working with an
NGramTokenizer so if I search for "Yo" both "New York" and "Toyota"
are valid results.  However I'm trying to figure out how to best
implement the search so that from a score perspective if the string
matches the beginning of an entire field it ranks first, followed by
the beginning of a term and then in the middle of a term.  For example
if I was searching with "vi" I would want Virginia ahead of West
Virginia ahead of Five.

I think I can do this with three seperate fields, one using a white
space tokenizer and a ngram filter, another using the edge-ngram +
whitespace and another using keyword+edge-ngram, then doing an or on
the 3 fields, so that Virginia would match all 3 and get a higher
score... but this doesn't feel right to me, so I wanted to check for
better options.

Thanks.


Re: Sort by index order desc?

2010-07-23 Thread Ryan McKinley
Looks like you can sort by _docid_ to get things in index order or
reverse index order.

?sort=_docid_ asc

thank you solr!


On Fri, Jul 23, 2010 at 2:23 PM, Ryan McKinley  wrote:
> Any pointers on how to sort by reverse index order?
> http://search.lucidimagination.com/search/document/4a59ded3966271ca/sort_by_index_order_desc
>
> it seems like it should be easy to do with the function query stuff,
> but i'm not sure what to sort by (unless I add a new field for indexed
> time)
>
>
> Any pointers?
>
> Thanks
> Ryan
>


Re: Securing Solr 1.4 in a glassfish container AS NEW THREAD

2010-07-23 Thread Sharp, Jonathan

Are you using the same instance of CommonsHttpSolrServer for all the
requests?


I was.

I also tried creating a new instance every x requests, also resetting  
the credentials on the new instances, to see if it would make a  
difference.


Doing that, I get an exception after several instances of the  
httpserver (again several hundred PDFs) to the effect that the socket  
is still in use... Perhaps I am not releasing the resources properly...?


-Jon

On Jul 22, 2010, at 3:02 AM, "Bilgin Ibryam"  wrote:


Are you using the same instance of CommonsHttpSolrServer for all the
requests?

On Wed, Jul 21, 2010 at 4:50 PM, Sharp, Jonathan   
wrote:




Some further information --

I tried indexing a batch of PDFs with the client and Solr CELL,  
setting

the credentials in the httpclient. For some reason after successfully
indexing several hundred files I start getting a "SolrException:
Unauthorized" and an info message (for every subsequent file):

INFO basic authentication scheme selected
Org.apache.commons.httpclient.HttpMethodDirector process
WWWAuthChallenge
INFO Failure authenticating with BASIC ''@host:port

I increased session timeout in web.xml with no change. I'm looking
through the httpclient authentication now.

-Jon

-Original Message-
From: Sharp, Jonathan
Sent: Friday, July 16, 2010 8:59 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: Securing Solr 1.4 in a glassfish container AS NEW THREAD

Hi Bilgin,

Thanks for the snippet -- that helps a lot.

-Jon

-Original Message-
From: Bilgin Ibryam [mailto:bibr...@gmail.com]
Sent: Friday, July 16, 2010 1:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Securing Solr 1.4 in a glassfish container AS NEW THREAD

Hi Jon,

SolrJ (CommonsHttpSolrServer) internally uses apache http client to
connect
to solr. You can check there for some documentation.
I secured solr also with BASIC auth-method and use the following  
snippet

to
access it from solrJ:

//set username and password
((CommonsHttpSolrServer)
server).getHttpClient().getParams().setAuthenticationPreemptive 
(true);

Credentials defaultcreds = new
UsernamePasswordCredentials("username",
"secret");
((CommonsHttpSolrServer)
server).getHttpClient().getState().setCredentials(new
AuthScope("localhost",
80, AuthScope.ANY_REALM), defaultcreds);

HTH
Bilgin Ibryam



On Fri, Jul 16, 2010 at 2:35 AM, Sharp, Jonathan   
wrote:



Hi All,

I am considering securing Solr with basic auth in glassfish using  
the

container, by adding to web.xml and adding sun-web.xml file to the
distributed WAR as below.

If using SolrJ to index files, how can I provide the credentials for
authentication to the http-client (or can someone point me in the

direction

of the right documentation to do that or that will help me make the
appropriate modifications) ?

Also any comment on the below is appreciated.

Add this to web.xml
---
 
 BASIC
 SomeRealm
 
 
 
 Admin Pages
 /admin
 /admin/*


GETPOSTmetho
d>PUTTRACEmethod>HEADp-method>OPTIONSDELETEhttp-met

hod>

 
 
 SomeAdminRole
 
 
 
 
 Update Servlet
 /update/*


GETPOSTmetho
d>PUTTRACEmethod>HEADp-method>OPTIONSDELETEhttp-met

hod>

 
 
 SomeUpdateRole
 
 
 
 
 Select Servlet
 /select/*


GETPOSTmetho
d>PUTTRACEmethod>HEADp-method>OPTIONSDELETEhttp-met

hod>

 
 
 SomeSearchRole
 
 
---

Also add this as sun-web.xml




Application

Server 9.0 Servlet 2.5//EN" "
http://www.sun.com/software/appserver/dtds/sun-web-app_2_5-0.dtd";>

/Solr

 
   Keep a copy of the generated servlet class' java
code.
 


   SomeAdminRole
   SomeAdminGroup


   SomeUpdateRole
   SomeUpdateGroup


   SomeSearchRole
   SomeSearchGroup


--

-Jon


--- 
--
SECURITY/CONFIDENTIALITY WARNING: This message and any attachments  
are

intended solely for the individual or entity to which they are

addressed.

This communication may contain information that is privileged,

confidential,
or exempt from disclosure under applicable law (e.g., personal  
health

information, research data, financial information). Because this

e-mail has

been sent without encryption, individuals other than the intended

recipient

may be able to view the information, forward it to others or tamper

with the
information without the knowledge or consent of the sender. If you  
are

not

the intended recipient, or the employee or person responsible for

delivering
the message to the intended recipient, any dissemination,  
distribution

or

copying of the communication is strictly prohibited. If you received

the

communication in error, please notify the sender immediately by

replying to
this message and deleting the 

Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 2:40 PM, MitchK  wrote:
> That only works if the docs are exactly the same - they may not be.
> Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
> don't they?

Documents aren't supposed to be duplicated across shards... so the
presence of multiple docs with the same id is a bug anyway.  We've
chosen to try and handle it gracefully rather than fail hard.

Some people have treated this as a feature - and that's OK as long as
expectations are set appropriately.

-Yonik
http://www.lucidimagination.com


Re: help with a schema design problem

2010-07-23 Thread Pramod Goyal
In my case the document id is the unique key( each row is not a unique
document ) . So a single document has multiple Party Value and Party Type.
Hence i need to define both Party value and Party type as mutli-valued. Is
there any way in solr to say p_value[someIndex]="pramod" And
p_type[someIndex]="client".
Is there any other way i can design my schema ? I have some solutions
but none seems to be a good solution. One way would be to define a single
field in the schema as p_value_type = "client pramod" i.e. combine the value
from both the field and store it in a single field.


On Sat, Jul 24, 2010 at 12:18 AM, Geert-Jan Brits  wrote:

> With the usecase you specified it should work to just index each "Row" as
> you described in your initial post to be a seperate document.
> This way p_value and p_type all get singlevalued and you get a correct
> combination of p_value and p_type.
>
> However, this may not go so well with other use-cases you have in mind,
> e.g.: requiring that no multiple results are returned with the same
> document
> id.
>
>
>
> 2010/7/23 Pramod Goyal 
>
> > I want to do that. But if i understand correctly in solr it would store
> the
> > field like this:
> >
> > p_value: "Pramod"  "Raj"
> > p_type:  "Client" "Supplier"
> >
> > When i search
> > p_value:"Pramod" AND p_type:"Supplier"
> >
> > it would give me result as document 1. Which is incorrect, since in
> > document
> > 1 Pramod is a Client and not a Supplier.
> >
> >
> >
> >
> > On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin <
> > knagelb...@globeandmail.com> wrote:
> >
> > > I think you just want something like:
> > >
> > > p_value:"Pramod" AND p_type:"Supplier"
> > >
> > > no?
> > > -Kallin Nagelberg
> > >
> > > -Original Message-
> > > From: Pramod Goyal [mailto:pramod.go...@gmail.com]
> > > Sent: Friday, July 23, 2010 2:17 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: help with a schema design problem
> > >
> > > Hi,
> > >
> > > Lets say i have table with 3 columns document id Party Value and Party
> > > Type.
> > > In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod
> > > Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party
> Type:
> > > Supplier. 3rd row Document id:2 Party Value: Pramod Party Type:
> Supplier.
> > > Now in this table if i use SQL its easy for me find all document with
> > Party
> > > Value as Pramod and Party Type as Client.
> > >
> > > I need to design solr schema so that i can do the same in Solr. If i
> > create
> > > 2 fields in solr schema Party value and Party type both of them multi
> > > valued
> > > and try to query +Pramod +Supplier then solr will return me the first
> > > document, even though in the first document Pramod is a client and not
> a
> > > supplier
> > > Thanks,
> > > Pramod Goyal
> > >
> >
>


Performance issues when querying on large documents

2010-07-23 Thread ahammad

Hello,

I have an index with lots of different types of documents. One of those
types basically contains extracts of PDF docs. Some of those PDFs can have
1000+ pages, so there would be a lot of stuff to search through.

I am experiencing really terrible performance when querying. My whole index
has about 270k documents, but less than 1000 of those are the PDF extracts.
The slow querying occurs when I search only on those PDF extracts (by
specifying filters), and return 100 results. The 100 results definitely adds
to the issue, but even cutting that down can be slow.

Is there a way to improve querying with such large results? To give an idea,
querying for a single word can take a little over a minute, which isn't
really viable for an application that revolves around searching. For now, I
have limited the results to 20, which makes the query execute in roughly
10-15 seconds. However, I would like to have the option of returning 100
results.

Thanks a lot.

 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Performance-issues-when-querying-on-large-documents-tp990590p990590.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: help with a schema design problem

2010-07-23 Thread Geert-Jan Brits
With the usecase you specified it should work to just index each "Row" as
you described in your initial post to be a seperate document.
This way p_value and p_type all get singlevalued and you get a correct
combination of p_value and p_type.

However, this may not go so well with other use-cases you have in mind,
e.g.: requiring that no multiple results are returned with the same document
id.



2010/7/23 Pramod Goyal 

> I want to do that. But if i understand correctly in solr it would store the
> field like this:
>
> p_value: "Pramod"  "Raj"
> p_type:  "Client" "Supplier"
>
> When i search
> p_value:"Pramod" AND p_type:"Supplier"
>
> it would give me result as document 1. Which is incorrect, since in
> document
> 1 Pramod is a Client and not a Supplier.
>
>
>
>
> On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin <
> knagelb...@globeandmail.com> wrote:
>
> > I think you just want something like:
> >
> > p_value:"Pramod" AND p_type:"Supplier"
> >
> > no?
> > -Kallin Nagelberg
> >
> > -Original Message-
> > From: Pramod Goyal [mailto:pramod.go...@gmail.com]
> > Sent: Friday, July 23, 2010 2:17 PM
> > To: solr-user@lucene.apache.org
> > Subject: help with a schema design problem
> >
> > Hi,
> >
> > Lets say i have table with 3 columns document id Party Value and Party
> > Type.
> > In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod
> > Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type:
> > Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier.
> > Now in this table if i use SQL its easy for me find all document with
> Party
> > Value as Pramod and Party Type as Client.
> >
> > I need to design solr schema so that i can do the same in Solr. If i
> create
> > 2 fields in solr schema Party value and Party type both of them multi
> > valued
> > and try to query +Pramod +Supplier then solr will return me the first
> > document, even though in the first document Pramod is a client and not a
> > supplier
> > Thanks,
> > Pramod Goyal
> >
>


Re: a bug of solr distributed search

2010-07-23 Thread MitchK


That only works if the docs are exactly the same - they may not be. 
Ahm, what? Why? If the uniqueID is the same, the docs *should* be the same,
don't they?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990563.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: a bug of solr distributed search

2010-07-23 Thread MitchK

... Additionally to my previous posting:
To keep this sync we could do two things:
Waiting for every server to make sure that everyone uses the same values to
compute the score and than apply them.
Or: Let's say that we collect the new values every 15 minutes. To merge and
send them over the network, we declare that this will need 3 additionally
minutes (We want to keep the network traffic for such actions very low, so
we do not send everything instantly).
Okay, and now we say "2 additionally minutes, if 3 were not enough or
something needs a little bit more time than we tought.". After those 2
minutes, every node has to apply the new values.
Pro: If one node gets broken, we do not delay the Application of the new
values.
Con: We need two HashMaps and both will have roughly the same sice. That
means we will waste some RAM for this operation, if we do not write the
values to disk (Which I do not suggest).

Thoughts?

- Mitch

MitchK wrote:
> 
> Yonik,
> 
> why do we do not send the output of TermsComponent of every node in the
> cluster to a Hadoop instance?
> Since TermsComponent does the map-part of the map-reduce concept, Hadoop
> only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
> After reducing, every node in the cluster gets the current values to
> compute the idf.
> We can store this information in a HashMap-based SolrCache (or something
> like that) to provide constant-time access. To keep the values up to date,
> we can repeat that after every x minutes.
> 
> If we got that, it does not care whereas we use doc_X from shard_A or
> shard_B, since they will all have got the same scores. 
> 
> Even if we got large indices with 10 million or more unique terms, this
> will only need some megabyte network-traffic.
> 
> Kind regards,
> - Mitch
> 
> 
> Yonik Seeley-2-2 wrote:
>> 
>> As the comments suggest, it's not a bug, but just the best we can do
>> for now since our priority queues don't support removal of arbitrary
>> elements.  I guess we could rebuild the current priority queue if we
>> detect a duplicate, but that will have an obvious performance impact.
>> Any other suggestions?
>> 
>> -Yonik
>> http://www.lucidimagination.com
>> 
> 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990551.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Autocommit not happening

2010-07-23 Thread John DeRosa
On Jul 23, 2010, at 9:37 AM, John DeRosa wrote:

> Hi! I'm a Solr newbie, and I don't understand why autocommits aren't 
> happening in my Solr installation.
> 

[snip]

"Never mind"... I have discovered my boneheaded mistake. It's so silly, I wish 
I could retract my question from the archives.



Re: a bug of solr distributed search

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 2:23 PM, MitchK  wrote:
> why do we do not send the output of TermsComponent of every node in the
> cluster to a Hadoop instance?
> Since TermsComponent does the map-part of the map-reduce concept, Hadoop
> only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
> After reducing, every node in the cluster gets the current values to compute
> the idf.
> We can store this information in a HashMap-based SolrCache (or something
> like that) to provide constant-time access. To keep the values up to date,
> we can repeat that after every x minutes.

There's already a patch in JIRA that does distributed IDF.
Hadoop wouldn't be the right tool for that anyway... it's for batch
oriented systems, not low-latency queries.

> If we got that, it does not care whereas we use doc_X from shard_A or
> shard_B, since they will all have got the same scores.

That only works if the docs are exactly the same - they may not be.

-Yonik
http://www.lucidimagination.com


Re: help with a schema design problem

2010-07-23 Thread Pramod Goyal
I want to do that. But if i understand correctly in solr it would store the
field like this:

p_value: "Pramod"  "Raj"
p_type:  "Client" "Supplier"

When i search
p_value:"Pramod" AND p_type:"Supplier"

it would give me result as document 1. Which is incorrect, since in document
1 Pramod is a Client and not a Supplier.




On Fri, Jul 23, 2010 at 11:52 PM, Nagelberg, Kallin <
knagelb...@globeandmail.com> wrote:

> I think you just want something like:
>
> p_value:"Pramod" AND p_type:"Supplier"
>
> no?
> -Kallin Nagelberg
>
> -Original Message-
> From: Pramod Goyal [mailto:pramod.go...@gmail.com]
> Sent: Friday, July 23, 2010 2:17 PM
> To: solr-user@lucene.apache.org
> Subject: help with a schema design problem
>
> Hi,
>
> Lets say i have table with 3 columns document id Party Value and Party
> Type.
> In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod
> Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type:
> Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier.
> Now in this table if i use SQL its easy for me find all document with Party
> Value as Pramod and Party Type as Client.
>
> I need to design solr schema so that i can do the same in Solr. If i create
> 2 fields in solr schema Party value and Party type both of them multi
> valued
> and try to query +Pramod +Supplier then solr will return me the first
> document, even though in the first document Pramod is a client and not a
> supplier
> Thanks,
> Pramod Goyal
>


Re: a bug of solr distributed search

2010-07-23 Thread MitchK

Yonik,

why do we do not send the output of TermsComponent of every node in the
cluster to a Hadoop instance?
Since TermsComponent does the map-part of the map-reduce concept, Hadoop
only needs to reduce the stuff. Maybe we even do not need Hadoop for this.
After reducing, every node in the cluster gets the current values to compute
the idf.
We can store this information in a HashMap-based SolrCache (or something
like that) to provide constant-time access. To keep the values up to date,
we can repeat that after every x minutes.

If we got that, it does not care whereas we use doc_X from shard_A or
shard_B, since they will all have got the same scores. 

Even if we got large indices with 10 million or more unique terms, this will
only need some megabyte network-traffic.

Kind regards,
- Mitch


Yonik Seeley-2-2 wrote:
> 
> As the comments suggest, it's not a bug, but just the best we can do
> for now since our priority queues don't support removal of arbitrary
> elements.  I guess we could rebuild the current priority queue if we
> detect a duplicate, but that will have an obvious performance impact.
> Any other suggestions?
> 
> -Yonik
> http://www.lucidimagination.com
> 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p990506.html
Sent from the Solr - User mailing list archive at Nabble.com.


Sort by index order desc?

2010-07-23 Thread Ryan McKinley
Any pointers on how to sort by reverse index order?
http://search.lucidimagination.com/search/document/4a59ded3966271ca/sort_by_index_order_desc

it seems like it should be easy to do with the function query stuff,
but i'm not sure what to sort by (unless I add a new field for indexed
time)


Any pointers?

Thanks
Ryan


RE: help with a schema design problem

2010-07-23 Thread Nagelberg, Kallin
I think you just want something like:

p_value:"Pramod" AND p_type:"Supplier"

no?
-Kallin Nagelberg

-Original Message-
From: Pramod Goyal [mailto:pramod.go...@gmail.com] 
Sent: Friday, July 23, 2010 2:17 PM
To: solr-user@lucene.apache.org
Subject: help with a schema design problem

Hi,

Lets say i have table with 3 columns document id Party Value and Party Type.
In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod
Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type:
Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier.
Now in this table if i use SQL its easy for me find all document with Party
Value as Pramod and Party Type as Client.

I need to design solr schema so that i can do the same in Solr. If i create
2 fields in solr schema Party value and Party type both of them multi valued
and try to query +Pramod +Supplier then solr will return me the first
document, even though in the first document Pramod is a client and not a
supplier
Thanks,
Pramod Goyal


help with a schema design problem

2010-07-23 Thread Pramod Goyal
Hi,

Lets say i have table with 3 columns document id Party Value and Party Type.
In this table i have 3 rows. 1st row Document id: 1 Party Value: Pramod
Party Type: Client. 2nd row: Document id: 1 Party Value: Raj Party Type:
Supplier. 3rd row Document id:2 Party Value: Pramod Party Type: Supplier.
Now in this table if i use SQL its easy for me find all document with Party
Value as Pramod and Party Type as Client.

I need to design solr schema so that i can do the same in Solr. If i create
2 fields in solr schema Party value and Party type both of them multi valued
and try to query +Pramod +Supplier then solr will return me the first
document, even though in the first document Pramod is a client and not a
supplier
Thanks,
Pramod Goyal


Re: Replacing text fields with numeric fields for speed

2010-07-23 Thread Gora Mohanty
On Fri, 23 Jul 2010 14:33:54 +0200
Peter Karich  wrote:

> Gora,
> 
> just for my interests:
> does apache bench sends different queries, or from the logs, or
> always the same query?
> If it would be always the same query the cache of solr will come
> and make the response time super small.

Yes, the way that things are set up currently the query is always
the same. My reasoning was that the effect of the Solr cache should
be the same for both numeric, and text fields. I am going to be
trying some more rigorous tests, such as turning off Solr caching,
and pre-warming the query before running the tests.

> I would like to find a tool or script where I can send my logfile
> to solr and measure some things ... because at the moment we are
> using fastbench and I would like to replace it ;-)

Not sure what fastbench is, but using Solr logs as a tool to
measure search times for typical searches is an interesting idea.
Hmm, we will also need to do that, so maybe we can compare notes on
this.

Regards,
Gora


RE: Spellcheck help

2010-07-23 Thread Dyer, James
In org.apache.solr.spelling.SpellingQueryConverter, find the line (#84):

final static String PATTERN = "(?:(?!(" + NMTOKEN + ":|\\d+)))[\\p{L}_\\-0-9]+";

and remove the |\\d+ to make it:

final static String PATTERN = "(?:(?!" + NMTOKEN + ":))[\\p{L}_\\-0-9]+";

My testing shows this solves your problem.  The caution is to test it against 
all your use cases because obviously someone thought we should ignore leading 
digits from keywords.  Surely there's a reason why although I can't think of it.

James Dyer
E-Commerce Systems
Ingram Book Company
(615) 213-4311

-Original Message-
From: dekay...@hotmail.com [mailto:dekay...@hotmail.com] 
Sent: Saturday, July 17, 2010 12:41 PM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck help

Can anybody help me with this? :(

-Original Message- 
From: Marc Ghorayeb
Sent: Thursday, July 08, 2010 9:46 AM
To: solr-user@lucene.apache.org
Subject: Spellcheck help


Hello,I've been trying to get rid of a bug when using the spellcheck but so 
far with no success :(When searching for a word that starts with a number, 
for example "3dsmax", i get the results that i want, BUT the spellcheck says 
it is not correctly spelled AND the collation gives me "33dsmax". Further 
investigation shows that the spellcheck is actually only checking "dsmax" 
which it considers does not exist and gives me "3dsmax" for better results, 
but since i have spellcheck.collate = true, the collation that i show is 
"33dsmax" with the first 3 being the one discarded by the spellchecker... 
Otherwise, the spellcheck works correctly for normal words... any ideas? 
:(My spellcheck field is fairly classic, whitespace tokenizer, with 
lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
_
Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
http://www.messengersurvotremobile.com/?d=iPhone 



RE: Novice seeking help to change filters to search without diacritics

2010-07-23 Thread Steven A Rowe
Hi HSingh,

Maybe the mapping file I attached to 
https://issues.apache.org/jira/browse/SOLR-2013 will help?

Steve

> -Original Message-
> From: HSingh [mailto:hsin...@gmail.com]
> Sent: Thursday, July 22, 2010 11:30 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Novice seeking help to change filters to search without
> diacritics
> 
> 
> Hoss, thank you for your helpful response!
> 
> : i think what's confusing you is that you are using the
> : MappingCharFilterFactory with that file in your "text" field type to
> : convert any ISOLatin1Accent characters to their "base" characters
> 
> The problem is that a large range of characters are not getting converting
> to their base characters.  The ASCIIFoldingFilterFactory handles this
> conversion for the entire Latin character set, including the extended sets
> without having to specify individual characters and their equivalent base
> characters.
> 
> Is there way for me to switch to ASCIIFoldingFilterFactory?  If so, what
> changes do I need to make to these files?  I would appreciate your help!
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Novice-
> seeking-help-to-change-filters-to-search-without-diacritics-
> tp971263p988890.html
> Sent from the Solr - User mailing list archive at Nabble.com.


RE: filter query on timestamp slowing query???

2010-07-23 Thread Jonathan Rochkind

> and a typical query would be:
>
fl=id,type,timestamp,score&start=0&q="Coca+Cola"+pepsi+-"dr+pepper"&fq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y)&
> rows=2000

On top of using trie dates, you might consider separating the timestamp portion 
and the type portion of the fq into seperate fq parameters -- that will allow 
them to to be stored in the filter cache seperately. So for instance, if you 
include "type:x OR type:y" in queries a lot, but with different date ranges, 
then when you make a new query, the set for "type:x OR type:y" can be pulled 
from the filter cache and intersected with the other result set, that portion 
won't have to be run again. That's probably not where your slowness is coming 
from, but shouldn't hurt. 

Multiple fq's are essentially AND'd together, so whenever you have an 'fq' 
that's seperate clauses AND'd together, you can always seperate them into 
multiple fq's, wont' effect the result set, will effect the caching 
possibilities. 

Allow custom overrides

2010-07-23 Thread Charlie Jackson
I need to implement a search engine that will allow users to override
pieces of data and then search against or view that data. For example, a
doc that has the following values:

 

DocId   FulltextMeta1  Meta2 Meta3

1   The quick brown fox foofoo   foo

 

Now say a user overrides Meta2 :

 

DocId   FulltextMeta1  Meta2 Meta3

1   The quick brown fox foofoo   foo

   bar

 

For that user, if they search for Meta2:bar, I need to hit, but no other
user should hit on it. Likewise, if that user searches for Meta2:foo, it
should not hit. Also, any searches against that document for that user
should return the value 'bar' for Meta2, but should return 'foo' for
other users.  

 

I'm not sure the best way to implement this. Maybe I could do this with
field collapsing somehow? Or with payloads? Custom analyzer? Any help
would be appreciated.

 

 

- Charlie

 



Re: Autocommit not happening

2010-07-23 Thread John DeRosa
On Jul 23, 2010, at 9:37 AM, John DeRosa wrote:

> Hi! I'm a Solr newbie, and I don't understand why autocommits aren't 
> happening in my Solr installation.
> 
> My one server running Solr:
> 
> - Ubuntu 10.04 (Lucid Lynx), with all the latest updates.
> - Solr 1.4.0 running on Tomcat6
> - Installation was done via "apt-get install solr-common solr-tomcat 
> tomcat6-admin"
> 
> My solrconfig.xml has:
> 
>  1
>  1 
>
> 

[snip]

The plot thickens. var/log/tomcat6/catalina.out contains:

Jul 22, 2010 9:36:32 PM 
org.apache.solr.update.DirectUpdateHandler2$CommitTracker 
INFO: AutoCommit: disabled

What's stepping in and disabling autocommit?

John



Re: filter query on timestamp slowing query???

2010-07-23 Thread oferiko

I'm in the process of indexing my demi data to test that, I'll have more
valid data on whether or not it made the differeve In a few days
Thanks


ב-23/07/2010, בשעה 19:42, "Jonathan Rochkind [via Lucene]" <
ml-node+990234-2085494904-316...@n3.nabble.com> כתב/ה:

> and a typical query would be:
>
fl=id,type,timestamp,score&start=0&q="Coca+Cola"+pepsi+-"dr+pepper"&fq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y)&

> rows=2000

My understanding is that this is essentially what the solr 1.4 trie date
fields are made for, I'd use them, should speed things up.  Not sure where
the best documentation for them is, but see:

http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/




--
 View message @
http://lucene.472066.n3.nabble.com/filter-query-on-timestamp-slowing-query-tp977280p990234.html
To unsubscribe from Re: filter query on timestamp slowing query???, click
here< (link removed) =>.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/filter-query-on-timestamp-slowing-query-tp977280p990337.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: filter query on timestamp slowing query???

2010-07-23 Thread Jonathan Rochkind
> and a typical query would be:
> fl=id,type,timestamp,score&start=0&q="Coca+Cola"+pepsi+-"dr+pepper"&fq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y)&
> rows=2000

My understanding is that this is essentially what the solr 1.4 trie date fields 
are made for, I'd use them, should speed things up.  Not sure where the best 
documentation for them is, but see:

http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/




Autocommit not happening

2010-07-23 Thread John DeRosa
Hi! I'm a Solr newbie, and I don't understand why autocommits aren't happening 
in my Solr installation.

My one server running Solr:

- Ubuntu 10.04 (Lucid Lynx), with all the latest updates.
- Solr 1.4.0 running on Tomcat6
- Installation was done via "apt-get install solr-common solr-tomcat 
tomcat6-admin"

My solrconfig.xml has:
 
  1
  1 



My code can add documents just fine. But after 12 hours, autocommit has never 
happened! Here's what I see on my Solr Admin pages:

CORE:   
name:   core  
class:   
version:1.0  
description:SolrCore  
stats:  coreName : 
startTime : Thu Jul 22 21:38:30 UTC 2010 
refCount : 2 
aliases : [] 
name:   searcher  
class:  org.apache.solr.search.SolrIndexSearcher  
version:1.0  
description:index searcher  
stats:  searcherName : searc...@10ed7f5c main 
caching : true 
numDocs : 0 
maxDoc : 0 
reader : 
SolrIndexReader{this=509f662e,r=readonlydirectoryrea...@509f662e,refCnt=1,segments=0}
 
readerDir : org.apache.lucene.store.NIOFSDirectory@/var/lib/solr/data/index 
indexVersion : 1279834591965 
openedAt : Thu Jul 22 23:58:28 UTC 2010 
registeredAt : Thu Jul 22 23:58:28 UTC 2010 
warmupTime : 3 
name:   searc...@10ed7f5c main  
class:  org.apache.solr.search.SolrIndexSearcher  
version:1.0  
description:index searcher  
stats:  searcherName : searc...@10ed7f5c main 
caching : true 
numDocs : 0 
maxDoc : 0 
reader : 
SolrIndexReader{this=509f662e,r=readonlydirectoryrea...@509f662e,refCnt=1,segments=0}
 
readerDir : org.apache.lucene.store.NIOFSDirectory@/var/lib/solr/data/index 
indexVersion : 1279834591965 
openedAt : Thu Jul 22 23:58:28 UTC 2010 
registeredAt : Thu Jul 22 23:58:28 UTC 2010 
warmupTime : 3 


UPDATE HANDLERS:

name:   updateHandler  
class:  org.apache.solr.update.DirectUpdateHandler2  
version:1.0  
description:Update handler that efficiently directly updates the on-disk 
main lucene index  
stats:  commits : 2 
autocommits : 0 
optimizes : 0 
rollbacks : 0 
expungeDeletes : 0 
docsPending : 496590 
adds : 496590 
deletesById : 0 
deletesByQuery : 0 
errors : 0 
cumulative_adds : 501989 
cumulative_deletesById : 0 
cumulative_deletesByQuery : 2 
cumulative_errors : 0 


There's nearly 500K pending commits, accumulated over the past 12 hours. I 
think we're past the specified autocommit limits. :-)

What should I look at to figure out what's preventing autocommits?

Thank you all in advance!

John



Re: Solr on iPad?

2010-07-23 Thread Stephan Schwab

Thanks Mark!

I'm subscribing to the cocoa-dev list.

On Jul 23, 2010, at 10:17 AM, Mark Allan [via Lucene] wrote:

> Hi Stephan, 
> 
> On the iPad, as with the iPhone, I'm afraid you're stuck with using   
> SQLite if you want any form of database in your app. 
> 
> I suppose if you wanted to get really ambitious and had a lot of time   
> on your hands you could use Xcode to try and compile one of the open- 
> source C-based DBs/Indexers, but as with most things in OS X and iOS   
> development, if you're bending over yourself trying to implement   
> something, you're probably doing it wrongly!  Also, I wouldn't put it   
> past the AppStore guardians to reject your app purely on the basis of   
> having used something other than SQLite! 
> 
> Apple's cocoa-dev mailing list is very active if you have problems,   
> but do your homework before asking questions or you'll get short shrift. 
> http://lists.apple.com/cocoa-dev
> 
> Mark 
> 
> On 22 Jul 2010, at 6:12 pm, Stephan Schwab wrote: 
> 
> > Dear Solr community, 
> > 
> > does anyone know whether it may be possible or has already been done   
> > to 
> > bring Solr to the Apple iPad so that applications may use a local   
> > search 
> > engine? 
> > 
> > Greetings, 
> > Stephan
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in 
> Scotland, with registration number SC005336. 
> 
> 
> 
> View message @ 
> http://lucene.472066.n3.nabble.com/Solr-on-iPad-tp987655p989269.html 
> To unsubscribe from Solr on iPad?, click here.
> 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-on-iPad-tp987655p990034.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Duplicates

2010-07-23 Thread Pavel Minchenkov
I mean two usecases.
I can't index folders only because I have another queries on files. Or I
have to do another index that contains only folders, but then I have to take
care of synchronizing folders in two indexes.
Does range, spatial, etc quiries are supported on multivalued fields?

2010/7/23 Peter Karich 

> Pavel,
>
> hopefully I understand now your usecase :-) but one question:
>
> > I need to select always *one* file per folder or
> > select *only* folders than contains matched files (without files).
>
> What do you mean here with 'or'? Do you have 2 usecases or would one of
> them be sufficient?
> Because the second usecase could be solved without the patch: you could
> index folders only,
> then all prop_N will be multivalued field. and you don't have the problem
> of duplicate folders.
>
> (If you don't mind uglyness both usecases could even handled: After you got
> the folders
>  grabbing the files which matched could be done in postprocessing)
>
> But I fear the cleanest solution is to use the patch. Hopefully it can be
> applied without hassles
> against 1.4 or the trunk. If not, please ask on the patch-site for
> assistance.
>
> Regards,
> Peter.
>
>
> > Thanks, Peter!
> >
> > I'll try collapsing today.
> >
> > Example (sorry if table unformated):
> >
> > id |  type  |   prop_1  |  |  prop_N |  folderId
> > 
> >  0 | folder |   |  | |
> >  1 | file   |  val1 |  |  valN1  |   0
> >  2 | file   |  val3 |  |  valN2  |   0
> >  3 | file   |  val1 |  |  valN3  |   0
> >  4 | folder |   |  | |
> >  5 | folder |   |  | |
> >  6 | file   |  val3 |  |  valN7  |   6
> >  7 | file   |  val4 |  |  valN8  |   6
> >  8 | folder |   |  | |
> >  9 | file   |  val2 |  |  valN3  |   8
> >  10| file   |  val1 |  |  valN2  |   8
> >  11| file   |  val2 |  |  valN5  |   8
> >  12| folder |   |  | |
> >
> >
> > I need to select always *one* file per folder or
> > select *only* folders than contains matched files (without files).
> >
> > Query:
> > prop_1:val1 OR prop_2:val2
> >
> > I need results (document ids):
> > 1, 9
> > or
> > 0, 8
> >
> > 2010/7/23 Peter Karich 
> >
> >
> >> Hi Pavel!
> >>
> >> The patch can be applied to 1.4.
> >> The performance is ok, but for some situations it could be worse than
> >> without the patch.
> >> For us it works good, but others reported some exceptions
> >> (see the patch site: https://issues.apache.org/jira/browse/SOLR-236)
> >>
> >>
> >>> I need only to delete duplicates
> >>>
> >> Could you give us an example what you exactly need?
> >> (Maybe you could index each master document of the 'unique' documents
> >> with an extra field and query for that field?)
> >>
> >> Regards,
> >> Peter.
> >>
> >> --
> >>
> > Pavel Minchenkov
> >
> >
>
>
> --
> http://karussell.wordpress.com/
>
>


-- 
Pavel Minchenkov


Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Eric Grobler
Hi Erik,

I must be doing something wrong :-(
I took:
svn co https://svn.apache.org/repos/asf/lucene/dev/trunk  mytest
  then i copied SOLR-792.path to folder /mytest/solr
then i ran:
  patch -p1 < SOLR-792.patch

but I get "can't find file to patch at input line 5"
Is this the correct trunk and patch command?

However if I just manually
  - copy TreeFacetComponent.java to folder
solr/src/java/org/apache/solr/handler/component
  - add SimpleOrderedMap _treeFacets; to
ResponseBuilder.java
  - and make the changes to solrconfig.xml
I am able to compile and run your test :-)

Regards
Eric


On Fri, Jul 23, 2010 at 2:37 PM, Erik Hatcher wrote:

> I've update the SOLR-792 patch to apply to trunk (using the solr/ directory
> as the root still, not the higher-level trunk/).
>
> This one I think is an important one that I'd love to see eventually part
> of Solr built-in, but the TODO's in TreeFacetComponent ought to be taken
> care of first, to generalize this to N fields levels and maybe some other
> must/nice-to-haves.
>
>Erik
>
>
>
> On Jul 23, 2010, at 3:45 AM, Eric Grobler wrote:
>
>  Thanks I saw the article,
>>
>> As far as I can tell the trunk archives only go back to the middle of
>> March
>> and the 2 patches are from the beginning of the year.
>>
>> Thus:
>> *These approaches can be tried out easily using a single set of sample
>> data
>> and the Solr example application (assumes current trunk codebase and
>> latest
>> patches posted to the respective issues). **
>>
>> **Is a bit of an over-statement!**
>> *
>> Regards
>> Eric*
>> *
>> On Fri, Jul 23, 2010 at 6:22 AM, Jonathan Rochkind 
>> wrote:
>>
>>  Solr does not, yet, at least not simply, as far as I know, but there are
>>> ideas and some JIRA's with maybe some patches:
>>>
>>> http://wiki.apache.org/solr/HierarchicalFaceting
>>>
>>>
>>> 
>>> From: rajini maski [rajinima...@gmail.com]
>>> Sent: Friday, July 23, 2010 12:34 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Tree Faceting in Solr 1.4
>>>
>>> I am also looking out for same feature in Solr and very keen to know
>>> whether
>>> it supports this feature of tree faceting... Or we are forced to index in
>>> tree faceting formatlike
>>>
>>> 1/2/3/4
>>> 1/2/3
>>> 1/2
>>> 1
>>>
>>> In-case of multilevel faceting it will give only 2 level tree facet is
>>> what
>>> i found..
>>>
>>> If i give query as : country India and state Karnataka and city
>>> bangalore...All what i want is a facet count  1) for condition above. 2)
>>> The
>>> number of states in that Country 3) the number of cities in that state
>>> ...
>>>
>>> Like => Country: India ,State:Karnataka , City: Bangalore <1>
>>>
>>>   State:Karnataka
>>>Kerla
>>>Tamilnadu
>>>Andra Pradesh...and so on
>>>
>>>   City:  Mysore
>>>Hubli
>>>Mangalore
>>>Coorg and so on...
>>>
>>>
>>> If I am doing
>>> facet=on & facet.field={!ex=State}State & fq={!tag=State}State:Karnataka
>>>
>>> All it gives me is Facets on state excluding only that filter query.. But
>>> i
>>> was not able to do same on third level ..Like  facet.field= Give me the
>>> counts of  cities also in state Karantaka..
>>> Let me know solution for this...
>>>
>>> Regards,
>>> Rajani Maski
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler <
>>> impalah...@googlemail.com
>>>
 wrote:

>>>
>>>  Thank you for the link.

 I was not aware of the multifaceting syntax - this will enable me to run

>>> 1
>>>
 less query on the main page!

 However this is not a tree faceting feature.

 Thanks
 Eric




 On Thu, Jul 22, 2010 at 4:51 PM, SR  wrote:

  Perhaps the following article can help:
>
>

>>> http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
>>>

> -S
>
>
> On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote:
>
>  Hi Solr Community
>>
>> If I have:
>> COUNTRY CITY
>> Germany Berlin
>> Germany Hamburg
>> Spain   Madrid
>>
>> Can I do faceting like:
>> Germany
>> Berlin
>> Hamburg
>> Spain
>> Madrid
>>
>> I tried to apply SOLR-792 to the current trunk but it does not seem
>>
> to
>>>
 be

> compatible.
>> Maybe there is a similar feature existing in the latest builds?
>>
>> Thanks & Regards
>> Eric
>>
>
>
>

>>>
>


Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Geert-Jan Brits
>If I am doing
>facet=on & facet.field={!ex=State}State & fq={!tag=State}State:Karnataka

>All it gives me is Facets on state excluding only that filter query.. But i
>was not able to do same on third level ..Like  facet.field= Give me the
>counts of  cities also in state Karantaka..
>Let me know solution for this...

This looks like regular faceting to me.

1. Showing citycounts given state
facet=on&fq=State:Karnataka&facet.field=city

2. showing statecounts given country (similar to 1)
facet=on&fq=Country:India&facet.field=state

3. showing city and state counts given country:
facet=on&fq=Country:India&facet.field=state&facet.field=city

4. showing city counts given state + all other states not filtered by
current state (
http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters
)
facet=on&fq={!tag=State}state:Karnataka&facet.field={!ex=State}state&facet.field=city

5. showing state + city counts given country + all other countries not
filtered by current country
(similar
to 4)
facet=on&fq={!tag=country}country:India&facet.field={!ex=country}country&facet.field=city&facet.field=state

etc.

This has nothing to do with "Hierarchical faceting" as described in SOLR-792
btw, although I understand the possible confusion as County > state > city
can obvisouly be seen as some sort of hierarchy.  The first part of your
question seemed to be more about Hierarchial faceting as per SOLR-792, but I
couldn't quite distill a question from that part.

Also, just a suggestion, consider using id's instead of names for filtering;
you will get burned sooner or later otherwise.

HTH,

Geert-Jan



2010/7/23 rajini maski 

> I am also looking out for same feature in Solr and very keen to know
> whether
> it supports this feature of tree faceting... Or we are forced to index in
> tree faceting formatlike
>
> 1/2/3/4
> 1/2/3
> 1/2
> 1
>
> In-case of multilevel faceting it will give only 2 level tree facet is what
> i found..
>
> If i give query as : country India and state Karnataka and city
> bangalore...All what i want is a facet count  1) for condition above. 2)
> The
> number of states in that Country 3) the number of cities in that state ...
>
> Like => Country: India ,State:Karnataka , City: Bangalore <1>
>
> State:Karnataka
>  Kerla
>  Tamilnadu
>  Andra Pradesh...and so on
>
> City:  Mysore
>  Hubli
>  Mangalore
>  Coorg and so on...
>
>
> If I am doing
> facet=on & facet.field={!ex=State}State & fq={!tag=State}State:Karnataka
>
> All it gives me is Facets on state excluding only that filter query.. But i
> was not able to do same on third level ..Like  facet.field= Give me the
> counts of  cities also in state Karantaka..
> Let me know solution for this...
>
> Regards,
> Rajani Maski
>
>
>
>
>
> On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler  >wrote:
>
> > Thank you for the link.
> >
> > I was not aware of the multifaceting syntax - this will enable me to run
> 1
> > less query on the main page!
> >
> > However this is not a tree faceting feature.
> >
> > Thanks
> > Eric
> >
> >
> >
> >
> > On Thu, Jul 22, 2010 at 4:51 PM, SR  wrote:
> >
> > > Perhaps the following article can help:
> > >
> >
> http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
> > >
> > > -S
> > >
> > >
> > > On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote:
> > >
> > > > Hi Solr Community
> > > >
> > > > If I have:
> > > > COUNTRY CITY
> > > > Germany Berlin
> > > > Germany Hamburg
> > > > Spain   Madrid
> > > >
> > > > Can I do faceting like:
> > > > Germany
> > > >  Berlin
> > > >  Hamburg
> > > > Spain
> > > >  Madrid
> > > >
> > > > I tried to apply SOLR-792 to the current trunk but it does not seem
> to
> > be
> > > > compatible.
> > > > Maybe there is a similar feature existing in the latest builds?
> > > >
> > > > Thanks & Regards
> > > > Eric
> > >
> > >
> >
>


solrj occasional timeout on commit

2010-07-23 Thread Nagelberg, Kallin
Hey,

I recently moved a solr app from a testing environment into a production 
environment, and I'm seeing a brand new error which never occurred during 
testing. I'm seeing this in the solrJ-based app logs:


org.apache.solr.common.SolrException: com.caucho.vfs.SocketTimeoutException: 
client timeout

com.caucho.vfs.SocketTimeoutException: client timeout

request: http://somehost:8080/solr/live/update?wt=javabin&version=1

at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424)

at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)

at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)




This occurs in a service that periodically adds new documents to solr. There 
are 4 boxes that could be doing updates in parallel. In testing there were 2.





We're running on a new Resin 4 based install on production, whereas we were 
using resin 3 in testing. Does anyone have any ideas. Help would be greatly 
appreciated!



Thanks,

-Kallin Nagelberg







Re: Solr 3.1 dev

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 9:33 AM, robert mena  wrote:
> Hi,
> is there any wiki/url of the proposed changes or new features that we should
> expect with this new release?

You can see what has already gone in by looking at the appropriate
CHANGES.txt in subversion.

http://svn.apache.org/viewvc/lucene/dev/trunk/solr/CHANGES.txt?view=markup
http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/CHANGES.txt?view=markup

-Yonik
http://www.lucidimagination.com


Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Eric Grobler
Hi Erik,

Thanks for the fast update :-)
I will try it soon.

Regards
Eric

On Fri, Jul 23, 2010 at 2:37 PM, Erik Hatcher wrote:

> I've update the SOLR-792 patch to apply to trunk (using the solr/ directory
> as the root still, not the higher-level trunk/).
>
> This one I think is an important one that I'd love to see eventually part
> of Solr built-in, but the TODO's in TreeFacetComponent ought to be taken
> care of first, to generalize this to N fields levels and maybe some other
> must/nice-to-haves.
>
>Erik
>
>
>
> On Jul 23, 2010, at 3:45 AM, Eric Grobler wrote:
>
>  Thanks I saw the article,
>>
>> As far as I can tell the trunk archives only go back to the middle of
>> March
>> and the 2 patches are from the beginning of the year.
>>
>> Thus:
>> *These approaches can be tried out easily using a single set of sample
>> data
>> and the Solr example application (assumes current trunk codebase and
>> latest
>> patches posted to the respective issues). **
>>
>> **Is a bit of an over-statement!**
>> *
>> Regards
>> Eric*
>> *
>> On Fri, Jul 23, 2010 at 6:22 AM, Jonathan Rochkind 
>> wrote:
>>
>>  Solr does not, yet, at least not simply, as far as I know, but there are
>>> ideas and some JIRA's with maybe some patches:
>>>
>>> http://wiki.apache.org/solr/HierarchicalFaceting
>>>
>>>
>>> 
>>> From: rajini maski [rajinima...@gmail.com]
>>> Sent: Friday, July 23, 2010 12:34 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Tree Faceting in Solr 1.4
>>>
>>> I am also looking out for same feature in Solr and very keen to know
>>> whether
>>> it supports this feature of tree faceting... Or we are forced to index in
>>> tree faceting formatlike
>>>
>>> 1/2/3/4
>>> 1/2/3
>>> 1/2
>>> 1
>>>
>>> In-case of multilevel faceting it will give only 2 level tree facet is
>>> what
>>> i found..
>>>
>>> If i give query as : country India and state Karnataka and city
>>> bangalore...All what i want is a facet count  1) for condition above. 2)
>>> The
>>> number of states in that Country 3) the number of cities in that state
>>> ...
>>>
>>> Like => Country: India ,State:Karnataka , City: Bangalore <1>
>>>
>>>   State:Karnataka
>>>Kerla
>>>Tamilnadu
>>>Andra Pradesh...and so on
>>>
>>>   City:  Mysore
>>>Hubli
>>>Mangalore
>>>Coorg and so on...
>>>
>>>
>>> If I am doing
>>> facet=on & facet.field={!ex=State}State & fq={!tag=State}State:Karnataka
>>>
>>> All it gives me is Facets on state excluding only that filter query.. But
>>> i
>>> was not able to do same on third level ..Like  facet.field= Give me the
>>> counts of  cities also in state Karantaka..
>>> Let me know solution for this...
>>>
>>> Regards,
>>> Rajani Maski
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler <
>>> impalah...@googlemail.com
>>>
 wrote:

>>>
>>>  Thank you for the link.

 I was not aware of the multifaceting syntax - this will enable me to run

>>> 1
>>>
 less query on the main page!

 However this is not a tree faceting feature.

 Thanks
 Eric




 On Thu, Jul 22, 2010 at 4:51 PM, SR  wrote:

  Perhaps the following article can help:
>
>

>>> http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
>>>

> -S
>
>
> On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote:
>
>  Hi Solr Community
>>
>> If I have:
>> COUNTRY CITY
>> Germany Berlin
>> Germany Hamburg
>> Spain   Madrid
>>
>> Can I do faceting like:
>> Germany
>> Berlin
>> Hamburg
>> Spain
>> Madrid
>>
>> I tried to apply SOLR-792 to the current trunk but it does not seem
>>
> to
>>>
 be

> compatible.
>> Maybe there is a similar feature existing in the latest builds?
>>
>> Thanks & Regards
>> Eric
>>
>
>
>

>>>
>


Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Erik Hatcher
I've update the SOLR-792 patch to apply to trunk (using the solr/  
directory as the root still, not the higher-level trunk/).


This one I think is an important one that I'd love to see eventually  
part of Solr built-in, but the TODO's in TreeFacetComponent ought to  
be taken care of first, to generalize this to N fields levels and  
maybe some other must/nice-to-haves.


Erik


On Jul 23, 2010, at 3:45 AM, Eric Grobler wrote:


Thanks I saw the article,

As far as I can tell the trunk archives only go back to the middle  
of March

and the 2 patches are from the beginning of the year.

Thus:
*These approaches can be tried out easily using a single set of  
sample data
and the Solr example application (assumes current trunk codebase and  
latest

patches posted to the respective issues). **

**Is a bit of an over-statement!**
*
Regards
Eric*
*
On Fri, Jul 23, 2010 at 6:22 AM, Jonathan Rochkind  
 wrote:


Solr does not, yet, at least not simply, as far as I know, but  
there are

ideas and some JIRA's with maybe some patches:

http://wiki.apache.org/solr/HierarchicalFaceting



From: rajini maski [rajinima...@gmail.com]
Sent: Friday, July 23, 2010 12:34 AM
To: solr-user@lucene.apache.org
Subject: Re: Tree Faceting in Solr 1.4

I am also looking out for same feature in Solr and very keen to know
whether
it supports this feature of tree faceting... Or we are forced to  
index in

tree faceting formatlike

1/2/3/4
1/2/3
1/2
1

In-case of multilevel faceting it will give only 2 level tree facet  
is what

i found..

If i give query as : country India and state Karnataka and city
bangalore...All what i want is a facet count  1) for condition  
above. 2)

The
number of states in that Country 3) the number of cities in that  
state ...


Like => Country: India ,State:Karnataka , City: Bangalore <1>

   State:Karnataka
Kerla
Tamilnadu
Andra Pradesh...and so on

   City:  Mysore
Hubli
Mangalore
Coorg and so on...


If I am doing
facet=on & facet.field={!ex=State}State & fq={! 
tag=State}State:Karnataka


All it gives me is Facets on state excluding only that filter  
query.. But i
was not able to do same on third level ..Like  facet.field= Give me  
the

counts of  cities also in state Karantaka..
Let me know solution for this...

Regards,
Rajani Maski





On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler 
wrote:



Thank you for the link.

I was not aware of the multifaceting syntax - this will enable me  
to run

1

less query on the main page!

However this is not a tree faceting feature.

Thanks
Eric




On Thu, Jul 22, 2010 at 4:51 PM, SR  wrote:


Perhaps the following article can help:




http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html


-S


On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote:


Hi Solr Community

If I have:
COUNTRY CITY
Germany Berlin
Germany Hamburg
Spain   Madrid

Can I do faceting like:
Germany
Berlin
Hamburg
Spain
Madrid

I tried to apply SOLR-792 to the current trunk but it does not  
seem

to

be

compatible.
Maybe there is a similar feature existing in the latest builds?

Thanks & Regards
Eric











Re: Solr 3.1 dev

2010-07-23 Thread robert mena
Hi,

is there any wiki/url of the proposed changes or new features that we should
expect with this new release?

On Fri, Jul 23, 2010 at 9:20 AM, Yonik Seeley wrote:

> On Fri, Jul 23, 2010 at 6:09 AM, Eric Grobler 
> wrote:
> > I have a few questions :-)
> >
> > a) Will the next release of solr be 3.0 (instead of 1.5)?
>
> The next release will be 3.1 (matching the next lucene version off of
> the 3x branch).
> Trunk is 4.0-dev
>
> > b) How stable/mature is the current 3x version?
>
> For features that are not new, it should be very stable.
>
> > c) Is LocalSolr implemented? where can I find a list of new features?
>
> Solr spatial is partly implemented... currently in trunk.
> http://wiki.apache.org/solr/SpatialSearch
>
> > d) Is this the correct method to download the lasted stable version?
> > svn co https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x
>
> The last official Solr release was 1.4.1
> Nightly builds aren't official apache releases... but plenty of people
> do use them in production environments (after appropriate testing of
> course).
>
> -Yonik
> http://www.lucidimagination.com
>


Re: Solr 3.1 dev

2010-07-23 Thread Yonik Seeley
On Fri, Jul 23, 2010 at 6:09 AM, Eric Grobler  wrote:
> I have a few questions :-)
>
> a) Will the next release of solr be 3.0 (instead of 1.5)?

The next release will be 3.1 (matching the next lucene version off of
the 3x branch).
Trunk is 4.0-dev

> b) How stable/mature is the current 3x version?

For features that are not new, it should be very stable.

> c) Is LocalSolr implemented? where can I find a list of new features?

Solr spatial is partly implemented... currently in trunk.
http://wiki.apache.org/solr/SpatialSearch

> d) Is this the correct method to download the lasted stable version?
> svn co https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x

The last official Solr release was 1.4.1
Nightly builds aren't official apache releases... but plenty of people
do use them in production environments (after appropriate testing of
course).

-Yonik
http://www.lucidimagination.com


Re: Replacing text fields with numeric fields for speed

2010-07-23 Thread Peter Karich
Gora,

just for my interests:
does apache bench sends different queries, or from the logs, or always
the same query?
If it would be always the same query the cache of solr will come and
make the response time super small.

I would like to find a tool or script where I can send my logfile to solr
and measure some things ... because at the moment we are using fastbench
and I would like to replace it ;-)

Regards,
Peter.

> On Fri, 23 Jul 2010 14:44:32 +0530
> Gora Mohanty  wrote:
> [...]
>   
>>   From some experiments, I see only a small difference between a
>> text search on a field, and a numeric search on the corresponding
>> numeric field.
>> 
> [...]
>
> Well, I take that back. Running more rigorous tests with Apache
> Bench shows a difference of slightly over a factor of 2 between the
> median search time on the numeric field, and on the text field. The
> search on the numeric field is, of course, faster. That much
> of a difference puzzles me. Would someone knowledgeable about
> Lucene indexes care to comment?
>
> Regards,
> Gora
>   


Re: Duplicates

2010-07-23 Thread Peter Karich
Pavel,

hopefully I understand now your usecase :-) but one question:

> I need to select always *one* file per folder or
> select *only* folders than contains matched files (without files).

What do you mean here with 'or'? Do you have 2 usecases or would one of them be 
sufficient?
Because the second usecase could be solved without the patch: you could index 
folders only, 
then all prop_N will be multivalued field. and you don't have the problem of 
duplicate folders.

(If you don't mind uglyness both usecases could even handled: After you got the 
folders 
 grabbing the files which matched could be done in postprocessing)

But I fear the cleanest solution is to use the patch. Hopefully it can be 
applied without hassles
against 1.4 or the trunk. If not, please ask on the patch-site for assistance.

Regards,
Peter.


> Thanks, Peter!
>
> I'll try collapsing today.
>
> Example (sorry if table unformated):
>
> id |  type  |   prop_1  |  |  prop_N |  folderId
> 
>  0 | folder |   |  | |
>  1 | file   |  val1 |  |  valN1  |   0
>  2 | file   |  val3 |  |  valN2  |   0
>  3 | file   |  val1 |  |  valN3  |   0
>  4 | folder |   |  | |
>  5 | folder |   |  | |
>  6 | file   |  val3 |  |  valN7  |   6
>  7 | file   |  val4 |  |  valN8  |   6
>  8 | folder |   |  | |
>  9 | file   |  val2 |  |  valN3  |   8
>  10| file   |  val1 |  |  valN2  |   8
>  11| file   |  val2 |  |  valN5  |   8
>  12| folder |   |  | |
>
>
> I need to select always *one* file per folder or
> select *only* folders than contains matched files (without files).
>
> Query:
> prop_1:val1 OR prop_2:val2
>
> I need results (document ids):
> 1, 9
> or
> 0, 8
>
> 2010/7/23 Peter Karich 
>
>   
>> Hi Pavel!
>>
>> The patch can be applied to 1.4.
>> The performance is ok, but for some situations it could be worse than
>> without the patch.
>> For us it works good, but others reported some exceptions
>> (see the patch site: https://issues.apache.org/jira/browse/SOLR-236)
>>
>> 
>>> I need only to delete duplicates
>>>   
>> Could you give us an example what you exactly need?
>> (Maybe you could index each master document of the 'unique' documents
>> with an extra field and query for that field?)
>>
>> Regards,
>> Peter.
>>
>> --
>> 
> Pavel Minchenkov
>
>   


-- 
http://karussell.wordpress.com/



Re: filter query on timestamp slowing query???

2010-07-23 Thread oferiko

I don't specify any sort order, and i do request for the score, so it is
ordered based on that.

My schema consists of these fields:
 
 (changing now to tdate)
 


and a typical query would be:
fl=id,type,timestamp,score&start=0&q="Coca+Cola"+pepsi+-"dr+pepper"&fq=timestamp:[2010-07-07T00:00:00Z+TO+NOW]+AND+(type:x+OR+type:y)&rows=2000

thanks again for you time
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/filter-query-on-timestamp-slowing-query-tp977280p989536.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Replacing text fields with numeric fields for speed

2010-07-23 Thread Gora Mohanty
On Fri, 23 Jul 2010 14:44:32 +0530
Gora Mohanty  wrote:
[...]
>   From some experiments, I see only a small difference between a
> text search on a field, and a numeric search on the corresponding
> numeric field.
[...]

Well, I take that back. Running more rigorous tests with Apache
Bench shows a difference of slightly over a factor of 2 between the
median search time on the numeric field, and on the text field. The
search on the numeric field is, of course, faster. That much
of a difference puzzles me. Would someone knowledgeable about
Lucene indexes care to comment?

Regards,
Gora


Re: Delta import processing duration

2010-07-23 Thread Qwerky

I found my problem! It was a bad custom EntityProcessor I wrote.

My EntityProcessor wasn't checking for hasNext() on the Iterator from my
FileImportDataImportHandler, it was just returning next(). The second bug
was that when the Iterator ran out of records it was returning an empty
Map (it now returns null).
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Delta-import-processing-duration-tp987562p989425.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 3.1 dev

2010-07-23 Thread Eric Grobler
Hi Everyone

I have a few questions :-)

a) Will the next release of solr be 3.0 (instead of 1.5)?

b) How stable/mature is the current 3x version?

c) Is LocalSolr implemented? where can I find a list of new features?

d) Is this the correct method to download the lasted stable version?
svn co https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x

Thanks & regards
eric


Re: Duplicates

2010-07-23 Thread Pavel Minchenkov
Thanks, Peter!

I'll try collapsing today.

Example (sorry if table unformated):

id |  type  |   prop_1  |  |  prop_N |  folderId

 0 | folder |   |  | |
 1 | file   |  val1 |  |  valN1  |   0
 2 | file   |  val3 |  |  valN2  |   0
 3 | file   |  val1 |  |  valN3  |   0
 4 | folder |   |  | |
 5 | folder |   |  | |
 6 | file   |  val3 |  |  valN7  |   6
 7 | file   |  val4 |  |  valN8  |   6
 8 | folder |   |  | |
 9 | file   |  val2 |  |  valN3  |   8
 10| file   |  val1 |  |  valN2  |   8
 11| file   |  val2 |  |  valN5  |   8
 12| folder |   |  | |


I need to select always *one* file per folder or
select *only* folders than contains matched files (without files).

Query:
prop_1:val1 OR prop_2:val2

I need results (document ids):
1, 9
or
0, 8

2010/7/23 Peter Karich 

> Hi Pavel!
>
> The patch can be applied to 1.4.
> The performance is ok, but for some situations it could be worse than
> without the patch.
> For us it works good, but others reported some exceptions
> (see the patch site: https://issues.apache.org/jira/browse/SOLR-236)
>
> > I need only to delete duplicates
>
> Could you give us an example what you exactly need?
> (Maybe you could index each master document of the 'unique' documents
> with an extra field and query for that field?)
>
> Regards,
> Peter.
>
> --
Pavel Minchenkov


Re: Solr on iPad?

2010-07-23 Thread Chantal Ackermann
Hi,

unfortunately for iPad developers, it seems that it is not possible to
use the Spotlight engine through the SDK:

http://stackoverflow.com/questions/3133678/spotlight-search-in-the-application

Chantal

On Fri, 2010-07-23 at 10:16 +0200, Mark Allan wrote:
> Hi Stephan,
> 
> On the iPad, as with the iPhone, I'm afraid you're stuck with using  
> SQLite if you want any form of database in your app.
> 
> I suppose if you wanted to get really ambitious and had a lot of time  
> on your hands you could use Xcode to try and compile one of the open- 
> source C-based DBs/Indexers, but as with most things in OS X and iOS  
> development, if you're bending over yourself trying to implement  
> something, you're probably doing it wrongly!  Also, I wouldn't put it  
> past the AppStore guardians to reject your app purely on the basis of  
> having used something other than SQLite!
> 
> Apple's cocoa-dev mailing list is very active if you have problems,  
> but do your homework before asking questions or you'll get short shrift.
>   http://lists.apple.com/cocoa-dev
> 
> Mark
> 
> On 22 Jul 2010, at 6:12 pm, Stephan Schwab wrote:
> 
> > Dear Solr community,
> >
> > does anyone know whether it may be possible or has already been done  
> > to
> > bring Solr to the Apple iPad so that applications may use a local  
> > search
> > engine?
> >
> > Greetings,
> > Stephan
> 





Problem with Pdf, Sol 1.4.1 Cell

2010-07-23 Thread Alessandro Benedetti
Hi all,
as I saw in this discussion [1] there were many issues with PDF indexing in
Solr 1.4  due to TIka library (0.4 Version).
In Solr 1.4.1 the tika library is the same so I guess  the issues are the
same.
Could anyone, who contributed to the previous thread, help me in resolving
these issues?
I need a simple tutorial that could help me to upgrade Solr Cell!

Something like this:
1) download tika core from trunk
2)create jar with maven dependecies
3)unjar Sol 1.4.1 and change tika library
4)jar the patched Solr 1.4.1 and enjoy!

[1]
http://markmail.org/message/zbkplnzqho7mxwy3#query:+page:1+mid:gamcxdx34ayt6ccg+state:results

Best regards

-- 
--

Benedetti Alessandro
Personal Page: http://tigerbolt.altervista.org

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Replacing text fields with numeric fields for speed

2010-07-23 Thread Gora Mohanty
Hi,

  One of the things that we were thinking of doing in order to
speed up results from Solr search is to convert fixed-text fields
(such as values from a drop-down) into numeric fields. The thinking
behind this was that searching through numeric values would be
faster than searching through text. However, I now feel that we
were barking up the wrong tree, as Lucene is probably not doing a
text search per se.

  From some experiments, I see only a small difference between a
text search on a field, and a numeric search on the corresponding
numeric field. This difference can probably be attributed to the
additional processing on the text field. Could someone clarify on
whether one can expect a difference in speed between searching
through a fixed-text field, and its numeric equivalent?

  I am aware of the benefit of numeric fields for range queries.

Regards,
Gora


Re: Duplicates

2010-07-23 Thread Peter Karich
Hi Pavel!

The patch can be applied to 1.4.
The performance is ok, but for some situations it could be worse than
without the patch.
For us it works good, but others reported some exceptions
(see the patch site: https://issues.apache.org/jira/browse/SOLR-236)

> I need only to delete duplicates

Could you give us an example what you exactly need?
(Maybe you could index each master document of the 'unique' documents
with an extra field and query for that field?)

Regards,
Peter.

> Thanks.
>
> Does it work with Solr 1.4 (Solr 4.0 mentioned in article)?
> What about performance? I need only to delete duplicates (I don't need cout
> of duplicates or select certain duplicate).
>
> 2010/7/23 Peter Karich 
>
>   
>> Another possibility could be the well known 'field collapse' ;-)
>>
>> http://wiki.apache.org/solr/FieldCollapsing
>>
>> Regards,
>> Peter.
>>
>> 
>>> Thanks.
>>>
>>> If I set uniqueKey on the field, then I can save duplicates?
>>> I need to remove duplicates only from search results. The ability to save
>>> duplicates are should be.
>>>
>>> 2010/7/23 Erick Erickson 
>>>
>>>
>>>   
 If the field is a single token, just define the uniqueKey on it in your
 schema.

 Otherwise, this may be of interest:
 http://wiki.apache.org/solr/Deduplication

 Haven't used it myself though...

 best
 Erick

 On Thu, Jul 22, 2010 at 6:14 PM, Pavel Minchenkov 
 wrote:


 
> Hi,
>
> Is it possible to remove duplicates in search results by a given field?
>
> Thanks.
>
> --
> Pavel Minchenkov
>   
>>
>> 
>
>   


-- 
http://karussell.wordpress.com/



Re: Solr on iPad?

2010-07-23 Thread Mark Allan

Hi Stephan,

On the iPad, as with the iPhone, I'm afraid you're stuck with using  
SQLite if you want any form of database in your app.


I suppose if you wanted to get really ambitious and had a lot of time  
on your hands you could use Xcode to try and compile one of the open- 
source C-based DBs/Indexers, but as with most things in OS X and iOS  
development, if you're bending over yourself trying to implement  
something, you're probably doing it wrongly!  Also, I wouldn't put it  
past the AppStore guardians to reject your app purely on the basis of  
having used something other than SQLite!


Apple's cocoa-dev mailing list is very active if you have problems,  
but do your homework before asking questions or you'll get short shrift.

http://lists.apple.com/cocoa-dev

Mark

On 22 Jul 2010, at 6:12 pm, Stephan Schwab wrote:


Dear Solr community,

does anyone know whether it may be possible or has already been done  
to
bring Solr to the Apple iPad so that applications may use a local  
search

engine?

Greetings,
Stephan



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: Duplicates

2010-07-23 Thread Pavel Minchenkov
Thanks.

Does it work with Solr 1.4 (Solr 4.0 mentioned in article)?
What about performance? I need only to delete duplicates (I don't need cout
of duplicates or select certain duplicate).

2010/7/23 Peter Karich 

> Another possibility could be the well known 'field collapse' ;-)
>
> http://wiki.apache.org/solr/FieldCollapsing
>
> Regards,
> Peter.
>
> > Thanks.
> >
> > If I set uniqueKey on the field, then I can save duplicates?
> > I need to remove duplicates only from search results. The ability to save
> > duplicates are should be.
> >
> > 2010/7/23 Erick Erickson 
> >
> >
> >> If the field is a single token, just define the uniqueKey on it in your
> >> schema.
> >>
> >> Otherwise, this may be of interest:
> >> http://wiki.apache.org/solr/Deduplication
> >>
> >> Haven't used it myself though...
> >>
> >> best
> >> Erick
> >>
> >> On Thu, Jul 22, 2010 at 6:14 PM, Pavel Minchenkov 
> >> wrote:
> >>
> >>
> >>> Hi,
> >>>
> >>> Is it possible to remove duplicates in search results by a given field?
> >>>
> >>> Thanks.
> >>>
> >>> --
> >>> Pavel Minchenkov
>
>


-- 
Pavel Minchenkov


Re: Duplicates

2010-07-23 Thread Peter Karich
Another possibility could be the well known 'field collapse' ;-)

http://wiki.apache.org/solr/FieldCollapsing

Regards,
Peter.

> Thanks.
>
> If I set uniqueKey on the field, then I can save duplicates?
> I need to remove duplicates only from search results. The ability to save
> duplicates are should be.
>
> 2010/7/23 Erick Erickson 
>
>   
>> If the field is a single token, just define the uniqueKey on it in your
>> schema.
>>
>> Otherwise, this may be of interest:
>> http://wiki.apache.org/solr/Deduplication
>>
>> Haven't used it myself though...
>>
>> best
>> Erick
>>
>> On Thu, Jul 22, 2010 at 6:14 PM, Pavel Minchenkov 
>> wrote:
>>
>> 
>>> Hi,
>>>
>>> Is it possible to remove duplicates in search results by a given field?
>>>
>>> Thanks.
>>>
>>> --
>>> Pavel Minchenkov



Re: Tree Faceting in Solr 1.4

2010-07-23 Thread Eric Grobler
Thanks I saw the article,

As far as I can tell the trunk archives only go back to the middle of March
and the 2 patches are from the beginning of the year.

Thus:
*These approaches can be tried out easily using a single set of sample data
and the Solr example application (assumes current trunk codebase and latest
patches posted to the respective issues). **

**Is a bit of an over-statement!**
*
Regards
Eric*
*
On Fri, Jul 23, 2010 at 6:22 AM, Jonathan Rochkind  wrote:

> Solr does not, yet, at least not simply, as far as I know, but there are
> ideas and some JIRA's with maybe some patches:
>
> http://wiki.apache.org/solr/HierarchicalFaceting
>
>
> 
> From: rajini maski [rajinima...@gmail.com]
> Sent: Friday, July 23, 2010 12:34 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Tree Faceting in Solr 1.4
>
> I am also looking out for same feature in Solr and very keen to know
> whether
> it supports this feature of tree faceting... Or we are forced to index in
> tree faceting formatlike
>
> 1/2/3/4
> 1/2/3
> 1/2
> 1
>
> In-case of multilevel faceting it will give only 2 level tree facet is what
> i found..
>
> If i give query as : country India and state Karnataka and city
> bangalore...All what i want is a facet count  1) for condition above. 2)
> The
> number of states in that Country 3) the number of cities in that state ...
>
> Like => Country: India ,State:Karnataka , City: Bangalore <1>
>
> State:Karnataka
>  Kerla
>  Tamilnadu
>  Andra Pradesh...and so on
>
> City:  Mysore
>  Hubli
>  Mangalore
>  Coorg and so on...
>
>
> If I am doing
> facet=on & facet.field={!ex=State}State & fq={!tag=State}State:Karnataka
>
> All it gives me is Facets on state excluding only that filter query.. But i
> was not able to do same on third level ..Like  facet.field= Give me the
> counts of  cities also in state Karantaka..
> Let me know solution for this...
>
> Regards,
> Rajani Maski
>
>
>
>
>
> On Thu, Jul 22, 2010 at 10:13 PM, Eric Grobler  >wrote:
>
> > Thank you for the link.
> >
> > I was not aware of the multifaceting syntax - this will enable me to run
> 1
> > less query on the main page!
> >
> > However this is not a tree faceting feature.
> >
> > Thanks
> > Eric
> >
> >
> >
> >
> > On Thu, Jul 22, 2010 at 4:51 PM, SR  wrote:
> >
> > > Perhaps the following article can help:
> > >
> >
> http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html
> > >
> > > -S
> > >
> > >
> > > On Jul 22, 2010, at 5:39 PM, Eric Grobler wrote:
> > >
> > > > Hi Solr Community
> > > >
> > > > If I have:
> > > > COUNTRY CITY
> > > > Germany Berlin
> > > > Germany Hamburg
> > > > Spain   Madrid
> > > >
> > > > Can I do faceting like:
> > > > Germany
> > > >  Berlin
> > > >  Hamburg
> > > > Spain
> > > >  Madrid
> > > >
> > > > I tried to apply SOLR-792 to the current trunk but it does not seem
> to
> > be
> > > > compatible.
> > > > Maybe there is a similar feature existing in the latest builds?
> > > >
> > > > Thanks & Regards
> > > > Eric
> > >
> > >
> >
>


Re: Getting FileNotFoundException with repl command=backup?

2010-07-23 Thread Alexander Rothenberg
Thanks for the info Peter, i think i ran into the same isssue some time ago 
and could not find out why the backup stopped and also got deleted by solr. 

I decided to stop current running updates to solr while backup is running and 
wrote an own backuphandler that simply just copies the index-files to some 
location and rotates older unneeded backups. 

I thought about a cleaner solution where the backuphandler should create a 
LOCK to the index which would prevent incomming updates to write into the 
index. (the same is happening when index-optimizing is running). Then when 
the LOCK is set, a backup could run without any problems and removes the LOCK 
when done then. I was not able to create a working LOCK that prevents 
incomming updates to be applied, never found out... 

-- 
Alexander Rothenberg
Fotofinder GmbH USt-IdNr. DE812854514
Software EntwicklungWeb: http://www.fotofinder.net/
Potsdamer Str. 96   Tel: +49 30 25792890
10785 BerlinFax: +49 30 257928999

Geschäftsführer:Ali Paczensky
Amtsgericht:Berlin Charlottenburg (HRB 73099)
Sitz:   Berlin


Re: Duplicates

2010-07-23 Thread Pavel Minchenkov
Thanks.

If I set uniqueKey on the field, then I can save duplicates?
I need to remove duplicates only from search results. The ability to save
duplicates are should be.

2010/7/23 Erick Erickson 

> If the field is a single token, just define the uniqueKey on it in your
> schema.
>
> Otherwise, this may be of interest:
> http://wiki.apache.org/solr/Deduplication
>
> Haven't used it myself though...
>
> best
> Erick
>
> On Thu, Jul 22, 2010 at 6:14 PM, Pavel Minchenkov 
> wrote:
>
> > Hi,
> >
> > Is it possible to remove duplicates in search results by a given field?
> >
> > Thanks.
> >
> > --
> > Pavel Minchenkov
> >
>



-- 
Pavel Minchenkov