date:20111004

Deploy Solritas as a separate application?

2011-10-04 Thread jwang

Solritas is a nice search UI integrated with Solr with many features we could
use. However we do not want to build our UI into our Solr instance. We will
have a front-end web app interfacing with Solr. Is there an easy way to
deploy Solritas as a separate application (e.g., Solritas with SolrJ to
query a backend Solr instance)?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Deploy-Solritas-as-a-separate-application-tp3392326p3392326.html
Sent from the Solr - User mailing list archive at Nabble.com.

is there any attribute in schema.xml to avoid duplication in solr?

2011-10-04 Thread nagarjuna

Hi everybody

  i want to know whether is there any attribute in schema.xml to avoid
the duplications?

pls reply 
thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-any-attribute-in-schema-xml-to-avoid-duplication-in-solr-tp3392408p3392408.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: UniqueKey filed length exceeds

2011-10-04 Thread kiran.bodigam

 Thanks for u r reply  Erick,

(Here my use case is -MM-DD 13:54:11.414632 needs to be unique key)

when  i trying to search the data for
http://localhost:8080/solr/select/?q=2009-11-04:13:51:07.348184 
  it throws following error,

though i change my schema to textfield i am getting following error

 fieldType name=string class=solr.TextField sortMissingLast=true
omitNorms=true  /

kindly check my stack trace

SEVERE: org.apache.solr.common.SolrException: undefined field 2009-11-04
at
org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1254)
at
org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getAnalyzer(IndexSchema.java:410)
at
org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:385)
at
org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:574)
at
org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:158)
at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1421)
at 
org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1309)
at 
org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1237)
at
org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1226)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80)
at org.apache.solr.search.QParser.getQuery(QParser.java:142)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:84)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/UniqueKey-filed-length-exceeds-tp3389759p3392432.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Documents Indexed, SolrJ see nothing before long time

2011-10-04 Thread darul

Thank you Christopher, I have found one issue in my code when building a
query, thus I do not know why it is not working.

When I comment this line, I get right result count :

// solrQuery.setParam(fq, +creation_date:[* TO NOW] +type:QUESTION);

Where creation_date is one Date field and type one String field.

I have tried those 2 lines which also make the query retrieve wrong count
whereas with curl working...:

solrQuery.addFilterQuery(creation_date:[* TO NOW]);
solrQuery.addFilterQuery(type:QUESTION);

When I remove filter on date, it works :

solrQuery.addFilterQuery(type:QUESTION);

any problem with adding 2 filters and one with a date ?

Is there any problem with this syntax for filter query and SolrJ ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Documents-Indexed-SolrJ-see-nothing-before-long-time-tp3389721p3392473.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to avoid duplicates in search results?

2011-10-04 Thread nagarjuna

Hi everybody
  i got the following response
code
?xml version=1.0 encoding=UTF-8 ? 
- response
- lst name=responseHeader
  int name=status0/int 
  int name=QTime0/int 
- lst name=params
  str name=dfgroups/str 
  str name=indenton/str 
  str name=start0/str 
  str name=qparticipate/str 
  str name=version2.2/str 
  str name=rows30/str 
  /lst
  /lst
- result name=response numFound=2 start=0
- doc
  str name=descriptiontesting group/str 
  str name=nametesting group/str 
  str
name=urlhttp://abc.xyz.com/groups/testing-group/discussions/62/str 
  /doc
- doc
  str name=descriptiontesting group/str 
  str name=nametesting group/str 
  str
name=urlhttp://abc.xyz.com/groups/testing-group/discussions/62/str 
  /doc
  /result
  /response 
/code

i need to remove the duplicte results 

can anyone give me suggestions

--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-avoid-duplicates-in-search-results-tp3392524p3392524.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Documents Indexed, SolrJ see nothing before long time

2011-10-04 Thread darul

Well, I guess, it is stupid to make +creation_date:[* TO NOW] filter



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Documents-Indexed-SolrJ-see-nothing-before-long-time-tp3389721p3392538.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to avoid duplicates in search results?

2011-10-04 Thread Edoardo Tosca

You can probably use the Grouping feature:
http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters

There is also a Document Duplicate Detection at index time:
http://wiki.apache.org/solr/Deduplication

On Tue, Oct 4, 2011 at 9:55 AM, nagarjuna nagarjuna.avul...@gmail.comwrote:

 Hi everybody
  i got the following response
 code
 ?xml version=1.0 encoding=UTF-8 ?
 - response
 - lst name=responseHeader
  int name=status0/int
  int name=QTime0/int
 - lst name=params
  str name=dfgroups/str
  str name=indenton/str
  str name=start0/str
  str name=qparticipate/str
  str name=version2.2/str
  str name=rows30/str
  /lst
  /lst
 - result name=response numFound=2 start=0
 - doc
  str name=descriptiontesting group/str
  str name=nametesting group/str
  str
 name=urlhttp://abc.xyz.com/groups/testing-group/discussions/62/str
  /doc
 - doc
  str name=descriptiontesting group/str
  str name=nametesting group/str
  str
 name=urlhttp://abc.xyz.com/groups/testing-group/discussions/62/str
  /doc
  /result
  /response
 /code

 i need to remove the duplicte results

 can anyone give me suggestions

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/how-to-avoid-duplicates-in-search-results-tp3392524p3392524.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Edoardo Tosca
Sourcesense - making sense of Open Source: http://www.sourcesense.com

Re: SolrJ Annotation for multiValued field

2011-10-04 Thread darul

well, another mistake, it works...sorry ;)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-Annotation-for-multiValued-field-tp3390255p3392652.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: email - DIH

2011-10-04 Thread jb

Nobody to help?

I tried telnet to get informations about the emails. Via telnet with IMAP i
can get any required fields. Is this an implementation issue?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/email-DIH-tp2711416p3392846.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query failing because of omitTermFreqAndPositions

2011-10-04 Thread Michael McCandless

This is because, within one segment only 1 value (omitP or not) is
possible, for all the docs in that segment.

This then means, on merging segments with different values for omitP,
Lucene must reconcile the different values, and that reconciliation
will favor omitting positions (if it went the other way, Lucene would
have to make up fake positions, which seems very dangerous).

Even if you delete all documents containing that field, and optimize
down to one segment, this omitPositions bit will still stick,
because of how Lucene stores the metadata per field.

omitNorms also behaves this way: once omitted, always omitted.

Mike McCandless

http://blog.mikemccandless.com



On Tue, Oct 4, 2011 at 1:41 AM, Isan Fulia isan.fu...@germinait.com wrote:
 Hi Mike,

 Thanks for the information.But why is it that once omiited positions in the
 past , it will always omit positions
 even if omitPositions is made false.

 Thanks,
 Isan Fulia.

 On 29 September 2011 17:49, Michael McCandless 
 luc...@mikemccandless.comwrote:

 Once a given field has omitted positions in the past, even for just
 one document, it sticks and that field will forever omit positions.

 Try creating a new index, never omitting positions from that field?

 Mike McCandless

 http://blog.mikemccandless.com

 On Thu, Sep 29, 2011 at 1:14 AM, Isan Fulia isan.fu...@germinait.com
 wrote:
  Hi All,
 
  My schema consisted of field textForQuery which was defined as
  field name=textForQuery type=text indexed=true stored=false
  multiValued=true/
 
  After indexing 10 lakhs  of  documents  I changed the field to
  field name=textForQuery type=text indexed=true stored=false
  multiValued=true *omitTermFreqAndPositions=true*/
 
  So documents that were indexed after that omiited the position
 information
  of the terms.
  As a result I was not able to search the text which rely on position
  information for eg. coke studio at mtv even though its present in some
  documents.
 
  So I again changed the field textForQuery to
  field name=textForQuery type=text indexed=true stored=false
  multiValued=true/
 
  But now even for new documents added  the query requiring positon
  information is still failing.
  For example i reindexed certain documents that consisted of coke studio
 at
  mtv but still the query is not returning any documents when searched for
  *textForQuery:coke studio at mtv*
 
  Can anyone please help me out why this is happening
 
 
  --
  Thanks  Regards,
  Isan Fulia.
 




 --
 Thanks  Regards,
 Isan Fulia.

Re: UniqueKey filed length exceeds

2011-10-04 Thread Jamie Johnson

It looks like your query is getting parsed as a field and a value

field: 2009-11-04
value: 13:51:07.34814

if you'd like to make a query like this you need to escape the : so
something like

2009-11-04\:13\:51\:07.348184

See the following link for more information

http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Escaping%20Special%20Characters


On Tue, Oct 4, 2011 at 3:59 AM, kiran.bodigam kiran.bodi...@gmail.com wrote:
  Thanks for u r reply  Erick,

 (Here my use case is -MM-DD 13:54:11.414632 needs to be unique key)

 when  i trying to search the data for
 http://localhost:8080/solr/select/?q=2009-11-04:13:51:07.348184
  it throws following error,

 though i change my schema to textfield i am getting following error

  fieldType name=string class=solr.TextField sortMissingLast=true
 omitNorms=true  /

 kindly check my stack trace

 SEVERE: org.apache.solr.common.SolrException: undefined field 2009-11-04
        at
 org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1254)
        at
 org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getAnalyzer(IndexSchema.java:410)
        at
 org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer.reusableTokenStream(IndexSchema.java:385)
        at
 org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:574)
        at
 org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:158)
        at 
 org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1421)
        at 
 org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1309)
        at 
 org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1237)
        at
 org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1226)
        at 
 org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)
        at 
 org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80)
        at org.apache.solr.search.QParser.getQuery(QParser.java:142)
        at
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:84)
        at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/UniqueKey-filed-length-exceeds-tp3389759p3392432.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: is there any attribute in schema.xml to avoid duplication in solr?

2011-10-04 Thread Tom Gullo

UniqueId avoids entries with the same id.  

--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-there-any-attribute-in-schema-xml-to-avoid-duplication-in-solr-tp3392408p3393085.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to achieve Indexing @ 270GiB/hr

2011-10-04 Thread Pranav Prakash

Greetings,

While going through the article 265% indexing speedup with Lucene's
concurrent 
flushinghttp://java.dzone.com/news/265-indexing-speedup-lucenes?mz=33057-solr_lucene
I
was stunned by the endless possibilities in which Indexing speed could be
increased.

I'd like to take inputs from everyone over here as to how to achieve this
speed. As far as I understand there are two broad ways of feeding data to
Solr -

   1. Using DataImportHandler
   2. Using HTTP to POST docs to Solr.

The speeds at which the article describes indexing seems kinda too much to
expect using the second approach. Or is it possible using multiple instances
feeding docs to Solr?

My current setup does the following -

   1. Execute SQL queries to create database of documents that needs to be
   fed.
   2. Go through the columns one by one, and create XMLs for them and send
   it over to Solr in batches of max 500 docs.


Even if using DataImportHandler what are the ways this could be optimized?
If I am able to solve the problem of indexing data in our current setup, my
life would become a lot easier.


*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
Google http://www.google.com/profiles/pranny

RE: solr searching for special characters?

2011-10-04 Thread Ahmet Arslan

 
 how it is possible also explain me and which tokenizer
 class can support for
 finding the special characters .

Probably WhiteSpaceTokenizer will do the job for you. Plus you need to escape 
special characters (if you are using defType=lucene query parser).

Anyhow you need to provide us more details. 
http://wiki.apache.org/solr/UsingMailingLists

Re: Deploy Solritas as a separate application?

2011-10-04 Thread Erik Hatcher


On Oct 3, 2011, at 23:32 , jwang wrote:

 Solritas is a nice search UI integrated with Solr with many features we could
 use. However we do not want to build our UI into our Solr instance. We will
 have a front-end web app interfacing with Solr. Is there an easy way to
 deploy Solritas as a separate application (e.g., Solritas with SolrJ to
 query a backend Solr instance)?

Well thanks!

It currently can't be deployed into a front-end webapp as is.  [though one 
could set it up as a non-doc-containing front-end to another shard, perhaps]

The Velocity stuff is quite straightforward though, so it'd be easy to create a 
servlet that passed requests to Solr and fed the response to Velocity 
templates.  I've thought about implementing this sort of thing a million times 
over.  I've come up with what I think is a very lean/clean way to do this, 
still using Velocity templates, but using JRuby (via Sinatra) as the front-end 
web tier.  I've implemented, a while ago, a proof-of-concept of this here:

https://github.com/lucidimagination/Prism

I'd like to make time to flesh this out even further.  The templates could be 
made practically the same from inside Solr's wt=velocity infrastructure to an 
external system using Velocity to render Solr responses.

In general - what you're asking for specifically doesn't exist that I know of, 
but the pieces are all kind of there to put this together.  And Prism is my 
opinionated way that Solr-backed UI can be built leanly and cleanly.

Erik

Re: SOLR HttpCache Qtime

2011-10-04 Thread Lord Khan Han

We are using this Qtime field and publishing in our front web. Even the
httpCache decreasing the Qtime in reality, its still using the cached old
Qtime value . We can use our internal qtime instead of Solr's but I just
wonder is there any way to say Solr if its coming httpCache  re-calculate
the Qtime.

On Tue, Oct 4, 2011 at 4:16 AM, Erick Erickson erickerick...@gmail.comwrote:

 Why do you want to? QTime is the time Solr
 spends searching. The cached value will,
 indeed, be from the query that filled
 in the HTTP cache. But what are you doing
 with that information that you want to correct
 it?

 That said, I have no clue how you'd attempt to
 do this.

 Best
 Erick

 On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Han khanuniver...@gmail.com
 wrote:
  Hi,
 
  Is there anyway to get correct Qtime when we use http caching ? I think
 Solr
  caching also the Qtime so giving the the same Qtime in response what ever
  takes it to finish ..  How I can set Qtime correcly from solr when I use
  http caching On.
 
  thanks

Re: Shingle and Query Performance

2011-10-04 Thread Lord Khan Han

We figured out that if use only shingle field not combined with ouput
Unigram than performance getting better. I f we use output unigram its not
good from the normal index field. so we decide to make separate field
only combined shingle using this field to support main queries.

On Wed, Aug 31, 2011 at 1:01 PM, Lord Khan Han khanuniver...@gmail.comwrote:

 Thanks Erick.. If I figure out something I will let you know also..  No
 body replied except you I thought there might be more people involve here..

 Thanks


 On Wed, Aug 31, 2011 at 3:47 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 OK, I'll have to defer because this makes no sense.
 4+ seconds in the debug component?

 Sorry I can't be more help here, but nothing really
 jumps out.
 Erick

 On Tue, Aug 30, 2011 at 12:45 PM, Lord Khan Han khanuniver...@gmail.com
 wrote:
  Below the output of the debug. I am measuring pure solr qtime which show
 in
  the Qtime field in solr xml.
 
  arr name=parsed_filter_queries
  strmrank:[0 TO 100]/str
  /arr
  lst name=timing
  double name=time8584.0/double
  lst name=prepare
  double name=time12.0/double
  lst name=org.apache.solr.handler.component.QueryComponent
  double name=time12.0/double
  /lst
  lst name=org.apache.solr.handler.component.FacetComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.MoreLikeThisComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.HighlightComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.StatsComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.SpellCheckComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.DebugComponent
  double name=time0.0/double
  /lst
  /lst
  lst name=process
  double name=time8572.0/double
  lst name=org.apache.solr.handler.component.QueryComponent
  double name=time4480.0/double
  /lst
  lst name=org.apache.solr.handler.component.FacetComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.MoreLikeThisComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.HighlightComponent
  double name=time41.0/double
  /lst
  lst name=org.apache.solr.handler.component.StatsComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.SpellCheckComponent
  double name=time0.0/double
  /lst
  lst name=org.apache.solr.handler.component.DebugComponent
  double name=time4051.0/double
  /lst
 
  On Tue, Aug 30, 2011 at 5:38 PM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
  Can we see the output if you specify both
  debugQuery=ondebug=true
 
  the debug=true will show the time taken up with various
  components, which is sometimes surprising...
 
  Second, we never asked the most basic question, what are
  you measuring? Is this the QTime of the returned response?
  (which is the time actually spent searching) or the time until
  the response gets back to the client, which may involve lots besides
  searching...
 
  Best
  Erick
 
  On Tue, Aug 30, 2011 at 7:59 AM, Lord Khan Han 
 khanuniver...@gmail.com
  wrote:
   Hi Eric,
  
   Fields are lazy loading, content stored in solr and machine 32 gig..
 solr
   has 20 gig heap. There is no swapping.
  
   As you see we have many phrases in the same query . I couldnt find a
 way
  to
   drop qtime to subsecends. Suprisingly non shingled test better qtime
 !
  
  
   On Mon, Aug 29, 2011 at 3:10 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
  
   Oh, one other thing: have you profiled your machine
   to see if you're swapping? How much memory are
   you giving your JVM? What is the underlying
   hardware setup?
  
   Best
   Erick
  
   On Mon, Aug 29, 2011 at 8:09 AM, Erick Erickson 
  erickerick...@gmail.com
   wrote:
200K docs and 36G index? It sounds like you're storing
your documents in the Solr index. In and of itself, that
shouldn't hurt your query times, *unless* you have
lazy field loading turned off, have you checked that
lazy field loading is enabled?
   
   
   
Best
Erick
   
On Sun, Aug 28, 2011 at 5:30 AM, Lord Khan Han 
  khanuniver...@gmail.com
   wrote:
Another insteresting thing is : all one word or more word queries
   including
phrase queries such as barack obama  slower in shingle
  configuration.
   What
i am doing wrong ? without shingle barack obama Querytime 300ms
   with
shingle  780 ms..
   
   
On Sat, Aug 27, 2011 at 7:58 PM, Lord Khan Han 
  khanuniver...@gmail.com
   wrote:
   
Hi,
   
What is the difference between solr 3.3  and the trunk ?
I will try 3.3  and let you know the results.
   
   
Here the search handler:
   
requestHandler name=search class=solr.SearchHandler
   default=true
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   !--str

Analyzer Tokenizer for Exact and Contains search on single field

2011-10-04 Thread Satish Talim

I am a Solr newbie.

Let's say we have a field with 4 records as follows:

James
James Edward
James Edward Gray
JamesEdward

a. In Solr 3.4, I want an exact search on the given field for James
Edward. Record 2 should be returned.

b. Next on the same field, I want to check whether James is contained in
the field, then records 1, 2 and 3 should be returned.

Which standard analyzer, tokenizer can one apply on one single field, to get
these results?

Satish

Re: Determining master/slave from ZK in SolrCloud

2011-10-04 Thread Jamie Johnson

Ok, so I am pretty sure this information is not available.  What is
the most appropriate way to add information like this to ZK?  I can
obviously look for the system properties enable.master and
enable.slave, but that won't be fool proof since someone could put
this in the config file instead and not as a system property.  Is
there a way to determine this quickly programatically without having
to go through all of the request handlers in the solrCore?

On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote:
 Is it possible to determine if a solr instance is a master or a slave
 in replication terms based on the information that is placed in ZK in
 SolrCloud?

Re: SOLR HttpCache Qtime

2011-10-04 Thread Erick Erickson

But if the HTTP cache is what's returning the value,
Solr never sees anything at all, right? So Solr
doesn't have a chance to do anything here.

Best
Erick

On Tue, Oct 4, 2011 at 9:24 AM, Lord Khan Han khanuniver...@gmail.com wrote:
 We are using this Qtime field and publishing in our front web. Even the
 httpCache decreasing the Qtime in reality, its still using the cached old
 Qtime value . We can use our internal qtime instead of Solr's but I just
 wonder is there any way to say Solr if its coming httpCache  re-calculate
 the Qtime.

 On Tue, Oct 4, 2011 at 4:16 AM, Erick Erickson erickerick...@gmail.comwrote:

 Why do you want to? QTime is the time Solr
 spends searching. The cached value will,
 indeed, be from the query that filled
 in the HTTP cache. But what are you doing
 with that information that you want to correct
 it?

 That said, I have no clue how you'd attempt to
 do this.

 Best
 Erick

 On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Han khanuniver...@gmail.com
 wrote:
  Hi,
 
  Is there anyway to get correct Qtime when we use http caching ? I think
 Solr
  caching also the Qtime so giving the the same Qtime in response what ever
  takes it to finish ..  How I can set Qtime correcly from solr when I use
  http caching On.
 
  thanks

Indexing PDF

2011-10-04 Thread Héctor Trujillo

Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But with
some files I’ve got problems because they stored estrange characters. I got
stored this content:
+++

Starting a Search Application

Abstract
Starting
a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i

Starting a Search Application A Lucid Imagination White Paper ¥ April 2009
Page ii Do You Need Full-text Search?
∞
∞
∞
Starting
a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1

Re: Indexing PDF

2011-10-04 Thread Paul Libbrecht

full of boxes for me.
Héctor, you need another way to reference these!
(e.g. a URL)

paul


Le 4 oct. 2011 à 16:49, Héctor Trujillo a écrit :

 Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But with
 some files I’ve got problems because they stored estrange characters. I got
 stored this content:
 +++
 
 Starting a Search Application
 
 Abstract
 Starting
 a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i
 
 Starting a Search Application A Lucid Imagination White Paper ¥ April 2009
 Page ii Do You Need Full-text Search?
 ∞
 ∞
 ∞
 Starting
 a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1

Re: Determining master/slave from ZK in SolrCloud

2011-10-04 Thread Jamie Johnson

I'm putting this out there for comment.

Right now I'm in ZKControllers and changed register as follows:


public void register(SolrCore core, boolean forcePropsUpdate) throws
IOException,

and at line 479 I've added this

 SolrRequestHandler requestHandler = core.getRequestHandler(/replication);
boolean master = false;
if(requestHandler != null  requestHandler instanceof ReplicationHandler){
ReplicationHandler replicationHandler = 
(ReplicationHandler)requestHandler;
master = replicationHandler.isMaster();
}
props.put(replication.master, master);


I also modified CoreContainer and ReplicationHandler to support what I
have above.  Does this seem a reasonable way to approach this?


Also to provide some history, I need this so I can programatically
determine which servers are masters and slaves and subsequently which
should be written to (for updates) and which should be queried.


On Tue, Oct 4, 2011 at 10:26 AM, Jamie Johnson jej2...@gmail.com wrote:
 Ok, so I am pretty sure this information is not available.  What is
 the most appropriate way to add information like this to ZK?  I can
 obviously look for the system properties enable.master and
 enable.slave, but that won't be fool proof since someone could put
 this in the config file instead and not as a system property.  Is
 there a way to determine this quickly programatically without having
 to go through all of the request handlers in the solrCore?

 On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote:
 Is it possible to determine if a solr instance is a master or a slave
 in replication terms based on the information that is placed in ZK in
 SolrCloud?

Re: Determining master/slave from ZK in SolrCloud

2011-10-04 Thread Mark Miller

Because the distributed indexing phase of SolrCloud will not use replication, 
we have not really gone down this path at all.

One thing we are considering is adding the ability to add various roles to each 
shard as hints - eg a shard might be designated a searcher and another an 
indexer.

You might be able to piggy back on this to label things master/slave. A 
ZooKeeper aware replication handler could then use this information.

There is nothing to stop you from adding this information to zookeeper 
yourself, using standard zookeeper tools - but just putting the information is 
only half the problem - something then needs to read it.

- Mark Miller
lucidimagination.com
2011.lucene-eurocon.org | Oct 17-20 | Barcelona

On Oct 4, 2011, at 10:26 AM, Jamie Johnson wrote:

 Ok, so I am pretty sure this information is not available.  What is
 the most appropriate way to add information like this to ZK?  I can
 obviously look for the system properties enable.master and
 enable.slave, but that won't be fool proof since someone could put
 this in the config file instead and not as a system property.  Is
 there a way to determine this quickly programatically without having
 to go through all of the request handlers in the solrCore?
 
 On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote:
 Is it possible to determine if a solr instance is a master or a slave
 in replication terms based on the information that is placed in ZK in
 SolrCloud?

Re: Determining master/slave from ZK in SolrCloud

2011-10-04 Thread Jamie Johnson

Thanks for the reply Mark.  So a couple of questions.

When is distributed indexing going to be available on Trunk?  Are
there docs on it now?

I think having Roles on the shard would scratch the itch here, because
as you said I could then include a role which indicated what to do
with this server.

My use case is actually for something outside of Solr.  As of right
now we are not on the latest trunk (actually a few months back I
think) but could push to upgrade to it if the distributed indexing
code was available today, but management may still shoot that down
because of a short timeline.  Suffice it to say that I'll be reading
this information by another application to handle distributed indexing
externally.  The version that I'm working on requires that the
application be responsible for performing the distribution.


On Tue, Oct 4, 2011 at 12:05 PM, Mark Miller markrmil...@gmail.com wrote:
 Because the distributed indexing phase of SolrCloud will not use replication, 
 we have not really gone down this path at all.

 One thing we are considering is adding the ability to add various roles to 
 each shard as hints - eg a shard might be designated a searcher and another 
 an indexer.

 You might be able to piggy back on this to label things master/slave. A 
 ZooKeeper aware replication handler could then use this information.

 There is nothing to stop you from adding this information to zookeeper 
 yourself, using standard zookeeper tools - but just putting the information 
 is only half the problem - something then needs to read it.

 - Mark Miller
 lucidimagination.com
 2011.lucene-eurocon.org | Oct 17-20 | Barcelona

 On Oct 4, 2011, at 10:26 AM, Jamie Johnson wrote:

 Ok, so I am pretty sure this information is not available.  What is
 the most appropriate way to add information like this to ZK?  I can
 obviously look for the system properties enable.master and
 enable.slave, but that won't be fool proof since someone could put
 this in the config file instead and not as a system property.  Is
 there a way to determine this quickly programatically without having
 to go through all of the request handlers in the solrCore?

 On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote:
 Is it possible to determine if a solr instance is a master or a slave
 in replication terms based on the information that is placed in ZK in
 SolrCloud?

Re: Determining master/slave from ZK in SolrCloud

2011-10-04 Thread Jamie Johnson

also as an FYI I created this JIRA

https://issues.apache.org/jira/browse/SOLR-2811

which perhaps should be removed if the roles option comes to life.  Is
there a JIRA on that now?

On Tue, Oct 4, 2011 at 12:12 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks for the reply Mark.  So a couple of questions.

 When is distributed indexing going to be available on Trunk?  Are
 there docs on it now?

 I think having Roles on the shard would scratch the itch here, because
 as you said I could then include a role which indicated what to do
 with this server.

 My use case is actually for something outside of Solr.  As of right
 now we are not on the latest trunk (actually a few months back I
 think) but could push to upgrade to it if the distributed indexing
 code was available today, but management may still shoot that down
 because of a short timeline.  Suffice it to say that I'll be reading
 this information by another application to handle distributed indexing
 externally.  The version that I'm working on requires that the
 application be responsible for performing the distribution.


 On Tue, Oct 4, 2011 at 12:05 PM, Mark Miller markrmil...@gmail.com wrote:
 Because the distributed indexing phase of SolrCloud will not use 
 replication, we have not really gone down this path at all.

 One thing we are considering is adding the ability to add various roles to 
 each shard as hints - eg a shard might be designated a searcher and another 
 an indexer.

 You might be able to piggy back on this to label things master/slave. A 
 ZooKeeper aware replication handler could then use this information.

 There is nothing to stop you from adding this information to zookeeper 
 yourself, using standard zookeeper tools - but just putting the information 
 is only half the problem - something then needs to read it.

 - Mark Miller
 lucidimagination.com
 2011.lucene-eurocon.org | Oct 17-20 | Barcelona

 On Oct 4, 2011, at 10:26 AM, Jamie Johnson wrote:

 Ok, so I am pretty sure this information is not available.  What is
 the most appropriate way to add information like this to ZK?  I can
 obviously look for the system properties enable.master and
 enable.slave, but that won't be fool proof since someone could put
 this in the config file instead and not as a system property.  Is
 there a way to determine this quickly programatically without having
 to go through all of the request handlers in the solrCore?

 On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote:
 Is it possible to determine if a solr instance is a master or a slave
 in replication terms based on the information that is placed in ZK in
 SolrCloud?

Re: sorting using function query results are notin order

2011-10-04 Thread abhayd

any help?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/sorting-using-function-query-results-are-notin-order-tp3390926p3393781.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Analyzer Tokenizer for Exact and Contains search on single field

2011-10-04 Thread Steven A Rowe

Hi Satish,

I don't think there is a single analyzer that does what you want.

However, you could send the info to a second field with copyField, and use e.g. 
WhitespaceTokenizer on one field for contains-style queries, and 
KeywordTokenizer on the other field (or just use the string field type) for 
exact matches.

Steve

 -Original Message-
 From: Satish Talim [mailto:satish.ta...@gmail.com]
 Sent: Tuesday, October 04, 2011 10:02 AM
 To: solr-user@lucene.apache.org
 Subject: Analyzer Tokenizer for Exact and Contains search on single field
 
 I am a Solr newbie.
 
 Let's say we have a field with 4 records as follows:
 
 James
 James Edward
 James Edward Gray
 JamesEdward
 
 a. In Solr 3.4, I want an exact search on the given field for James
 Edward. Record 2 should be returned.
 
 b. Next on the same field, I want to check whether James is contained
 in
 the field, then records 1, 2 and 3 should be returned.
 
 Which standard analyzer, tokenizer can one apply on one single field, to
 get
 these results?
 
 Satish

Re: Determining master/slave from ZK in SolrCloud

2011-10-04 Thread Jamie Johnson

So I see this JIRA which references roles

https://issues.apache.org/jira/browse/SOLR-2765

I'm looking at implementing what Yonik suggested, namely in
solrconfig.xml I have something like

core name=${coreName} instanceDir=. shard=${shard}
collection=${collection} roles=searcher,indexer/

these will be pulled out and added to the cloudDescriptor so they can
be saved in ZK.  Seem reasonable?

On Tue, Oct 4, 2011 at 12:17 PM, Jamie Johnson jej2...@gmail.com wrote:
 also as an FYI I created this JIRA

 https://issues.apache.org/jira/browse/SOLR-2811

 which perhaps should be removed if the roles option comes to life.  Is
 there a JIRA on that now?

 On Tue, Oct 4, 2011 at 12:12 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks for the reply Mark.  So a couple of questions.

 When is distributed indexing going to be available on Trunk?  Are
 there docs on it now?

 I think having Roles on the shard would scratch the itch here, because
 as you said I could then include a role which indicated what to do
 with this server.

 My use case is actually for something outside of Solr.  As of right
 now we are not on the latest trunk (actually a few months back I
 think) but could push to upgrade to it if the distributed indexing
 code was available today, but management may still shoot that down
 because of a short timeline.  Suffice it to say that I'll be reading
 this information by another application to handle distributed indexing
 externally.  The version that I'm working on requires that the
 application be responsible for performing the distribution.


 On Tue, Oct 4, 2011 at 12:05 PM, Mark Miller markrmil...@gmail.com wrote:
 Because the distributed indexing phase of SolrCloud will not use 
 replication, we have not really gone down this path at all.

 One thing we are considering is adding the ability to add various roles to 
 each shard as hints - eg a shard might be designated a searcher and another 
 an indexer.

 You might be able to piggy back on this to label things master/slave. A 
 ZooKeeper aware replication handler could then use this information.

 There is nothing to stop you from adding this information to zookeeper 
 yourself, using standard zookeeper tools - but just putting the information 
 is only half the problem - something then needs to read it.

 - Mark Miller
 lucidimagination.com
 2011.lucene-eurocon.org | Oct 17-20 | Barcelona

 On Oct 4, 2011, at 10:26 AM, Jamie Johnson wrote:

 Ok, so I am pretty sure this information is not available.  What is
 the most appropriate way to add information like this to ZK?  I can
 obviously look for the system properties enable.master and
 enable.slave, but that won't be fool proof since someone could put
 this in the config file instead and not as a system property.  Is
 there a way to determine this quickly programatically without having
 to go through all of the request handlers in the solrCore?

 On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote:
 Is it possible to determine if a solr instance is a master or a slave
 in replication terms based on the information that is placed in ZK in
 SolrCloud?

Suggestions feature

2011-10-04 Thread Milan Dobrota

I am working on a feature similar to Youtube suggestions (where the videos
are suggested based on your viewing history).
What I do is parse the history and get the user's interests, in the form of
weighted topics. When I boost according to those interests, the dominant
ones take over the result list.

Is there any way to use the boost so that there is some variety in the
results?

Thanks
Milan

Re: sorting using function query results are notin order

2011-10-04 Thread Yonik Seeley

Hmmm, try adding fl={!func}Count
to make sure Count is an indexed field and function queries are
getting the right values.

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference



On Mon, Oct 3, 2011 at 3:42 PM, abhayd ajdabhol...@hotmail.com wrote:
 hi
 I am trying to sort results from solr using sum(count,score) function.
 Basically its not adding things correctly.
 For example here is partial sample response
        Count:54,
        UserQuery:how to,
        score:1.2550932,
        query({!dismax qf=UserQuery v='how'}):1.2550932,
        sum(Count,query({!dismax qf=UserQuery v='how'})):1.2550932},

 how come addition of 54+1.2550932 is equla to 1.2550932 ?as if

 What i m doing wrong?
 here is my complete query
 
 http://localhost:10101/solr/autosuggest/select?q=howstart=0indent=onwt=jsonrows=5sort=sum%28Count,query%28{!dismax%20qf=UserQuery%20v=%27how%27}%29%29%20descfl=UserQuery,score,Count,query%28{!dismax%20qf=UserQuery%20v=%27how%27}%29,sum%28Count,query%28{!dismax%20qf=UserQuery%20v=%27how%27}%29%29debug=true

 {
  responseHeader:{
    status:0,
    QTime:0,
    params:{
      sort:sum(Count,query({!dismax qf=UserQuery v='how'})) desc,
      wt:json,
      rows:5,
      indent:on,
      fl:UserQuery,score,Count,query({!dismax qf=UserQuery
 v='how'}),sum(Count,query({!dismax qf=UserQuery v='how'})),
      debug:true,
      start:0,
      q:how}},
  response:{numFound:2628,start:0,maxScore:1.2550932,docs:[
      {
        Count:54,
        UserQuery:how to,
        score:1.2550932,
        query({!dismax qf=UserQuery v='how'}):1.2550932,
        sum(Count,query({!dismax qf=UserQuery v='how'})):1.2550932},
      {
        Count:51,
        UserQuery:how to text,
        score:0.8964951,
        query({!dismax qf=UserQuery v='how'}):0.8964951,
        sum(Count,query({!dismax qf=UserQuery v='how'})):0.8964951},
      {
        Count:117,
        UserQuery:how to block calls,
        score:0.7171961,
        query({!dismax qf=UserQuery v='how'}):0.7171961,
        sum(Count,query({!dismax qf=UserQuery v='how'})):0.7171961},
      {
        Count:109,
        UserQuery:how to call forward,
        score:0.7171961,
        query({!dismax qf=UserQuery v='how'}):0.7171961,
        sum(Count,query({!dismax qf=UserQuery v='how'})):0.7171961},
      {
        Count:79,
        UserQuery:how do I pay my bill?,
        score:0.7171961,
        query({!dismax qf=UserQuery v='how'}):0.7171961,
        sum(Count,query({!dismax qf=UserQuery v='how'})):0.7171961}]
  },


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/sorting-using-function-query-results-are-notin-order-tp3390926p3390926.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Determining master/slave from ZK in SolrCloud

2011-10-04 Thread Jamie Johnson

so my initial test worked, this appeared in ZK now

roles=searcher,indexer

which I can use to tell if it should be used to write to or not.  It
had fewer changes to other files as well

I needed to change
  CloudDescriptor (add roles variable/methods)
  CoreContainer (parse roles attribute)
  ZKController (store roles in ZK)

and now what is in ZK is the following:

roles=searcher,indexer
node_name=hostname:8983_solr
url=http://hostname:8983/solr/

As I said this will meet my need, I can provide my changes back to the
JIRA referenced if this is what is expected from roles.

On Tue, Oct 4, 2011 at 12:45 PM, Jamie Johnson jej2...@gmail.com wrote:
 So I see this JIRA which references roles

 https://issues.apache.org/jira/browse/SOLR-2765

 I'm looking at implementing what Yonik suggested, namely in
 solrconfig.xml I have something like

    core name=${coreName} instanceDir=. shard=${shard}
 collection=${collection} roles=searcher,indexer/

 these will be pulled out and added to the cloudDescriptor so they can
 be saved in ZK.  Seem reasonable?

 On Tue, Oct 4, 2011 at 12:17 PM, Jamie Johnson jej2...@gmail.com wrote:
 also as an FYI I created this JIRA

 https://issues.apache.org/jira/browse/SOLR-2811

 which perhaps should be removed if the roles option comes to life.  Is
 there a JIRA on that now?

 On Tue, Oct 4, 2011 at 12:12 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks for the reply Mark.  So a couple of questions.

 When is distributed indexing going to be available on Trunk?  Are
 there docs on it now?

 I think having Roles on the shard would scratch the itch here, because
 as you said I could then include a role which indicated what to do
 with this server.

 My use case is actually for something outside of Solr.  As of right
 now we are not on the latest trunk (actually a few months back I
 think) but could push to upgrade to it if the distributed indexing
 code was available today, but management may still shoot that down
 because of a short timeline.  Suffice it to say that I'll be reading
 this information by another application to handle distributed indexing
 externally.  The version that I'm working on requires that the
 application be responsible for performing the distribution.


 On Tue, Oct 4, 2011 at 12:05 PM, Mark Miller markrmil...@gmail.com wrote:
 Because the distributed indexing phase of SolrCloud will not use 
 replication, we have not really gone down this path at all.

 One thing we are considering is adding the ability to add various roles to 
 each shard as hints - eg a shard might be designated a searcher and 
 another an indexer.

 You might be able to piggy back on this to label things master/slave. A 
 ZooKeeper aware replication handler could then use this information.

 There is nothing to stop you from adding this information to zookeeper 
 yourself, using standard zookeeper tools - but just putting the 
 information is only half the problem - something then needs to read it.

 - Mark Miller
 lucidimagination.com
 2011.lucene-eurocon.org | Oct 17-20 | Barcelona

 On Oct 4, 2011, at 10:26 AM, Jamie Johnson wrote:

 Ok, so I am pretty sure this information is not available.  What is
 the most appropriate way to add information like this to ZK?  I can
 obviously look for the system properties enable.master and
 enable.slave, but that won't be fool proof since someone could put
 this in the config file instead and not as a system property.  Is
 there a way to determine this quickly programatically without having
 to go through all of the request handlers in the solrCore?

 On Mon, Oct 3, 2011 at 5:52 PM, Jamie Johnson jej2...@gmail.com wrote:
 Is it possible to determine if a solr instance is a master or a slave
 in replication terms based on the information that is placed in ZK in
 SolrCloud?

Solr Schema and how?

2011-10-04 Thread caman

Hello all,

We have a screen builder application where users design their own forms.
They have a choice of create forms fields with type date, text,numbers,large
text etc upto total of 500 fields supported on a screen. 
Once screens are designed system automatically handle the type checking for
valid data entries on front end even though data of any type gets stored as
text. 
So as you can imagine, table is huge with 600+
columns(screenId,recordId,field1 ...field500) and every column is set as
'text'. Same table stores data for every screen designed in the system.

So basically here are my questions

1. How best to index it? I did it using dynamic field 'field*' which works
great
2. Since everything is text,not sure how to enable filtering on each field
e.g. If a user wants to enable 'greater than' or 'less then' type of queries
on a number field (stored as text), somehow that data needs to be stored as
number in SOLR but I don't think I have a way to do that.  I can't do that
Since 'field2' may be be a 'number' field for a 'screen1' and 'date' for
screen2. 




Would appreciate any ideas to handle this? 



thanks

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Schema-and-how-tp3393989p3393989.html
Sent from the Solr - User mailing list archive at Nabble.com.

Search on content_type

2011-10-04 Thread ahmad ajiloo

Hi
I'm using Nutch for crawing and indexed my data by using index-more plugin
and added my required field (like content_type) to schema.xml in Solr.
Now how can i search on pdf files (a kind of content_types) using this new
index? what query should i enter to have a search on pdf files in Solr?

Re: Indexing PDF

2011-10-04 Thread ahmad ajiloo

I have this problem too, in indexing some of persian pdf files.

2011/10/4 Héctor Trujillo hecto...@gmail.com

 Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But
 with
 some files I’ve got problems because they stored estrange characters. I got
 stored this content:
 +++

 Starting a Search Application

 
 Abstract

 Starting
 a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i

 
 Starting a Search Application A Lucid Imagination White Paper ¥ April 2009
 Page ii Do You Need Full-text Search?

 ∞

 ∞
 ∞

 Starting
 a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1

Re: SOLR HttpCache Qtime

2011-10-04 Thread Lord Khan Han

I just want to be sure..  because its solr internal HTTP cache.. not an
outside httpcacher

On Tue, Oct 4, 2011 at 5:39 PM, Erick Erickson erickerick...@gmail.comwrote:

 But if the HTTP cache is what's returning the value,
 Solr never sees anything at all, right? So Solr
 doesn't have a chance to do anything here.

 Best
 Erick

 On Tue, Oct 4, 2011 at 9:24 AM, Lord Khan Han khanuniver...@gmail.com
 wrote:
  We are using this Qtime field and publishing in our front web. Even the
  httpCache decreasing the Qtime in reality, its still using the cached old
  Qtime value . We can use our internal qtime instead of Solr's but I just
  wonder is there any way to say Solr if its coming httpCache  re-calculate
  the Qtime.
 
  On Tue, Oct 4, 2011 at 4:16 AM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Why do you want to? QTime is the time Solr
  spends searching. The cached value will,
  indeed, be from the query that filled
  in the HTTP cache. But what are you doing
  with that information that you want to correct
  it?
 
  That said, I have no clue how you'd attempt to
  do this.
 
  Best
  Erick
 
  On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Han khanuniver...@gmail.com
  wrote:
   Hi,
  
   Is there anyway to get correct Qtime when we use http caching ? I
 think
  Solr
   caching also the Qtime so giving the the same Qtime in response what
 ever
   takes it to finish ..  How I can set Qtime correcly from solr when I
 use
   http caching On.
  
   thanks

Case Insensitive Sting

2011-10-04 Thread Strokin, Eugene

Hello, I know that this topic was already discussed, but I want to make sure I 
understood it right.
I need to have a field for email of a user. I should be able to find a 
document(s) by this field, and it should be exact match, and case insensitive.
Based on that I've found from previous discussions, I couldn't use 
solr.StrField class, but should use solr.TextField class instead.
Also, I've suspect very match that later the requirements could change and I 
should be able to store not an email as identifier, but some free text, 
potentially with spaces, and some other white spaces, and still should be able 
to do exact case insensitive match.
So I come up with such type:

fieldType name=string_ci class=solr.TextField
sortMissingLast=true omitNorms=true
analyzer
filter class=solr.LowerCaseFilterFactory /
/analyzer
/fieldType

So, if I have a document with a field of such type, and it would contain value 
like this:
ABC 123 xyz
A xyz query shouldn't return the document, nor xyz 123 ABC query, but abc 
123 XYZ should.
Am I correct in my assumption, or am I missing something?

Any comments are appreciated,
Thank you,
Eugene S.

Re: Indexing PDF

2011-10-04 Thread Robert Muir

Your persian pdf problem is different, and already taken care of in pdfbox trunk

https://issues.apache.org/jira/browse/PDFBOX-1127

On Tue, Oct 4, 2011 at 2:04 PM, ahmad ajiloo ahmad.aji...@gmail.com wrote:
 I have this problem too, in indexing some of persian pdf files.

 2011/10/4 Héctor Trujillo hecto...@gmail.com

 Hi all, I'm indexing pdf's files with SolrJ, and most of them work. But
 with
 some files I’ve got problems because they stored estrange characters. I got
 stored this content:
 +++

 Starting a Search Application

 
 Abstract

 Starting
 a Search Application A Lucid Imagination White Paper ¥ April 2009 Page i

 
 Starting a Search Application A Lucid Imagination White Paper ¥ April 2009
 Page ii Do You Need Full-text Search?

 ∞

 ∞
 ∞

 Starting
 a Search Application A Lucid Imagination White Paper ¥ April 2009 Page 1

http request works, but wget same URL fails

2011-10-04 Thread Fred Zimmerman

This http request works as desired (bringing back a csv file)

http://zimzazsearch3-1.bitnamiapp.com:8983/solr/select?indent=onversion=2.2q=battleshipwt=csv;

but the same URL submitted via wget produces the 500 error reproduced below.

I want the wget to download the csv file.  What's going on?

FredZ


bitnami@ip-10-202-202-68:/opt/bitnami/apache2/htdocs/scripts$ --2011-10-04
19:33:41--  http://zimzazsearch3-1.bitnamiapp.com:8983/solr/select?indent=on
Resolving zimzazsearch3-1.bitnamiapp.com... 75.101.204.213
Connecting to zimzazsearch3-1.bitnamiapp.com|75.101.204.213|:8983...
connected.
HTTP request sent, awaiting response... 500 null
 java.lang.NullPointerException \tat
java.io.StringReader.init(StringReader.java:33) \tat
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203) \tat
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) \tat
org.apache.solr.search.QParser.getQuery(QParser.java:142) \tat
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81)
\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) \tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
\tat
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
\tat
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
\tat
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
\tat
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
\tat
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
\tat org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
\tat
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
\tat
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
\tat
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
\tat org.mortbay.jetty.Server.handle(Server.java:326) \tat
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) \tat
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
\tat org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) \tat
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) \tat
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) \tat
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
\tat o
2011-10-04 19:33:41 ERROR 500: null  java.lang.NullPointerException \tat
java.io.StringReader.init(StringReader.java:33) \tat
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203) \tat
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) \tat
org.apache.solr.search.QParser.getQuery(QParser.java:142) \tat
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81)
\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) \tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
\tat
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
\tat
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
\tat
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
\tat
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
\tat
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
\tat org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
\tat
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
\tat
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
\tat
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
\tat org.mortbay.jetty.Server.handle(Server.java:326) \tat
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) \tat
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
\tat org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) \tat
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) \tat
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) \tat
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
\tat o.



-
Subscribe to the Nimble Books Mailing List  http://eepurl.com/czS- for
monthly updates

Re: Case Insensitive Sting

2011-10-04 Thread Ahmet Arslan

 Hello, I know that this topic was
 already discussed, but I want to make sure I understood it
 right.
 I need to have a field for email of a user. I should be
 able to find a document(s) by this field, and it should be
 exact match, and case insensitive.
 Based on that I've found from previous discussions, I
 couldn't use solr.StrField class, but should use
 solr.TextField class instead.
 Also, I've suspect very match that later the requirements
 could change and I should be able to store not an email as
 identifier, but some free text, potentially with spaces, and
 some other white spaces, and still should be able to do
 exact case insensitive match.
 So I come up with such type:
 
     fieldType name=string_ci
 class=solr.TextField
         sortMissingLast=true
 omitNorms=true
         analyzer
             filter
 class=solr.LowerCaseFilterFactory /
         /analyzer
     /fieldType
 
 So, if I have a document with a field of such type, and it
 would contain value like this:
 ABC 123 xyz
 A xyz query shouldn't return the document, nor xyz 123
 ABC query, but abc 123 XYZ should.
 Am I correct in my assumption, or am I missing something?

Yeap all is correct. However, there is no tokenizer is defined in your 
analyzer. In your case KeywordTokenizerFactory should be used.

Private text fields

2011-10-04 Thread Kristian Rickert

I'm trying to find a way to query a private field in solr while
using the text fields.

So I want to allow private tags only searchable by an assigned owner.
The private tags will also query along side regular keyword tags.

Here's an example:

Company A (identified by idA) searches and finds company B (identified
by idB).  Company A would like to add some tags specific to Company B
and only searchable from company A.

So far I've found a few ways to implement this, and all of them feel
really sloppy as far as what it'll do to the index by either making
too many dynamic columns or manipulating the keywords in the value.
I'd like to know if solr can do this out of the box in any other way.
It looks like I can just create a custom field (PrivateString or
something like that) but before I head down that route, I'd like to
know what solr can do now.

So here's a couple ways I figured out so far and I don't like either of them:
Option 1: Use Dynamic Fields for the text field
I can make the dynamic fields in solr that specify the field name
followed by the company ID.  For example, if Company A (idA) above
adds the words great client to the field tags in the index, I can
make the field as follows:

tags_idA=great client

But I'm going to be working with over 10K clients, and don't think
this is a great idea.  It would allow for simple syntax and a fast
implementation, but I'd like to avoid crowding the index with 1000s of
extra columns if I can avoid it.

Option 2: Store the private tags with a company ID prepended to the keyword
When searching the private tag field, I can store each keyword by
prepending the identifier to that word.

Although this would make the query easier, it'll hamper some other
text features that expect a word to be stored as it would be searched
on.


Option 3: Customize dismax to handle a PrivateField type
This would be a custom multi-value poly field that would hold the
private owner ID as well as the full text that we want to index.  I've
not yet done this route but it seems like the cleanest way as far as
resulting syntax and index storage.  If any of the specified fields
are a PrivateField, then it can automatically search the appropriate
IDs

So when indexing, I can have the PrivateFieldType look something like this:

Doc1:
privateTags=[{,great client},{3,bad client}]
Doc2:
privateTags=[{,scott tiger},{3,bad client}]


So when I perform a query:

http://localhost:8080/usercore/select/?q=clientqoid=qf=firstName%20lastName%20privateTagsdefType=dismax

So from the above, I'd want it to search the firstName lastName and
privateTags fields.  However, I'd want solr to realize that the
privateTags are a PrivateFieldType and look for the qoid field -
only returning matches the matching ID in the qoid field.

So the above query will only return Doc1 because it matches the
private tag with and ID of .



Thoughts?  Ideas?

RE: Suggestions feature

2011-10-04 Thread Steven A Rowe

Hi Milan,

I have three ideas:

1. Boost by log(weight) instead of just by weight.  This would reduce 
weight-to-weight ratios and so reduce the likelihood of hit list domination, 
while still retaining the user's relative preferences.  Multiple log 
applications will further decrease the weight-to-weight ratios and thus 
increase variety.

2. Take the top N topics by weight, thresholding either by some arbitrary 
weight or at some arbitrary N, and then boost those N topics equally.  This 
would increase variety at the expense of ignoring the user's minor interests.

3. Just apply a fixed boost to all of the user's interests and ignore their 
associated weights.  (This is equivalent to taking strategy #1 to the limit.)

Steve

 -Original Message-
 From: Milan Dobrota [mailto:mi...@milandobrota.com]
 Sent: Tuesday, October 04, 2011 12:56 PM
 To: solr-user@lucene.apache.org
 Subject: Suggestions feature
 
 I am working on a feature similar to Youtube suggestions (where the
 videos are suggested based on your viewing history).
 What I do is parse the history and get the user's interests, in the form
 of weighted topics. When I boost according to those interests, the dominant
 ones take over the result list.
 
 Is there any way to use the boost so that there is some variety in the
 results?
 
 Thanks
 Milan

Re: http request works, but wget same URL fails

2011-10-04 Thread Fred Zimmerman

got it.

curl 
http://zimzazsearch3-1.bitnamiapp.com:8983/solr/select/?indent=onq=videofl=name,idwt=csv;
works like a champ.





On Tue, Oct 4, 2011 at 15:35, Fred Zimmerman w...@nimblebooks.com wrote:

 This http request works as desired (bringing back a csv file)


 http://zimzazsearch3-1.bitnamiapp.com:8983/solr/select?indent=onversion=2.2q=battleshipwt=csv;

 but the same URL submitted via wget produces the 500 error reproduced
 below.

 I want the wget to download the csv file.  What's going on?

 FredZ


 bitnami@ip-10-202-202-68:/opt/bitnami/apache2/htdocs/scripts$ --2011-10-04
 19:33:41--
 http://zimzazsearch3-1.bitnamiapp.com:8983/solr/select?indent=on
 Resolving zimzazsearch3-1.bitnamiapp.com... 75.101.204.213
 Connecting to zimzazsearch3-1.bitnamiapp.com|75.101.204.213|:8983...
 connected.
 HTTP request sent, awaiting response... 500 null
  java.lang.NullPointerException \tat
 java.io.StringReader.init(StringReader.java:33) \tat
 org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203) \tat
 org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) \tat
 org.apache.solr.search.QParser.getQuery(QParser.java:142) \tat
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81)
 \tat
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
 \tat
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 \tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) \tat
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 \tat
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 \tat
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 \tat
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 \tat
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 \tat
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 \tat
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 \tat org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 \tat
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 \tat
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 \tat
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 \tat org.mortbay.jetty.Server.handle(Server.java:326) \tat
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) \tat
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 \tat org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) \tat
 org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) \tat
 org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) \tat
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 \tat o
 2011-10-04 19:33:41 ERROR 500: null  java.lang.NullPointerException \tat
 java.io.StringReader.init(StringReader.java:33) \tat
 org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:203) \tat
 org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:80) \tat
 org.apache.solr.search.QParser.getQuery(QParser.java:142) \tat
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:81)
 \tat
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
 \tat
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 \tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) \tat
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 \tat
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 \tat
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 \tat
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 \tat
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 \tat
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 \tat
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 \tat org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 \tat
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 \tat
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 \tat
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 \tat org.mortbay.jetty.Server.handle(Server.java:326) \tat
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) \tat
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 \tat org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) \tat
 org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) \tat

Re: Remove results limit

2011-10-04 Thread Tomás Fernández Löbbe

Hi Andrew, I think this question belongs to the users list more than to the
dev's list.

Programmatically, it depends on the client library you are using, if you are
using SolrJ, it should be something like:

SolrQuery query = new SolrQuery();
...
query.setRows(20);
query.setStart(40);

You can also change the default on the server side, to do this you have to
modify the request handler configuration on the solrconfig.xml file, adding
something like:

requestHandler name=search class=solr.SearchHandler default=true
 lst name=defaults
   *int name=rows20/int*
 /lst
/requestHandler

I hope this helps,

Tomás


On Tue, Oct 4, 2011 at 3:19 PM, Andrew Clark
andrew.clark.at...@gmail.comwrote:

 How about programmatically? Is there some config on the server I can change
 or some API call that changes the default resultset size?


 On Tue, Oct 4, 2011 at 2:00 PM, Erick Erickson erickerick...@gmail.comwrote:

 Set rows=200

 or page through it
 start=40rows=20

 and on the next one

 start=60rows=20

 Best
 Erick

 On Tue, Oct 4, 2011 at 1:44 PM, Andrew Clark
 andrew.clark.at...@gmail.com wrote:
  I get 193 documents found in my SolrDocumentList, but only 10 of them
 are
  returned to me.. how can I remove the 10 document limit?
  Thanks,
  Andrew
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

Hierarchical faceting with Date

2011-10-04 Thread Ravi Bulusu

Hi,

I'm trying to perform a hierarchical (pivot) faceted search and it doesn't
work with date (as one of the field).

My questions are
1. Is this a supported feature or just a bug that needs to be addressed?
2. If it is not intended to be supported, what is the complexity involved in
implementing it.

FYI
I'm using a SOLR 4.0 build from 09/30/2011.

Regards
Ravi Bulusu

Re: SOLR error with custom FacetComponent

2011-10-04 Thread Ravi Bulusu

Thanks for your response.
I could solve my use case with your suggestion.

-Ravi Bulusu

On Sat, Sep 24, 2011 at 1:51 PM, Ravi Bulusu ravi.b...@gmail.com wrote:

 Erik,

 Unfortunately the facet fields are not static. The field are dynamic SOLR
 fields and are generated by different applications.
 The field names will be populated into a data store (like memcache) and
 facets have to be driven from that data store.

 I need to write a Custom FacetComponent which picks up the facet fields
 from the data store.
 Thanks for your response.

 -Ravi Bulusu

 Subject:
 Re: SOLR error with custom FacetComponent
 From:
 Erik Hatcher erik.hatcher@...
 Date:
 2011-09-21 18:18
 Why create a custom facet component for this?

 Simply add lines like this to your request handler(s):

 str name=facet.fieldmanu_exact/str

 either in defaults or appends sections.

 Erik

 On Wed, Sep 21, 2011 at 2:00 PM, Ravi Bulusu ravi.b...@gmail.com wrote:

 Hi All,


 I'm trying to write a custom SOLR facet component and I'm getting some
 errors when I deploy my code into the SOLR server.

 Can you please let me know what Im doing wrong? I appreciate your help on
 this issue. Thanks.

 *Issue*

 I'm getting an error saying Error instantiating SearchComponent My
 Custom Class is not a org.apache.solr.handler.component.SearchComponent.

 My custom class inherits from *FacetComponent* which extends from *
 SearchComponent*.

 My custom class is defined as follows…

 I implemented the process method to meet our functionality.

 We have some default facets that have to be sent every time, irrespective
 of the Query request.


 /**

  *

  * @author ravibulusu

  */

 public class MyFacetComponent extends FacetComponent {

 ….

 }

Re: UniqueKey filed length exceeds

2011-10-04 Thread Chris Hostetter


: if you'd like to make a query like this you need to escape the : so
: something like

or use the term QParser, which was created for the explicit purpose of 
never needing to worry about escaping terms in your index...

q={!term f=id}2009-11-04:13:51:07.348184



-Hoss

Re: is there any attribute in schema.xml to avoid duplication in solr?

2011-10-04 Thread Chris Hostetter

y
:   i want to know whether is there any attribute in schema.xml to avoid
: the duplications?

you need to explain your problem better ... duplications can mean 
differnet things to different people.  (duplicate documents? duplicates 
terms in a field? etc...)

please provide a detailed description of your *real* goal and the 
*speciifcs* of the problems you are encountering...

https://wiki.apache.org/solr/UsingMailingLists

-Hoss

composite Unique Keys?

2011-10-04 Thread Jason Toy

I have several different document types that I store.  I use a serialized
integer that is unique to the document type.  If I use id as the uniqueKey,
then there is a possibility to have colliding docs on the id, what would be
the best way to have a unique id given I am storing my unique identifier in
an integer field? I've seen other solutions where the unique is consists of
DocTypeString 123245  as the id which seems pretty inefficient to me.

Re: sorting using function query results are notin order

2011-10-04 Thread abhayd

that was it..COunt was not indexed...works fine now

--
View this message in context: 
http://lucene.472066.n3.nabble.com/sorting-using-function-query-results-are-notin-order-tp3390926p3394876.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: sorting using function query results are notin order

2011-10-04 Thread Chris Hostetter


: Subject: Re: sorting using function query results are notin order
: 
: that was it..COunt was not indexed...works fine now

Hmmm... which version of solr are you using?  
what exactly is the FieldType of your Count field?

Since Solr 3.1, attempting to use a function on a non-indexed field 
*should* fail with a clear error...

https://issues.apache.org/jira/browse/SOLR-2348




-Hoss

Re: how to avoid duplicates in search results?

2011-10-04 Thread Chris Hostetter


: There is also a Document Duplicate Detection at index time:
: http://wiki.apache.org/solr/Deduplication

Of just setting url as your UniqueKey field would solve this simplr 
usecase.  but it's not entirely clear what else you consider duplicates 
besides this one example.

:  - doc
:   str name=descriptiontesting group/str
:   str name=nametesting group/str
:   str
:  name=urlhttp://abc.xyz.com/groups/testing-group/discussions/62/str
:   /doc
:  - doc
:   str name=descriptiontesting group/str
:   str name=nametesting group/str
:   str
:  name=urlhttp://abc.xyz.com/groups/testing-group/discussions/62/str
:   /doc

-Hoss

StreamingUpdateSolrServer and commitWithin

2011-10-04 Thread Leonardo Souza

Hi,

I'm confused about using StreamingUpdateSolrServer and commitWithin
parameter in conjuction with waitSearcher and waitFlush.

Does it make sense a request like this?

UpdateRequest updateRequest = new UpdateRequest();
updateRequest.setCommitWithin(12);
updateRequest.setWaitSearcher(false);
updateRequest.setWaitFlush(true);
updateRequest.add(doc);
this.solrServer.request(updateRequest);

I'm not sure if setWaitSearcher(false) is being honored, look at my log

INFO o.apache.solr.update.UpdateHandler - start
commit(optimize=false,waitFlush=true,*waitSearcher=true*
,expungeDeletes=false)

My guess is if commitWithin is used we do not have to care about setting
waitSearcher or waitFlush.

thanks

--
Leonardo S Souza

Re: solr 1.4 facet.limit behaviour in merging from several shards

2011-10-04 Thread Chris Hostetter


: OK, if SOLR-2403 being related to the bug I described, has been fixed in
: SOLR 3.4 than we are safe, since we are in the process of migration. Is it
: possible to verify this somehow? Is FacetComponent class is the one I should
: start checking this from? Can you give any other pointers?

According to Jira, it has not been fixed (note the Unresolved status)

FacetComponent is definitely the class that needs fixed.  If you 
are interestedin working on a patch, see Yonik's comments in the 
issue about how to approach it, in particular about how fixing 
mincount=1 is definitley solvable even if mincount  1 is a 
harder/intractable problem.  the change sounds simple, but writting test 
cases to verify it is going to also take some effort.


-Hoss

Re: indexing FTP documet with solrj

2011-10-04 Thread Chris Hostetter


: I want to index some document with solrj API's but the URL of theses
: documents is FTP, 
:  How to set username and password for FTP acount in solrj 
: 
: in solrj API there is CommonsHttpSolrServer method but i do not find any
: method for FTP configuration 

it sounds like you are getting ocnfused between using SolrJ to talk to 
*solr* And using SolrJ to index arbitrary URLs.

SolrJ doesn't do any crawling -- if you have data that you want to index 
then your client code needs to decide what that data is (and where it 
comes from) and feed that data to SolrJ as documents to index.  the only 
URLs that SolrJ knows about are:
 * the URL for tlaking to Solr
 * strings that SolrJ passes to solr as document fields that may just so 
   happen to be URLs (SolrJ doesn't know/care)

-Hoss

Re: Any plans to support function queries on score?

2011-10-04 Thread Chris Hostetter


: Do you have any plans to support function queries on score field? for
: example, sort=floor(product(score, 100)+0.5) desc?

You most certianly can conput function queries on the the score of a 
query, but you have to be explicit about which query you want to use the 
score of.  You seem to already know this...

: I can't use subquery in this case because I am trying to use secondary
: sorting, however I will be open for that if someone successfully use
: another field to boost the results.

...i don't understand your explanation of why you can't specify a 
subquery to indicate what you want to sort on.

a) the subquery can be the exact query you executed (you can even use 
variable substitution to garuntee it)

b) wether you use secondary sorting has no impact on how the function is 
computed

Here's an example using the solr sample data that works great...

q=ipodfl=inStock,id,price,scoresort=inStock+desc,+product(price,query($q))+desc




-Hoss

Re: SOLR HttpCache Qtime

2011-10-04 Thread Erick Erickson

Still doesn't make sense to me. There is no
Solr HTTP cache that I know of. There is a
queryResultCache. There is a filterCache.
There is a documentCache. There's may
even be custom cache implementations.
There's a fieldValueCache. There's
no http cache internal to Solr as far as I
can tell.

If you're asking if documents returned from
the queryResultCache have QTimes that
reflect the actual time spent (near 0), I'm
pretty sure the answer is yes.

If this doesn't answer your question, please
take the time to formulate a complete question.
It'll get you your answers quicker than multiple
twitter-style exchanges.

Best
Erick

On Tue, Oct 4, 2011 at 2:22 PM, Lord Khan Han khanuniver...@gmail.com wrote:
 I just want to be sure..  because its solr internal HTTP cache.. not an
 outside httpcacher

 On Tue, Oct 4, 2011 at 5:39 PM, Erick Erickson erickerick...@gmail.comwrote:

 But if the HTTP cache is what's returning the value,
 Solr never sees anything at all, right? So Solr
 doesn't have a chance to do anything here.

 Best
 Erick

 On Tue, Oct 4, 2011 at 9:24 AM, Lord Khan Han khanuniver...@gmail.com
 wrote:
  We are using this Qtime field and publishing in our front web. Even the
  httpCache decreasing the Qtime in reality, its still using the cached old
  Qtime value . We can use our internal qtime instead of Solr's but I just
  wonder is there any way to say Solr if its coming httpCache  re-calculate
  the Qtime.
 
  On Tue, Oct 4, 2011 at 4:16 AM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Why do you want to? QTime is the time Solr
  spends searching. The cached value will,
  indeed, be from the query that filled
  in the HTTP cache. But what are you doing
  with that information that you want to correct
  it?
 
  That said, I have no clue how you'd attempt to
  do this.
 
  Best
  Erick
 
  On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Han khanuniver...@gmail.com
  wrote:
   Hi,
  
   Is there anyway to get correct Qtime when we use http caching ? I
 think
  Solr
   caching also the Qtime so giving the the same Qtime in response what
 ever
   takes it to finish ..  How I can set Qtime correcly from solr when I
 use
   http caching On.
  
   thanks

Re: schema changes changes 3.3 to 3.4?

2011-10-04 Thread Erick Erickson

It looks to me like you changed the analysis
chain for the field in question by removing
stemmers of some sort or other. The quickest
way to answer this kind of question is to get
familiar with the admin/analysis page (don't
forget to check the verbose checkboxes). Enter
the term in both the index and query boxes and
hit the button. It shows you exactly what parts
of the chain performed what actions.

So you index analysis chain probably removed
the plurals, but your query-side didn't. So I'm
guessing that not only didn't it show the
metadata, it didn't even find the document in
question.

But that's a guess at this point.

Best
Erick

On Mon, Oct 3, 2011 at 4:22 PM, jo jairo.or...@firmex.com wrote:
 Hi, I have the following issue on my test environment
 when I do a query with the full word the reply no longer contains the
 attr_meta
 ex:
 http://solr1:8983/solr/core_1/select/?q=stegosaurus

 arr name=attr_content_encoding
 strISO-8859-1/str
 /arr
 arr name=attr_content_language
 stren/str
 /arr

 but if I remove just one letter it shows the expected response
 ex:
 http://solr1:8983/solr/core_1/select/?q=stegosauru

 arr name=attr_content_encoding
 strISO-8859-1/str
 /arr
 arr name=attr_meta
 strstream_source_info/str
 strdocument/str
 strstream_content_type/str
 strtext/plain/str
 strstream_size/str
 str81/str
 strContent-Encoding/str
 strISO-8859-1/str
 strstream_name/str
 strfilex123.txt/str
 strContent-Type/str
 strtext/plain/str
 strresourceName/str
 strdinosaurs5.txt/str
 /arr


 For troubleshooting I replaced the schema.xml from 3.3 into 3.4 and it work
 just fine, I can't find what changes on the schema would case this, any
 clues?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/schema-changes-changes-3-3-to-3-4-tp3391019p3391019.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to obtain the Explained output programmatically ?

2011-10-04 Thread Erick Erickson

If we're talking SolrJ here, I *think* you can
get it from the NamedListObject returned
from SolrResponse.getResponse, but I confess
I haven't tried it.

If not SolrJ, what is your plugin doing? And
where to you expect your plugin to be in
the query process?

Best
Erick

On Mon, Oct 3, 2011 at 5:31 PM, David Ryan help...@gmail.com wrote:
 Thanks Hoss!

 debug.explain.structuredhttps://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured
 is
 definitely helpful.  It adds some structure to the plain explained output.
 Is there a way to access these structured outputs in Java code (e.g., via
 Solr plugin class)?
 We could write a HTML parse to examine the output in the browser, but it's
 probably no the best way to do that.



 On Mon, Oct 3, 2011 at 2:11 PM, Chris Hostetter 
 hossman_luc...@fucit.orgwrote:


 :
 http://localhost:8983/solr/select/?q=GBversion=2.2start=0rows=10indent=ondebugQuery=truefl=id,score
        ...
 : the web browser.   Is there a way to access the explained output
 : programmatically?

 https://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured


 -Hoss

Re: SOLR HttpCache Qtime

2011-10-04 Thread Nicholas Chase

Seems to me what you're asking is how to have an accurate query time 
when you're getting a response that's been cached by an HTTP cache.  
This might be from the browser, or from a proxy, or from something else, 
but it's not from Solr.  The reason that the QTime doesn't change is 
because it's the entire response -- results, parameters, Qtime, and all 
-- that's cached.  Solr isn't making a new request; it doesn't even know 
that a request has been made.  So if you do 6 requests, and the last 5 
come from the cache, Solr has done only one request, with one Qtime.


So it sounds to me that you are looking for the RESPONSE time, which 
would be different from the QTime, and would, I suppose, come from your 
application, and not from Solr.


  Nick

On 10/4/2011 7:44 PM, Erick Erickson wrote:

Still doesn't make sense to me. There is no
Solr HTTP cache that I know of. There is a
queryResultCache. There is a filterCache.
There is a documentCache. There's may
even be custom cache implementations.
There's a fieldValueCache. There's
no http cache internal to Solr as far as I
can tell.

If you're asking if documents returned from
the queryResultCache have QTimes that
reflect the actual time spent (near 0), I'm
pretty sure the answer is yes.

If this doesn't answer your question, please
take the time to formulate a complete question.
It'll get you your answers quicker than multiple
twitter-style exchanges.

Best
Erick

On Tue, Oct 4, 2011 at 2:22 PM, Lord Khan Hankhanuniver...@gmail.com  wrote:

I just want to be sure..  because its solr internal HTTP cache.. not an
outside httpcacher

On Tue, Oct 4, 2011 at 5:39 PM, Erick Ericksonerickerick...@gmail.comwrote:


But if the HTTP cache is what's returning the value,
Solr never sees anything at all, right? So Solr
doesn't have a chance to do anything here.

Best
Erick

On Tue, Oct 4, 2011 at 9:24 AM, Lord Khan Hankhanuniver...@gmail.com
wrote:

We are using this Qtime field and publishing in our front web. Even the
httpCache decreasing the Qtime in reality, its still using the cached old
Qtime value . We can use our internal qtime instead of Solr's but I just
wonder is there any way to say Solr if its coming httpCache  re-calculate
the Qtime.

On Tue, Oct 4, 2011 at 4:16 AM, Erick Ericksonerickerick...@gmail.com
wrote:


Why do you want to? QTime is the time Solr
spends searching. The cached value will,
indeed, be from the query that filled
in the HTTP cache. But what are you doing
with that information that you want to correct
it?

That said, I have no clue how you'd attempt to
do this.

Best
Erick

On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Hankhanuniver...@gmail.com
wrote:

Hi,

Is there anyway to get correct Qtime when we use http caching ? I

think

Solr

caching also the Qtime so giving the the same Qtime in response what

ever

takes it to finish ..  How I can set Qtime correcly from solr when I

use

http caching On.

thanks

Re: sorting using function query results are notin order

2011-10-04 Thread abhayd

hi
  solr-spec-version
  4.0.0.2011.07.19.16.15.08
  solr-impl-version
  4.0-SNAPSHOT ${svnversion} - ad895d - 2011-07-19 16:15:08
  lucene-spec-version
  4.0-SNAPSHOT
  lucene-impl-version
  4.0-SNAPSHOT ${svnversion} - ad895d - 2011-07-19 16:15:13



--
View this message in context: 
http://lucene.472066.n3.nabble.com/sorting-using-function-query-results-are-notin-order-tp3390926p3395106.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: sorting using function query results are notin order

2011-10-04 Thread Chris Hostetter

:   solr-spec-version
:   4.0.0.2011.07.19.16.15.08
:   solr-impl-version
:   4.0-SNAPSHOT ${svnversion} - ad895d - 2011-07-19 16:15:08

Uh, ok ... that doesn't really answer my question at all.

According to that info, your version of solr was *built* on 2011-07-19, 
using some snapshot of trunk, but the svn version info is missing, so 
there is know way of knowing how old that snapshot is.  

you also didn't answer my question about the field type you are using for 
Count.

details matter.

SOLR-2348 was commited to trunk on 2011-02-17 (r1071459).  When i tried to 
reproduce your example using the current trunk, and an un-indexed 
TrieIntField (i edited popularity in the example schema.xml to make it 
indexed=false) i got the expected error message instead of silently 
computing he wrong value when i tried to use popularity in a function...

http://localhost:8983/solr/select?q=ipodfl=id,product%28popularity,price%29

So unless your 2011-07-19 build was from a version of svn that was already 
4 months old, there may still be something wrong that we should try to get 
to the bottom of.


-Hoss

Scoring of DisMax in Solr

2011-10-04 Thread David Ryan

Hi,


When I examine the score calculation of DisMax in Solr,   it looks to me
that DisMax is using  tf x idf^2 instead of tf x idf.
Does anyone have insight why tf x idf is not used here?

Here is the score contribution from one one field:

score(q,c) =  queryWeight x fieldWeight
   = tf x idf x idf x queryNorm x fieldNorm

Here is the example that I used to derive the formula above. Clearly, idf is
multiplied twice in the score calculation.
*
http://localhost:8983/solr/select/?q=GBversion=2.2start=0rows=10indent=ondebugQuery=truefl=id,score
*

str name=6H500F0
0.18314168 = (MATCH) sum of:
  0.18314168 = (MATCH) weight(text:gb in 1), product of:
0.35845062 = queryWeight(text:gb), product of:
  2.3121865 = idf(docFreq=6, numDocs=26)
  0.15502669 = queryNorm
0.5109258 = (MATCH) fieldWeight(text:gb in 1), product of:
  1.4142135 = tf(termFreq(text:gb)=2)
  2.3121865 = idf(docFreq=6, numDocs=26)
  0.15625 = fieldNorm(field=text, doc=1)
/str


Thanks!

DIH full-import with clean=false is still removing old data

2011-10-04 Thread Pulkit Singhal

Hello,

I have a unique dataset of 1,110,000 products, each as its own file.
It is split into three different directories as 500,000 and 110,000
files and 500,000.

When I run:
http://localhost:8983/solr/bbyopen/dataimport?command=full-importclean=falsecommit=true
The first 500,000 entries are successfully indexed and then the next
110,000 entries also work ... but after I run the third full-import on
the last set of 500,000 entries, the document count remains at 610,000
... it doesn't go up to 1,110,000!

1) Is there some kind of limit here? Why can the full-import keep the
initial 500,000 entries and then let me do a full-import with 110,000
more entries ... but when I try to do a 3rd full-import, the document
count doesn't go up.

2) I know for sure that all the data is unique. Since I am not doing
delta-imports, I have NOT specified any primary key in the
data-import.xml file. But I do have a uniqueKey in the schema.xml
file.

Any tips?
- Pulkit

Re: schema changes changes 3.3 to 3.4?

2011-10-04 Thread jo

Interesting... I did not make changes on the default settings, but defenetely
will give that a shot.. thanks I will comment later if I found a
solution beside replacing the schema with the default one on 3.3

thanks

JO

--
View this message in context: 
http://lucene.472066.n3.nabble.com/schema-changes-changes-3-3-to-3-4-tp3391019p3395265.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr 1.4 facet.limit behaviour in merging from several shards

2011-10-04 Thread Yonik Seeley

On Tue, Oct 4, 2011 at 7:13 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 : OK, if SOLR-2403 being related to the bug I described, has been fixed in
 : SOLR 3.4 than we are safe, since we are in the process of migration. Is it
 : possible to verify this somehow? Is FacetComponent class is the one I should
 : start checking this from? Can you give any other pointers?

 According to Jira, it has not been fixed (note the Unresolved status)

Boy, I wish jira defaulted to showing the All tab.
I *think* this actually has been fixed and I just forgot to close the issue?

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference


 FacetComponent is definitely the class that needs fixed.  If you
 are interestedin working on a patch, see Yonik's comments in the
 issue about how to approach it, in particular about how fixing
 mincount=1 is definitley solvable even if mincount  1 is a
 harder/intractable problem.  the change sounds simple, but writting test
 cases to verify it is going to also take some effort.


 -Hoss

Re: sorting using function query results are notin order

2011-10-04 Thread abhayd

hi hoss,
I see this in change.txt
$Id: CHANGES.txt 1148494 2011-07-19 19:25:01Z hossman $

repository root: http://svn.apache.org/repos/asf/lucene/dev/trunk
Revision:1148519

If u let me know what/where to look for i can send details.

Here is field defination

   field name=Count type=sint indexed=true stored=true
multiValued=false/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/sorting-using-function-query-results-are-notin-order-tp3390926p3395385.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH full-import with clean=false is still removing old data

2011-10-04 Thread Pulkit Singhal

Bah it worked after cleaning it out for the 3rd time, don't know what
I did differently this time :(

result name=response numFound=1110983 start=0/

On Tue, Oct 4, 2011 at 8:00 PM, Pulkit Singhal pulkitsing...@gmail.com wrote:
 Hello,

 I have a unique dataset of 1,110,000 products, each as its own file.
 It is split into three different directories as 500,000 and 110,000
 files and 500,000.

 When I run:
 http://localhost:8983/solr/bbyopen/dataimport?command=full-importclean=falsecommit=true
 The first 500,000 entries are successfully indexed and then the next
 110,000 entries also work ... but after I run the third full-import on
 the last set of 500,000 entries, the document count remains at 610,000
 ... it doesn't go up to 1,110,000!

 1) Is there some kind of limit here? Why can the full-import keep the
 initial 500,000 entries and then let me do a full-import with 110,000
 more entries ... but when I try to do a 3rd full-import, the document
 count doesn't go up.

 2) I know for sure that all the data is unique. Since I am not doing
 delta-imports, I have NOT specified any primary key in the
 data-import.xml file. But I do have a uniqueKey in the schema.xml
 file.

 Any tips?
 - Pulkit

A simple query?

2011-10-04 Thread alexw

Hi all,

This may seem to be an easy one but I have been struggling to get it
working. To simplify things, let's say I have a field that can contain any
combination of the 26 alphabetic letters, space delimited:

doc
myfielda b/myfield
/doc
doc
myfieldb c/myfield
/doc
doc
myfieldx y z/myfield
/doc

The search term is a list of user specified letters, for exampe: a b y z

I would like only the following docs to be returned:

1. Any doc that contains exactly the 4 letters a b y z (sequence not
important)
2. Any docs that contains only a subset of the 4 letters a b y z (sequence
not important)

Note if a doc contains any letters other than the 4 letters a b y z will not
qualify. So in this case, only the first doc should be returned.

Can some one shed some light here and let me know how to get this working,
specifically:

1. What should the field type be (text, text_ws, string...)?
2. How does the query look like?

Thanks in advance!







--
View this message in context: 
http://lucene.472066.n3.nabble.com/A-simple-query-tp3395465p3395465.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR HttpCache Qtime

2011-10-04 Thread Lance Norskog

Solr supports having the browser cache the results. If your client code
supports this caching, or your code goes through an HTTP cacher like Squid,
it could return a cached page for a query. Is this what you mean?

On Tue, Oct 4, 2011 at 4:55 PM, Nicholas Chase nch...@earthlink.net wrote:

 Seems to me what you're asking is how to have an accurate query time when
 you're getting a response that's been cached by an HTTP cache.  This might
 be from the browser, or from a proxy, or from something else, but it's not
 from Solr.  The reason that the QTime doesn't change is because it's the
 entire response -- results, parameters, Qtime, and all -- that's cached.
  Solr isn't making a new request; it doesn't even know that a request has
 been made.  So if you do 6 requests, and the last 5 come from the cache,
 Solr has done only one request, with one Qtime.

 So it sounds to me that you are looking for the RESPONSE time, which would
 be different from the QTime, and would, I suppose, come from your
 application, and not from Solr.

   Nick


 On 10/4/2011 7:44 PM, Erick Erickson wrote:

 Still doesn't make sense to me. There is no
 Solr HTTP cache that I know of. There is a
 queryResultCache. There is a filterCache.
 There is a documentCache. There's may
 even be custom cache implementations.
 There's a fieldValueCache. There's
 no http cache internal to Solr as far as I
 can tell.

 If you're asking if documents returned from
 the queryResultCache have QTimes that
 reflect the actual time spent (near 0), I'm
 pretty sure the answer is yes.

 If this doesn't answer your question, please
 take the time to formulate a complete question.
 It'll get you your answers quicker than multiple
 twitter-style exchanges.

 Best
 Erick

 On Tue, Oct 4, 2011 at 2:22 PM, Lord Khan Hankhanuniver...@gmail.com
  wrote:

 I just want to be sure..  because its solr internal HTTP cache.. not an
 outside httpcacher

 On Tue, Oct 4, 2011 at 5:39 PM, Erick 
 Ericksonerickerickson@gmail.**comerickerick...@gmail.com
 wrote:

  But if the HTTP cache is what's returning the value,
 Solr never sees anything at all, right? So Solr
 doesn't have a chance to do anything here.

 Best
 Erick

 On Tue, Oct 4, 2011 at 9:24 AM, Lord Khan Hankhanuniver...@gmail.com
 wrote:

 We are using this Qtime field and publishing in our front web. Even the
 httpCache decreasing the Qtime in reality, its still using the cached
 old
 Qtime value . We can use our internal qtime instead of Solr's but I
 just
 wonder is there any way to say Solr if its coming httpCache
  re-calculate
 the Qtime.

 On Tue, Oct 4, 2011 at 4:16 AM, Erick Ericksonerickerickson@gmail.**
 com erickerick...@gmail.com
 wrote:

  Why do you want to? QTime is the time Solr
 spends searching. The cached value will,
 indeed, be from the query that filled
 in the HTTP cache. But what are you doing
 with that information that you want to correct
 it?

 That said, I have no clue how you'd attempt to
 do this.

 Best
 Erick

 On Sat, Oct 1, 2011 at 5:55 PM, Lord Khan Hankhanuniver...@gmail.com
 
 wrote:

 Hi,

 Is there anyway to get correct Qtime when we use http caching ? I

 think

 Solr

 caching also the Qtime so giving the the same Qtime in response what

 ever

 takes it to finish ..  How I can set Qtime correcly from solr when I

 use

 http caching On.

 thanks





-- 
Lance Norskog
goks...@gmail.com

70 matches

Mail list logo