date:20120917

Thank you for the reply. I have done a bit of reading and it says I can also
use this one:

filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=30 /

This is what I will use I think, as it weeds out words like at I as a
bonus.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Only-exact-match-searches-working-tp4008160p4008264.html
Sent from the Solr - User mailing list archive at Nabble.com.

Taking a full text, then truncate and duplicate with stopwords

I've hit a bit of a wall and would appreciate some guidance. I want to index
a large block of text, like such:



I dont want to store this as it is in Solr, I want to instead have two
versions of it. One as a truncated form, and one as a keyword form.

*Truncated Form:*


*Keyword Form (using stopwords to remove common words):*


How should I be doing this. Purely with index analyzer's?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taking a full text, then truncate and duplicate with stopwords

 I dont want to store this as it is in Solr, I want to
 instead have two
 versions of it. One as a truncated form, and one as a
 keyword form.
 
 *Truncated Form:*

If truncated form means first N characters then copyField can be used
http://wiki.apache.org/solr/SchemaXml#Copy_Fields
 
 *Keyword Form (using stopwords to remove common words):*

Are you going to use this keyword form for searching or displaying purposes?

Re: Question about Fuzzy search in Solr

2012-09-17 Thread Rahul Warawdekar

Thanks.
Is any extra configuration from the Solr side to make this work ?
Any additional text files like synonyms.txt, any additional fields or any
changes in schema.xml or solrconfig.xml ?

On Mon, Sep 17, 2012 at 4:45 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 Is this what you are looking for

 https://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html#Fuzzy%20Searches
 ?

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Hi,

  I need to know how we can implement fuzzy searches using Solr.
  Can someone provide any links to any relevant documentation ?




-- 
Thanks and Regards
Rahul A. Warawdekar

Re: 1.3 to 3.6 migration

2012-09-17 Thread Sujatha Arun

Hi Jack,

Thanks.

Even though I have mentioned compound Index to true in the Indexconfig
section of schema for 3.6 version ,it still seems to create normal Index
files.

Attached is the solrconfig.xml

Please let me know if anything wrong

Regards
Sujatha

On Sat, Sep 15, 2012 at 9:43 PM, Jack Krupansky j...@basetechnology.comwrote:

Correcting myself, for #4, Solr doesn't analyze string fields such as
the unique key field, but... a transformer or other logic, say in DIH, that
constructs the document key values might behave differently between Solr
1.3 and 3.6. Maybe there was a bug in 1.3 that caused distinct keys to map
to the same value (causing documents to be discarded), but now in 3.6 the
mapping is correct and distinct (and more documents are correctly indexed.

-- Jack Krupansky

-Original Message- From: Jack Krupansky
Sent: Saturday, September 15, 2012 10:34 AM

To: solr-user@lucene.apache.org
Subject: Re: 1.3 to 3.6 migration

Try some queries in both the old and the new and identify some documents
that appear in one and not the other. Then examine a couple of those docs
in
detail one field at a time and see if anything is suspicious. Take each
field value and enter it into the Solr Admin Analysis page to see how Solr
3.6 analyzes the field value compared to 1.3.

Four likely scenarios:
1. The additional docs were not present when you indexed with 1.3.
2. Your indexing tool (DIH, or whatever) may have discarded the docs in
1.3
due to some issue that has now been resolved.
3. Solr 1.3 got an error those documents but your indexing process
continued
despite the error, while Solr 3.6 may not have hit those errors, possibly
because it is more flexible and has more features now.
4. Your key values analyze differently in Solr 3.6 so that the keys of the
extra documents mapped to other existing keys in Solr 1.3, causing the
extra documents to overwrite existing documents in Solr 1.3.

-- Jack Krupansky

-Original Message- From: Sujatha Arun
Sent: Saturday, September 15, 2012 2:39 AM
To: solr-user@lucene.apache.org
Subject: Re: 1.3 to 3.6 migration

Can you please elaborate?

Regards
Sujatha

On Sat, Sep 15, 2012 at 1:34 AM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:

Hi,

Maybe your indexer is different/modified/buggy?

Otis
--
Search Analytics -
http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Performance Monitoring -
http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Fri, Sep 14, 2012 at 3:23 PM, Sujatha Arun suja.a...@gmail.com
wrote:
Hi,

Just migrated to 3.6.1 from 1.3 version with the following observation

Indexed content using the same source

1.3
3.6.1
Number of documents indexed 11505 13937
Index Time - Full Index 170ms
171ms
Index size 23 MB
31MB
Query Time [first time] for *:* 44 ms
187

and *:* query is not cached in 3.6.1 in query result cache ,is this
expected?

some points:

Even though I used the same data source ,the number of documents indexed
seem to be more in 3.6.1 [ not sure why?]
All the other params including index size and query time seem to be more
instnead of less in 3.6.1 and queries are not getting cached in 3.6.1

Attached the schema's - any pointers?

Regards
Sujatha

?xml version=1.0 encoding=UTF-8 ?
!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the License); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an AS IS BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--

config

abortOnConfigurationError${solr.abortOnConfigurationError:true}/abortOnConfigurationError

luceneMatchVersionLUCENE_36/luceneMatchVersion

!-- The DirectoryFactory to use for indexes.
solr.StandardDirectoryFactory, the default, is filesystem based.
solr.RAMDirectoryFactory is memory based, not persistent, and doesn't work with replication. --
directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

indexConfig
!-- Values here affect all index writers and act as a default unless overridden. --
useCompoundFiletrue/useCompoundFile

mergeFactor4/mergeFactor

Re: Question about Fuzzy search in Solr

2012-09-17 Thread Rafał Kuć

Hello!

There is no need to include any changes or additional component to
have fuzzy search working in Solr.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Thanks.
 Is any extra configuration from the Solr side to make this work ?
 Any additional text files like synonyms.txt, any additional fields or any
 changes in schema.xml or solrconfig.xml ?

 On Mon, Sep 17, 2012 at 4:45 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 Is this what you are looking for

 https://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html#Fuzzy%20Searches
 ?

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Hi,

  I need to know how we can implement fuzzy searches using Solr.
  Can someone provide any links to any relevant documentation ?

Re: Question about Fuzzy search in Solr

2012-09-17 Thread Rahul Warawdekar

Got it.
Thanks Rafał !

On Mon, Sep 17, 2012 at 6:37 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 There is no need to include any changes or additional component to
 have fuzzy search working in Solr.

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Thanks.
  Is any extra configuration from the Solr side to make this work ?
  Any additional text files like synonyms.txt, any additional fields or any
  changes in schema.xml or solrconfig.xml ?

  On Mon, Sep 17, 2012 at 4:45 PM, Rafał Kuć r@solr.pl wrote:

  Hello!
 
  Is this what you are looking for
 
 
 https://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html#Fuzzy%20Searches
  ?
 
  --
  Regards,
   Rafał Kuć
   Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
 ElasticSearch
 
   Hi,
 
   I need to know how we can implement fuzzy searches using Solr.
   Can someone provide any links to any relevant documentation ?
 
 





-- 
Thanks and Regards
Rahul A. Warawdekar

Stats field with decimal values

2012-09-17 Thread Gustav

Hello everyone,
When im using stats=truestats=product_price parameter, it returns me the
following structure:

lst name=stats
lst name=stats_fields
lst name=produto_preco
double name=min1.0/double
double name=max1.0/double
long name=count7/long
long name=missing0/long
double name=sum7.0/double
double name=sumOfSquares7.0/double
double name=mean1.0/double
double name=stddev0.0/double
/lst
/lst
/lst

What im looking for is these 2:
double name=min1.0/double
double name=max1.0/double
Is it possible to them be returned as decimal values?
Like this:
double name=min1.00/double
double name=max1.00/double

Tnks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stats-field-with-decimal-values-tp4008292.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taking a full text, then truncate and duplicate with stopwords

Purely for searching. 

The truncated form is just to show to the user as a preview, and the keyword
form is for the keyword searching.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008295.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Stats field with decimal values

Could you clue us in as to why this is important to you? I mean, any modern 
programming language should be capable of dealing with parsing 1.0 if it 
can deal with parsing 1.00.


-- Jack Krupansky

-Original Message- 
From: Gustav

Sent: Monday, September 17, 2012 9:19 AM
To: solr-user@lucene.apache.org
Subject: Stats field with decimal values

Hello everyone,
When im using stats=truestats=product_price parameter, it returns me the
following structure:

lst name=stats
lst name=stats_fields
lst name=produto_preco
double name=min1.0/double
double name=max1.0/double
long name=count7/long
long name=missing0/long
double name=sum7.0/double
double name=sumOfSquares7.0/double
double name=mean1.0/double
double name=stddev0.0/double
/lst
/lst
/lst

What im looking for is these 2:
double name=min1.0/double
double name=max1.0/double
Is it possible to them be returned as decimal values?
Like this:
double name=min1.00/double
double name=max1.00/double

Tnks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stats-field-with-decimal-values-tp4008292.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing PDF-Files using Solr Cell


Add the fmap.content=your-stored-field to the URL.

Or if your schema doesn't already have a content field, add one that is 
stored and it will automatically be used.


-- Jack Krupansky

-Original Message- 
From: Alexander Troost

Sent: Monday, September 17, 2012 1:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing PDF-Files using Solr Cell

Thank you for your response.

I'm writing my Bachelor-Thesis about Solr and my company doesn't want me to
use a beta-version.

I dont want to be annoying, but how do i direct the content to a stored
filed and so on... in the URL i use for the HTTP-POST? In a config-file?





2012/9/17 Jack Krupansky j...@basetechnology.com


Be sure to direct the content to a stored field (such as content)
which you can add to your fl field list to return. Then use a copyField
to copy that stored field to the  text field for searching.

Again, this is all simplified in Solr 4.0-BETA.


-- Jack Krupansky

-Original Message- From: Alexander Troost
Sent: Sunday, September 16, 2012 11:59 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing PDF-Files using Solr Cell


Hi, first of all: Thank you for that quick response!

But i am not sure if i am doing this right.

For my point of view the command now has to look like:

curl 
http://localhost:8983/solr/**update/extract?literal.id=**
doc11literal.filename=markus**fmap.content=textcommit=truehttp://localhost:8983/solr/update/extract?literal.id=doc11literal.filename=markusfmap.content=textcommit=true

-F myfile=@markus.pdf

When I am seaching now for Text in the PDF, i am getting the result:

result name=response numFound=1 start=0
doc
str name=authorA28240/str
arr name=content_typestr**application/pdf/str/arr
str name=iddoc11/str
date name=last_modified2012-09-**17T03:49:39Z/date
/doc
/result

SORRY for being such a newbie and sorry for my bad english. It's 6 AM here
and i spend the whole night at the computer :-)

Greetz

A


2012/9/17 Jack Krupansky j...@basetechnology.com

 The content will be sent to the content field, which you can redirect

using the fmap.content=some-field request parameter. You need to
explicitly set the file name field yourself, using the
literal.your-file-name-field=file-name request parameter.


Also, if using Solr 4.0-BETA, you can simply use the SimplePostTool
(post.jar) to send documents to SolrCell, which will automatically take
care of these extra steps.

-- Jack Krupansky

-Original Message- From: Alexander Troost
Sent: Sunday, September 16, 2012 10:16 PM
To: solr-user@lucene.apache.org
Subject: Indexing PDF-Files using Solr Cell


Hello *,

I've got a problem indexing and searching PDF-Files.

It seems like Solr doenst index the name of the file.

In returning i only get
result name=response numFound=1 start=0docstr
name=authorA28240/strarr
name=content_typestrapplication/pdf/str/arrstr

name=iddoc5/strdate
name=last_modified2012-09-17T01:45:39Z/date/doc/result


He founds the right document, but no content or title is displayed in the
XML-Response. Where do i config that?

I index my documents (right now) via curl

e.g.:

curl 
http://localhost:8983/solr/update/extract?literal.id=**http://localhost:8983/solr/**update/extract?literal.id=**

doc7commit=truehttp://**localhost:8983/solr/update/**
extract?literal.id=doc7**commit=truehttp://localhost:8983/solr/update/extract?literal.id=doc7commit=true



-F myfile=@xyz.pdf


Where is my mistake?

Greeting

Alex

Re: Taking a full text, then truncate and duplicate with stopwords

In an attempt to answer my own question, is this a good solution.

Before I was thinking of importing my fulltext description once, then
sorting it into two seperate fields in solr, one truncated, one keyword.

How about instead actually importing my fulltext description twice. Then I
can import it first into truncated_description and then again into
keyword_description.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008327.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Only exact match searches working

That will match internal substrings in addition to prefix strings. EdgeNGram 
does only prefix substrings, which is generally what people want. So, 
NGramFilter would match England when the query is land or gland, 
gla, etc.


Use the Solr Admin Analysis UI to enter text to see how the filter analyzes 
it to make sure it is what you expect.


-- Jack Krupansky

-Original Message- 
From: Spadez

Sent: Monday, September 17, 2012 7:16 AM
To: solr-user@lucene.apache.org
Subject: Re: Only exact match searches working

Thank you for the reply. I have done a bit of reading and it says I can also
use this one:

filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=30 /

This is what I will use I think, as it weeds out words like at I as a
bonus.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Only-exact-match-searches-working-tp4008160p4008264.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question about Fuzzy search in Solr

That doc is out of date for 4.0. See the 4.0 Javadoc on FuzzyQuery for 
updated info. The tilda right operand is now an integer editing distance 
(number of times to insert char, delete char, change char, or transpose two 
adjacent chars to map index term to query term) that is limited to 2.


Be aware that if you use fuzzy query in 3.6/3.6.1 or earlier, it will change 
when you go to 4.0.


-- Jack Krupansky

-Original Message- 
From: Rafał Kuć

Sent: Monday, September 17, 2012 7:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Question about Fuzzy search in Solr

Hello!

Is this what you are looking for
https://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html#Fuzzy%20Searches
?

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch


Hi,



I need to know how we can implement fuzzy searches using Solr.
Can someone provide any links to any relevant documentation ?

Re: Question about Fuzzy search in Solr

2012-09-17 Thread Rahul Warawdekar

Thanks Jack.
We are using Solr 3.4.

On Mon, Sep 17, 2012 at 8:18 PM, Jack Krupansky j...@basetechnology.comwrote:

 That doc is out of date for 4.0. See the 4.0 Javadoc on FuzzyQuery for
 updated info. The tilda right operand is now an integer editing distance
 (number of times to insert char, delete char, change char, or transpose two
 adjacent chars to map index term to query term) that is limited to 2.

 Be aware that if you use fuzzy query in 3.6/3.6.1 or earlier, it will
 change when you go to 4.0.

 -- Jack Krupansky

 -Original Message- From: Rafał Kuć
 Sent: Monday, September 17, 2012 7:15 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Question about Fuzzy search in Solr


 Hello!

 Is this what you are looking for
 https://lucene.apache.org/**core/old_versioned_docs/**versions/3_0_0/**
 queryparsersyntax.html#Fuzzy%**20Searcheshttps://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/queryparsersyntax.html#Fuzzy%20Searches
 ?

 --
 Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Hi,


  I need to know how we can implement fuzzy searches using Solr.
 Can someone provide any links to any relevant documentation ?





-- 
Thanks and Regards
Rahul A. Warawdekar

Re: Taking a full text, then truncate and duplicate with stopwords

--- On Mon, 9/17/12, Spadez james_will...@hotmail.com wrote:

 From: Spadez james_will...@hotmail.com
 Subject: Re: Taking a full text, then truncate and duplicate with stopwords
 To: solr-user@lucene.apache.org
 Date: Monday, September 17, 2012, 5:32 PM
 In an attempt to answer my own
 question, is this a good solution.

 Before I was thinking of importing my fulltext description
 once, then
 sorting it into two seperate fields in solr, one truncated,
 one keyword.

 How about instead actually importing my fulltext description
 twice. Then I
 can import it first into truncated_description and then
 again into
 keyword_description.

Have you used copyField? copyField source=keyword_description 
dest=truncated_description maxChars=3000/

field name=truncated_description indexed=false stored=true
field name=keyword_description indexed=true stored=false

Installing Tomcat as the user solr?

2012-09-17 Thread Ken Clarke

Can I have some clarification about installing Tomcat as the user solr?  See 
http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6 second paragraph, 
which states Create the solr user. As solr, extract the Tomcat 6.0 download 
into /opt/tomcat6.

Does this user need a home-dir?  (I'm guessing no).  Should it have it's own 
private group?  If so, is that group a system group with GID  500?  What about 
a login shell (again I'm guessing not necessary)

The documentation doesn't go on to say that you should switch to the solr user 
account when installing SOLR.  Sorry if that sounds like a dumb question, but 
there is no explanation about why tomcat needs to be installed as solr rather 
than tomcat or root.

Thanks.

Apache solr for Oracle DB

2012-09-17 Thread vijaym

Hi

I am to planning use APache solr for Oracle DB based (Future we may may use
some other DB) search for our project. Its going to be a customer faced
product and we are using Spring MVC frame work. Could anybody help me how
can i integrate  Apache Solr with my project or could any body suggest me a
better document?

Thanks  Regards
Vijay



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Apache-solr-for-Oracle-DB-tp4008351.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Installing Tomcat as the user solr?

2012-09-17 Thread Michael Della Bitta

I probably wouldn't suggest running Tomcat as root because of the
principle of least privilege, but aside from that, it's sort of
immaterial what you call the account, particularly if you already have
a 'tomcat' daemon account set up.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Mon, Sep 17, 2012 at 11:13 AM, Ken Clarke
k_cla...@perlprogrammer.net wrote:
 Can I have some clarification about installing Tomcat as the user solr?  See 
 http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6 second paragraph, 
 which states Create the solr user. As solr, extract the Tomcat 6.0 download 
 into /opt/tomcat6.

 Does this user need a home-dir?  (I'm guessing no).  Should it have it's own 
 private group?  If so, is that group a system group with GID  500?  What 
 about a login shell (again I'm guessing not necessary)

 The documentation doesn't go on to say that you should switch to the solr 
 user account when installing SOLR.  Sorry if that sounds like a dumb 
 question, but there is no explanation about why tomcat needs to be installed 
 as solr rather than tomcat or root.

 Thanks.

Re: Taking a full text, then truncate and duplicate with stopwords

Thank you for the reply.


The trouble is, I want the truncated desciption to still have the keywords.

If I pass it to the keyword_descipriton and remove words like and i
then if etc, then copy it across to truncated_description, my truncated
description will not be a sentance, it will only be keywords.

*How I want my truncated text to be:*
Several men are in the locker room of a golf club. A cell phone on a bench
rings and a man engages the hands-free speaker function and begins to talk.
Everyone else...

*How it would be under your scenario:*
Several men locker room golf club cell phone bench rings man engages
hands-free speaker function begins talk Everyone else




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008358.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Only exact match searches working

Ok. I can still define GramSize too?

*filter class=solr.EdgeNGramFilterFactory minGramSize=3
maxGramSize=30 /*



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Only-exact-match-searches-working-tp4008160p4008361.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taking a full text, then truncate and duplicate with stopwords

 The trouble is, I want the truncated desciption to still
 have the keywords.


copyField copies raw text, it has noting to do with analysis.

RE: Solr 4.0 - Join performance

2012-09-17 Thread Eric Khoury

Hi David, I see that you committed the work for solr-3304 to the 4.x tree, 
which is great news, thanks.I'm not fully familiar with the process, does that 
mean its currently available in the nighty builds?Eric.
  Date: Wed, 29 Aug 2012 08:44:14 -0700
 From: dsmi...@mitre.org
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 4.0 - Join performance

 The solr.GeoHashFieldType is useless; I'd like to see it deprecated then 
 removed.  You'll need to go with unreleased code and apply patches or wait 
 till Solr 4.

 ~ David

 On Aug 29, 2012, at 10:53 AM, Eric Khoury [via Lucene] wrote:

 Awesome, thanks David.  In the meantime, could I potentially use geohash, or 
 something similar?  Geohash looks like it supports seperate lon or lat 
 range queries which would help, but its not a multivalue field, which I need.
   Date: Wed, 29 Aug 2012 07:20:42 -0700

  From: [hidden 
  email]x-msg://228/user/SendEmail.jtp?type=nodenode=4004060i=0
  To: [hidden 
  email]x-msg://228/user/SendEmail.jtp?type=nodenode=4004060i=1
  Subject: Re: Solr 4.0 - Join performance

  Solr 4 is certainly the goal.  There's a bit of a setback at the moment 
  until some of the Lucene spatial API is re-thought.  I'm working heavily on 
  such things this week.
  ~ David

  On Aug 28, 2012, at 6:22 PM, Eric Khoury [via Lucene] wrote:

  David, Solr support for this will come in Solr-3304 I 
  suppose?http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4Any idea 
  if this is going to make it into Solr 4.0? Thanks,Eric.
Date: Wed, 15 Aug 2012 07:07:21 -0700

   From: [hidden 
   email]x-msg://178/user/SendEmail.jtp?type=nodenode=4003852i=0
   To: [hidden 
   email]x-msg://178/user/SendEmail.jtp?type=nodenode=4003852i=1
   Subject: RE: Solr 4.0 - Join performance

   You would index rectangles of 0 height but that have a left edge 'x' of 
   the
   start time and a right edge 'x' of your end time.  You can index a 
   variable
   number of these per Solr document and then query by either a point or
   another rectangle to find documents which intersect your query shape.  It
   can't do a completely within based query, just intersection for now.  I
   really look forward to seeing this wrapped up in some sort of 
   RangeFieldType
   so that users don't have to think in spatial terms.

   -
Author: 
   http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
   --
   View this message in context: 
   http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
   Sent from the Solr - User mailing list archive at 
   Nabble.comhttp://Nabble.comhttp://Nabble.comhttp://Nabble.com/.

  If you reply to this email, your message will be added to the discussion 
  below:

  NAMLx-msg://228/http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

  -
   Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
  --
  View this message in context: 
  http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004035.html
  Sent from the Solr - User mailing list archive at 
  Nabble.comhttp://Nabble.com.

 If you reply to this email, your message will be added to the discussion 
 below:
 http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004060.html
 To unsubscribe from Solr 4.0 - Join performance, click 
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3998827code=RFNNSUxFWUBtaXRyZS5vcmd8Mzk5ODgyN3wxMDE2NDI2OTUw.
 NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml

 -
  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004073.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taking a full text, then truncate and duplicate with stopwords

Maybe I dont understand, but if you are copying the keyword description field
and then truncating it then the truncated form will only have keywords too.
That isnt what I want. I want the truncated form to have words like a
the it etc that would have been removed when added to
keyword_description.

copyField source=keyword_description dest=truncated_description
maxChars=3000/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008372.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Only exact match searches working

 Ok. I can still define GramSize too?
 
 *filter class=solr.EdgeNGramFilterFactory
 minGramSize=3
 maxGramSize=30 /*

Yes you can. 
http://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/EdgeNGramFilterFactory.html

Re: Taking a full text, then truncate and duplicate with stopwords

--- On Mon, 9/17/12, Spadez james_will...@hotmail.com wrote:

 From: Spadez james_will...@hotmail.com
 Subject: Re: Taking a full text, then truncate and duplicate with stopwords
 To: solr-user@lucene.apache.org
 Date: Monday, September 17, 2012, 7:10 PM
 Maybe I dont understand, but if you
 are copying the keyword description field
 and then truncating it then the truncated form will only
 have keywords too.
 That isnt what I want. I want the truncated form to have
 words like a
 the it etc that would have been removed when added to
 keyword_description.

 copyField source=keyword_description
 dest=truncated_description
 maxChars=3000/

If you add a document 
add
doc
field name=keyword_description
Several men are in the locker room of a golf club. A cell phone on a bench 
rings and a man engages the hands-free speaker function and begins to talk. 
Everyone else in the room stops to listen. The man hangs up. The other men in 
the locker room are looking at him in astonishment. Then he smiles and asks: 
Anyone know whose phone is???!!!
/field

you will see that truncated_description will have joining words (a, the, etc).

Re: Taking a full text, then truncate and duplicate with stopwords

The only catch here is that copyField might truncate in the middle of a 
word, yielding an improper term.


-- Jack Krupansky

-Original Message- 
From: Ahmet Arslan

Sent: Monday, September 17, 2012 11:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Taking a full text, then truncate and duplicate with stopwords


The trouble is, I want the truncated desciption to still
have the keywords.



copyField copies raw text, it has noting to do with analysis.

Re: Taking a full text, then truncate and duplicate with stopwords

I'm really confused here. I have a document which is say 4000 words long. I
want to get this put into two fields in Solr without having to save the
original document in its entirety within Solr.

When I import my fulltext (4000 word) document to Solr I was going to put it
straight into keyword_document which uses stopwords to remove words like
and it this. Now I only have 3000 words for example.

Then if I do copy command to move it into truncate_document then even though
I can reduce it down to say 100 words, it is lacking words like and it
and this because it has been copied from the keyword_document.

I want the following scenario:

truncate_document to have 100 words including words like and it and
this
keyword_docment to have only stop words removed
And finally only have the fulltext document, full length and all stop words,
exist in my SQL database.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Taking a full text, then truncate and duplicate with stopwords

 Then if I do copy command to move it into truncate_document
 then even though
 I can reduce it down to say 100 words, it is lacking words
 like and it
 and this because it has been copied from the
 keyword_document.

That's not true. copy operation is performed before analysis (stopword removal, 
lowercasing etc). It will copy raw text of keyword_document field. It has 
noting to do with analysis of source field.

Re: Taking a full text, then truncate and duplicate with stopwords

You said it has been copied from the keyword_document [field], but the 
reality is that Solr is not copying from the indexed value of the field, but 
from the source value for the field. The idea is that multiple fields can be 
based on the same source value even if they analyze and index the value in 
different ways.


-- Jack Krupansky

-Original Message- 
From: Spadez

Sent: Monday, September 17, 2012 12:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Taking a full text, then truncate and duplicate with stopwords

I'm really confused here. I have a document which is say 4000 words long. I
want to get this put into two fields in Solr without having to save the
original document in its entirety within Solr.

When I import my fulltext (4000 word) document to Solr I was going to put it
straight into keyword_document which uses stopwords to remove words like
and it this. Now I only have 3000 words for example.

Then if I do copy command to move it into truncate_document then even though
I can reduce it down to say 100 words, it is lacking words like and it
and this because it has been copied from the keyword_document.

I want the following scenario:

truncate_document to have 100 words including words like and it and
this
keyword_docment to have only stop words removed
And finally only have the fulltext document, full length and all stop words,
exist in my SQL database.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 4.0 - Join performance

2012-09-17 Thread David Smiley (@MITRE.org)

Yes absolutely.  Since 4.0 hasn't been released, anything with a fix version to 
4.0 basically implies trunk as well.  Also notice my comment Committed to 
trunk  4x which is explicit.
~ David

On Sep 17, 2012, at 12:02 PM, Eric Khoury [via Lucene] wrote:


Hi David, I see that you committed the work for solr-3304 to the 4.x tree, 
which is great news, thanks.I'm not fully familiar with the process, does that 
mean its currently available in the nighty builds?Eric.
  Date: Wed, 29 Aug 2012 08:44:14 -0700

 From: [hidden 
 email]x-msg://175/user/SendEmail.jtp?type=nodenode=4008368i=0
 To: [hidden email]x-msg://175/user/SendEmail.jtp?type=nodenode=4008368i=1
 Subject: Re: Solr 4.0 - Join performance

 The solr.GeoHashFieldType is useless; I'd like to see it deprecated then 
 removed.  You'll need to go with unreleased code and apply patches or wait 
 till Solr 4.

 ~ David

 On Aug 29, 2012, at 10:53 AM, Eric Khoury [via Lucene] wrote:


 Awesome, thanks David.  In the meantime, could I potentially use geohash, or 
 something similar?  Geohash looks like it supports seperate lon or lat 
 range queries which would help, but its not a multivalue field, which I need.
   Date: Wed, 29 Aug 2012 07:20:42 -0700

  From: [hidden 
  email]x-msg://228/user/SendEmail.jtp?type=nodenode=4004060i=0
  To: [hidden 
  email]x-msg://228/user/SendEmail.jtp?type=nodenode=4004060i=1
  Subject: Re: Solr 4.0 - Join performance
 
  Solr 4 is certainly the goal.  There's a bit of a setback at the moment 
  until some of the Lucene spatial API is re-thought.  I'm working heavily on 
  such things this week.
  ~ David
 
  On Aug 28, 2012, at 6:22 PM, Eric Khoury [via Lucene] wrote:
 
 
  David, Solr support for this will come in Solr-3304 I 
  suppose?http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4Any idea 
  if this is going to make it into Solr 4.0? Thanks,Eric.
Date: Wed, 15 Aug 2012 07:07:21 -0700
 
   From: [hidden 
   email]x-msg://178/user/SendEmail.jtp?type=nodenode=4003852i=0
   To: [hidden 
   email]x-msg://178/user/SendEmail.jtp?type=nodenode=4003852i=1
   Subject: RE: Solr 4.0 - Join performance
  
   You would index rectangles of 0 height but that have a left edge 'x' of 
   the
   start time and a right edge 'x' of your end time.  You can index a 
   variable
   number of these per Solr document and then query by either a point or
   another rectangle to find documents which intersect your query shape.  It
   can't do a completely within based query, just intersection for now.  I
   really look forward to seeing this wrapped up in some sort of 
   RangeFieldType
   so that users don't have to think in spatial terms.
  
  
  
   -
Author: 
   http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
   --
   View this message in context: 
   http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4001404.html
   Sent from the Solr - User mailing list archive at 
   Nabble.comhttp://Nabble.comhttp://Nabble.comhttp://Nabble.com/http://Nabble.comhttp://Nabble.com/http://Nabble.com/.
 
 
  
  If you reply to this email, your message will be added to the discussion 
  below:
 
  NAMLx-msg://228/http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
 
 
 
 
 
  -
   Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
  --
  View this message in context: 
  http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004035.html
  Sent from the Solr - User mailing list archive at 
  Nabble.comhttp://Nabble.comhttp://Nabble.comhttp://Nabble.com/.


 
 If you reply to this email, your message will be added to the discussion 
 below:

 NAMLx-msg://175/http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





 -
  Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4004073.html
 Sent from the Solr - User mailing list archive at 
 Nabble.comhttp://Nabble.com.



If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Solr-4-0-Join-performance-tp3998827p4008368.html
To unsubscribe from Solr 4.0 - Join performance, click

Re: Stats field with decimal values

2012-09-17 Thread Gustav

Well, my client is asking if is it possible, im just providing the search
enginne to him, not working directly with the application. Dont know exactly
in what language he is programming.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stats-field-with-decimal-values-tp4008292p4008395.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Clustering

2012-09-17 Thread Denis Kuzmenok




Sorry for late response. To be strict, here is what i want:

* I get documents all the time. Let's assume those are news (It's
rather similar thing).

* Every time i get new batch of news i should add them to Solr index
and get cluster information for that document. Store this information
in the DB (so i should know each document's cluster).

* I can't wait for cluster definition service/program to launch from
time to time, but it should define clusters on the fly.

* I want to be able to get clusters only for some period of time (For
example i want to search for clusters only for documents that were
loader one month ago).

* I will have tens of thousands of new documents every day and overall
base of several millions.

I'm reading Mahout in action now. But maybe you can point me to what i
need.
--- Исходное сообщение ---
От кого: Chandan Tamrakar chandan.tamra...@nepasoft.com
Кому: solr-user@lucene.apache.org
Дата: 4 сентября 2012, 12:30:56
Тема: Re: Solr Clustering





yes there is a solr component if you want to cluster solr documents , check
the following linkhttp://wiki.apache.org/solr/ClusteringComponent
Carrot2 might be good if you want to cluster few thousands of documents ,
for example when user search solr , just cluster the  search results

Mahout is much more scalable and probably you need Hadoop for that


thanks
chandan

On Tue, Sep 4, 2012 at 2:10 PM, Denis Kuzmenok forward...@ukr.net wrote:



  Original Message 
 Subject: Solr Clustering
 From: Denis Kuzmenok forward...@ukr.net
 To: solr-user@lucene.apache.org CC:

 Hi, all.
 I know there is carrot2 and mahout for clustering. I want to implement
 such thing:
 I fetch documents and want to group them into clusters when they are added
 to index (i want to filter similar documents for example for 1 week). i
 need these documents quickly, so i cant rely on some postponed
 calculations. Each document should have assigned cluster id (like group
 similar documents into clusters and assign each document its cluster id.
 It's something similar to news aggregators like google news. I dont need
 to search for clusters with documents older than 1 week (for example). Each
 document will have its unique id and saved into DB. But solr will have
 cluster id field also.
 Is it possible to implement this with solr/carrot/mahout?




-- 
Chandan Tamrakar
*
*

Re: Selective field level security

2012-09-17 Thread Peter Sturge

Hi,

Solr doesn't have any built-in mechanism for document/field level security
- basically it's delegated to the container to provide security, but this
of course won't apply to specific documents and/or fields.
There are are a lot of ways to skin this cat, some bits of which have been
covered by your message.

What can be the trickiest thing about this isn't so much adding indexed
fields etc., but rather how you plan to determine who the 'searching user'
actually is.
This task can seem not too bad at first, then all sorts of worms start
streaming out of the can (e.g. how to avoid spoofing/identity theft).
Once you're app is confident it has a bona-fide user, you then need a way
to map the user to a set of fields/docs/permissions etc. that he/she
can/can't look at.

There are plenty of approaches - mainly driven by:
 * where your original data lives (outside of Solr? does it still exist?
etc)
 * is there an external ACL mechanism that you can use (e.g. file system
permissions)
 * how do you manage users? (e.g. internal emplyoyees? public website
account holders? anyone?)

Two Jiras of note might help you in your quest:
SOLR-1872   (a good approach if you don't have access to the original
data at search-time)
SOLR-1895   (uses ManifoldCF - good if you have access to original data
and use its permissions - e.g. file system ACL)

HTH,
Peter





On Mon, Sep 17, 2012 at 7:44 PM, Nalini Kartha nalinikar...@gmail.comwrote:

 Hi,

 We're trying to push some security related info into the index which will
 control which users can search certain fields and we're wondering what the
 best way to accomplish this is.

 Some records that are being indexed and searched can have certain fields
 marked as private. When a field is marked as private, some querying users
 should not see/search on it whereas some super users can.

 Here's the solutions we're considering -

- Index a separate boolean value into a new _INTERNAL field to indicate
if the corresponding field value is marked private or not and include a
filter in the query when the searching user is not a super user.

 So for eg., consider that a record can contain 3 fields - field[123] where
 field1 and field2 can be marked as private but field3 cannot.

 Record A has only field1 marked as private, record B has both field1 and
 field2 marked as private.

 When we index these records here's what we'd end up with in the index -

 Record A -
 field1:something,  field1_INTERNAL:1, field2:something,
 field2_INTERNAL:0, field3:something
 Record B -
 field1:something,  field1_INTERNAL:1, field2:something,
 field2_INTERNAL:1, field3:something

 If the searching user is NOT a super user then the query (let's say it's
 'hidden security') needs to look like this-

 ((field3:hidden) OR (field1:hidden AND field1_INTERNAL:0) OR (field2:hidden
 AND field2_INTERNAL:0)) AND ((field3:security) OR (field1:security AND
 field1_INTERNAL:0) OR (field2:security AND field2_INTERNAL:0))

 Manipulating the query this way seems painful and error prone so we're
 wondering if Solr provides anything out of the box that would help with
 this?


- Index the private values themselves into a separate _INTERNAL field
and then determine which fields to query depending on the visibility of
 the
searching user.

 So using the example from above, here's what the indexed records would look
 like -

 Record A - field1_INTERNAL:something, field2:something,
  field3:something
 Record B - field1_INTERNAL:something, field2_INTERNAL:something,
 field3:something

 If the searching user is NOT a super user then the query just needs to be
 against the regular fields whereas if the searching user IS a super user,
 the query needs to be against BOTH the regular and INTERNAL fields.

 The issue with this solution is that since the number of docs that include
 the INTERNAL fields is going to be much fewer we're wondering if relevancy
 would be messed up when we're querying both regular and internal fields for
 super users?

 Thoughts?

 Thanks,
 Nalini

RE: Selective field level security

2012-09-17 Thread Swati Swoboda

Hi Nalini,

We had similar requirements and this is how we did it (using your example):

Record A:
Field1_All: something
Field1_Private: something
Field2_All: ''
Field2_Private: something private
Field3_All: ''
Field3_Private: something very private

Fields_All: something
Fields_Private: something something private something very private

Basically, we're just using a lot of copy fields and dynamic fields. Instead of 
storing a type, we just change the column name. So if someone who had access to 
private fields, we would perform our search in the private column fields:

(fields_private:something)

Or if you want a specific field:

(field1_private:something) OR (field2_private:something) or 
(field3_private:something)

Likewise, if someone didn't have access to the private fields, we would only 
search in the all fields. We also created a super field so that we don't 
have to search each individual field -- we use copyfields to copy all private 
fields into the super field and just search that.

I hope this helps.

Swati

-Original Message-
From: Nalini Kartha [mailto:nalinikar...@gmail.com] 
Sent: Monday, September 17, 2012 2:45 PM
To: solr-user@lucene.apache.org
Subject: Selective field level security

Hi,

We're trying to push some security related info into the index which will 
control which users can search certain fields and we're wondering what the best 
way to accomplish this is.

Some records that are being indexed and searched can have certain fields marked 
as private. When a field is marked as private, some querying users should not 
see/search on it whereas some super users can.

Here's the solutions we're considering -

   - Index a separate boolean value into a new _INTERNAL field to indicate
   if the corresponding field value is marked private or not and include a
   filter in the query when the searching user is not a super user.

So for eg., consider that a record can contain 3 fields - field[123] where
field1 and field2 can be marked as private but field3 cannot.

Record A has only field1 marked as private, record B has both field1 and
field2 marked as private.

When we index these records here's what we'd end up with in the index -

Record A -
field1:something,  field1_INTERNAL:1, field2:something, field2_INTERNAL:0, 
field3:something Record B - field1:something,  field1_INTERNAL:1, 
field2:something, field2_INTERNAL:1, field3:something

If the searching user is NOT a super user then the query (let's say it's 
'hidden security') needs to look like this-

((field3:hidden) OR (field1:hidden AND field1_INTERNAL:0) OR (field2:hidden AND 
field2_INTERNAL:0)) AND ((field3:security) OR (field1:security AND
field1_INTERNAL:0) OR (field2:security AND field2_INTERNAL:0))

Manipulating the query this way seems painful and error prone so we're 
wondering if Solr provides anything out of the box that would help with this?


   - Index the private values themselves into a separate _INTERNAL field
   and then determine which fields to query depending on the visibility of the
   searching user.

So using the example from above, here's what the indexed records would look 
like -

Record A - field1_INTERNAL:something, field2:something,  field3:something 
Record B - field1_INTERNAL:something, field2_INTERNAL:something, 
field3:something

If the searching user is NOT a super user then the query just needs to be 
against the regular fields whereas if the searching user IS a super user, the 
query needs to be against BOTH the regular and INTERNAL fields.

The issue with this solution is that since the number of docs that include the 
INTERNAL fields is going to be much fewer we're wondering if relevancy would be 
messed up when we're querying both regular and internal fields for super users?

Thoughts?

Thanks,
Nalini

FilterCache Memory consumption high

2012-09-17 Thread Mike Schultz

I've looked through documentation and postings and expect that a single
filter cache entry should be approx MaxDoc/8 bytes.

Our frequently updated index (replication every 3 minutes) has maxdoc ~= 23
Million.

So I'm figuring 3MB per entry.  With CacheSize=512 I expect something like
1.5GB of RAM, but with the server in steady state after 1/2 hour, it is 7GB
larger than without the cache.

I can understand maybe a 2x difference, given the warming searcher but 4x I
don't understand.

I do have maxWarmingSearchers = 2, but have never seen 2 searchers
sumiltaneously being warmed.

Ideas anybody?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/FilterCache-Memory-consumption-high-tp4008444.html
Sent from the Solr - User mailing list archive at Nabble.com.

Help with slow Solr Cloud query

2012-09-17 Thread jimtronic

Hi,

I've got a set up as follows:

- 13 cores
- 2 servers 
- running Solr 4.0 Beta with numShards=1 and an embedded zookeeper.

I'm trying to figure out why some complex queries are running so slowly in
this setup versus quickly in a standalone mode.

Given a query like: /select?q=(some complex query)

It runs fast and gets faster (caches) when only running one server:

1. ?fl=*q=(complex query)wt=jsonrows=24 (QTime 3)

When, I issue the same query to the cluster and watch the logs, it looks
like it's actually performing the query 3 times like so:

1. ?q=(complex
query)distrib=falsewt=javabinrows=24version=2NOW=1347911018556shard.url=(server1)|(server2)fl=id,scoredf=textstart=0isShard=truefsv=true
(QTime 2)

2. ?ids=(ids from query
1)distrib=falsewt=javabinrows=24version=2df=textfl=*shard.url=(server1)|(server2)NOW=1347911018556start=0q=(complex
query)isShard=true (QTime 4)

3.  ?fl=*q=(complex query)wt=jsonrows=24 (QTime 459)

Why is it performing #3? It already has everything it needs in #2 and #3
seems to be really slow even when warmed and cached.

As stated above, this query is fast when running on a single server that is
warmed and cached.

Since my query is complex, I could understand some slowness if I was
attempting this across multiple shards, but since there's only one shard,
shouldn't it just pick one server and query it?

Thanks!
Jim





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-with-slow-Solr-Cloud-query-tp4008448.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Stats field with decimal values

2012-09-17 Thread Swati Swoboda

You can use an XSL response writer to transform your values to have a different 
precision.

http://wiki.apache.org/solr/XsltResponseWriter

Would most likely be better for your client to just do it on his end though. He 
is probably parsing the response anyway.

-Original Message-
From: Gustav [mailto:xbihy...@sharklasers.com] 
Sent: Monday, September 17, 2012 1:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Stats field with decimal values

Well, my client is asking if is it possible, im just providing the search 
enginne to him, not working directly with the application. Dont know exactly in 
what language he is programming.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Stats-field-with-decimal-values-tp4008292p4008395.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: FilterCache Memory consumption high

2012-09-17 Thread Yonik Seeley

On Mon, Sep 17, 2012 at 3:44 PM, Mike Schultz mike.schu...@gmail.com wrote:
 So I'm figuring 3MB per entry.  With CacheSize=512 I expect something like
 1.5GB of RAM, but with the server in steady state after 1/2 hour, it is 7GB
 larger than without the cache.

Heap size and memory use aren't quite the same thing.
Try running jconsole (it comes with every JDK), attaching to the
process, and then make it run multiple garbage collections to see what
the heap shrinks down to.

-Yonik
http://lucidworks.com

Re: Taking a full text, then truncate and duplicate with stopwords

Ah, ok this is news to me and makes a lot more sense. If I can just run this
back past you to make sure I understand. If I move my full_text to 

If I move my fulltext document from my SQL database to keyword_document it
will contain the original fulltext in the source, but the index will have
the stopword filter, lowercase filter etc applied. Then by copying this to
truncated_document the original source is being moved?

*This is my definition for keyword_description, using the stopwords.txt*
fieldType name=keyword_description class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=30
/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

*Then this to do the copying across. Is there somewhere specific to put this
within the schema.xml?*
copyField source=keyword_description dest=truncated_description
maxChars=3000/

*Then do I need to have definitions for the truncated description in the
same way that I did for keyword_description?*
fieldType name=truncated_description class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=30
/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType



Jack Krupansky-2 wrote
 
 You said it has been copied from the keyword_document [field], but the 
 reality is that Solr is not copying from the indexed value of the field,
 but 
 from the source value for the field. The idea is that multiple fields can
 be 
 based on the same source value even if they analyze and index the value in 
 different ways.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Spadez
 Sent: Monday, September 17, 2012 12:29 PM
 To: solr-user@.apache
 Subject: Re: Taking a full text, then truncate and duplicate with
 stopwords
 
 I'm really confused here. I have a document which is say 4000 words long.
 I
 want to get this put into two fields in Solr without having to save the
 original document in its entirety within Solr.
 
 When I import my fulltext (4000 word) document to Solr I was going to put
 it
 straight into keyword_document which uses stopwords to remove words like
 and it this. Now I only have 3000 words for example.
 
 Then if I do copy command to move it into truncate_document then even
 though
 I can reduce it down to say 100 words, it is lacking words like and it
 and this because it has been copied from the keyword_document.
 
 I want the following scenario:
 
 truncate_document to have 100 words including words like and it and
 this
 keyword_docment to have only stop words removed
 And finally only have the fulltext document, full length and all stop
 words,
 exist in my SQL database.
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 

Jack Krupansky-2 wrote
 
 You said it has been copied from the keyword_document [field], but the 
 reality is that Solr is not copying from the indexed value of the field,
 but 
 from the source value for the field. The idea is that multiple fields can
 be 
 based on the same source value even if they analyze and index the value in 
 different ways.
 
 -- Jack Krupansky
 
 -Original Message- 
 From: Spadez
 Sent: Monday, September 17, 2012 12:29 PM
 To: solr-user@.apache
 Subject: Re: Taking a full text, then truncate and duplicate with
 stopwords
 
 I'm really confused here. I have a document which is say 4000 words long.
 I
 want to get this put into two fields in Solr without having to save the
 original document in its entirety within Solr.
 
 When I import my fulltext (4000 word) document to Solr I was going to put
 it
 straight into keyword_document which uses stopwords to remove words like
 and it this. Now I only have 3000 words for example.
 
 Then if I do copy command to move it into truncate_document then even
 though
 I can reduce it down to say 100 words, it is lacking words like and it
 and this because it has been copied from the keyword_document.
 
 I want the following scenario:
 
 truncate_document to have 100 words including words like and it and
 this
 keyword_docment to have only stop words removed
 And finally only have the fulltext document, full length and all stop
 words,
 exist in my SQL database.
 
 
 
 
 
 --
 View this message in

Re: Taking a full text, then truncate and duplicate with stopwords

You're getting the hang of it. No particular location for CopyField, just 
not within fields or types. Putting them after your fields makes sense. 
See the Solr example schema.


-- Jack Krupansky

-Original Message- 
From: Spadez

Sent: Monday, September 17, 2012 4:47 PM
To: solr-user@lucene.apache.org
Subject: Re: Taking a full text, then truncate and duplicate with stopwords

Ah, ok this is news to me and makes a lot more sense. If I can just run this
back past you to make sure I understand. If I move my full_text to

If I move my fulltext document from my SQL database to keyword_document it
will contain the original fulltext in the source, but the index will have
the stopword filter, lowercase filter etc applied. Then by copying this to
truncated_document the original source is being moved?

*This is my definition for keyword_description, using the stopwords.txt*
fieldType name=keyword_description class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=30
/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType

*Then this to do the copying across. Is there somewhere specific to put this
within the schema.xml?*
copyField source=keyword_description dest=truncated_description
maxChars=3000/

*Then do I need to have definitions for the truncated description in the
same way that I did for keyword_description?*
fieldType name=truncated_description class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=3 maxGramSize=30
/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType



Jack Krupansky-2 wrote


You said it has been copied from the keyword_document [field], but the
reality is that Solr is not copying from the indexed value of the field,
but
from the source value for the field. The idea is that multiple fields can
be
based on the same source value even if they analyze and index the value in
different ways.

-- Jack Krupansky

-Original Message- 
From: Spadez

Sent: Monday, September 17, 2012 12:29 PM
To: solr-user@.apache
Subject: Re: Taking a full text, then truncate and duplicate with
stopwords

I'm really confused here. I have a document which is say 4000 words long.
I
want to get this put into two fields in Solr without having to save the
original document in its entirety within Solr.

When I import my fulltext (4000 word) document to Solr I was going to put
it
straight into keyword_document which uses stopwords to remove words like
and it this. Now I only have 3000 words for example.

Then if I do copy command to move it into truncate_document then even
though
I can reduce it down to say 100 words, it is lacking words like and it
and this because it has been copied from the keyword_document.

I want the following scenario:

truncate_document to have 100 words including words like and it and
this
keyword_docment to have only stop words removed
And finally only have the fulltext document, full length and all stop
words,
exist in my SQL database.





--
View this message in context:
http://lucene.472066.n3.nabble.com/Taking-a-full-text-then-truncate-and-duplicate-with-stopwords-tp4008269p4008380.html
Sent from the Solr - User mailing list archive at Nabble.com.



Jack Krupansky-2 wrote


You said it has been copied from the keyword_document [field], but the
reality is that Solr is not copying from the indexed value of the field,
but
from the source value for the field. The idea is that multiple fields can
be
based on the same source value even if they analyze and index the value in
different ways.

-- Jack Krupansky

-Original Message- 
From: Spadez

Sent: Monday, September 17, 2012 12:29 PM
To: solr-user@.apache
Subject: Re: Taking a full text, then truncate and duplicate with
stopwords

I'm really confused here. I have a document which is say 4000 words long.
I
want to get this put into two fields in Solr without having to save the
original document in its entirety within Solr.

When I import my fulltext (4000 word) document to Solr I was going to put
it
straight into keyword_document which uses stopwords to remove words like
and it this. Now I only have 3000 words for example.

Then if I do copy command to move it into truncate_document then even
though
I can reduce it down to say 100 words, it is lacking words like and it
and this because it has been copied from the keyword_document.

I

Re: Installing Tomcat as the user solr?

2012-09-17 Thread Ken Clarke


Ok, I'll try running as tomcat.

The wiki has a problem with the tomcat startup script.  It looks like it's 
supposed to be a link which allows us to download a shell script, but when I 
click it, I get the error message You are not allowed to do AttachFile on 
this page. Login and try again..  The link I'm talking about is 1 line 
above 
http://wiki.apache.org/solr/SolrTomcat?action=AttachFiledo=viewtarget=tomcat6#Building_Solr



- Original Message - 
From: Michael Della Bitta michael.della.bi...@appinions.com

To: solr-user@lucene.apache.org
Sent: Monday, September 17, 2012 12:32 PM
Subject: Re: Installing Tomcat as the user solr?


I probably wouldn't suggest running Tomcat as root because of the
principle of least privilege, but aside from that, it's sort of
immaterial what you call the account, particularly if you already have
a 'tomcat' daemon account set up.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Mon, Sep 17, 2012 at 11:13 AM, Ken Clarke
k_cla...@perlprogrammer.net wrote:
Can I have some clarification about installing Tomcat as the user solr? 
See http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6 second 
paragraph, which states Create the solr user. As solr, extract the Tomcat 
6.0 download into /opt/tomcat6.


Does this user need a home-dir?  (I'm guessing no).  Should it have it's 
own private group?  If so, is that group a system group with GID  500? 
What about a login shell (again I'm guessing not necessary)


The documentation doesn't go on to say that you should switch to the solr 
user account when installing SOLR.  Sorry if that sounds like a dumb 
question, but there is no explanation about why tomcat needs to be 
installed as solr rather than tomcat or root.


Thanks.

broken links in solr wiki

2012-09-17 Thread Petersen, Robert

Hi group,

On this wiki page these two links below are broken as they are also on 
lucidworks' version, can someone point me at the correct locations please?  I 
googled around and came up with possible good links.

Thanks
Robi

http://wiki.apache.org/solr/LanguageAnalysis#Other_Tips
http://lucidworks.lucidimagination.com/display/solr/Language+Analysis

solr.KeywordMarkerFilterFactory

A sample Solr protwords.txt with 
commentshttp://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/conf/protwords.txt
 can be found in the Source Repository.

Is this it?  
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1/conf/protwords.txt



solr.StemmerOverrideFilterFactory

A sample stemdict.txt with 
commentshttp://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/conf/stemdict.txt
 can be found in the Source Repository.

Is this it?  
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/test-files/solr/conf/stemdict.txt?p=1227271
  (needs the ?p= parameter???)

Re: broken links in solr wiki