Re: Is SOLR best suited to this application - Finding co-ordinates

2012-08-01 Thread Spadez
Normalising the data is a good idea, and it would be easy to do since I would
only have around 50,000 entires BUT it is a bit complicated with addresses I
think. Lets say I store the data in this form:

TownCityCountry

London, England
Swindon, Wiltshire, England
Wiltshire England
England

What happens if someone searches just London, or just Swindon. I assume
it wouldnt return any results because they would have to type London,
England for example. If I include an entry for London and London, England
then the autocomplete will show both, which would confuse the user.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-SOLR-best-suited-to-this-application-Finding-co-ordinates-tp3998308p3998547.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr upgrade from 1.4 to 3.6

2012-08-01 Thread Chantal Ackermann
Hi Kalyan,

that is becouse SolrJ uses javabin as format which has class version numbers 
in the serialized objects that do not match. Set the format to XML (wt 
parameter) and it will work (maybe JSON would, as well).

Chantal
 

Am 31.07.2012 um 20:50 schrieb Manepalli, Kalyan:

 Hi all,
We are trying to upgrade our solr instance from 1.4 to 3.6. We 
 use SolrJ API to fetch the data from index. We see that SolrJ 3.6 version is 
 not compatible with index generated with 1.4.
 Is this known issue and is there a workaround for this.
 
 Thanks,
 Kalyan Manepalli
 



auto completion search with solr using NGrams in SOLR

2012-08-01 Thread aniljayanti
I want to implement an auto completion search with solr using NGrams. If the
user is searching for names of employees, then auto completion should be
applied. ie., 

if types j then need to show the names starts with j if types ja then
need to show the names starts with ja if types jac then need to show the
names starts with jak if types jack then need to show the names starts
with jack

Below is my configuration settings in schema.xml, Please suggest me if
anything wrong.

below is my code in schema.xml

fieldType name=edgytext class=solr.TextField
positionIncrementGap=100
 analyzer type=index
  tokenizer class=solr.KeywordTokenizerFactory / 
  filter class=solr.LowerCaseFilterFactory / 
  filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=15 / 
  /analyzer
 analyzer type=query
  tokenizer class=solr.KeywordTokenizerFactory / 
  filter class=solr.LowerCaseFilterFactory / 
  /analyzer
  /fieldType
field name=empname type=edgytext indexed=true stored=true /
field name=autocomplete_text type=edgytext indexed=true stored=true
omitNorms=true omitTermFreqAndPositions=true / 
copyField source=empname dest=text / 

when im searching with name mado or madonna getting employees names.But
when searching with madon not getting any data.

Please help me on this.


Thanks in Advance,

Anil.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559.html
Sent from the Solr - User mailing list archive at Nabble.com.


Urgent: Facetable but not Searchable Field

2012-08-01 Thread jayakeerthi s
All,

We have a requirement, where we need to implement 2 fields as Facetable,
but the values of the fields should not be Searchable.

Please let me know is this feature Supported in Solr If yes what would be
the Configuration to be done in Schema.xml and Solrconfig.xml to achieve
the same.

This is kind of urgent as we need to reply on the functionality.


Thanks in advance,

Jay


Re: Urgent: Facetable but not Searchable Field

2012-08-01 Thread Michael Kuhlmann

On 01.08.2012 13:58, jayakeerthi s wrote:

We have a requirement, where we need to implement 2 fields as Facetable,
but the values of the fields should not be Searchable.


Simply don't search for it, then it's not searchable.

Or do I simply don't understand your question? As long as Dismax doesn't 
have the attribute in its qf parameter, it's not getting searched.


Or, if the user has direct access to Solr, then she can search for the 
attribute. And can delete the index, or crash the server, if she likes.


So the short anser is: No. Facettable fields must be searchable. But 
usually, this is no problem.


-Kuli


Re: Urgent: Facetable but not Searchable Field

2012-08-01 Thread Yonik Seeley
On Wed, Aug 1, 2012 at 7:58 AM, jayakeerthi s mail2keer...@gmail.com wrote:
 We have a requirement, where we need to implement 2 fields as Facetable,
 but the values of the fields should not be Searchable.

The user fields uf feature of the edismax parser may work for you:

http://wiki.apache.org/solr/ExtendedDisMax#uf_.28User_Fields.29

-Yonik
http://lucidimagination.com


AW: auto completion search with solr using NGrams in SOLR

2012-08-01 Thread Markus Klose
Your configuration of the fieldtype looks quite ok.

In what field are you searching? text?  empname  ? autocomplete_text?
If you are searching in autocomplete_text  how do you add content to it? Is 
there another copyfield statement? If you are searching in text what 
fieldtype has that field.

You can use the analysis.jsp (linked at the admin console) to check what 
happens with your content at index time and search time and if there is a match.

Viele Grüße aus Augsburg

Markus Klose
SHI Elektronische Medien GmbH 
 

-Ursprüngliche Nachricht-
Von: aniljayanti [mailto:anil.jaya...@gmail.com] 
Gesendet: Mittwoch, 1. August 2012 12:05
An: solr-user@lucene.apache.org
Betreff: auto completion search with solr using NGrams in SOLR

I want to implement an auto completion search with solr using NGrams. If the 
user is searching for names of employees, then auto completion should be 
applied. ie., 

if types j then need to show the names starts with j if types ja then 
need to show the names starts with ja if types jac then need to show the 
names starts with jak if types jack then need to show the names starts with 
jack

Below is my configuration settings in schema.xml, Please suggest me if anything 
wrong.

below is my code in schema.xml

fieldType name=edgytext class=solr.TextField
positionIncrementGap=100
 analyzer type=index
  tokenizer class=solr.KeywordTokenizerFactory /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.EdgeNGramFilterFactory minGramSize=1
maxGramSize=15 /
  /analyzer
 analyzer type=query
  tokenizer class=solr.KeywordTokenizerFactory /
  filter class=solr.LowerCaseFilterFactory /
  /analyzer
  /fieldType
field name=empname type=edgytext indexed=true stored=true / field 
name=autocomplete_text type=edgytext indexed=true stored=true
omitNorms=true omitTermFreqAndPositions=true / copyField source=empname 
dest=text / 

when im searching with name mado or madonna getting employees names.But 
when searching with madon not getting any data.

Please help me on this.


Thanks in Advance,

Anil.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559.html
Sent from the Solr - User mailing list archive at Nabble.com.


termFrequncy off and still use fastvector highlighter?

2012-08-01 Thread abhayd
hi
We would like to turn off TF for a field but we still want to use fast
vector highlighter.

How would we do that?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/termFrequncy-off-and-still-use-fastvector-highlighter-tp3998590.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Urgent: Facetable but not Searchable Field

2012-08-01 Thread Jack Krupansky
The indexed and stored field attributes are independent, so you can 
define a facet field as stored but not indexed (stored=true 
indexed=false), so that the field can be faceted but not indexed.


In addition, you can also use a copyField to copy the original values for an 
indexed field (before the values get analyzed and transformed to be placed 
in the index as terms) to a stored field to facet them (or vice versa).


-- Jack Krupansky

-Original Message- 
From: jayakeerthi s

Sent: Wednesday, August 01, 2012 6:58 AM
To: solr-user@lucene.apache.org ; solr-user-h...@lucene.apache.org ; 
solr-dev-h...@lucene.apache.org

Subject: Urgent: Facetable but not Searchable Field

All,

We have a requirement, where we need to implement 2 fields as Facetable,
but the values of the fields should not be Searchable.

Please let me know is this feature Supported in Solr If yes what would be
the Configuration to be done in Schema.xml and Solrconfig.xml to achieve
the same.

This is kind of urgent as we need to reply on the functionality.


Thanks in advance,

Jay 



Re: Urgent: Facetable but not Searchable Field

2012-08-01 Thread Michael Kuhlmann

On 01.08.2012 15:40, Jack Krupansky wrote:

The indexed and stored field attributes are independent, so you can
define a facet field as stored but not indexed (stored=true
indexed=false), so that the field can be faceted but not indexed.


?

A field must be indexed to be used for faceting.

-Kuli


Re: Urgent: Facetable but not Searchable Field

2012-08-01 Thread Jack Krupansky
Oops. Obviously facet fields must be indexed. Not sure what I was thinking 
at the moment.


-- Jack Krupansky

-Original Message- 
From: Michael Kuhlmann

Sent: Wednesday, August 01, 2012 8:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Urgent: Facetable but not Searchable Field

On 01.08.2012 15:40, Jack Krupansky wrote:

The indexed and stored field attributes are independent, so you can
define a facet field as stored but not indexed (stored=true
indexed=false), so that the field can be faceted but not indexed.


?

A field must be indexed to be used for faceting.

-Kuli 



Cloud and cores

2012-08-01 Thread Pierre GOSSÉ
Hi all,

I'm playing around with SolrCloud and followed indications I found at 
http://wiki.apache.org/solr/SolrCloud/

-  Started Instance 1 with embedded zk

-  Started Instances 2 3 and 4 using Instance 1 as zk server.

Everything works fine.

Then, using CoreAdmin, I add a second core in collection1 for Instance 1 and 3 
... everything is ok in the admin GUI, meaning that the graph show 2 shards of 
3 server addresses each, those having 2 cores showing to time on the graph.

collection1  shard1  wks-pge:7574
wks-pge:8900
wks-pge:8983
shard2  wks-pge:8983
wks-pge:7500
wks-pge:8900

On instances 1 and 3 I have 2 cores both at the bottom of the left column, and 
in the CoreAdmin screen.

I restart everything, and find the server in what seems to be an inconsistent 
state : i.e. graph still showing 2 shards of 3 server addresses, but CoreAdmin 
not showing my additional cores any more.

Is there a problem in SolrCloud or CoreAdmin, or did I just do something stupid 
here ? :)

Pierre



Map Complex Datastructure with Solr

2012-08-01 Thread Thomas Gravel
Hi,
how can I map these complex Datastructure in Solr?

Document
- Groups
 - Group_ID
 - Group_Name
 - .
   - Title
   - Chapter
 - Chapter_Title
 - Chapter_Content


Or

Product
- Groups
 - Group_ID
 - Group_Name
 - .
   - Title
   - Articles
 - Artilce_ID
 - Artilce_Color
 - Artilce_Size

Thanks for ideas


Re: Map Complex Datastructure with Solr

2012-08-01 Thread Jack Krupansky
The general rule is to flatten the structures. You have a choice between 
sharing common fields between tables, such as title, or adding a 
prefix/suffix to qualify them, such as document_title vs. product_title.


You also have the choice of storing different tables in separate Solr 
cores/collections, but then you have the burden of querying them separately 
and coordinating the separate results on your own. It all depends on your 
application.


A lot hinges on:

1. How do you want to search the data?
2. How do you want to access the fields once the Solr documents have been 
identified by a query - such as fields to retrieve, join, etc.


So, once the data is indexed, what are your requirements for accessing the 
data? E.g., some sample pseudo-queries and the fields you want to access.


-- Jack Krupansky

-Original Message- 
From: Thomas Gravel

Sent: Wednesday, August 01, 2012 9:52 AM
To: solr-user@lucene.apache.org
Subject: Map Complex Datastructure with Solr

Hi,
how can I map these complex Datastructure in Solr?

Document
   - Groups
- Group_ID
- Group_Name
- .
  - Title
  - Chapter
- Chapter_Title
- Chapter_Content


Or

Product
   - Groups
- Group_ID
- Group_Name
- .
  - Title
  - Articles
- Artilce_ID
- Artilce_Color
- Artilce_Size

Thanks for ideas 



Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-01 Thread Robert Muir
On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote:
 Hi All

 I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that
 when we are indexing lots of data with 16 concurrent threads, Heap grows
 continuously. It remains high and ultimately most of the stuff ends up
 being moved to Old Gen. Eventually, Old Gen also fills up and we start
 getting into excessive GC problem.

Hi: I don't claim to know anything about how tomcat manages threads,
but really you shouldnt have all these objects.

In general snowball stemmers should be reused per-thread-per-field.
But if you have a lot of fields*threads, especially if there really is
high thread churn on tomcat, then this could be bad with snowball:
see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841

I think it would be useful to see if you can tune tomcat's threadpool
as he describes.

separately: Snowball stemmers are currently really ram-expensive for
stupid reasons.
each one creates a ton of Among objects, e.g. an EnglishStemmer today
is about 8KB.

I'll regenerate these and open a JIRA issue: as the snowball code
generator in their svn was improved
recently and each one now takes about 64 bytes instead (the Among's
are static and reused).

Still this wont really solve your problem, because the analysis
chain could have other heavy parts
in initialization, but it seems good to fix.

As a workaround until then you can also just use the good old
PorterStemmer (PorterStemFilterFactory in solr).
Its not exactly the same as using Snowball(English) but its pretty
close and also much faster.

-- 
lucidimagination.com


RE: Cloud and cores

2012-08-01 Thread Pierre GOSSÉ
It may have something to do with SOLR-3425, but I'm not that sure it fits.

I made some more tests.

Case 1 : with SolrCloud
I can create a new core on one of the server by the admin GUI or by CREATE 
directive in URL. The data folder is created (but no conf folder, I believe zk 
conf is used). However ./solr/solr.xml is not updated with the new core 
parameter.
If I restart the server, the core is lost (but data folder is kept)

Case 2 : on a single solr server
Creation of new core fails by the gui with error :
GRAVE: org.apache.solr.common.SolrException: Error executing default 
implementation of CREATE
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:396)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:141)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:175)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)

at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in 
classpath or 'solr\core2\conf/', cwd=F:\solr-4.0\Test
at 
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:294)
at 
org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:260)
at org.apache.solr.core.Config.init(Config.java:111)
at org.apache.solr.core.Config.init(Config.java:78)
at org.apache.solr.core.SolrConfig.init(SolrConfig.java:117)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:742)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:391)
... 29 more

Using an URL CREATE and giving relative pathes for solrconfig.xml and shema.xml 
fails later on stopwords.txt

Again solr/solr.xml is not updated, but the runtime exception could explain 
that in this case.

Pierre
-Message d'origine-
De : Pierre GOSSÉ [mailto:pierre.go...@arisem.com] 
Envoyé : mercredi 1 août 2012 16:22
À : solr-user@lucene.apache.org
Objet : Cloud and cores

Hi all,

I'm playing around with SolrCloud and followed indications I found at 
http://wiki.apache.org/solr/SolrCloud/

-  Started Instance 1 with embedded zk

-  Started Instances 2 3 and 4 using Instance 1 as zk server.

Everything works fine.

Then, using CoreAdmin, I add a second core in collection1 for Instance 1 and 3 
... everything is ok in the admin GUI, meaning that the graph show 2 shards of 
3 server addresses each, those 

StandardTokenizerFactory is behaving differently in Solr 3.6?

2012-08-01 Thread raonalluri
I have a field type like the following:

fieldType name=text_general_name class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


This type is behaving differently in Solr 3.3 and 3.6. In 3.3, the following
doesn't return any records because there is no author called 'Gerri Killis'.
But there is an author called ''Gerri Jonathan'.

/select/?q=author:Gerri\ Killis

In 3.6, the following returns records because there is an author called
'Gerri Jonathan'. So something is wrong in 3.6?. I didn't expect any records
here, because there is no author called 'Gerri Killis'.

/select/?q=author:Gerri\ Killis


Your help is appreciated.

Thanks
Srini



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Map Complex Datastructure with Solr

2012-08-01 Thread Thomas Gravel
Thanks for the answer.

Ich have to explain, where the problem is...

you may have at the shop solutions products and articles.
The product is the parent of all articles...

in json...

{
   product_name: tank top,
   article_list: [
 {
  color: red,
  price: 10.99,
  size: XL,
  inStore: true
 },
 {
  color: blue,
  price: 15.99,
  size: XL,
  inStore: false
 }
   ]
}

the problem is not the search (i think, because you can use
copyField), but the searchresults...

I have read the possibility to create own FieldTypes, but I don't know
if this is the answer of my issues...

2012/8/1 Jack Krupansky j...@basetechnology.com:
 The general rule is to flatten the structures. You have a choice between
 sharing common fields between tables, such as title, or adding a
 prefix/suffix to qualify them, such as document_title vs. product_title.

 You also have the choice of storing different tables in separate Solr
 cores/collections, but then you have the burden of querying them separately
 and coordinating the separate results on your own. It all depends on your
 application.

 A lot hinges on:

 1. How do you want to search the data?
 2. How do you want to access the fields once the Solr documents have been
 identified by a query - such as fields to retrieve, join, etc.

 So, once the data is indexed, what are your requirements for accessing the
 data? E.g., some sample pseudo-queries and the fields you want to access.

 -- Jack Krupansky

 -Original Message- From: Thomas Gravel
 Sent: Wednesday, August 01, 2012 9:52 AM
 To: solr-user@lucene.apache.org
 Subject: Map Complex Datastructure with Solr


 Hi,
 how can I map these complex Datastructure in Solr?

 Document
- Groups
 - Group_ID
 - Group_Name
 - .
   - Title
   - Chapter
 - Chapter_Title
 - Chapter_Content


 Or

 Product
- Groups
 - Group_ID
 - Group_Name
 - .
   - Title
   - Articles
 - Artilce_ID
 - Artilce_Color
 - Artilce_Size

 Thanks for ideas


Re: Map Complex Datastructure with Solr

2012-08-01 Thread Alexandre Rafalovitch
Sorry, that did not explain the problem, just more info about data
layout. What are you actually trying to get out of SOLR?

Are you saying you want parent's details repeated in every entry? Are
you saying that you want to be able to find entries and from there,
being able to find specific parent.

Whatever you do, SOLR will return you a list of flat entries plus some
statistics on occurrences and facets. Given that, what would you like
to see?

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Aug 1, 2012 at 12:33 PM, Thomas Gravel thomas.gra...@gmail.com wrote:
 Thanks for the answer.

 Ich have to explain, where the problem is...

 you may have at the shop solutions products and articles.
 The product is the parent of all articles...

 in json...

 {
product_name: tank top,
article_list: [
  {
   color: red,
   price: 10.99,
   size: XL,
   inStore: true
  },
  {
   color: blue,
   price: 15.99,
   size: XL,
   inStore: false
  }
]
 }

 the problem is not the search (i think, because you can use
 copyField), but the searchresults...

 I have read the possibility to create own FieldTypes, but I don't know
 if this is the answer of my issues...

 2012/8/1 Jack Krupansky j...@basetechnology.com:
 The general rule is to flatten the structures. You have a choice between
 sharing common fields between tables, such as title, or adding a
 prefix/suffix to qualify them, such as document_title vs. product_title.

 You also have the choice of storing different tables in separate Solr
 cores/collections, but then you have the burden of querying them separately
 and coordinating the separate results on your own. It all depends on your
 application.

 A lot hinges on:

 1. How do you want to search the data?
 2. How do you want to access the fields once the Solr documents have been
 identified by a query - such as fields to retrieve, join, etc.

 So, once the data is indexed, what are your requirements for accessing the
 data? E.g., some sample pseudo-queries and the fields you want to access.

 -- Jack Krupansky

 -Original Message- From: Thomas Gravel
 Sent: Wednesday, August 01, 2012 9:52 AM
 To: solr-user@lucene.apache.org
 Subject: Map Complex Datastructure with Solr


 Hi,
 how can I map these complex Datastructure in Solr?

 Document
- Groups
 - Group_ID
 - Group_Name
 - .
   - Title
   - Chapter
 - Chapter_Title
 - Chapter_Content


 Or

 Product
- Groups
 - Group_ID
 - Group_Name
 - .
   - Title
   - Articles
 - Artilce_ID
 - Artilce_Color
 - Artilce_Size

 Thanks for ideas


Re: Map Complex Datastructure with Solr

2012-08-01 Thread Thomas Gravel
hm ok I think i have to write my example data and the queries I want
to make + the response I expect...

Data:

{
product_id: xyz76,
product_name: tank top,
brand: adidas,
description:this is the long description of the product,
short_description:this is the short description of the product,
product_image:/images/tanktop.jpg,
product_image:/images/tanktop2.jpg,
article_list: [
{
article_number: TR47,
color: red,
price: 10.99,
size: XL,
unit: 1 piece,
inStore: true
},
{
article_number: TR48,
color: blue,
price: 15.99,
size: XL,
unit: 1 piece,
inStore: false
}
]
}

I want to search:
- article_number (i.e with inStore = true)
- color
- description
- short_description
- product_name

Facets:
- brand
- color
- size
- price

example query-response
{
  responseHeader:{
status:0,
QTime:2,
params:{
  indent:on,
  start:0,
  q:IBProductName:Durch*,
  wt:json,
  version:2.2,
  rows:10}},
  response:{numFound:1,start:0,docs:[
  {
product_id: xyz76,
product_name: tank top,
brand: adidas,
description:this is the long description of the product,
short_description:this is the short description of the product,
product_image:/images/tanktop.jpg,
product_image:/images/tanktop2.jpg,
article_list: [
{
color: red,
price: 10.99,
size: XL,
unit: 1 piece,
inStore: true
},
{
color: blue,
price: 15.99,
size: XL,
unit: 1 piece,
inStore: false
}
]

}
]
  }}


2012/8/1 Alexandre Rafalovitch arafa...@gmail.com:
 Sorry, that did not explain the problem, just more info about data
 layout. What are you actually trying to get out of SOLR?

 Are you saying you want parent's details repeated in every entry? Are
 you saying that you want to be able to find entries and from there,
 being able to find specific parent.

 Whatever you do, SOLR will return you a list of flat entries plus some
 statistics on occurrences and facets. Given that, what would you like
 to see?

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Wed, Aug 1, 2012 at 12:33 PM, Thomas Gravel thomas.gra...@gmail.com 
 wrote:
 Thanks for the answer.

 Ich have to explain, where the problem is...

 you may have at the shop solutions products and articles.
 The product is the parent of all articles...

 in json...

 {
product_name: tank top,
article_list: [
  {
   color: red,
   price: 10.99,
   size: XL,
   inStore: true
  },
  {
   color: blue,
   price: 15.99,
   size: XL,
   inStore: false
  }
]
 }

 the problem is not the search (i think, because you can use
 copyField), but the searchresults...

 I have read the possibility to create own FieldTypes, but I don't know
 if this is the answer of my issues...

 2012/8/1 Jack Krupansky j...@basetechnology.com:
 The general rule is to flatten the structures. You have a choice between
 sharing common fields between tables, such as title, or adding a
 prefix/suffix to qualify them, such as document_title vs. product_title.

 You also have the choice of storing different tables in separate Solr
 cores/collections, but then you have the burden of querying them separately
 and coordinating the separate results on your own. It all depends on your
 application.

 A lot hinges on:

 1. How do you want to search the data?
 2. How do you want to access the fields once the Solr documents have been
 identified by a query - such as fields to retrieve, join, etc.

 So, once the data is indexed, what are your requirements for accessing the
 data? E.g., some sample pseudo-queries and the fields you want to access.

 -- Jack Krupansky

 -Original Message- From: Thomas Gravel
 Sent: Wednesday, August 01, 2012 9:52 AM
 To: solr-user@lucene.apache.org
 Subject: Map Complex Datastructure with Solr


 Hi,
 how can I map these complex Datastructure in Solr?

 Document
- Groups
 - Group_ID
 - Group_Name
 - 

Exact match on few fields, fuzzy on others

2012-08-01 Thread Pranav Prakash
Hi Folks,

I am using Solr 3.4 and my document schema has attributes - title,
transcript, author_name. Presently, I am using DisMax to search for a user
query across transcript. I would also like to do an exact search on
author_name so that for a query Albert Einstein, I would want to get all
the documents which contain Albert or Einstein in transcript and also those
documents which have author_name exactly as 'Albert Einstein'.

Can we do this by dismax query parser? The schema for both the fields are
below:

 fieldType name=text_commongrams class=solr.TextField
analyzer
  charFilter class=solr.HTMLStripCharFilterFactory /
  tokenizer class=solr.StandardTokenizerFactory /
  filter class=solr.RemoveDuplicatesTokenFilterFactory /
  filter class=solr.TrimFilterFactory /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt
  ignoreCase=true
  expand=true /
filter class=solr.CommonGramsFilterFactory
  words=stopwords_en.txt
  ignoreCase=true /
filter class=solr.StopFilterFactory
  words=stopwords_en.txt
  ignoreCase=true /
filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1
  generateNumberParts=1
  catenateWords=1
  catenateNumbers=1
  catenateAll=0
  preserveOriginal=1 /
  /analyzer
/fieldType
fieldType name=text_standard class=solr.TextField
analyzer
  charFilter class=solr.HTMLStripCharFilterFactory /
  tokenizer class=solr.StandardTokenizerFactory /
  filter class=solr.TrimFilterFactory /
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.StopFilterFactory
words=stopwords_en.txt
ignoreCase=true /
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
catenateWords=1
catenateNumbers=1
catenateAll=0
preserveOriginal=1 /
  filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt
ignoreCase=true
expand=false /
  filter class=solr.RemoveDuplicatesTokenFilterFactory /
  filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
  /analyzer
  /fieldType

 field name=titletype=text_commongrams   indexed=true
 stored=true  multiValued=false /
 field name=author_name type=text_standard indexed=true
stored=false /


--
*Pranav Prakash*

temet nosce


4.0 Strange Commit/Replication Issue

2012-08-01 Thread Briggs Thompson
Hello all,

I am running 4.0 alpha and have encountered something I am unable to
explain. I am indexing content to a master server, and the data is
replicating to a slave. The odd part is that when searching through the UI,
no documents show up on master with a standard *:* query. All cache types
are set to zero. I know indexing is working because I am watching the logs
and I can see documents getting added, not to mention the data is written
to the filesystem. I have autocommit set to 6 (1 minute) so it isn't a
commit issue.

The very strange part is that the slave is correctly replicating the data,
and it is searchable in the UI on the slave (but not master). I don't
understand how/why the data is visible on the slave and not visible on the
master. Does anyone have any thoughts on this or seen it before?

Thanks in advance!
Briggs


Solr spellcheck for words with quotes

2012-08-01 Thread Shri Kanish
Hi ,
I use solr as search engine for our application. WE have a title Pandora's 
star. When I give a query as 
http://localhost:8983/solr/select?q=pandora's starspellcheck=true 
spellcheck.collate=true
 
I get response as below,

- lst name=spellcheck


- lst name=suggestions


- lst name=pandora


  int name=numFound1/int 

  int name=startOffset10/int 

  int name=endOffset17/int 

- arr name=suggestion


  strpandora's/str 
  /arr
  /lst

  str name=collationtext_engb:pandora's's star/str 
  /lst
  /lst
 
The word goes as pandora and not as pandora's. An additional  's is appended to 
the collation result. Below is my configuraion for spellcheck
 

fieldType name=textSpell class=solr.TextField positionIncrementGap=100 
omitNorms=true
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords_selma.txt/
filter class=solr.StandardFilterFactory/ 
filter class=solr.LowerCaseFilterFactory/ 
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
 tokenizer class=solr.StandardTokenizerFactory/ 
 
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords_selma.txt/
filter class=solr.StandardFilterFactory/ 
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType
 
Please suggest
 
Thanks,
Shri

Re: 4.0 Strange Commit/Replication Issue

2012-08-01 Thread Tomás Fernández Löbbe
Could your autocommit in the master be using openSearcher=false? If you
go to the Master admin, do you see that the searcher has all the segments
that you see in the filesystem?



On Wed, Aug 1, 2012 at 4:24 PM, Briggs Thompson w.briggs.thomp...@gmail.com
 wrote:

 Hello all,

 I am running 4.0 alpha and have encountered something I am unable to
 explain. I am indexing content to a master server, and the data is
 replicating to a slave. The odd part is that when searching through the UI,
 no documents show up on master with a standard *:* query. All cache types
 are set to zero. I know indexing is working because I am watching the logs
 and I can see documents getting added, not to mention the data is written
 to the filesystem. I have autocommit set to 6 (1 minute) so it isn't a
 commit issue.

 The very strange part is that the slave is correctly replicating the data,
 and it is searchable in the UI on the slave (but not master). I don't
 understand how/why the data is visible on the slave and not visible on the
 master. Does anyone have any thoughts on this or seen it before?

 Thanks in advance!
 Briggs



Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

2012-08-01 Thread raonalluri
I noticed, escape character which is in the query, is getting ignored in solr
3.6.

For the following 3.3 gives results where 'Featuring Chimp' is matched. But
in 3.6, it gives results where Featuring or Chimp or Featuring Chimp is
matched. Any idea what is the difference between my 3.3 and 3.6 environments
for this inconsistent results?

/select/?q=title:Featuring\ Chimp



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623p3998665.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: 4.0 Strange Commit/Replication Issue

2012-08-01 Thread Briggs Thompson
That is the problem. I wasn't aware of that new feature in 4.0. Thanks for
the quick response Tomás.

-Briggs

On Wed, Aug 1, 2012 at 3:08 PM, Tomás Fernández Löbbe tomasflo...@gmail.com
 wrote:

 Could your autocommit in the master be using openSearcher=false? If you
 go to the Master admin, do you see that the searcher has all the segments
 that you see in the filesystem?



 On Wed, Aug 1, 2012 at 4:24 PM, Briggs Thompson 
 w.briggs.thomp...@gmail.com
  wrote:

  Hello all,
 
  I am running 4.0 alpha and have encountered something I am unable to
  explain. I am indexing content to a master server, and the data is
  replicating to a slave. The odd part is that when searching through the
 UI,
  no documents show up on master with a standard *:* query. All cache types
  are set to zero. I know indexing is working because I am watching the
 logs
  and I can see documents getting added, not to mention the data is written
  to the filesystem. I have autocommit set to 6 (1 minute) so it isn't
 a
  commit issue.
 
  The very strange part is that the slave is correctly replicating the
 data,
  and it is searchable in the UI on the slave (but not master). I don't
  understand how/why the data is visible on the slave and not visible on
 the
  master. Does anyone have any thoughts on this or seen it before?
 
  Thanks in advance!
  Briggs
 



Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

2012-08-01 Thread Jack Krupansky

Which query parser do you have set in your request handler?

There was a problem with edismax in 3.6 with the WordDelimiterFilter, that 
sounds exactly like your symptom. The workaround is to enclose the term in 
quotes (to make it a phrase), otherwise the terms would be ORed rather 
than ANDed.


-- Jack Krupansky

-Original Message- 
From: raonalluri

Sent: Wednesday, August 01, 2012 3:25 PM
To: solr-user@lucene.apache.org
Subject: Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

I noticed, escape character which is in the query, is getting ignored in 
solr

3.6.

For the following 3.3 gives results where 'Featuring Chimp' is matched. But
in 3.6, it gives results where Featuring or Chimp or Featuring Chimp is
matched. Any idea what is the difference between my 3.3 and 3.6 environments
for this inconsistent results?

/select/?q=title:Featuring\ Chimp



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623p3998665.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

2012-08-01 Thread raonalluri
Jack, thanks a lot for your reply. We are using LuceneQParser query parser. I
agree, if I phrase the string by adding double quotes, I am good. 

But I am checking if there is any fix for this without changing the query.
As we are in production environment, we need to change the quries in
different places.

Can we escape from this issue by change the query parser?

regards
Srini



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623p3998677.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

2012-08-01 Thread Jack Krupansky
This may simply be a matter of changing the default query operator from OR 
to AND. Try adding q.op=AND to your request.


-- Jack Krupansky

-Original Message- 
From: raonalluri

Sent: Wednesday, August 01, 2012 4:26 PM
To: solr-user@lucene.apache.org
Subject: Re: StandardTokenizerFactory is behaving differently in Solr 3.6?

Jack, thanks a lot for your reply. We are using LuceneQParser query parser. 
I

agree, if I phrase the string by adding double quotes, I am good.

But I am checking if there is any fix for this without changing the query.
As we are in production environment, we need to change the quries in
different places.

Can we escape from this issue by change the query parser?

regards
Srini



--
View this message in context: 
http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623p3998677.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Exact match on few fields, fuzzy on others

2012-08-01 Thread Jack Krupansky
Try edismax with the PF2 option, which will automatically boost documents 
that contains occurrences of adjacent terms as you have suggested.


See:
http://wiki.apache.org/solr/ExtendedDisMax

-- Jack Krupansky

-Original Message- 
From: Pranav Prakash

Sent: Wednesday, August 01, 2012 1:21 PM
To: solr-user@lucene.apache.org
Subject: Exact match on few fields, fuzzy on others

Hi Folks,

I am using Solr 3.4 and my document schema has attributes - title,
transcript, author_name. Presently, I am using DisMax to search for a user
query across transcript. I would also like to do an exact search on
author_name so that for a query Albert Einstein, I would want to get all
the documents which contain Albert or Einstein in transcript and also those
documents which have author_name exactly as 'Albert Einstein'.

Can we do this by dismax query parser? The schema for both the fields are
below:

fieldType name=text_commongrams class=solr.TextField
   analyzer
 charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.StandardTokenizerFactory /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 filter class=solr.TrimFilterFactory /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
   filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt
 ignoreCase=true
 expand=true /
   filter class=solr.CommonGramsFilterFactory
 words=stopwords_en.txt
 ignoreCase=true /
   filter class=solr.StopFilterFactory
 words=stopwords_en.txt
 ignoreCase=true /
   filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 generateNumberParts=1
 catenateWords=1
 catenateNumbers=1
 catenateAll=0
 preserveOriginal=1 /
 /analyzer
/fieldType
fieldType name=text_standard class=solr.TextField
   analyzer
 charFilter class=solr.HTMLStripCharFilterFactory /
 tokenizer class=solr.StandardTokenizerFactory /
 filter class=solr.TrimFilterFactory /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.StopFilterFactory
   words=stopwords_en.txt
   ignoreCase=true /
 filter class=solr.WordDelimiterFilterFactory
   generateWordParts=1
   generateNumberParts=1
   catenateWords=1
   catenateNumbers=1
   catenateAll=0
   preserveOriginal=1 /
 filter class=solr.SynonymFilterFactory
   synonyms=synonyms.txt
   ignoreCase=true
   expand=false /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
 /analyzer
 /fieldType

field name=titletype=text_commongrams   indexed=true
stored=true  multiValued=false /
field name=author_name type=text_standard indexed=true
stored=false /


--
*Pranav Prakash*

temet nosce 



Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter

2012-08-01 Thread roz dev
Thanks Robert for these inputs.

Since we do not really Snowball analyzer for this field, we would not use
it for now. If this still does not address our issue, we would tweak thread
pool as per eks dev suggestion - I am bit hesitant to do this change yet as
we would be reducing thread pool which can adversely impact our throughput

If Snowball Filter is being optimized for Solr 4 beta then it would be
great for us. If you have already filed a JIRA for this then please let me
know and I would like to follow it

Thanks again
Saroj





On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir rcm...@gmail.com wrote:

 On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote:
  Hi All
 
  I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing
 that
  when we are indexing lots of data with 16 concurrent threads, Heap grows
  continuously. It remains high and ultimately most of the stuff ends up
  being moved to Old Gen. Eventually, Old Gen also fills up and we start
  getting into excessive GC problem.

 Hi: I don't claim to know anything about how tomcat manages threads,
 but really you shouldnt have all these objects.

 In general snowball stemmers should be reused per-thread-per-field.
 But if you have a lot of fields*threads, especially if there really is
 high thread churn on tomcat, then this could be bad with snowball:
 see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841

 I think it would be useful to see if you can tune tomcat's threadpool
 as he describes.

 separately: Snowball stemmers are currently really ram-expensive for
 stupid reasons.
 each one creates a ton of Among objects, e.g. an EnglishStemmer today
 is about 8KB.

 I'll regenerate these and open a JIRA issue: as the snowball code
 generator in their svn was improved
 recently and each one now takes about 64 bytes instead (the Among's
 are static and reused).

 Still this wont really solve your problem, because the analysis
 chain could have other heavy parts
 in initialization, but it seems good to fix.

 As a workaround until then you can also just use the good old
 PorterStemmer (PorterStemFilterFactory in solr).
 Its not exactly the same as using Snowball(English) but its pretty
 close and also much faster.

 --
 lucidimagination.com